Neo4j

rbuck-som · ‎11-21-2022

I have to believe I MUST be doing something wrong. I am a database expert by the way, having worked in engineering and consulting at database companies for 24 years, and I'm a language expert in over 13 languages. So I know what I am doing.

But when I run a simple benchmark for simple inserts into an ODBMS or RDBMS, or just about any other DBMS, records with two columns (or properties) I consistently get between 10,000 up to 30,000 tps. But with Neo4j I get an appalling 11 tps. I was getting over 10,000 TPS 23 years ago with ObjectStore, so I cannot understand why when I follow documentation I only get 11 TPS with Neo4j? Can someone point out what I am doing incorrectly?

I have to be doing something terribly wrong. This same hardware (64 GB RAM, 2 TB NVMe, 10 Gbps Eth) gets bleeding edge performance with any other database. I have assigned 16 GB to the database. And I am simply trying to insert a few dozen records, upwards of a few thousand. So this is a VERY SMALL database, no reason to be slow. And I am not setting up ANY relationships, only inserting nodes.

Can someone explain to me what I could possibly be doing wrong in the code below that would result in this sort of performance? I am literally copying the code out of the golang driver "benchmark" (for the most part).

	ctx := context.Background()

	// create an auth token...
	neo4jAuthToken := getNeoAuthToken(neo4jUsername, neo4jPassword, neo4jRealm)

	// create a driver...
	driver, err := neo4j.NewDriverWithContext(neo4jUri, neo4jAuthToken)
	if err != nil {
		log.Fatal(err)
	}
	defer driver.Close(ctx)

	if err := driver.VerifyConnectivity(ctx); err != nil {
		log.Fatalf("failed to verify connection: %s", err)
	}

	// create the session configuration...
	config := neo4j.SessionConfig{
		AccessMode: neo4j.AccessModeWrite,
	}

	count := 100
	start := time.Now()
	for i := 0; i < count; i++ {
		session := driver.NewSession(ctx, config)
		defer session.Close(ctx)

		name := fmt.Sprintf("jim_%d", i)
		_, err = session.Run(ctx, `CREATE (n:Account {name: $name, hash: $hash}) RETURN n`,
			map[string]interface{}{
				"name": name,
				"hash": hash(name),
			})
		if err != nil {
			log.Fatal(err)
		}
	}
	elapsed := time.Since(start).Milliseconds()
	rate := float64(count) * 1000 / float64(elapsed)
	fmt.Println("Time to load: " + strconv.Itoa(int(elapsed)))
	fmt.Println("Rate in load: " + fmt.Sprintf("%f", rate))

``

steggy · ‎11-29-2022

fwiw ~1800 tps:

print(datetime.now())

with GraphDatabase.driver(uri, auth=auth) as driver:
    with driver.session(database="neo4j") as session:
        with session.begin_transaction() as tx:
            for i in range(10000):
                tx.run("CREATE (a:Account {name: $name})", name="Jim_"+str(i))
            
            tx.commit()
print(datetime.now())

View solution in original post

andrew_bowman · ‎11-21-2022

It would probably be faster to post this as an issue on the Go bolt driver github here:

https://github.com/neo4j/neo4j-go-driver

Also you might want to test aligned more with the example code presented in the readme, and see if the timing matches yours or is faster.

We would want to know the version of Neo4j you are using, the version of the Go driver, whether this is enterprise or community, and whether this is a single instance or cluster.

rbuck-som · ‎11-22-2022

Thanks, unless I can find a reasonable benchmark or simple test illustrating acceptable transaction performance I will have to drop Neo4j from consideration and look at Allegro | Arango | Neptune | Cambridge Semantics. I just ran the equivalent code using the python client, and I get 40.15 TPS. While 4x faster than the Go driver, it's not anywhere near 10,000 to 20,000 TPS that I should expect from such a simple example.

steggy · ‎11-22-2022

@rbuck-som

You're seeing the overhead of back-and-forth with every transaction. I tested this on a pretty puny VM running on my laptop (both python and the DB are in the VM):

from neo4j import GraphDatabase
from datetime import datetime
class Neo4jConnection:
    
    def __init__(self, uri, user, pwd):
        self.__uri = uri
        self.__user = user
        self.__pwd = pwd
        self.__driver = None
        try:
            self.__driver = GraphDatabase.driver(self.__uri, auth=(self.__user, self.__pwd))
        except Exception as e:
            print("Failed to create the driver:", e)
        
    def close(self):
        if self.__driver is not None:
            self.__driver.close()
        
    def query(self, query, db=None):
        assert self.__driver is not None, "Driver not initialized!"
        session = None
        response = None
        try: 
            session = self.__driver.session(database=db) if db is not None else self.__driver.session() 
            response = list(session.run(query))
        except Exception as e:
            print("Query failed:", e)
        finally: 
            if session is not None:
                session.close()
        return response

    def driver(self):
        d = self.__driver
        return d

conn = Neo4jConnection(uri="redacted", user="redacted", pwd="redacted")

batch = []
print(datetime.now())

for i in range(100000):
    batch.append({"name": "Jim_"+str(i)})
print(datetime.now())


s=conn.driver().session()
s.run("unwind $batch as row create (n:Account {name: row['name']})", batch=batch);
s.close()

print(datetime.now())

Don't critique my Python please 🙂

~~Results are about 62k tps~~

Running this twice in a row... 100K nodes in less than a second. repeated with 500k rows (in a single txn, probably can be optimized for even better performance):

2022-11-22 11:42:58.046505
2022-11-22 11:43:02.242657

>119k tps. Now do this with multithreading and proper batch sizing 🙂

steggy · ‎11-22-2022

oh, and I forgot to add - I was running this on a little autonomous cluster I had set up for a blog that I'm working on. The cluster had 3 primaries, so my above tps also includes the overhead of guaranteed writes to the other 2 primary nodes. Single instance would be even better

steggy · ‎11-22-2022

Thanks to my co-worker Rouven, who made this into some actually reasonable Python code:

from datetime import datetime

from neo4j import GraphDatabase


uri = "neo4j://localhost:7687"
auth = ("username", "password")

print(datetime.now())

names = ["Jim_{i}" for i in range(100000)]

print(datetime.now())

with GraphDatabase.driver(uri, auth=auth) as driver:
    with driver.session(database="neo4j") as session:
        session.run("UNWIND $names AS name CREATE (n:Account {name: name})",
                    names=names)

print(datetime.now())

rbuck-som · ‎11-28-2022

I decided I'm not going to respond to the thread. I will find a different database to work with.

steggy · ‎11-29-2022

fwiw ~1800 tps:

print(datetime.now())

with GraphDatabase.driver(uri, auth=auth) as driver:
    with driver.session(database="neo4j") as session:
        with session.begin_transaction() as tx:
            for i in range(10000):
                tx.run("CREATE (a:Account {name: $name})", name="Jim_"+str(i))
            
            tx.commit()
print(datetime.now())

florent_biville · ‎12-01-2022

Hello,

Thanks for raising concerns (Go driver maintainer here).

I spotted a couple of issues with your program:

you are stacking up defer calls when closing the session. Each session will only be closed at program completion, not at the end of each iteration
creating a session without any configured database name (see SessionConfig#DatabaseName) means a home database resolution will occur every time and that involves a network roundtrip. The documentation has recently been updated to document this. I would advise to create the session only once and/or to explicitly set the database name.
you are not consuming the results of the autocommit transaction, I would advise here to not ignore the result and call Consume on it, so the server does not keep active autocommit transactions needlessly.

That should improve the situation.

rbuck-som · ‎12-01-2022

Thanks, will make these changes. Do you have a best-practice (reference) example that illustrates this?

rbuck-som · ‎12-01-2022

Florent,

It would be really nice if the driver project presented a decent coroutine-based example, that illustrates a high insert rate, follows the best practices, for both single record insert and batched insert. If you know of a complete example, I can rewrite mine to follow. To save time.

florent_biville · ‎12-01-2022

I will try to come up with something and keep you posted.

steggy · ‎12-01-2022

@rbuck-som One question I had for you based on some internal testing we've done: is a Mac involved in this test in any way?

rbuck-som · ‎12-01-2022

Yes, I have a pretty high end MacBook Pro, with 2 TB NvME, and 64 GB RAM. I'm running on through the loopback adapter, but have 10 Gbe locally. When I run batched mode with Python I get "okay" performance upwards of 1000/s. But having worked in the database industry for 24 years, I am surprised tbh how slow Neo4j is, folks shouldn't need to resort to batching, only as a last resort; we were getting upwards of 10,000 TPS using ObjectStore over the course of a one weekend consulting gig at a Thompson Financial head-to-head against Oracle (creamed their butts), back in 1998 using a single server, and using relatively (to these days) crappy hardware (64-bit Sparc) circa 1998, so Neo on this fancy hardware we have these days SHOULD BE blazingly faster than that.

The golang driver... Well that's another story entirely: I rewrote to use channels and coroutines having 1 producer, and 3 consumers (each with their own session/transactions); and at best I am upwards of 100 tps. Again, at least off by 2-3 orders of magnitude from what I would consider to be reasonable.

I really would love to see tps numbers, single record inserts, per thread upwards of 1000 tps. Then from there I can scale out across many threads... But I am unsure how well Neo would handle lots of concurrency (Again, I know SQL databases can safely handle upwards of 1k concurrent connections at scale w/o skipping a beat). I've read online that Neo is not quite so great with lots of concurrency.

I am trying to also assess the usefulness of front-ending this with Spark or Kafka, to stream inputs faster. One thing I am unsure of also is how well this will hold up on the READ side of the house. If this is used in an operational environment with heavy concurrent reads, will Neo fall over from all the load? Will it slow down at scale like AWS Neptune does?

rbuck-som · ‎12-01-2022

As a point of comparison, we have been running SiteWise (AWS Neptune internals as you may be aware) and streaming real time metrics into it from NOAA (weather) ground-based stations, and from building sensors (internal air-quality), and we're finding that SiteWise at scale, over longer time scales, falls over. Too much data. We're having to split the knowledge graph from the time-series data. So the graph does not have to handle rates of 100K/s upwards of 1M/s inserts, but it does have to support merge very efficiently (sync models and assets), support in excess of 10M nodes and relationships, support high concurrent reads and writes (95/5 mix).

steggy · ‎12-01-2022

@rbuck-som thanks for getting back to us on the thread. Your question sparked us to have a deeper look into what was going on here, and we found some things very similar to you. Without going into a huge amount of detail or knocking the hardware, Mac NVMe is a big part of the culprit here. When we (in the database) commit - we need to ensure that things are committed; bottom line: fsync call. I have a similar bit of hardware to you (half the ram and half the disk) and this is the best I can get, so the hardware is really getting in the way here. Is it possible for you to take the Mac out of the equation? We've done some testing, and that gets much better results

  fsync/fdatasync/sync_file_range:
    sync (usec): min=13, max=209, avg=29.08, stdev= 7.99
    sync percentiles (usec):
     |  1.00th=[   20],  5.00th=[   21], 10.00th=[   22], 20.00th=[   22],
     | 30.00th=[   26], 40.00th=[   28], 50.00th=[   29], 60.00th=[   30],
     | 70.00th=[   33], 80.00th=[   37], 90.00th=[   38], 95.00th=[   39],
     | 99.00th=[   47], 99.50th=[   75], 99.90th=[  114], 99.95th=[  131],
     | 99.99th=[  161]

rbuck-som · ‎12-01-2022

Yes, I recall this conversation during my NuoDB days, as they supported multiple kernel instruction types for commit. You might all the kernel instructions others do, that's ok. Anyways, I can test on a Linux machine in AWS. Also, most databases prohibit use of transparent huge pages, do you require that turned off too? Thank you!!!

steggy · ‎12-01-2022

re: hugepages https://stackoverflow.com/questions/35280757/neo4j-and-hugepages

Neo4j

Golang Driver | Neo4j Performance Issue