Neo4j

Rogie · ‎06-20-2020

I have code that reads data from the web, cleans it up, and stores it in Neo4j. I'm wondering how to "parallelize" this process, since getting the data from web can be slow sometimes. My current setup is something like this:

In config.py :

from neo4j import GraphDatabase


class cfg_holder():
    ''' Container for global variables.'''
    def __init__(self, params):
        self.params = params
        self.uri = "bolt://localhost:7687"
        self.driver = GraphDatabase.driver(self.uri, auth=("user", "pass"))
        self.db = self.driver.session()

def init(param):
    return cfg_holder(param)

In main.py :

import concurrent.futures
import config

def func(h):
    # get data
    # build queries
    # when enough data has been collected:
    with h.db.begin_transaction() as tx:
        tx.run(q)
        tx.success = True


if __name__ == "__main__":
    holders = []
    for i in [10, 20]:
        holders.append(config.init(i))
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        executor.map(func, holders)

So each cfg_holder has access to its own db connection. I'm not sure this is the correct way to set things up.

It's possible that my design pattern is entirely off here. What's the right way to set this kind of thing up? Do I need to be locking the threads somewhere? Are threads even the right way to go about this? Looking for some general advice here...

jggomez · ‎06-20-2020

Hi, I tried too but I tried with multiprocessing with python and I had problem and doesn't work. The reason because the driver doesn't support that, I don't know if the new driver for neo4j 4.0 supports that.

Rogie · ‎06-20-2020

Hello, can you please tell me what it is exactly that the driver doesn't support?

jggomez · ‎06-22-2020

Multiprocessing and threads in python

Neo4j

Workflow for threads and writing to Neo4j?