Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-20-2020 08:54 AM
I have code that reads data from the web, cleans it up, and stores it in Neo4j. I'm wondering how to "parallelize" this process, since getting the data from web can be slow sometimes. My current setup is something like this:
In config.py
:
from neo4j import GraphDatabase
class cfg_holder():
''' Container for global variables.'''
def __init__(self, params):
self.params = params
self.uri = "bolt://localhost:7687"
self.driver = GraphDatabase.driver(self.uri, auth=("user", "pass"))
self.db = self.driver.session()
def init(param):
return cfg_holder(param)
In main.py
:
import concurrent.futures
import config
def func(h):
# get data
# build queries
# when enough data has been collected:
with h.db.begin_transaction() as tx:
tx.run(q)
tx.success = True
if __name__ == "__main__":
holders = []
for i in [10, 20]:
holders.append(config.init(i))
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
executor.map(func, holders)
So each cfg_holder
has access to its own db connection. I'm not sure this is the correct way to set things up.
It's possible that my design pattern is entirely off here. What's the right way to set this kind of thing up? Do I need to be locking the threads somewhere? Are threads even the right way to go about this? Looking for some general advice here...
06-20-2020 01:45 PM
Hi, I tried too but I tried with multiprocessing with python and I had problem and doesn't work. The reason because the driver doesn't support that, I don't know if the new driver for neo4j 4.0 supports that.
06-20-2020 05:50 PM
Hello, can you please tell me what it is exactly that the driver doesn't support?
06-22-2020 07:37 PM
Multiprocessing and threads in python
All the sessions of the conference are now available online