Neo4j

FourMoBro · ‎07-22-2022

I am looking for the correct syntax to help me load data into Neo4j, in particular using the periodic commit ability when loading from a python/pandas dataframe. My general workflow is as follows:

Within a Jupyter notebook, I load the 1M+ line tab-delimited text file into a dataframe.
Clean the data
Create a smaller dataframe to be used as input parameter for a function
Run function

In general my functions look like this:

def add_data(df1):
    query = """
    UNWIND $rows as row
    MERGE
    SET
    RETURN COUNT(*) as total
    """
    return conn.query(query, parameters = {'rows':df1.to_dict('records')})

columns = []
df1 = pd.DataFrame(df[columns])
df1 = df1.explode(columns).drop_duplicates()
add_data(df1)

This works great for creating nodes and relationships when the total count is under 1000, but when there are 1M+ nodes/relationships, it tends to not finish.

I know there are server parameters in neo4j.conf that can be adjusted which may help with the load. I know I can save the dataframe to csv and load from harddisk USING PERIODIC COMMIT. I know I can split my dataframe and create a for loop and process the loop from within python. But I don't want to go those routes. I want to get apoc.periodic.commit to work within the add_data function.

I have tried several iterations in attempt to get it to work, but to no avail. I am hoping the community can help.

Thanks in advance.

bennu_neo · ‎07-23-2022

Hi @FourMoBro,

Quick question. How does your Merge statement look? Do you have an index on the properties used?

Regards

Oh, y’all wanted a twist, ey?

Neo4j

apoc.periodic.commit within python/pandas help