Neo4j

gocampo · ‎05-04-2022

I'm a backend engineer with the sole task of implementing a knowledge graph for SNOMED CT in neo4j. We have a neo4j aura instance (the cheapest one 1gb ram 2gb storage) When runnig the upload of these very large csv files using pyingest I get the following error:

 {message: The allocation of an extra 15.1 MiB would use more than the limit 100.0 MiB. Currently using 88.4 MiB. dbms.memory.transaction.global_max_size threshold reached}

Am I not paying for 1gb? Am I doing something terribly stupid or wrong? I've read the graph databases book (the one with the octopus). What else should I read?

[ 3:55 PM ]

I've fallen into this beautiful rabbit hole and find new insights each day, but I have to deliver and be practical
I'll appreciate any practical guidance and book/learning resource recommendation

[ 3:56 PM ]

again, sorry if this is the wrong place to ask!

[ 3:58 PM ]

server_uri: neo4j+s://id:7687
admin_user: neo4j
admin_pass: pass

files:
  # concepts
  - url: /home/gocandra/workspace/uma/deep-learning/research/graphs/snomed-loader/csv/Concept_Snapshot.csv
    compression: none
    skip_file: false
    chunk_size: 100
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MERGE (c:Concept {conceptId:row.id,term:row.term,descType:row.descType})
        ON CREATE SET c.conceptId = row.id, c.term = row.term, c.descType = row.descType
        ON MATCH SET c.conceptId = row.id, c.term = row.term, c.descType = row.descType
  
  ## concept synonim generator        
  - url: /home/gocandra/workspace/uma/deep-learning/research/graphs/snomed-loader/csv/Concept_Snapshot_add.csv
    compression: none
    skip_file: false
    chunk_size: 50
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MATCH (dest:Concept) WHERE dest.conceptId = row.id 
        CREATE (c:Concept:Synonym{
          conceptId: row.id,
          term: row.term,
          descType: row.descType
          })-[r:IS_A {
            relId:'116680003',
            term:'Is a (attribute)',
            descType:'900000000000003001'
          }]->(dest);

  # relationships
  - url: /home/gocandra/workspace/uma/deep-learning/research/graphs/snomed-loader/csv/Concept_Snapshot_add.csv
    compression: none
    skip_file: false
    chunk_size: 50
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MATCH (source:Concept) WHERE source.conceptId = row.sourceId
        MATCH (dest:Concept:FSA) WHERE dest.conceptId = row.destinationId
        CREATE (source)-[r:row.relLabel{relId: row.typeId, term: row.term, descType: row.descType}]->(dest)"

[ 3:59 PM ]

that's the config.yml with all the queries (i'm chunking to try and avoid this issue)

[ 4:00 PM ]

 {code: Neo.TransientError.General.MemoryPoolOutOfMemoryError} {message: The allocation of an extra 7.3 MiB would use more than the limit 100.0 MiB. Currently using 99.0 MiB. dbms.memory.transaction.global_max_size threshold reached}

[ 4:01 PM ]

now I get this error, I'm not running any other queries on the database, nor is anyone else (i'm the only one with credentials)

bennu_neo · ‎05-10-2022

Hi @gocampo !

You may like to check the ram usage of your queries. You ram is also use for the page cache. Take a look on:

Bennu

Oh, y’all wanted a twist, ey?

michael_hunger · ‎06-02-2022

You can try to reduce your chunk sizes.

It might also be good to merge on single properties (with constraint) only -> here as your id-is row.id, the other fields should not be part of the merge but an ON CREATE SET ...

MERGE (c:Concept {conceptId:row.id}) ON CREATE SET ...

Do you see which of the import queries causes the memory issue?

Sometimes AuraDB free works better in terms of memory limits, give it a try, as it doesn't have to support a clustered environment.

mholford666 · ‎06-02-2022

I noticed one thing with your query:

 # relationships
  - url: /home/gocandra/workspace/uma/deep-learning/research/graphs/snomed-loader/csv/Concept_Snapshot_add.csv
    compression: none
    skip_file: false
    chunk_size: 50
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MATCH (source:Concept) WHERE source.conceptId = row.sourceId
        MATCH (dest:Concept:FSA) WHERE dest.conceptId = row.destinationId
        CREATE (source)-[r:row.relLabel{relId: row.typeId, term: row.term, descType: row.descType}]->(dest)"

The [r:row.relLabel] won't resolve. Pyingest uses regular cypher parameter substitution which doesn't include the relationship name. To create a dynamic relationship name, you need to use something like apoc.createRelationship

Neo4j

issues with dbms.memory.transaction.global_max_size