cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Nodes are not saving when created with Python

sberk10
Node Link

I have a method called create_node that will create a new node if it doesn't already exist in my database. That method works when I test it with hard-coded values in the code snippet below:

next_node = session.read_transaction(get_node_by_name, 'Greve')
if next_node is None:
    wikidata_results = find_in_wikidata('Q2044', 'Greve') # check if this is a valid entity in wikidata
    if wikidata_results:
        next_uri = wikidata_results[0]['location']['value']
        node_result = session.write_transaction(create_node, next_uri)
        if node_result:
            print("Created node in db for {n}".format(
                n=node_result['name']))

I know that method works, because I'm able to find the node that was created by querying my db in the Desktop app using cypher like this:
MATCH(n:Resource {name: 'Greve}) return n

But when I plug this into this part of my code, it is not saving the nodes:

cities = session.run("MATCH (a:`Location:City`) RETURN a.name AS name")
city_names = [record["name"] for record in cities]

for city in city_names:
    current_node = session.read_transaction(get_node_by_name, city)
    current_uri = os.path.basename(current_node[0]["uri"])
    go_next_list = get_go_next_list(city) 
    
    if go_next_list:
        for go_next_location in go_next_list:
            go_next_location = clean_place(go_next_location)
            # check if current link already has a node in the graph
            next_node = session.read_transaction(get_node_by_name, go_next_location)
            if next_node is not None:
                result = session.write_transaction(create_relationship, city, go_next_location, "GO_NEXT")
                for record in result:
                    print("Created {r} relationship between: {n1} and {n2}".format(
                        r=record['r'], n1=record['n1'], n2=record['n2']))                
            else:
                wikidata_results = find_in_wikidata(current_uri, go_next_location)    
                if wikidata_results:
                        next_uri = wikidata_results[0]['location']['value']
                        node_result = session.write_transaction(create_node, next_uri)
                        if node_result:
                            print("Created node in db for {n}".format(
                                n=node_result['name']))
                            city_names.append(go_next_location)
                            session.write_transaction(create_relationship, city, go_next_location, "GO_NEXT")
                else:
                    print("No wikidata results for item %s" % (go_next_location))

It seems to be working, because it's printing the statement I added after the transaction where I create the node (i.e. "Created node in db for {n}"). But any nodes that get created here are not being saved to the database. If I try to reference them later, I get an error and when I try to find them in my desktop app (like before: MATCH(n:Resource {name: 'Greve}) return n), they aren't there.

I don't understand why this would work in the first case, but not in the larger context. I've also tried this with the py2neo library, and I run into the same issue where nodes I create don't get saved to the database. Is there anything I can do to debug this?

1 ACCEPTED SOLUTION

sberk10
Node Link

Solved the issue! I printed out the node result immediately after running the method to create the node, and noticed that the node id looked rather small considering the size of my database. Looking up the node in Neo4j Desktop using the name or uri field returned no results, but looking it via the id returned a completely different node that had the same types as the node I was trying to create, and already existed in the database.

As @kees.vegter pointed out, I need to use MERGE on unique key values so the way I had my create_node method was not working as expected, because there were already nodes of the same types in the database. Changing that portion of my method to include the wikidata uri solved the issue:

MERGE (l:Location:Resource:%s {uri: uri})
SET l.image = image,
l.name = name,
l.article = article

View solution in original post

4 REPLIES 4

Hi,

Just a short remark. Do you know that you can use MERGE in neo4j, it is a combination of MATCH and CREATE.
So it is MERGE , it the is not there it will be created.
Tip you should only 'merge' on unique key values and use the SET command to set other properties.

regards.
Kees

dkm1006
Node Clone

Are you sure that no nodes are created or is it possible that just not all nodes you would expect are created?

I haven’t worked with the driver directly, so I’m basically just guessing but maybe there’s an issue with your transactions not being committed or something else with your session.

Without seeing your create_node function it’s hard to tell what the problem is though.

sberk10
Node Link

Thanks, kees.vegter and dkm1006. I actually am using merge to create the nodes, but the method is a bit more complex:

  1. First check if the node exists, and return it if it does
  2. Do some text processing to format the uri and types labels for the node
  3. Build a SPARQL query to get all the values I want for the node from wikidata
  4. Use apoc.load.jsonParams to get the SPARQL results and unwind the values
  5. Use MERGE to create the node and then set all the properties with the SPARQL results
def create_node(tx, uri):
    existing_node = get_node_by_uri(tx, uri)
    if existing_node:
        print("node already exists for uri %s"%(uri))
        return existing_node[0]
    base_uri = os.path.basename(uri)
    types = ':'.join(get_types(base_uri))
    create_node_query = """
                        WITH "SELECT DISTINCT * WHERE {
                                            ?location wdt:P18 ?image .
                                            ?location wdt:P17 wd:Q38 .
                                            ?location rdfs:label ?locationName .
                                            FILTER(lang(?locationName) = 'en')
                                            FILTER(?location=wd:%s)
                                            OPTIONAL {
                                            ?article schema:about ?location .
                                            ?article schema:isPartOf <https://en.wikivoyage.org/> .}
                                            }" AS sparql
                        CALL apoc.load.jsonParams(
                          "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql),
                          { Accept: "application/sparql-results+json"},
                          null)
                        YIELD value 
                        UNWIND value['results']['bindings'] as row
                        WITH row['image']['value'] as image, 
                             row['location']['value'] as uri, 
                             row['locationName']['value'] as name,
                             row['article']['value'] as article
                        MERGE (l:Location:Resource:%s)
                        SET l.image = image,
                            l.uri = uri,
                            l.name = name,
                            l.article = article
                            """%(base_uri, types)
    tx.run(create_node_query)
    result = get_node_by_uri(tx, uri)
    try:
        return result[0]
    except ServiceUnavailable as exception:
        logging.error("{query} raised an error: \n {exception}".format(
            query=create_node_query, exception=exception))
        raise     

At the end of the create_node method, I try to retrieve the newly created node and return it with get_node_by_uri, which is below. My thought was that if it doesn't successfully create the node, it wouldn't be able to retrieve it and it would log an error, but I'm not seeing that happen.

def get_node_by_uri(tx, uri):
    query = (
        "MATCH (r:Resource) "
        "WHERE r.uri = $uri "
        "RETURN r"    
    )
    result = tx.run(query, uri=uri)
    return result.single()

sberk10
Node Link

Solved the issue! I printed out the node result immediately after running the method to create the node, and noticed that the node id looked rather small considering the size of my database. Looking up the node in Neo4j Desktop using the name or uri field returned no results, but looking it via the id returned a completely different node that had the same types as the node I was trying to create, and already existed in the database.

As @kees.vegter pointed out, I need to use MERGE on unique key values so the way I had my create_node method was not working as expected, because there were already nodes of the same types in the database. Changing that portion of my method to include the wikidata uri solved the issue:

MERGE (l:Location:Resource:%s {uri: uri})
SET l.image = image,
l.name = name,
l.article = article