Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-18-2020 03:03 AM
Hello good neo4j community!
I am working on a NLP project in which I want to use Neo4j. I am dealing with several hundreds long text documents from a project database. I am using Python to process the text and metadata and to extract and append entities with Spacy in my pipeline.
One example project looks like this.
Here the turquoise node is the Project ID, the purple node contains the project document and the orange node displays the text of one extracted entity from the document.
Now, let's assume I query for several projects and want to return a network map of the entities - I query for example:
Match (n:PIMS_ID)-[*1]->(b:project_level)-[r:has_entity]->(f)
where n.name in ['5844', '5696', '5438', '5437', '5413', '3298']
with n,b,r,f
return (b:project_level)-[r:has_entity]->(f)
This returns following output:
I need help in doing the following:
I hope I can get some insights. Certainly not expecting to have all my questions answered, but anything helps!
Solved! Go to Solution.
08-18-2020 12:23 PM
You're making a meta-graph of connectedness. There are many ways to do this.
@alicia.frame touched on it in Nodes 2019 "Graph Embedding and Machine Learning"
There's also a simpler, but fairly effective guide to knowledge graphs, which is very similar to the problem you're trying to solve. "Knowledge Graph Cancer Modeling"
The simplest, but not the best, that does what you're asking (I don't think thats really what you want to do. You need research, and probably GDS and Alicia's help):
MATCH (e:Entity) SET e.weight = 0;
MATCH (e:Entity)<-[:PROJECT_TERMS]-(:Project)-[:PROJECT_TERMS]->(e2:Entity)
MERGE (e)-[r:META]-(e2)
ON CREATE SET r.weight=1
ON MATCH SET r.weight = r.weight + 1
SET e.weight = e.weight + 1
SET e2.weight = e2.weight + 1;
That will give you the meta-graph you've asked for, which you can then retrieve via:
MATCH p=(:Entity)-[:META]-(:Entity)
RETURN p
However, changing the style according to data-content isn't built into Neo4j Browser. You'll have to make something custom to do that.
08-18-2020 12:23 PM
You're making a meta-graph of connectedness. There are many ways to do this.
@alicia.frame touched on it in Nodes 2019 "Graph Embedding and Machine Learning"
There's also a simpler, but fairly effective guide to knowledge graphs, which is very similar to the problem you're trying to solve. "Knowledge Graph Cancer Modeling"
The simplest, but not the best, that does what you're asking (I don't think thats really what you want to do. You need research, and probably GDS and Alicia's help):
MATCH (e:Entity) SET e.weight = 0;
MATCH (e:Entity)<-[:PROJECT_TERMS]-(:Project)-[:PROJECT_TERMS]->(e2:Entity)
MERGE (e)-[r:META]-(e2)
ON CREATE SET r.weight=1
ON MATCH SET r.weight = r.weight + 1
SET e.weight = e.weight + 1
SET e2.weight = e2.weight + 1;
That will give you the meta-graph you've asked for, which you can then retrieve via:
MATCH p=(:Entity)-[:META]-(:Entity)
RETURN p
However, changing the style according to data-content isn't built into Neo4j Browser. You'll have to make something custom to do that.
08-20-2020 05:57 AM
Thank you @tony.chiboucas!
I marked your answer as the solution since it allowed me to definitely move further in the process! It does what I was looking for.
I started to use Neovis.js to output the metagraph with weigthed nodes and edges and it seems to work fine if set up properly!
Doing some research, I also agree that utilising the GDS would be super interesting. I will run some experiments and get back to @alicia.frame answer, once I get stuck!
Thank you two!
08-18-2020 02:46 PM
@jonas-nothnagel you might want to check out the nodeSimilarity
algorithm in the GDS library: https://neo4j.com/docs/graph-data-science/current/algorithms/node-similarity/
Given source and target nodes (entity
and document
) you can calculate similarity based on neighboring nodes; you can even use weights (eg. the number of times a term occurs in a given document) in your similarity calculation.
Node Similarity creates new relationships in your graph, where two nodes are above a similarity threshold, and adds a weight property indicating how similar documents are. I think if you had that, people could easily query the results and interact with your conclusions. You wouldn't necessarily want to delete the text_document nodes, but instead add new information to the graph.
08-25-2020 02:13 AM
Hi @alicia.frame. I wanted to try your suggestion and just wanted to check if I understood you correctly.
Assuming I have a source node (document) and target nodes (entities) that are connected with the relationship:
(document)-[:has_entity]->(entity)
You suggest to calculate the similarity between documents based on the neighbouring nodes, in this case all entities per document?
What would I gain having this information in my graph? Could you offer me some additional guidance how to produce this example (perhaps give an example code line) and elaborate what I could gain from this.
Thank you so much again!
All the sessions of the conference are now available online