Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-21-2021 10:46 PM
Hi all,
I was wondering if anyone has done a project that involve 2 independent pairs of nodes
The plan is to use a similarity algo to find the jaccard.score between my POSITTIONS_1 nodes (1st day of data) and POSITIONS_2 nodes (2nd day of data)
Both sets of nodes contain same type of information of latitude, longitude and time(string format)
In addition I am planning to use it at production scale,
Any suggestions?
Solved! Go to Solution.
12-28-2021 07:37 AM
For GDS, you'll want to convert your date/time data into a numerical format that GDS can interpret. @jennifer.reif has a great blog post on handling time data in Neo4j that's worth taking a look at: Cypher Sleuthing: Dealing with Dates, Part 1
For lat/long data, the only algorithm that can explicitly use lat/long data is A* (pathfinding). If you want to use lat/long for a similarity comparison, it can be compared as two numerical values, but we don't have any concept of spatial similarity built into the library.
12-22-2021 05:57 AM
It depends how big "production scale" is - and how quickly you want the computation to finish.
Node Similarity is a brute force similarity algorithm, and uses jaccard similarity to score nodes based on neighbors. To use that algorithm, you'd need the information you're comparing (latitude, longitude, time) to be nodes.
You could also use K Nearest Neighbors - which is an approximate similarity algorithm, using cosine similarity, to compare nodes based on properties. It's much faster than Node Similarity (because it doesn't default to comparing every node with every other node).
12-22-2021 05:39 PM
Hi @alicia.frame1 ,
Thanks for the advice,
In regards to the "production scale", it is at least 1 million rows of data being fed on monthly basis and ideally take less than 2 min to finish the computation.
Currently I am applying the methodology from this article
In short summary, I replaced info used in the article above and created the new metric based on latitude and longitude since there are float numbers.
After the similarity relationships are developed, I used centrality algo, to filter out most of the interconnected nodes to focus on the nodes with desired latitude and latitude shown.
My main objective is to find common stop locations for logistic purposes.
Not sure if my approach is right for this use case.
Any advice would be appreciated
12-28-2021 12:24 AM
Actually I do have a concern.
Can the graph algorithm in neo4j process point data types and datetime data type?
In a lot of the tutorials and lessons I have seen, it seems that the algorithm can only process float or integer type formats for geospatial data in which I am dealing with now.
Does the statement above make sense? Some further advice on this would be appreciate.
12-28-2021 07:37 AM
For GDS, you'll want to convert your date/time data into a numerical format that GDS can interpret. @jennifer.reif has a great blog post on handling time data in Neo4j that's worth taking a look at: Cypher Sleuthing: Dealing with Dates, Part 1
For lat/long data, the only algorithm that can explicitly use lat/long data is A* (pathfinding). If you want to use lat/long for a similarity comparison, it can be compared as two numerical values, but we don't have any concept of spatial similarity built into the library.
01-02-2022 08:15 PM
Greatly appreciated for the advice @alicia.frame1
Currently my Graph has 5 mil nodes.
I tried using K nearest neighbors Graph Data Science playground Neuler.
However, I ran into error as shown below
Algorithm failed to complete
Error: Neo4jError: Unable to start new transaction since limit of concurrently executed transactions is reached. See setting dbms.transaction.concurrency.maximum
Any suggestions on how to resolve this issue?
01-04-2022 03:26 PM
That's a looks like a database setting - Configuration settings - Operations Manual
You'll want to go into the config file (instructions here: (and File locations - Operations Manual) and change the dbms.transaction.concurrency.maximum
setting to a higher value - or only run one thing at a time.
All the sessions of the conference are now available online