Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-09-2020 10:48 PM
So, for my bachelor thesis I have to import a distance matrix in Neo4j and got the following message:
This query builds a cartesian product between disconnected patterns.
my data set is destributed into two files, one for nodes one for edges.
Nodes:
ID; Name
1;"AANAUSF000000AG0149"
2;"AANAUSF000000AG0150"
3;"AANAUSM000000000Z24"
4;"AANAUSM000000000Z28"
5;"AANAUSM000000AG0152"
Edges:
startID;endID;distance
1;2;17
1;3;21
1;4;23
1;5;30
2;3;24
2;4;28
2;5;35
3;4;18
3;5;27
4;5;17
starID
and endID
refere to the same kind of nodes and distance
is the value how far apart those samples are from each other. The relationship file represents a (n-1) x (n-1) matrix in which symmetric pairings are not significant ( [a,b] == [b,a] ) as the diagonal , which is always 0 ( [a,a] == 0 ). The final matrix represents alsmot 60 million relations and many of them could have the same distance like [1,2] and [4,5].
I tried the following instruction, leading me to the inevitable cartesian warning:
LOAD CSV WITH HEADERS FROM "file:///Kanten.csv" AS row
FIELDTERMINATOR ';'
Match (c1:Cow {id:toInteger(row.startID)}), (c2:Cow{id:toInteger(row.endID)})
CREATE (c1)-[d:DISTANCE]->(c2);
here is also my plan
My question is now: Is there a work around for this problem or is this the end for my thesis?
I appreciate every advice I can take!
Solved! Go to Solution.
08-09-2020 11:18 PM
Hello @andrej.liebert257
LOAD CSV WITH HEADERS FROM "file:///Kanten.csv" AS row FIELDTERMINATOR ';'
MATCH (c1:Cow {id: toInteger(row.startID)})
MATCH (c2:Cow {id: toInteger(row.endID)})
CREATE (c1)-[d:DISTANCE]->(c2)
To make the load faster, did you create a UNIQUE CONSTRAINT on the id
property?
Regards,
Cobra
08-09-2020 11:18 PM
Hello @andrej.liebert257
LOAD CSV WITH HEADERS FROM "file:///Kanten.csv" AS row FIELDTERMINATOR ';'
MATCH (c1:Cow {id: toInteger(row.startID)})
MATCH (c2:Cow {id: toInteger(row.endID)})
CREATE (c1)-[d:DISTANCE]->(c2)
To make the load faster, did you create a UNIQUE CONSTRAINT on the id
property?
Regards,
Cobra
08-11-2020 01:45 PM
Hey @Cobra,
It solved the problem! I implemented a few CONSTRAINTS:
(1) CREATE CONSTRAINT cow_constraint ON (c:Cow) ASSERT c.id IS UNIQUE
(2) CREATE CONSTRAINT cow_exists_constraint ON (c:Cow) ASSERT EXISTS (c.id)
(3) CREATE CONSTRAINT cow_name_exists_constraint ON (c:Cow) ASSERT EXISTS (c.name)
(4) CREATE CONSTRAINT node_key_constraint ON (c:Cow) ASSERT (c.id, c.name) IS NODE KEY
(5) CREATE CONSTRAINT distance_exists ON ()-[d:DISTANCE]-() ASSERT EXISTS (d.distance)
(1)
to (4)
worked just fine, but (5)
, creation of a relationship property existence constraint. resolved in the following message:
Relationship(0) with type
DISTANCE
must have the propertydistance
To be honest, I don't know if everyone was necessary. I wonder how those CONSTRAINTS resulted into a faster loading. In the manual I found this:
"When using MERGE
or MATCH
with LOAD CSV
we need to make sure we have an index or a unique constraint on the property that we are merging on. This will ensure that the query executes in a performant way."
On 5.4 Constraints it sais:
(I)
Adding a unique property constraint on a property will also add a single-property index on that property, so such an index cannot be added separately.
(II)
Adding a node key constraint for a set of properties will also add a composite index on those properties, so such an index cannot be added separately.
So it has something to be with indexing. I found this quote on 5.2.1 Indexing Introduction
"This [Indexing] comes at the cost of additional storage space and slower writes, so deciding what to index and what not to index is an important and often non-trivial task."
Concidering that my project has to deal with a huge data volume, it would make sense to have only the required constraints. But which constraints are necessary and which don't? I also build an index
CREATE INDEX FOR (c:COW) ON (c.name)
just in case I'll need it for performance. In the end, it might be redundant - because of (I)
/(II)
... ? I will try to get into Indexes, but wouldn't decline a more understandable source than 5.2 to dig into the theme.
After the relationship constraint (4)
was deleted, I used your code and it worked. Howerever,
I am still curious why
MATCH (c1:Cow {id:toInteger(row.startID)}), (c2:Cow{id:toInteger(row.endID)})
had to be separated into
MATCH (c1:Cow {id: toInteger(row.startID)})
MATCH (c2:Cow {id: toInteger(row.endID)})
08-11-2020 11:28 PM
You need one UNIQUE CONSTRAINT on a propery, usually, it's an ID.
You can also create multiple indexes that will be used in WHERE clauses, so if you always use a property in your WHERE clauses, you can add an index on it
Between the two codes, it should not have any difference but I prefer the second syntax, it's more clear
08-12-2020 03:16 AM
Your dataset matches the format neo4j admin import format. If this is the first time import you can do this pretty fast with this option.
Please take a look at
https://neo4j.com/docs/operations-manual/current/tools/import/syntax/
All the sessions of the conference are now available online