Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-04-2021 12:28 AM
If I have a query along the lines of MATCH (g:Gene)-[r]-() RETURN DISTINCT type(r), count(r)
I can get a breakdown of the connections a gene node has, what type of relationships and how many of these.
Say I have a list of genes with which I want to do the same query, to get a pivot table, relationship types in rows, genes in columns, each individual cell in the table being the count of one type of relationship that gene has. How could I do this? I tried to connect two queries with a UNION ALL:
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g2
this returns an error about column names not being the same. If I make sure both returns have the same columns:
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r), null as g2, count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), null as g1, count(r) as g2
then I get duplicate rows, values being null for one gene and then the other. I also tried using UNWIND with a list then I get the two tables returned concatenated vertically instead (i.e. rows duplicated).
What I am trying to do should not be that difficult, what I am doing wrong?
02-04-2021 04:49 AM
Though it may not be entirely critical to the issue at hand, what version of Neo4j is in play here?
Regarding the error with
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g2
this is expected but can be avoided if you run
RETURN DISTINCT type(r), count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g1
admittedly your output may not allow you to differentiate data. Maybe use
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN 'G1' as gene, DISTINCT type(r), count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN 'G2' as gene, DISTINCT type(r), count(r) as g1
but further I think we dont need the DISTINCT
on type(r) but instead can simply use type(r). This is so because count(r) is going to result in aggregation. For example with test data
i// create 3 Gene nodes
create (n:Gene) set n.name='G1';
create (n:Gene) set n.name='G2';
create (n:Gene) set n.name='G3';
// Gene with name G1 will have 4 relationships to Gene named G2. The
// relationship are as follows
// 2 named R1
// 1 named R2
// 1 named R3
match (n:Gene {name:'G1'}), (n2:Gene {name:'G2'}) create (n)-[:R1]->(n2);
match (n:Gene {name:'G1'}), (n2:Gene {name:'G2'}) create (n)-[:R1]->(n2);
match (n:Gene {name:'G1'}), (n2:Gene {name:'G2'}) create (n)-[:R2]->(n2);
match (n:Gene {name:'G1'}), (n2:Gene {name:'G2'}) create (n)-[:R3]->(n2);
// Gene with name G2 will have 3 relationships to Gene named G3. The
// relationship are as follows
// 3 named R1
match (n:Gene {name:'G2'}), (n2:Gene {name:'G3'}) create (n)-[:R1]->(n2);
match (n:Gene {name:'G2'}), (n2:Gene {name:'G3'}) create (n)-[:R1]->(n2);
match (n:Gene {name:'G2'}), (n2:Gene {name:'G3'}) create (n)-[:R1]->(n2);
and now
match (n:Gene)-[r]->() return distinct type(r), count(r);
+--------------------+
| type(r) | count(r) |
+--------------------+
| "R3" | 1 |
| "R2" | 1 |
| "R1" | 5 |
+--------------------+
3 rows available after 15 ms, consumed after another 5 ms
and if I remove DISTINCT
I get the same output.
and maybe what you are really after is
match (n:Gene)-[r]->() return n.name,type(r), count(r);
+-----------------------------+
| n.name | type(r) | count(r) |
+-----------------------------+
| "G1" | "R3" | 1 |
| "G1" | "R2" | 1 |
| "G1" | "R1" | 2 |
| "G2" | "R1" | 3 |
+-----------------------------+
02-04-2021 06:52 AM
Hi Dana, this is Neo4j 4.0.5 I believe.. Does it matter in this context?
If I understand your reply correctly, it is not possible to get "wide" data rather than "long" data, borrowing the terminology from the tidyverse? Ideally I was hoping to get a table that contains three columns, type(r), counts for G1 and counts for G2
02-04-2021 06:58 AM
version.. in this case doesnt not matter except as you initially reported that an error was encountered I wanted to make sure this was / wasnt reproducible on said version
regarding your desired output are you expecting output similar to
RelationshpName G1 G2
R1 2 3
R2 1 0
R3 1 0
but where RelationshipName can be any value on the Y axis and the list on Genes on the X axis can be N number of Genes ??
02-05-2021 03:21 AM
Yes precisely, that was the idea. Maybe not the approach?
All the sessions of the conference are now available online