Neo4j

bjoerngohlke · ‎03-03-2020

Dear all,

I have a network of people (author) who publish together (scientific publications).
So its a very simple graph(author-PUBLISHED_IN-publication)
I need to search for an author by ID and get all people back (coauthor) he published together with. Here I need the count of common publications.
I also need the connections of all coauthors to themself, also with their counts of common publications! Here ist were I struggle with.

My query so far:

MATCH (p1:Author{author_id:"thisAuthorsID})-[:PUBLISHED_IN]->(m:Publication)<-[:PUBLISHED_IN]-(p2:Author)
WITH p1, p2, size(collect(m)) as commonPublications
WHERE commonPublications >= 2
RETURN p1, p2, commonPublications
order by commonPublications desc
LIMIT 25

intouch_vivek · ‎03-04-2020

Hi @bjoerngohlke,

Your query looks good..
Is there any specific error you are getting?

bjoerngohlke · ‎03-04-2020

Yea, the query works, but the result is not as I need it. So far the results only contains the connections from p1 to his co-authors and not between those co-authors themself.
for the p1-Author I need to filter for coauthors based on the number of common publications (subquery).
Let assume I want to have 3 authors a,b and c
in the end I would like a table like

author, coauthor, common publication
p1, a, 10
p1, b, 5
p1, c, 3
a, b, 20
b,c,2

Something like this.
The problem is that I still struggle with limiting to direct coauthors based on the number of publications and then using this set in combination with p1 to get all results like in my table above.
Hope this explanation is somehow clear, what I need to get out.

Best

intouch_vivek · ‎03-04-2020

Sorry,
I am still confused.
I tried with below dataset
author,publisher
A,Pub1
B,Pub1
B,Pub2
C,Pub2
D,Pub3
E,Pub4
A,Pub2
and your query gave
╒════════════╤════════════╤════════════════════╕
│"p1" │"p2" │"commonPublications"│
╞════════════╪════════════╪════════════════════╡
│{"name":"B"}│{"name":"A"}│2 │
└────────────┴────────────┴────────────────────┘
Please explain your requirement with example

bjoerngohlke · ‎03-04-2020

Sorry for that!

The query itself works for connections from Input-Author to HIS Co-Authors.
But I need additionally the connections from the Co-Authors to each other Co-Author.

I think that I need some kind of subquery or something like this.
See my very simple schema. ( )

My workflow:

Select a node by name: Here e.g. B
Get all his co-authors and their common publication
e.g. Filter for the top 10 Co-Authors based on common publication (Until here, my query works fine)
Create a list of filtered co-authors and check their common publication (E.g: A,C,D,E <- co-authors)
Calculate their common publication (ONLY co-authors to each other co-author) and combine with those from the initial author (B). (4+5 MISSING SO FAR)

Hope it becomes clear now.
Sorry again for this confusion!

Otherwise I can draw a figure what still is missing.

intouch_vivek · ‎03-04-2020

sorry to say,,
It will be really good if you draw it and get me small data set

bjoerngohlke · ‎03-04-2020

Hey,

attached you can find the input csv (authors_test.txt (213 Bytes) published_in_test.txt (638 Bytes) publications_test.txt (218 Bytes) )

The queries to import:
queries_to_include.txt (1012 Bytes)

The results I need are explained in the following picture (

).

Hope it will become clear now.
Sorry for that confusion. Somehow hard to explain.
Best

intouch_vivek · ‎03-04-2020

Please try this. Although I am not very satisfied with my solution

match(n:Author) with collect(n.name) as name unwind name as name1

MATCH (p1:Author{name:name1})-[:PUBLISHED_BY]->(m:Publication)<-[:PUBLISHED_BY]-(p2:Author)
WITH p1, p2, size(collect(m)) as commonPublications
WHERE commonPublications >= 2
RETURN distinct p1,p2, commonPublications
order by commonPublications desc
LIMIT 25

bjoerngohlke · ‎03-04-2020

Thanks that you take the time to try to solve my issue!!
Unfortunately your query did not fix my problem.
Following again my workflow as list.

Select a node by name: Here e.g. "aA"
Get all his co-authors and their common publication
e.g. Filter for the top 4 Co-Authors based on common publication (The problem is, that an author can have hundreds of co-authors. Therefore I am only interested in those, the author publish the most with!!)

p1.name	p2.name	cP
"aA"	"aC"	4
"aA"	"aB"	3
"aA"	"aJ"	3
"aA"	"aG"	3

Create a list of filtered co-authors and check their common publication (HERE: aB,aC,aJ,aG <- co-authors)
Calculate their common publication (ONLY co-authors to each other co-author) and combine with those from the initial author .

p1.name	p2.name	cP
"aB"	"aC"	4
"aB"	"aJ"	2
"aB"	"aG"	4
"aC"	"aJ"	3
"aC"	"aG"	3
"aJ"	"aG"	1

Combine those two result sets!!

I worked on my query, but unfortunately could not fix it so far:

MATCH (u:AuthorTest{name:"aA"})-[:TEST_PUBLISHED_IN*2]-(coauth) 
WITH u + COLLECT (DISTINCT coauth) AS coauths UNWIND coauths AS a1  **order by count(*) LIMIT 4**
MATCH (a1)-[:TEST_PUBLISHED_IN*2]-(a2) 
WHERE a2 in coauths and id(a1)<id(a2)
WITH a1, a2, count(*) as Degree 
ORDER BY Degree DESC 
RETURN a1.name, a2.name, Degree

The part marked with needs to be added in a valid way.

intouch_vivek · ‎03-05-2020

Ah sorry it did not help you much.
Can you please try this. Although it has some duplicate records
//match(n:Author) with collect(n.name) as name unwind name as name1
MATCH (p1:Author{name:'B'})-[:PUBLISHED_BY]->(m:Publication)<-[:PUBLISHED_BY]-(p2:Author)
WITH p1, collect( distinct p2) as listCoAuthor, collect(distinct m) as commonPublications, size(collect(distinct m)) as sizecommonPublications
WHERE sizecommonPublications >= 2
unwind listCoAuthor as newlist
Match(p3:Author)-[:PUBLISHED_BY]->(m1:Publication)<-[:PUBLISHED_BY]-(p4:Author)
Where p3.name=newlist.name
with p1, p3, p4,collect(distinct p4) as cp4, size(collect(distinct m1)) as scm1
RETURN p1,p3,cp4,scm1
//order by sizecommonPublications desc
//LIMIT 25

bjoerngohlke · ‎03-10-2020

Thanks for that try!

Unfortunately, I'm not sure I can do anything with it.
The main purpose is to identify the top "x" co-authors as first step. (A limit at the end does not help me. Because those networks can be quite large)
And afterwards I'm only interested in those interactions between the top x co-authors.

Workflow once again:

Select a node by name (e.g."aA")
Select his Top-4 Co-Authors (common publication) <- (Here: "aB","aC","aJ","aG")
Calculate common publication between those co-authors from point 2.

Sorry for my unclear description!!

Some questions regarding your query:

if I interpret the results correctly, p1 is not needed. Correct?
why is cp4 a list? If there is only one element in it at a time.

Best

Neo4j

How to query for full graph of common events