Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-15-2022 05:57 AM
Hi all!
I'm facing a performance issue on a rather small graph (~30K nodes and ~270K relationships).
I'm exposing the database through Spring Data Rest using spring 2.6.2, neo4j version: 4.4.2-community
The source code is freely available at Jedidex / public API · GitLab
For example, If I query the apparencies of a character, it can take up to 30 seconds to give a response. Meanwhile, if the same query get executed directly on the database it is almost instantaneous.
For reference, this is the Character model:
package com.holodex.publicapi.model.resource.character;
@Node("Character")
public class Character extends SWElement {
private String gender;
private Double height;
private Double mass;
private String hair;
private String eyes;
private String skin;
private String cyber;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "BORN_IN")
private SWElement homeWorld;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "OF_SPIECES")
private SWElement species;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "AFFILIATED_TO")
private Set<SWElement> affiliation;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "APPRENTICE_OF")
private Set<SWElement> masters;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "MASTER_OF")
private Set<SWElement> apprentices;
@JsonSerialize(contentAs = SWElement.class)
@Relationship(type = "APPEARS_IN")
private Set<SWElement> appears_in;
}
To keep things lightweight I expect each relationship to be of kind "SWElement" which is the supertype of every entity. It does not have any relationship.
An example of a result could be found here: https://api.jedidex.com/v1/characters/452217/appears_in, this returns ~350 elements which are the various media in which Luke Skywalker appears.
This request is performed by the following method in the Neo4jRepository:
package com.holodex.publicapi.repository.character;
@Override
@Query("MATCH path=(n:Character {element_id:$id})-[r]-(x) RETURN n, collect(nodes(path)), collect(relationships(path))")
Optional<Character> findById(Integer id);
The element_id property is indexed, I think is more of a object-mapping problem since the query is quite fast if executed on neo4j. I have no clue on how to optimize this case
I'm using Spring Rest Data since the goal is to have all the kinds of resources exposed by this API without writing too much of boilerplate code.
Thanks in advice
Niko
02-22-2022 02:05 AM
The problem for me is to get the data
I see the problem that you are somehow limited to Spring Data Rest structure and need to use the findById
.
My first thoughts/questions are:
I think MATCH (n:Character {element_id:$id})-[r]-(x) RETURN n, collect(r), collect(x)
would already improve the mapping because the collect(nodes(path))
might contain duplicates and SDN needs its time to filter out the already mapped / uninterested data.
02-23-2022 05:07 AM
Hi @gerrit.meier
Thanks for the kind answer!
Well, I was using a path because of an example found online, my bad I guess
I tried to use your suggested query but still, I experience poor performances.
To answer your second question, I used a custom query because otherwise it will fetch the whole database
Foor seek of completeness, I have deployed a test database (containing all the 30K nodes and relationships) at the following url:
the database is exposed at: bolt://jedidex.com:8087
using neo4j as both username and password, maybe it can be useful
here there is the explorer https://neo4test.jedidex.com/
To execute the query you mentioned it took around 72ms
Started streaming 1 records after 5 ms and completed after 72 ms.
02-23-2022 08:42 AM
Please profile/explain clause of the query in neo4j explorer or bloom.It will give you the amount of actual work done by Graph DB engine in term of number of hits per millisecond.Please check if you have done indexing correctly.
This will give you enough clue to optimise your query.
Many thanks
Sameer S Gijare
02-23-2022 08:50 AM
Hi @sameer.gijare14
thanks for answering
Wel, I have ran the profile and it says that I got 19099 hints in 59 milliseconds.
Those are the index that I have if I execute the :shema commands
As I said the query is quite fast, or at least, it seems to be fast. But using the API it take quite some time to show some results
02-23-2022 11:34 AM
The bottle neck is in the concrete class determination of the mapping bits. And with the SWElement
you are really challenging SDN
But hey, challenge accepted: Inheritance determination performance improvements · Issue #2487 · spring-projects/spring-data-neo4j...
In ~30 minutes there should be a snapshot available 6.2.3-GH-2487-SNAPSHOT
that should improve the performance a lot. The request for "Luke" went down to sth. 1 - 1.5 but this was also with a profiler running.
Would be great if you could give us feedback here or (even better) on the issue, if you also have a GitHub account.
Your pom should then contain this:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-neo4j</artifactId>
<version>6.2.3-GH-2487-SNAPSHOT</version>
</dependency>
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>
</repositories>
All the sessions of the conference are now available online