Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-17-2021 01:53 PM
I am using Neo4j for analyzing LDAP servers, and would appreciate some help / feedback on the data model as well as the query for building relationships.
Currently, I am creating nodes for each LDAP entry with a label equal to its objectClass
attribute (e.g. organizationalUnit
, user
, group
, etc.).
The problem is, I want to create relationships between nodes based on the RDN (e.g. (OU=Users,DN=foo
)-[:CONTAINS
]->(CN=Bob,OU=Users,DN=foo
). So far I have this:
EXPLAIN MATCH (child)
WHERE child
WITH
child,
substring(reduce(parent_dn = "", rdn IN tail(split(child.dn, ",")) | parent_dn + "," + rdn), 1) as parent_dn
MATCH (parent {dn: parent_dn})
MERGE (parent)-[:CONTAINS]->(child)
RETURN count(parent);
Which is obviously not good since it doesn't specify labels so doesn't use indices.
I'm stuck on how I should optimize this because of that. Should I not use objectClass
as a label since I have maybe 20 unique values? Should I just use one label for everything?
I originally thought it would be good to use objectClass
as a label since Neo4j would color them differently and allow me to differentiate between types.
05-17-2021 02:07 PM
I'm far from an LDAP master but whenever you have embedded strings like this, your data model probably isn't right. You shouldn't ever need to parse text in cypher as you're doing right now. This is a strong indication that what you need is 3 different properties and possibly label types.
For example, what you reference as (CN=Bob,OU=Users,DN=foo)
, you could consider modeling this as:
(c:CN { id: "Bob" }), (o:OU { id: 'Users' }), (d:DN { id: 'foo' }), (entry)-[:REF]->(c), (entry)-[:REF]->(o), (entry)-[:REF]->(d)
In other words, if you can do that text parsing, do it once upfront when you load the model, and then never again
05-17-2021 11:05 PM
Hi David,
This is exactly the right approach. I want to emphasize the fact that the underlying benefit of this approach is 'Scalability'.
05-18-2021 05:38 AM
@david.allen @ameyasoft The entire DN (CN=Bob,OU=Users,DN=foo
) is actually what is the "ID" in the sense that CN=Bob,OU=Users,DN=foo
should be unique across all types of nodes (not just CN), but "Bob" isn't necessarily unique across anything, even CN. There could be CN=Bob,OU=Users,DN=foo
as well as CN=Bob,OU=SomethingElse,DN=foo
. Are you mainly just suggesting to use CN / OU / DN as node labels, and I could do (c:CN { dn: 'CN=Bob,OU=Users,DN=foo' }), (o:OU { dn: 'OU=Users,DN=foo' }), (d:DN { dn: 'DN=foo' }), (entry)-[:REF]->(c), (entry)-[:REF]->(o), (entry)-[:REF]->(d)
so they can be indexed by dn
(formerly id
) / guaranteed to be unique? Or is there a benefit to keeping (c:CN { id: 'Bob' })
instead of the whole DN?
Also, regarding properties, I'm not sure how I would store them as such. The DN (the entire string) is kind of like a URL / path in the sense that the order matters. I may have OU=foo and OU=bar for one node, but OU=foo,OU=bar
is totally different from OU=bar,OU=foo
. Think of it like components to a URL.
The relationships which would be created are really related to substrings. CN=Bob,OU=Users,DN=foo
is part of OU=Users,DN=foo
which is part of DN=foo
.
All the sessions of the conference are now available online