Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-10-2019 08:25 PM
Hello,
I'm having some issues building proper relationships with neo4j and I assume this is due to my poor loading. I basically want to import something that is almost like netflow:
user, src_ip,dst_ip,protocol,packets,bytes,start_time,end_time,action,log-status
tom,10.0.1.243,10.0.1.185,88,64304,6,4,378,1763709753,1763709811,ACCEPT,OK
tom,10.0.1.185,10.0.1.243,64309,88,6,4,478,1763709753,1763709811,ACCEPT,OK
I created my nodes as follows:
MERGE(a: attribute {ip:{IP}})"
ON CREATE SET a = {ip:{IP}, user_id:{USER_ID}}"
And the relationship:
MATCH (a: attribute), (b: attribute)
WHERE a.ip = {SRC_IP} AND b.ip = {DST_IP}
MERGE (a)-[rel: flow {start_time:{START_TIME},end_time:{END_TIME}}]->(b)
ON CREATE SET rel = {
proto:{PROTO},
src_port:{SRC_PORT},
dst_port:{DST_PORT},
packets:{PACKETS},
bytes:{BYTES},
action:{ACTION}}
RETURN rel
Then I us py2neo with:
graph.schema.create_uniqueness_constraint("attribute", "ip")
Couple of problems it loads but very slow and fails after 200k relationships (there should be around 3M). With what is loaded I see the relationship type "flow" which is what I added but I can't query it in meaningful way (e.g. I can't seem to do ip.src_port any to ip.dst_port =~ "443" or find the volume of traffic between nodes etc...).
Could someone point me in the right direction ?
Thanks !
06-12-2019 07:46 AM
You should create your unique constraint prior to loading the data. The MATCH
otherwise gets slower and slower.
Also consider transaction sizes - adding 200k rels in one tx might be too large.
06-12-2019 08:28 AM
Hi @stefan.armbruster, I do not do 200k in one commits I do batch tx of 5000.
I just can't believe no one loaded netflow in neo4j I can't seem to find anything meaningful around netflow and neo4j.
All the sessions of the conference are now available online