Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-07-2023 06:00 AM - edited 01-07-2023 06:18 AM
Hello all,
I recently implemented a preferential attachment algorithm in Neo4j using the Python driver. I was inspired to do so because I wanted to simulate social media growth, where new users are more likely to connect to users with more connections (as per the Barabasi-Albert model).
I have had issues with using the APOC procedure for this purpose, so I decided to try implementing it on my own. The algorithm allows for the creation of nodes that don't have any connections, which in my use case is potentially valuable since I'm simulating social media growth where even if a user is exposed to some other user, there is a chance that they decide not to connect.
However, when I ran some analysis on the resulting graph, I noticed that this scenario appears more often than I expected. I think this may be because the probability of a node connecting to a preceding node is not properly calculated, but I'm not sure.
Another issue to note is that currently the E, the variable that keeps track of the total number of edges, is simply a sum of the n.link
property, which is twice the total number of links since each link is counted twice (once in each of the connected nodes).
I was wondering if anyone has any thoughts on how to address these issues, or if there are any other considerations I should take into account when implementing preferential attachment in Neo4j.
Thanks in advance for any feedback!
from neo4j import GraphDatabase
import random
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "network"))
# Create a seed network of nodes and relationships
with driver.session() as session:
# Create the first two nodes and connect them
result = session.run("CREATE (:Node {name: 'node1', link: 1, probability: 0.5})-[:CONNECTED_TO]->(:Node {name: 'node2', link: 1, probability: 0.5})")
# Create the rest of the nodes and set their link and probability to 0
for i in range(3, 100):
result = session.run("CREATE (:Node {name: $name, link: 0, probability: 0})", name="node" + str(i))
# Calculate the total number of links in the network
with driver.session() as session:
result = session.run("MATCH (n:Node) WHERE n.link > 0 RETURN SUM(n.link) as E")
E = result.single()["E"]
# Iterate through the nodes and update the links based on the preferential attachment probability
for i in range(3, 100):
with driver.session() as session:
# Find all the preceding nodes
result = session.run("MATCH (n:Node) WHERE n.name < $name RETURN n", name="node" + str(i))
nodes = [record["n"] for record in result]
# Iterate through the preceding nodes and update the links based on the probability
for node in nodes:
# Calculate the probability that the new node will be connected to this node
probability = node["link"] / E
# Generate a random number and compare it to the probability
if random.random() < probability:
# Increment the links on both nodes
session.run("MATCH (n:Node {name: $name}) SET n.link = n.link + 1", name=node["name"])
session.run("MATCH (n:Node {name: $name}) SET n.link = n.link + 1", name="node" + str(i))
session.run("MATCH (n:Node {name: $new_name}), (m:Node {name: $prev_name}) CREATE (n)-[:CONNECTED_TO]->(m)", new_name="node" + str(i), prev_name=node["name"])
E += 2 # Increment the total number of links in the network
# Calculate the new probability for each node
with driver.session() as session:
result = session.run("MATCH (n:Node) WHERE n.link > 0 SET n.probability = n.link / $E", E=E)
driver.close()
01-17-2023 05:20 PM
Hi @bluabaleno, what was the problem with the APOC procedure and did you already try the GDS preferentialAttachment function a try?
All the sessions of the conference are now available online