cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Seeking feedback on preferential attachment implementation using Neo4j and Python

bluabaleno
Node Clone

Hello all,

I recently implemented a preferential attachment algorithm in Neo4j using the Python driver. I was inspired to do so because I wanted to simulate social media growth, where new users are more likely to connect to users with more connections (as per the Barabasi-Albert model).

I have had issues with using the APOC procedure for this purpose, so I decided to try implementing it on my own. The algorithm allows for the creation of nodes that don't have any connections, which in my use case is potentially valuable since I'm simulating social media growth where even if a user is exposed to some other user, there is a chance that they decide not to connect.

However, when I ran some analysis on the resulting graph, I noticed that this scenario appears more often than I expected. I think this may be because the probability of a node connecting to a preceding node is not properly calculated, but I'm not sure.

Another issue to note is that currently the E, the variable that keeps track of the total number of edges, is simply a sum of the n.link property, which is twice the total number of links since each link is counted twice (once in each of the connected nodes).

I was wondering if anyone has any thoughts on how to address these issues, or if there are any other considerations I should take into account when implementing preferential attachment in Neo4j.

Thanks in advance for any feedback!

from neo4j import GraphDatabase
import random

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "network"))

# Create a seed network of nodes and relationships
with driver.session() as session:
    # Create the first two nodes and connect them
    result = session.run("CREATE (:Node {name: 'node1', link: 1, probability: 0.5})-[:CONNECTED_TO]->(:Node {name: 'node2', link: 1, probability: 0.5})")

    # Create the rest of the nodes and set their link and probability to 0
    for i in range(3, 100):
        result = session.run("CREATE (:Node {name: $name, link: 0, probability: 0})", name="node" + str(i))

# Calculate the total number of links in the network
with driver.session() as session:
    result = session.run("MATCH (n:Node) WHERE n.link > 0 RETURN SUM(n.link) as E")
    E = result.single()["E"]

# Iterate through the nodes and update the links based on the preferential attachment probability
for i in range(3, 100):
    with driver.session() as session:
        # Find all the preceding nodes
        result = session.run("MATCH (n:Node) WHERE n.name < $name RETURN n", name="node" + str(i))
        nodes = [record["n"] for record in result]

        # Iterate through the preceding nodes and update the links based on the probability
        for node in nodes:
            # Calculate the probability that the new node will be connected to this node
            probability = node["link"] / E

            # Generate a random number and compare it to the probability
            if random.random() < probability:
                # Increment the links on both nodes
                session.run("MATCH (n:Node {name: $name}) SET n.link = n.link + 1", name=node["name"])
                session.run("MATCH (n:Node {name: $name}) SET n.link = n.link + 1", name="node" + str(i))
                session.run("MATCH (n:Node {name: $new_name}), (m:Node {name: $prev_name}) CREATE (n)-[:CONNECTED_TO]->(m)", new_name="node" + str(i), prev_name=node["name"])
                E += 2  # Increment the total number of links in the network

# Calculate the new probability for each node
with driver.session() as session:
    result = session.run("MATCH (n:Node) WHERE n.link > 0 SET n.probability = n.link / $E", E=E)

driver.close()
1 REPLY 1

jalakoo
Node Clone

Hi @bluabaleno, what was the problem with the APOC procedure and did you already try the GDS preferentialAttachment function a try?

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online