Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-07-2022 07:06 PM - edited 11-07-2022 07:08 PM
Hi,
I am working with Twitter data and creating nodes for Tweets. More specifically, each row in my CSV has 2 tweets, some tweet properties (likes, retweets, and such) and a similarity score created earlier in python. I am then connecting the tweets based on their score (since scores below a certain level were removed from the dataset, I am creating a relationship for every row in the CSV).
I do the following:
The very last step is taking insanely long. I've tried a few times now and I get a web socket error (I think my PC falls asleep and the Neo4J browser disconnects). Steps 2 and 3 take about 20 minutes each. The dataset has 230k rows and is 200 MB. Any ideas how to optimize my query?
11-08-2022 12:30 AM
Hello @NewGraphGuy 😊
You must have a UNIQUE CONSTRAINT on id property.
I also updated your queries:
// Delete Everything
CALL apoc.periodic.iterate("
MATCH (n) RETURN n
", "
DETACH DELETE n
", {batchSize: 1000, parallel: false});
// Create unique constraints
CREATE CONSTRAINT constraint_Tweet_id IF NOT EXISTS FOR (tweet:Tweet) REQUIRE tweet.TweetID IS UNIQUE;
// Create Tweets from TweetA
LOAD CSV WITH HEADERS FROM 'file:///Tweet2Tweet.csv' AS row
WITH row
WHERE NOT row.UserA IS NULL
MERGE (tweet:Tweet {TweetID: row.TweetTokenA})
SET tweet += {
Tweet: row.text_x,
AuthorToken: row.AuthorTokenA,
AuthorHandle: row.UserA,
CreatedAt: row.a_created_at,
Retweets: row.a_rt_cnt,
Replies: row.a_reply_cnt,
Likes: row.a_like_count,
Quotes: row. a_qt_count,
Aspect: row.AspectsA
};
// Create Tweets from TweetB
LOAD CSV WITH HEADERS FROM 'file:///Tweet2Tweet.csv' AS row
WITH row
WHERE NOT row.UserB IS NULL
MERGE (tweet:Tweet {TweetID: row.TweetTokenB})
SET tweet += {
Tweet: row.text_y,
AuthorToken: row.AuthorTokenB,
AuthorHandle: row.UserB,
CreatedAt: row.b_created_at,
Retweets: row.b_rt_cnt,
Replies: row.b_reply_cnt,
Likes: row.b_like_count,
Quotes: row.b_qt_count,
Aspect: row.AspectsB
};
//Create Tweet to Tweet relationship
LOAD CSV WITH HEADERS FROM 'file:///Tweet2Tweet.csv' AS row
WITH row
WHERE not row.UserB IS NULL
MATCH (a:Tweet {TweetID: row.TweetTokenA})
MATCH (b:Tweet {TweetID: row.TweetTokenB})
MERGE (a)-[r:SIMILAR_TO]-(b)
SET r += {
Score: row.SimilarityScore
};
Regards,
Cobra
11-11-2022 09:39 PM
Thanks @Cobra. I got this to work before your response by adding this line prior to each merge statement.
Neo.ClientError.Statement.SyntaxError: Invalid input 'F': expected whitespace, comment or ON (line 1, column 53 (offset: 52)) "CREATE CONSTRAINT constraint_Tweet_id IF NOT EXISTS FOR (tweet:Tweet) REQUIRE tweet.TweetID IS UNIQUE"
11-12-2022 03:15 AM
Make sure you have UNIQUE CONSTRAINTS on id properties. Moreover you should use my queries and add what you did front of them.
For the error, I don't see anything wrong so I don't know 😕
All the sessions of the conference are now available online