Neo4j

richiesui · ‎12-04-2019

 I've already use APOC code to export all the data to one csv or json. But when I import to MySQL csv is fine, but after import the data looks not right. Cause node has id, so all the node data are perfect. But relationship data just have start_id and end_id etc.. So MySQL database just add a new columns to save all the relationship data.

I just use these two queries to export CSV and json data:
CALL apoc.export.json.all("/graph_db.json",{useTypes:true})
CALL apoc.export.csv.all('/graph_db.csv',{stream:true,batchSize:12000}) and also change true to false as well. But after export it's no changed.

  The json file, if I import to MySQL, can't import it. And it says "Not a valid json"

How do I export or do something can make it looks good? Should I export all the nodes as a table and export all the relationship as the other table ? Or do Neo4j has a kind of code to match with id and start_id and just append relationship data after nodes' data.
By the way, if all the problems are solved, do use the MySQL data to rebuild a graph? In this case, it can be showed on website etc..
Please help me. Thanks!

julie_fisher · ‎12-05-2019

Graph and relational databases address different needs and thus hold data differently. How you store the data in MySQL will depend on what you want to do with it and what you want to see.

The easiest one-to-one translation would be to store the nodes in one table with their properties, then the relationships (start node column, end node column) with any properties in a second table. Then you could look up nodes by node id and join to the relationship table for any relationships.

This would also allow for easy loading into a graph database. However, I wouldn't recommend that as the best solution if you want to store and reload the graph. The best way to reload the full graph would be to do a data dump. Instructions can be found here.

What's your end goal of putting the data into MySQL?

richiesui · ‎12-05-2019

Dear Julie,

Thank you for your reply.From your last email, you want to know my end goal in order to let my question be solved. Here is my end goal:

I want to export all the informations by csv or json and import them to MySQL successfully. In this case, I could load all the data from MySQL to continue Deep Learining research. And I need this step.
I just want to know that, if all the informations are exported and imported to MySQL, do use MySQL to draw a relation map?

Sincerely,

Richie Sui

julie_fisher · ‎12-06-2019

'if all the informations are exported and imported to MySQL, do use MySQL to draw a relation map?'
Essentially, yes.

I've been working on putting the Panama Papers data into Postgres so I can do some data cleaning and deduping. My experience is that you have to put the nodes with their properties in one table, then create a separate relationship table to show which nodes are related to each other.

If your relationships are directional, you don't need to worry about whether they're in the source or target column. However, if the relationships are bi-directional (the connection goes both from A to B and from B to A), you might need to duplicate some of the relationships so you can get correct joins (ie when joining nodes from the node table to the relationships table, you'd need to make sure all applicable relationships are in your foreign key column).

I'd recommend using the CSV method instead of json. CSVs are the fastest solution that I've gotten to work. I tried json a while back and I think it got removed from APOC.

To download a CSV using a cypher query to pull the nodes and properties you want, first make sure APOC is installed and enabled (let me know if you need directions on how to do this). The apoc function you want is apoc.export.csv.query(). Example using the Marvel database:

// node example
call apoc.export.csv.query("MATCH (hero:Hero) RETURN ID(hero) as unique_id, hero.name as name", "file_path_to_save.csv", {});
// edge example
call apoc.export.csv.query("MATCH (hero1:Hero)-[e:KNOWS]-(hero2:Hero) WHERE ID(hero1) < ID(hero2) RETURN distinct ID(hero1) as source, ID(hero2) as target, "file_path_to_save.csv", {});

The ID() is to have a unique identifier. If you've already got one in your node properties, use that instead because the node id doesn't always remain the same if you delete and add nodes to your graph.

For the Panama Papers, I took the CSVs and dropped them into Postgres without a problem. I think I played with MariaDB and experienced some errors there, but when you reach that point, any errors should be specific to the database and you'd need to do trouble shooting from that side.

richiesui · ‎12-06-2019

Dear Julie,
Thank you for writing back and helping me. I’ll try as what you said .

By the way, I was try to use this query to export: call apoc.export.csv.query('match (n) optional match(n)-[r]->(m) return id(n) as start_id,labels(n) as labels,n.name as nname,type(r) as rel_tpye,id(m) as end_id,labels(m) as mlabels,m.name as mname','/graph_db4.csv',{}) is that still okay? And I saw all in one table( node and relation) , but I’m still not sure is a correct way or not. Could you tell me if this query is correct or incorrect.

Sincerely,

Richie Sui

julie_fisher · ‎12-06-2019

That looks right if you're trying to put it all in one table. If you want to separate out the relationships and the nodes, you'll need to change it. Are you trying to create 2 separate tables or did you decide to put it all in the same table?

richiesui · ‎12-06-2019

Dear Julie,

Happy to get your email. My end goal is no matter one table or two, I could use MySQL query to get the values from my code in python. Then use the relationship to get end to head or head to the end easily.

Sincerely,

Richie Sui

julie_fisher · ‎12-09-2019

Then the query you put in earlier should have all the information you need and just need one table. Just keep in mind whether your relationships are directed or not and whether any of the properties may add duplicate rows.

Duplicates are easy to overlook and not always intuitive, so here's an example: if you were looking for John Smith and email addresses associated with him, if he has more than one email, he'll have more than one row in the results:
Name Email
John Smith john.smith@annon.com
John Smith jsmith@annon.com

Neo4j

I've already exported all the data to csv or json,but can't import to mysql