cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

MERGE logic with multilanguage data

ivor
Node Link

I have some data in two languages and I'm trying to write a query to add data. I already made a thread about modelling this data here: Best way to model this data

Because the data can come in either English only, Chinese only, or both, I need to add it in different ways in order to correctly merge people, but I'm not sure I'm going about this in the best way. If it makes any difference I am working in node to run the queries.

Here's a simplified example of my data coming in:.

{
  title: { zh: "《如夢之夢》", en: "A Dream Like a Dream" },
  entities: [
    {
      name: { zh: "賴聲川", en: "Stan Lai" },
      role: { zh: "編劇及導演", en: "Playwright / Director" },
    },
    {
      name: { zh: "馮蔚衡", en: "Fung Wai Hang" },
      role: { zh: "聯合導演", en: "Co-director" },
    },
    {
      name: { zh: "蘇玉華", en: "Louisa So" },
      role: { zh: "特邀主演", en: "Guest Leading Cast" },
    },
    {
      name: { zh: "雷思蘭", en: "Lui Si Lan" },
      role: { zh: "演員", en: "Cast" },
    },
  ],
}

This describes a 'Show', with a bunch of people credited in it. I want to create an Entity node for each.

For this data: If an entity with the same Chinese name but no English name exists, I want to merge and add the English name. If an entity with the same English name but no Chinese name exists, I want to merge and add the Chinese name. If an entity with the same English name but a different Chinese name exists, I want to create a new entity. Same for roles.

If my data had only Chinese:

      name: { zh: "賴聲川", en: undefined },
      role: { zh: "編劇及導演", en: undefined },

I would want to merge, but retain the English name if it already existed, simple enough. Same in reverse if I only had English. But if I do this, I will end up in some cases with two nodes for one person - sometimes they're credited only with an English name, sometimes with only a Chinese name.

Then later in the data, perhaps I will have a case where I have both their English and Chinese names together. In that case I will want to go and find those two nodes and merge them. So I found I can do something like this:

MERGE (a:Entity {name_zh: $name_en}, (b:Entity {name_en: $name_en})
CALL apoc.refactor.mergeNodes([a,b], {mergeRels: true, properties:"combine"}) YIELD node

But to be honest I'm not sure if I'm using this correctly, or if this is a 'bad idea'.

I'm having a hard time working out how to write the above logic using FOREACH and CASE as I've seen is meant to be used to do conditional logic in cypher. I'd be grateful for any help, but I will also continue trying to work it out myself - I'll add a query code example later.

3 REPLIES 3

accounts
Node Clone

given how you want to solve this problem my preference would be to deploy this proc and simply call it


    @Procedure(mode = Mode.WRITE)
    public Stream<NodeResult> createOrPerson(@Name("zh") String zh, @Name("en") String en){
        Node personNode = Optional.ofNullable(db.findNode(Label.label("Person"), "zh", zh)).orElseGet(()->db.findNode(Label.label("Person"), "en", en));
        if(null == personNode){
            personNode = db.createNode(Label.label("Person"));
        }
        personNode.setProperty("zh", zh);
        personNode.setProperty("en", en);            
        
        return Stream.of(new NodeResult(personNode));

    }

as a side note i'd look at the movies database to get a good idea of a structure. ( your example seems to be of a similar domain )

Thank you for your help. I have looked at the movies database, and I have been playing with the node / react example app since it's very close to my intended use and a language/framework I know: https://github.com/neo4j-examples/neo4j-movies-template

However the way it seems to build queries is not at all like what you seem to be using in java - their example has cypher queries as strings:

var rate = function (session, movieId, userId, rating) {
  return session.writeTransaction((txc) =>
    txc.run(
      "MATCH (u:User {id: $userId}),(m:Movie {id: $movieId}) \
      MERGE (u)-[r:RATED]->(m) \
      SET r.rating = $rating \
      RETURN m",
      {
        userId: userId,
        movieId: parseInt(movieId),
        rating: parseInt(rating),
      }
    )
  );
};

This made me doubt whether I should be trying to write my queries in cypher, or whether I can write my logic in javascript. I'm sure the answer should be to read the documentation, but I currently remain confused.

Cypher is awesome, but when you have queries that get complicated and involve too many steps i find that it easier to maintain and understand it in other forms other than cypher ( my 2 cents anyway )

i've written long winded queries and without fail when i look at it a few months down the road, i ahve no clue what i was thinking