Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-31-2021 08:12 AM
OS: Windows Server 2019 Datacenter
Java 1.8.0_292
neo4j-jdbc-driver-4.0.4
Pentaho Data Integration 9.2
OLD version: community 4.2.13 (problem/error DOES NOT arise)
(attempt) New Version: community 4.4.2
I also tested, and the problem is also in neo4j 4.3.x
The error DOES NOT OCCUR in the neo4j browser, only when I attempt to run a job through PDI (using the neo4j jdbc driver)
The error does NOT OCCUR using the SAME jdbc driver with the older version of neo4j
PDI Transform fails with:
This feature is deprecated and will be removed in future versions. (WARNING)
Neo.ClientNotification.Statement.FeatureDeprecationWarning : Coercion of list to boolean is deprecated. Please consider using NOT isEmpty(...)
instead
I believe THIS is the statement it is complaining about:
where split(c.email,'@')[1]=~domainname and
not (domainname in publicdomains) and not (domainname in cemaildomains)
I did some searching, and tried some alternatives... Every option I tried complained that I was trying to ask things of a list, but was providing a boolean.
The original error claims I'm trying to coerce a list INTO a boolean, so I feel like I'm caught in a circular logic problem here.
Here's some of the alternatives I tried (that also failed:)
//not all(x in publicdomains WHERE x=domainname) and not all(x in cemaildomains WHERE x=domainname)
//isEmpty(domainname in publicdomains) AND isEmpty(domainname in cemaildomains)
Here's the entire query (basically trying to identify NON-public email domains, that ONLY are associated with ONE client in our CRM - essentially determining email domains that are likely unique to a particular client and then creating a relationship to :Clientdomain):
// Creating (:Clientdomain)-[:DOMAIN_OF]-(:Company) relationships for accounts that exclusively use the domain (from Contacts).
MATCH (cd:Clientdomain)-[]-(:Company)
WITH COLLECT(distinct cd.name) as cemaildomains
MATCH (c:Contact)-[:ACCOUNT_STATE]-(:Recordstatus {state:'A'})
WHERE c.email IS NOT NULL
WITH cemaildomains,collect(distinct split(c.email,"@")[1]) as emaildomains,
['gmail.com','yahoo.com','hotmail.com','aol.com','hotmail.co.uk','msn.com','comcast.net','live.com','rediffmail.com','ymail.com','outlook.com','cox.net','sbcglobal.net','verizon.net','googlemail.com','rocketmail.com','att.net','facebook.com','usinternet.com','charter.net','sky.com','earthlink.net','inbox.com','icloud.com','mail.com','qq.com','me.com','mail.com','frontiernet.net','juno.com','windstream.net','mac.com','attlocal.net','zyxel.com','airstreamcomm.net','example.com'] as publicdomains
UNWIND emaildomains as domainname
MATCH (c:Contact)-[:WORKS_FOR]-(a:Company)
where split(c.email,'@')[1]=~domainname and
not (domainname in publicdomains) and not (domainname in cemaildomains)
with domainname,collect(distinct a.name) as colcomp
MATCH (a:Company {name:colcomp[0]})
where size(colcomp)=1
MERGE (cd:Clientdomain {name:trim(domainname)})
WITH * WHERE not (cd)-[]-(:Company)
MERGE (cd)-[:DOMAIN_OF]->(a)
RETURN distinct domainname,a.name order by a.name
;
01-01-2022 09:14 PM
I don't see a syntax error. As you stated, the statement '(domain name in public domains)' works in the browser.
I did have two minor refactoring suggestions that may improve efficiency. For the first, I believe you can filter your emaildomains list to remove those values in the publicdomains and cemaildomains before the query, since the where clause that uses them is not dependent on the query; it is dependent on the values in the emaildomains list. The second, is to move the 'where size(clomp)=1' clause to above the match, so all those rows are filtered out before the match. Something like this. I don't have the data, so I was not able to test it.
MATCH (cd:Clientdomain)--(:Company)
WITH COLLECT(distinct cd.name) as cemaildomains
MATCH (c:Contact)-[:ACCOUNT_STATE]-(:Recordstatus {state:'A'})
WHERE c.email IS NOT NULL
WITH cemaildomains,
['gmail.com','yahoo.com','hotmail.com','aol.com','hotmail.co.uk','msn.com','comcast.net','live.com','rediffmail.com','ymail.com','outlook.com','cox.net','sbcglobal.net','verizon.net','googlemail.com','rocketmail.com','att.net','facebook.com','usinternet.com','charter.net','sky.com','earthlink.net','inbox.com','icloud.com','mail.com','qq.com','me.com','mail.com','frontiernet.net','juno.com','windstream.net','mac.com','attlocal.net','zyxel.com','airstreamcomm.net','example.com']as publicdomains, [i in collect(distinct split(c.email,"@")[1] where not(i in publicdomains) and not(i in cemaildomains)]as filteredemaildomains
UNWIND filteredemaildomains as domainname
MATCH (c:Contact)-[:WORKS_FOR]-(a:Company)
where split(c.email,'@')[1]=~domainname
with domainname,collect(distinct a.name) as colcomp
where size(colcomp)=1
MATCH (a:Company {name:colcomp[0]})
MERGE (cd:Clientdomain {name:trim(domainname)})
WITH * WHERE not (cd)--(:Company)
MERGE (cd)-[:DOMAIN_OF]->(a)
RETURN distinct domainname,a.name order by a.name
01-02-2022 07:45 AM
Wow - thanks for the query help! - just had to make a small adjustment to your suggestion:
WITH c,cemaildomains,
['msn.com','usinternet.com','att.net','charter.net','gmail.com','hotmail.com','yahoo.com','outlook.com','inbox.com','icloud.com','mail.com','aol.com','comcast.net','attlocal.net','zyxel.com','airstreamcomm.net','example.com'] as publicdomains,collect(distinct split(c.email,"@")[1]) as emaildomains
WITH c,publicdomains,emaildomains,cemaildomains,
[i in emaildomains where not(i in publicdomains) and not(i in cemaildomains)] as filteredemaildomains
3459ms vs 235ms <- a fantastic improvement!
It worked great in my N4j 4.2 server, unfortunately Neo4j 4.3/4.4 (via the JDBC driver within Pentaho/kettle) is still unhappy with it Here's the complete error message it throws:
2022/01/02 09:39:22 - Update (:Clientdomain)-[:DOMAIN_OF]->(:Company) (:Contact).0 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : This feature is deprecated and will be removed in future versions. (WARNING)
2022/01/02 09:39:22 - Update (:Clientdomain)-[:DOMAIN_OF]->(:Company) (:Contact).0 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : Neo.ClientNotification.Statement.FeatureDeprecationWarning : Coercion of list to boolean is deprecated. Please consider using `NOT isEmpty(...)` instead., position offset=1120, line=17, column=18
2022/01/02 09:39:22 - Update (:Clientdomain)-[:DOMAIN_OF]->(:Company) (:Contact).0 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : Unexpected error
2022/01/02 09:39:22 - Update (:Clientdomain)-[:DOMAIN_OF]->(:Company) (:Contact).0 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : org.pentaho.di.core.exception.KettleException:
2022/01/02 09:39:22 - Update (:Clientdomain)-[:DOMAIN_OF]->(:Company) (:Contact).0 - Unable to execute batch of cypher statements (1)
01-03-2022 09:29 AM
Determined this error DOES NOT occur using the Pentaho native "Execute SQL Script" with the JDBC Neo4j driver, but appears to be an issue only using the Neo4jOutput 5.0.9 plugin "Neo4j Cypher" task.
I've opened a github issue on the neo4j output repository.
All the sessions of the conference are now available online