cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Is there a way to run concurrently a query that handles API data?

damisg7
Node Clone

Assuming that I want to create nodes for each CPE (Common Platform Enumeration from NVD) and get the data from the official API of NVD. Is there a way to run concurrently this task?

I already created a constraint on the ID and saw a big difference but it's still very slow.

WITH $APIurl + '?' + $ParameterName + '=' + $ParameterValue AS url,
$AuthKey AS apiKey

call apoc.load.jsonParams(url, { apiKey : apiKey }, null ) yield value

UNWIND value.products AS cpes_values
MERGE (cpe:CPE { cpeNameId:cpes_values.cpe.cpeNameId })
  ON CREATE SET cpe.uri = cpes_values.cpe.cpeName,
cpe.created = cpes_values.cpe.created,
cpe.lastModified = cpes_values.cpe.lastModified

 

3 REPLIES 3

There nothing native to cypher that I am aware of. You could try to use apoc.periodic.iterate as shown below.  It can be configured to execute items in parallel.

WITH $APIurl + '?' + $ParameterName + '=' + $ParameterValue AS url, $AuthKey AS apiKey
CALL apoc.periodic.iterate(
"
  call apoc.load.jsonParams($url, { apiKey : $apiKey }, null ) yield value
  UNWIND value.products AS cpes_values
  RETURN cpes_values
",
"
  MERGE (cpe:CPE { cpeNameId:cpes_values.cpe.cpeNameId })
  ON CREATE SET cpe.uri = cpes_values.cpe.cpeName,
  cpe.created = cpes_values.cpe.created,
  cpe.lastModified = cpes_values.cpe.lastModified
",
  {batchSize:10000, parallel:true, params:{url: url, apiKey: apiKey}}) yield total, timeTaken
return *

Thanks. I saw difference like 20sec but still to have 10.000 CPEs through the API calls is getting a lot of time and not like 4-5 seconds. I played with the batch size as the default size of the return is 10.000 records so the batchSize: 10.000 was the same as 1 return.

Update: I added the parameter concurrency: 200. The result was significantly better.

With a batchSize of 5000 and concurrency: 200, I need only 5-6 seconds to pass 10.000 records as nodes.