Neo4j

damisg7 · ‎01-12-2023

Assuming that I want to create nodes for each CPE (Common Platform Enumeration from NVD) and get the data from the official API of NVD. Is there a way to run concurrently this task?

I already created a constraint on the ID and saw a big difference but it's still very slow.

WITH $APIurl + '?' + $ParameterName + '=' + $ParameterValue AS url,
$AuthKey AS apiKey

call apoc.load.jsonParams(url, { apiKey : apiKey }, null ) yield value

UNWIND value.products AS cpes_values
MERGE (cpe:CPE { cpeNameId:cpes_values.cpe.cpeNameId })
  ON CREATE SET cpe.uri = cpes_values.cpe.cpeName,
cpe.created = cpes_values.cpe.created,
cpe.lastModified = cpes_values.cpe.lastModified

glilienfield · ‎01-12-2023

There nothing native to cypher that I am aware of. You could try to use apoc.periodic.iterate as shown below. It can be configured to execute items in parallel.

WITH $APIurl + '?' + $ParameterName + '=' + $ParameterValue AS url, $AuthKey AS apiKey
CALL apoc.periodic.iterate(
"
  call apoc.load.jsonParams($url, { apiKey : $apiKey }, null ) yield value
  UNWIND value.products AS cpes_values
  RETURN cpes_values
",
"
  MERGE (cpe:CPE { cpeNameId:cpes_values.cpe.cpeNameId })
  ON CREATE SET cpe.uri = cpes_values.cpe.cpeName,
  cpe.created = cpes_values.cpe.created,
  cpe.lastModified = cpes_values.cpe.lastModified
",
  {batchSize:10000, parallel:true, params:{url: url, apiKey: apiKey}}) yield total, timeTaken
return *

damisg7 · ‎01-19-2023

Thanks. I saw difference like 20sec but still to have 10.000 CPEs through the API calls is getting a lot of time and not like 4-5 seconds. I played with the batch size as the default size of the return is 10.000 records so the batchSize: 10.000 was the same as 1 return.

damisg7 · ‎01-19-2023

Update: I added the parameter concurrency: 200. The result was significantly better.

With a batchSize of 5000 and concurrency: 200, I need only 5-6 seconds to pass 10.000 records as nodes.

Neo4j

Is there a way to run concurrently a query that handles API data?