cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Counting nodes based on date

I have a set of nodes with a creationDate property that is indexed, and I am trying to count how many nodes were created for every day in a given date range, including days for which none were created. To do this for the month of December, I tried:

// Query 1
MATCH (m:item)
WHERE datetime('2019-12-01') <= m.creationDate < datetime('2020-01-01')
RETURN date(m.creationDate) AS date, COUNT(*) AS count
ORDER BY date

However, this query does not return dates for which there are no nodes. I managed to get this working:

// Query 2
WITH date('2019-12-01') AS startDate, date('2019-12-31') AS endDate
WITH startDate, duration.inDays(startDate, endDate).days AS days
WITH [day in range(0, days) | startDate + duration({days: day})] AS dates
UNWIND dates AS date
MATCH (m:item)
WHERE date(m.creationDate) = date
RETURN date, COUNT(*) AS count

The problem is that query 2 takes 27 seconds, while query 1 takes 1 second. This makes sense–query 2 is essentially running query 1 once for each day in the range.

I know I can post-process the data in my code to fill in the gaps, but I would prefer not to have to do the extra step if possible. Is there a way to do this in Cypher without the time penalty of rerunning query 1 once for each day in the range?

5 REPLIES 5

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03']) as dt 
create (:Sample { creationDate: datetime(dt) });

(Note that these are datetimes in there and not dates)

match (s:Sample) 
return distinct(date(s.creationDate)) as date, count(s) as nodesWithThisDate

Yields

"2020-01-01"    2
"2020-01-02"    1
"2020-01-03"    1

If there's a node in there with a null, it will show up and count nulls.

Thanks for the response David!

In your example, if I changed the first statement to be:

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-04']) as dt 
create (:Sample { creationDate: datetime(dt) });

then run the same query you give, the results would be:

"2020-01-01"    2
"2020-01-02"    1
"2020-01-04"    1

The results I need are:

"2020-01-01"    2
"2020-01-02"    1
"2020-01-03"    0
"2020-01-04"    1

In a relational database, this would be an OUTER JOIN operation between the aggregated data and a list of dates. In Neo4j, I'm not sure if an analogous operation is possible. Query 2 isn't the same–it's more of an iterated approach. OPTIONAL MATCH seems like the closest thing to OUTER JOIN, but

The only option I see (other than post-processing) is to create "date bin" nodes for each possible creation date:

UNWIND (['2019-12-01', '2019-12-02', ..., '2019-12-31') AS dt
CREATE (:DateBin {date: date(dt)})

Then I can establish relationships between those nodes and the item nodes:

MATCH (m:item)
WITH m
MATCH (n:DateBin {date: date(m.creationDate})
MERGE (m)-[:CREATED_ON]->(n)

Then the report can be run using those nodes:

MATCH (n:DateBin)
WHERE date('2019-12-01') <= n.date < date('2020-01-01')
WITH n
OPTIONAL MATCH (m:item)-[:CREATED_ON]->(n)
RETURN n.date AS date, COUNT(m) AS count
ORDER BY date

(I haven't actually tried this. Apologies if there are errors; I just wanted to illustrate the idea.)

The only real downside to this approach is the additional structure needed in the database to support these queries. Any chance there's an easier way that isn't O(n) on the size of the date range?

just faced the same issue and got a solution:

MATCH (m:item)
return date(m.creationDate) as date,count(*) as total
UNION
WITH date('2022-06-01') AS startDate, date('2022-07-07') AS endDate
WITH startDate, duration.inDays(startDate, endDate).days AS days
WITH [day in range(0, days) | startDate + duration({days: day})] AS dates
UNWIND dates AS date
MATCH (m:item)
where date(m.creationDate)<>date
return date,0 as total



glilienfield
Ninja
Ninja

This seems to work. It may look a little convoluted, but it does not require a second search to fill in the missing dates. Instead it does a little processing to get the missing dates and creates a map using a apoc procedure. I borrowed @Umer's logic for generating all the dates between two given dates. 

 

with date('2020-01-01') AS startDate, date('2020-01-10') AS endDate
with duration.inDays(startDate, endDate).days AS days, startDate, endDate
with [day in range(0, days) | toString(startDate + duration({days: day}))] AS allDates, startDate, endDate
match (s:Sample) where startDate <= s.creationDate <= endDate
with toString(s.creationDate) as date, count(s) as dateCounts, allDates
with collect(date) as dateList, collect(dateCounts) as countList, allDates
with [i in allDates where not i in dateList | i] as zeroDates, dateList, countList
with zeroDates, [i in zeroDates | 0] as zeros, dateList, countList
return apoc.map.fromLists(dateList + zeroDates, countList + zeros) as result

 

 Test Data (I used @daveb's script):

 

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09', '2020-01-09', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09']) as dt 
create (:Sample { creationDate: date(dt) });

 

Result:

 

{
  "2020-01-10": 0,
  "2020-01-01": 2,
  "2020-01-02": 2,
  "2020-01-03": 2,
  "2020-01-04": 0,
  "2020-01-05": 0,
  "2020-01-06": 2,
  "2020-01-07": 0,
  "2020-01-08": 0,
  "2020-01-09": 3
}

 

 

I refactored the query using a different apoc function. I think it is slightly less complicated as a result.

 

with date('2020-01-01') AS startDate, date('2020-01-10') AS endDate
WITH duration.inDays(startDate, endDate).days AS days, startDate, endDate
WITH [day in range(0, days) | toString(startDate + duration({days: day}))] AS allDates, startDate, endDate
match (s:Sample) where startDate <= s.creationDate <= endDate
with toString(s.creationDate) as date, count(s) as dateCounts, allDates
with collect([date, dateCounts]) as dateCounts, collect(date) as datesWithCounts, allDates
with [i in allDates where not i in datesWithCounts | [i, 0]] as datesWithZeroCounts, dateCounts
return apoc.map.fromPairs(dateCounts + datesWithZeroCounts) as result