Neo4j

tms · ‎04-18-2022

I want to get a DATE corresponding to the first Monday in 2020.

I expected the following Cypher to answer that:

WITH DATE({year: 2020, week: 1, weekday: 1}) as date
RETURN date

The answer that Neo4J returns is "2019-12-30" (I'm using Neo4J Enterprise v4.4.4).

That's at least unexpected, if not just plain wrong. Sorting out this sort of complexity is precisely why I want use a pre-existing function rather than rolling my own.

I expected the answer to be 2020-01-06.

What am I misunderstanding about Neo4J date handling?

What Cypher should I use to get the first Monday in 2020 (or any other year)?

glilienfield · ‎04-18-2022

It looks like neo4j is using 'week-year' for the 'week' units. The first week of the year is the first week that overlaps with January. In the case of 2020, the first week-year of 2020 is from Dec 30, 2019 - Jan 5, 2020, assuming a Monday through Sunday week; therefore, the Monday of that first week-year in 2020 is Dec 30, 2019.

The following operations seem to work to calculate the first Monday of any given month.

WITH DATE({year: 2020, month: 1, day: 1}) as date
WITH date + duration({days: 6}) as offset
RETURN date.truncate('week', offset)

tms · ‎04-19-2022

This is very helpful, and shows that I need to provide more context for the problem I'm trying solve.

I'm handling COVID data reported by each of the US states. At the moment, 26 of those states report data on "weekdays". This means that there is no data for weekends and holidays.

I have defined a "reportingDays" property (for a given year) that contains a list of the ordinal days (each an integer between 1 and 366) for which a given state reports on a given date in a given year.

I'm therefore attempting to write a Cypher query that answers that list for a given year.

The expected reportingDays list for 2020 is:

[1, 2, 3, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, ..., 363, 364, 365, 366]

There are 366 ordinal days in 2020 because it is a leap year.

Note that it begins with "1" (no dates from 2019) and ends on a Thursday (no dates from 2021). It is important that each item in the list pertain only to 2020.

I thought that I understood the "Date" behavior of Neo4J, and it's clear that I don't. Between the "aggregation" behavior of Neo4J and the behavior of its Date protocols, I'm stumped.

My next step is to try this in Python and then store the result in Neo4J.

glilienfield · ‎04-19-2022

I suggest you review the date functionality. It does support ordinal day, so you can determine the ordinal day from a date or add the ordinal days to the first day of the year to get the specific date. With a date, you can then get the day of the week. It’s pretty extensive.

If I understand your model, you will maintain for each state a list of ordinal days that state reported on. If you are interested in finding the states that reported on a specific ordinal day, you will need to interrogate each list to determine if it contains the ordinal day. If this is a requirement, you may want to consider having ‘Reporting Day’ nodes that represent each reporting day each year. You can then create a relationship between the state’s node and the reporting date node each time a state reports on that day. The benefit is you can store information about their reporting in the relationship’s properties. You can quickly find all states that reported on a specific day too.

Just a thought.

tms · ‎04-19-2022

Ah, that's an interesting thought -- a labeled node for each reporting day.

The most common use happens during ingestion of new data. The data is loaded from an aggregator (in this case, the New York Times daily covid dataset). Each record from the aggregator is used to make an instance of a labeled Datapoint node. Each Datapoint has a pertainsDate property (the date for which the data applies), a case count for that day, and labeled relationships to its countyFeature (another labelled node) and its Dataset. There is one Dataset instance for each file ingested from the aggregator. Each Dataset node has an originationDate property.

Each countyFeature has a labeled relationship to a stateFeature that contains the countyFeature. The reportingDays list is a property on each stateFeature (it will actually be a reportingDaysMap with each reportingDays list keyed by its year, but that's a different topic).

I need to compare the originationDate for each Dataset with the reportingDays for its stateFeature (special processing is required for non-reporting counties).

I had hoped to apply a restriction such as mumble.originationDate NOT IN stateFeature.reportingDays to identify nodes needing special processing.

I appreciate your attention, and apologize in advance for my thick-headedness in grokking all of this.

glilienfield · ‎04-19-2022

No worries. Reach out if you have any other specific questions.

Neo4j

Date constructor answers unexpected result