Neo4j

silvio_jonas · ‎02-18-2022

I have a csv like the following:

And I want to query it filtering dates. But the column 'Data' which are dates, contains something like 2016-01-01 06:47:00 which is a date plus a timestamp. I have no idea on how to query it filtering the dates only.

Are there some clause like
MATCH ()
WHERE date(2016-01-01) ?

Cobra · ‎02-18-2022

Hello @silvio.jonas

You first need to parse the date then convert ms as datetime with datetime() function then you use date() function on it to compare the date with another date.

WITH apoc.date.parse("2016-01-01 06:47:00", "ms", "YYYY-MM-dd HH:MM:SS") AS ms
WITH datetime({epochmillis: ms}) AS datetime
RETURN date(datetime) <= date("2016-01-01")

Regards,
Cobra

silvio_jonas · ‎02-18-2022

it returns a table with True and Falses. How can I create one graph for each unique date?
In other worlds: can I write a filter with something like FOR each date WHERE date is unique RETURN etc?

If you look at the data, you would see that what I want is a graph for 2016-01-01 , another one for 2016-01-02 , and another one for 2016-01-03 etc.

Is it possible? If so, what is the way?

Cobra · ‎02-18-2022

You can use DISTINCT + collect():

WITH DISTINCT date, collect(row) AS rows

Like this you will have one unique date per row and the list of items in the list.
I hope it helps.

silvio_jonas · ‎02-18-2022

Sorry, I don't get it.
Where in the query should WITH DISTINCT go?

My first attempt was this:

LOAD CSV WITH HEADERS FROM 'file:///2022-02-10__movimentacao_regular.csv' AS row

MERGE (conta:Conta {nome: row.Conta Origem})
MERGE (contad:Conta {nome: row.Conta Destino})
SET conta:contad
MERGE (conta)-[op:Movimentação]->(contad)
ON CREATE SET op.Valor = toFloat(row.Valor), op.Tipo de Gasto = row.Tipo de Gasto, op.Tipo de movimentacao=row.Tipo de movimentacao;

And it does work,, giving me a graph but it completely ignores the dates (expected since I never coded anything referring to the dates). But now, I want to split the data from the csv file into various graphs each one for a distinct date. Where, in my query, should I put and what is the clause (SELECT, WHERE, DISTINCt?) that I should use?

glilienfield · ‎02-18-2022

One good thing about representing dates in yyyy-mm-dd format is that the dates are in both chronological and alphabetical order. As such, you can compare them with string comparison and don’t need to convert them to a date. There is a benefit of converting them though, as it guarantees a valid date. If that is not an issue, you can consider just sticking with strings. You will need to strip off the date and store it as it’s own property.

Cobra · ‎02-18-2022

Can you give us an example of what you would like to get as a result?

silvio_jonas · ‎02-18-2022

Yes, of course.

This is the output I have:

Note that there is no filter on dates. From the dataset, there are different dates where 'movimentação' was made between accounts (the nodes).

I want to filter the dates, and, for example, for date 2016-01-01 I want this graph:

That's the first two lines from the dataset. And for date 2016-01-02 I want this as the output:

etc

This was imputed by hand, with code like this:

CREATE (conta1: Conta {nome: 'Conta1'})
CREATE (conta25: Conta {nome: 'Conta25'})
CREATE (conta5001: Conta {nome: 'Conta5001'})
CREATE
(conta1)-[:movimentação {tipo:'pagamento', valor: 10, gasto:'padaria'}]->(conta25),
(conta1)-[:movimentação {tipo:'transferência', valor: 86, gasto:'transfer_50001_out'}]->(conta5001)

WITH conta1 AS a
MATCH (a)-[:movimentação]->(m) RETURN a, m;

but this is cumbersome and not a good approach for thousands of line. Thus, what I want is a script that will read each line from the csv file, and separate the dates creating a graph for each date. Thus, if there are n different dates the outcome would be n different graphs

My problem is essentially: I don't know cypher very well, it means that it is very hard for me to find the right functions that can do what I want. I mean: are there something like WHERE clause, or DISTINCT etc that can do this? if so, how? The examples I found on neo4j website didn't worked (or I didn't understand how to use...)

glilienfield · ‎02-18-2022

If I understand this, each row looks to represent a transaction on a date between two entities ('Conta Origem' and 'Conta Destino'). The columns 'tips de gusto', 'valor', and 'tips de movimentacao' are attributes of the transaction. If this is correct, you can add the date as an attribute of the relationship and use it in the where clause to filter for transactions that occurred on your target date, or between two dates For performance, you can add an index for the relationship type and date property.

I would strip away the time component and store the date in YYYY-MM-DD format, so you can query on it.

Does this sound like what you are looking for?

glilienfield · ‎02-18-2022

Another option, is to use a node to represent the transaction, and have the two entities connected to it. The transaction attributes would be properties of the 'Transaction' node. You could either put the date as a property of the 'Transaction' node, or create 'Date' nodes, and connect the 'Transaction' nodes to its corresponding 'Date' node. Then you could query for something like this:

match (d:Date)<-[:OCCURED_ON]-(t:Transaction)<-[:BELONGS_TO]-(c:Conta)
where d.date = '2021-10-05'

You would get all transactions occurring on that day connected as a subgraph.

silvio_jonas · ‎02-18-2022

yes, this sounds like what I am looking for. You understand correctly the whole thing (indeed, the data are transactions between two entities and everything you described is what I am building).

The idea of adding the date as attribute of the relationship is good. May I use apoc.date.parse for this stuff?

something like:

MERGE (conta)-[op:Movimentação]->(contad)
ON CREATE SET op.Valor = toFloat(row.Valor), op.Tipo de Gasto = row.Tipo de Gasto, op.Tipo de movimentacao=row.Tipo de movimentacao, op.Data = apoc.date.parse(row.Data);
?

I mean, I need something else, or this is not what you were suggesting?

glilienfield · ‎02-18-2022

That is one approach you can look into. You can also just simply truncate the timestamp you have and use it as a string. This works because it is in YYYY-MM-DD format, so the string 'YYYY-MM-DD' is also in alphabetic order; therefore, you can do something like n.date = '2021-01-01' or n.date < '2021-01-01'. You need to remember to format your literal string values padded with zero. The following would not work n.date < '2021-1-12'

A solution would be to store the timestamp in a property 'timestamp' (if you want to retain the exact time of the transaction) and set another property 'date' that has only the date component stored as string in 'YYYY-MM-DD' format. You can get that from your spreadsheet by something like:

set n.date = left(row.data, 10)

glilienfield · ‎02-18-2022

MERGE (conta)-[op:Movimentação]->(contad)
ON CREATE SET op.Valor = toFloat(row.Valor), op. Tipo de Gasto = row. Tipo de Gasto , op. Tipo de movimentacao =row. Tipo de movimentacao , op.date = left(row.Data, 10), op.timestamp = row.Data;

if you want the original timestamp for auditing.

You should also try not to use property names that have spaces. Use underscores or camel case if want to retain the original names.

Neo4j

Filter date where data also contains timestamp