cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

APOC s3 url isnt working for me

dbeaumon
Node Link

I'm trying to use APOC to load a CSV load from S3. I cannot make this happen.

My setup:
neo 3.6.3 (docker) - tried both enterprise and community
apoc 3.5.0.4

The plugins directory (per the documentation) contains:
apoc-3.5.0.4-all.jar
aws-java-sdk-core-1.11.250.jar
aws-java-sdk-s3-1.11.250.jar
httpclient-4.5.4.jar
httpcore-4.4.8.jar
joda-time-2.9.9.jar

I can call dbms.procedures() and see that APOC is being loaded successfully upon startup. I am setting NEO4J_apoc_import_file_enabled=true, and I can test against a local file successfully (CALL apoc.load.csv('/test.csv') yield lineNo, map, list RETURN *).

When I try to do the same via S3 using the format - s3://accessKey:secretKey@endpoint:port/bucket/key

CALL apoc.load.csv("s3://MY_URL_STRING") yield lineNo, map, list RETURN *;

All I get is a variation on the theme of "cannot find the file locally in your import directory":
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.load.csv: Caused by: java.io.FileNotFoundException: /import/mybucket/misc/test.csv (No such file or directory)

What am I missing?

EDIT:
A poke through the code and I'd say that FileUtils.isFile should probably detect S3 as "not a file" so that the url doesn't get mangled by changeFileUrlIfImportDirectoryConstrained if you have apoc.import.file.use_neo4j_config set to true (otherwise you have to set it to false to make this work as far as I can tell, which may not be what is desired).

I think a larger question that I have is this - is it a design decision to force the static credentials provider (via URL parsing, meaning S3 access only comes via a client providing their creds), or could I:
a) Add a config parameter that allowed server-config-based fall-through for s3
b) Allow an S3 URL without credentials, and a fall through to allow the server to use the default credentials provider, thus allowing environment variables, creds files, instance profiles etc

I agree it's not the best idea by default (hence enabling it), but for some POC work it would make my life easier.

If I would be doing a pull request, I'm probably going to do all of the above, and I would rather not write the code if it is going to be rejected anyway

2 REPLIES 2

although this does not describe APOC it does detail how to get LOAD CSV to read from S3 https://neo4j.com/developer/kb/load-csv-data-from-csv-files-on-aws-s3-bucket/

does this suffice?

Not really (although it is a neat trick regardless), since I was thinking of both reading from and writing to an S3 bucket and this only covers reading (via http).

I'll just fork the APOC core and make it do what I want for my POC for the time being - I'm comfortable in the AWS libraries and I'm not sure anyone should be following my example on this one anyway