cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

ArrayIndexOutOfBoundsException when loading CSV

Hi, I want to import a csv table with tweets from twitter. But I get the error message:

Failed to invoke procedure apoc.load.csv: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 4 out of bounds for length 4

My Code is:

CALL apoc.load.csv("conversations_until_2021_06_18.tsv", {
  sep: "TAB",
  arraySep: ",",
  skip: 100000,
  mapping: {
    hashtags: {array: true},
    mentions: {array: true},
    ref_id: {array: true},
    reply_count: {type: "int"},
    retweet_count: {type: "int"},
    quote_count: {type: "int"},
    like_count: {type: "int"}
    }
  }
)
YIELD map AS tweet
CREATE (t:Tweet)
SET t = tweet

Please provide the following information if you ran into a more serious issue:

I'm using Neo4j v4.3.1, Desktop v1.4.5

An example:

comment_type	conversatoin_id	text	author_id	tweet_id	ref_type	ref_id	in_reply_to_user_id	created_at	mentions	url	hashtags	like_count	quote_count	reply_count	retweet_count	reply_settings
side	1234  @url https://t.co/...   345  5678  replied_to	564465   4566   2021-04-28T15:55:42.000Z	ABaerbock, ArminLaschet	https://twitter.com/... 	NaN	0	0	0	0	everyone

One of my files works fine, but the second produces this error. According to this this question there my be a problem with a line in the file. But how to find that line? I have no idea where an array of length 4 could be.

1 ACCEPTED SOLUTION

I think I found my problem. Some of my strings ended with a Backslash \. This escapes the following tab. Using \\ seems to solve this.

View solution in original post

4 REPLIES 4

There seems to be some "fake tabs" in your file.
Can be solved by replacing them.
You can do it for example with IntelliJ, opening the file, then Replace, and substitute [ ]+ (that is, 1 or more spaces) with \t (selecting the Regex option).

Obviously, you could also delete spaces that you don't want to delete, in that case you have to replace only what you need

Yes, the error message is unclear, but unfortunately it seems to depend on an external library (http://opencsv.sourceforge.net/)

Thanks for the input. What do you mean by "fake tabs"? I have some text with spaces, so i cannot simply replace them.

I deleted some quotation marks and brackets from the file. Could this cause the problem?

I don't think the missing quotation marks cause the problem.

For "fake tabs" I mean some column separators which are not actually tab characters,
but instead of multiple white spaces.

In simple terms, if you copy and paste your example in VsCode, IntelliJ or any other editor
and try to search "\t" (using "Regular Expression" option),
you will see that some are not tab separated. For example from 1234 and @url there are 2 spaces instead of 1 tab (this causes the error).

I think I found my problem. Some of my strings ended with a Backslash \. This escapes the following tab. Using \\ seems to solve this.