Neo4j

ffoschi · ‎04-16-2021

Good morning everybody, I am trying to deploy a lambda function on AWS that updates a neo4j db hosted on my EC2 machine.
This is the code for the lambda, with some alterations due to my company's policy

import base64
import gzip
from neo4j import GraphDatabase
import transactions_lambda as txs
import json
from preprocessor_lambda import Preprocessor


def unpack(data):
    string = gzip.decompress(base64.b64decode(data))
    json_dict = json.loads(string.decode('UTF-8'))
    return json_dict

def lambda_handler(event, context):
    message = (event['Records'][0]['Sns']['Message'])
    data = unpack(message)

    print('----------MESSAGE \n\n')
    print(message)
    print('----------DATA \n\n')
    print(data)

    preprocessor = Preprocessor(data)
    events = preprocessor.preprocess_events()
    print('----------EVENTS \n\n')
    print(events)

    driver = GraphDatabase.driver("bolt://14.43.65.34:7687", auth=('neo4j', 'hello'))
    print(driver)
    
    with driver.session() as session:
        print('-----------------OPENING SESSION-----------------')
        session.write_transaction(txs1)
        session.write_transaction(tx2)

    driver.close()

    return {
        'statusCode': 200,
        'body': json.dumps('Loaded event <EVENT NAME>')
    }

Basically all the prints that happen before creating the driver show in the log, while those coming after don't. I already tried to strip this lambda down to only instantiating the driver but I get this output

Response
{
  "errorMessage": "2021-04-16T08:33:47.185Z 8f9e6616-89fb-4d7d-88c5-aee4739858da Task timed out after 3.55 seconds"
}

Moreover I tested the same exact code on my local machine and the code works flawlessly. As I can see the result of running a test transaction on my EC2 machine

----EDIT----

After increasing the timeout time for the lambda function i now get a new error:

Function Logs
START RequestId: 09235331-d238-4150-a489-5da005858596 Version: $LATEST
ciao
[ERROR] ServiceUnavailable: Timed out trying to establish connection to IPv4Address(('54.229.49.225', 7687))
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 18, in lambda_handler
    driver = GraphDatabase.driver("bolt://54.229.49.225:7687", auth=('neo4j', 'hello'))
  File "/var/task/neo4j/__init__.py", line 183, in driver
    return cls.bolt_driver(parsed.netloc, auth=auth, **config)
  File "/var/task/neo4j/__init__.py", line 196, in bolt_driver
    return BoltDriver.open(target, auth=auth, **config)
  File "/var/task/neo4j/__init__.py", line 359, in open
    pool = BoltPool.open(address, auth=auth, pool_config=pool_config, workspace_config=default_workspace_config)
  File "/var/task/neo4j/io/__init__.py", line 535, in open
    seeds = [pool.acquire() for _ in range(pool_config.init_size)]
  File "/var/task/neo4j/io/__init__.py", line 535, in <listcomp>
    seeds = [pool.acquire() for _ in range(pool_config.init_size)]
  File "/var/task/neo4j/io/__init__.py", line 549, in acquire
    return self._acquire(self.address, timeout)
  File "/var/task/neo4j/io/__init__.py", line 413, in _acquire
    connection = self.opener(address, timeout)
  File "/var/task/neo4j/io/__init__.py", line 532, in opener
    return Bolt.open(addr, auth=auth, timeout=timeout, routing_context=routing_context, **pool_config)
  File "/var/task/neo4j/io/__init__.py", line 193, in open
    s, pool_config.protocol_version, handshake, data = connect(
  File "/var/task/neo4j/io/__init__.py", line 1052, in connect
    raise last_error
  File "/var/task/neo4j/io/__init__.py", line 1042, in connect
    s = _connect(resolved_address, timeout, keep_alive)
  File "/var/task/neo4j/io/__init__.py", line 940, in _connect
    raise ServiceUnavailable("Timed out trying to establish connection to {!r}".format(resolved_address))END RequestId: 09235331-d238-4150-a489-5da005858596
REPORT RequestId: 09235331-d238-4150-a489-5da005858596	Duration: 30033.06 ms	Billed Duration: 30034 ms	Memory Size: 256 MB	Max Memory Used: 111 MB

Altough I am still able to connect to the remote Database running the same script from my local machine

mdfrenchman · ‎04-16-2021

I don't know python, and I have yet to set up Neo4j on an EC2 instance. So take this with a box of salt....

is the pool_config size set correctly?
can you ping that port on the neo4j EC2 instance succesfully?
is neo4j configured to have the bolt connection be unsecure? (do you need bolt+s or bolt+ssc?)

Anyway, hope you figure it out, or someone else that has hit this before can chime in.

Cheers,
Mike

ffoschi · ‎04-16-2021

Yes, I am able to reach the machine at it's public url via a curl request from the shell.
The bolt protocol is configured as default so it is unsecure.
The thing that bothers me the most is that this seems to be an AWS related problem as I am able to run the script from my local machine.

mdfrenchman · ‎04-16-2021

Oh good point that it's the lambda function or the AWS in general. I think I missed that you could hit the EC2 from local.

raise the connection timeout possibly? I doubt that will help though. Definitely looks like some connection boundary between the function and EC2.

Have you tried connecting from that lambda function to something else when it's deployed? might give you another data point to debug permission/connection issue.

This is 100% outside my realm of experience, so if I'm just asking stupid things you've already tried, feel free to tell me

Neo4j

Can't establish connection to Neo4j from AWS Lambda