Neo4j

mike2 · ‎06-09-2020

I have been working on converting my project from SQL to Neo4j and have gotten all my tests running and everything working on NEO4j Desktop on my local machine use 4.0.4, everything works fine.

I have now been trying for several weeks to get it running on a server so I can actually integrate with my application and I have run in to nothing but trouble.

First I tried to use Graphene, but they don't support 4.x yet, so I thought I would use Aura. There was nothing in the documentation about the version Aura was running and I made the assumption it was 4.x (I know, I know that's what I get for assuming). After much pain I got some of the stuff working on Aura, but a number of my queries are using 4.x only cypher syntax apparently.

So the next thing I tried was using the AWS Marketplace Cloud Formation Template, Well that doesn't work either. Every time you run it you get:

The following resource(s) failed to create: [WaitOnPasswordReset]. . Rollback requested by user.
2020-06-09 12:10:28 UTC-0500	WaitOnPasswordReset	CREATE_FAILED	WaitCondition timed out. Received 0 conditions when expecting 1

So now I have several months into this project and I don't know where to go next, as I have no way to actually start a server.

Is 4.x not production ready? What am I doing wrong?

Please note I am a developer and not an ops guy.

david_allen · ‎06-09-2020

Sorry you're running into trouble here.

What that error message you're seeing means is that the cluster is not starting correctly, so the CloudFormation deploy process cannot succeed. To diagnose what's gone wrong, you need to look in the CloudFormation stack logs, and see events to find out what failed. If you could post some updates here on what's not working, maybe we can help further. There are a lot of reasons things could fail; for example, you could fail to create the VM because of a quota issue, or because something is misconfigured. The best way to figure this out is to just get a dump of the CloudFormation log events and see what's there, and where to go next.

Also, do please specify exactly how you're launching it, whether from the marketplace, or from templates that you downloaded

mike2 · ‎06-09-2020

Hello,

I was running the AWS Marketplace install of a casual cluster
https://aws.amazon.com/marketplace/pp/B07D441G55?qid=1591746181777&sr=0-2&ref_=srh_res_product_title

Timestamp	Logical ID	Status	Status reason
2020-06-09 12:17:33 UTC-0500	GTFNEO4J	ROLLBACK_COMPLETE	-
2020-06-09 12:17:32 UTC-0500	VPC	DELETE_COMPLETE	-
2020-06-09 12:17:32 UTC-0500	InternetGateway	DELETE_COMPLETE	-
2020-06-09 12:17:17 UTC-0500	InternetGateway	DELETE_IN_PROGRESS	-
2020-06-09 12:17:17 UTC-0500	VPC	DELETE_IN_PROGRESS	-
2020-06-09 12:17:16 UTC-0500	AttachGateway	DELETE_COMPLETE	-
2020-06-09 12:17:09 UTC-0500	DNSZone	DELETE_COMPLETE	-
2020-06-09 12:15:39 UTC-0500	Subnet1	DELETE_COMPLETE	-
2020-06-09 12:15:39 UTC-0500	Subnet2	DELETE_COMPLETE	-
2020-06-09 12:15:38 UTC-0500	Subnet0	DELETE_COMPLETE	-
2020-06-09 12:15:27 UTC-0500	ReadOwnTags	DELETE_COMPLETE	-
2020-06-09 12:15:25 UTC-0500	sgNeo4jEnterprise	DELETE_COMPLETE	-
2020-06-09 12:15:25 UTC-0500	ReadOwnTags	DELETE_IN_PROGRESS	-
2020-06-09 12:15:24 UTC-0500	instProfNeo4jEnterprise	DELETE_COMPLETE	-
2020-06-09 12:15:24 UTC-0500	StackTokenWaitHandle	DELETE_COMPLETE	-
2020-06-09 12:15:23 UTC-0500	Subnet1	DELETE_IN_PROGRESS	-
2020-06-09 12:15:23 UTC-0500	sgNeo4jEnterprise	DELETE_IN_PROGRESS	-
2020-06-09 12:15:23 UTC-0500	instProfNeo4jEnterprise	DELETE_IN_PROGRESS	-
2020-06-09 12:15:23 UTC-0500	StackTokenWaitHandle	DELETE_IN_PROGRESS	-
2020-06-09 12:15:23 UTC-0500	Subnet2	DELETE_IN_PROGRESS	-
2020-06-09 12:15:23 UTC-0500	Neo4jServer1	DELETE_COMPLETE	-
2020-06-09 12:15:23 UTC-0500	Neo4jServer2	DELETE_COMPLETE	-
2020-06-09 12:15:22 UTC-0500	Subnet0	DELETE_IN_PROGRESS	-
2020-06-09 12:15:22 UTC-0500	Neo4jServer0	DELETE_COMPLETE	-
2020-06-09 12:14:36 UTC-0500	Neo4jServer1	DELETE_IN_PROGRESS	-
2020-06-09 12:14:36 UTC-0500	DNSZone	DELETE_IN_PROGRESS	-
2020-06-09 12:14:36 UTC-0500	Neo4jServer2	DELETE_IN_PROGRESS	-
2020-06-09 12:14:36 UTC-0500	Neo4jServer1DNS	DELETE_COMPLETE	-
2020-06-09 12:14:35 UTC-0500	Neo4jServer2DNS	DELETE_COMPLETE	-
2020-06-09 12:14:35 UTC-0500	Neo4jServer0	DELETE_IN_PROGRESS	-
2020-06-09 12:14:35 UTC-0500	Neo4jServer0DNS	DELETE_COMPLETE	-
2020-06-09 12:11:18 UTC-0500	NetworkAcl	DELETE_COMPLETE	-
2020-06-09 12:11:18 UTC-0500	RouteTable	DELETE_COMPLETE	-
2020-06-09 12:11:18 UTC-0500	NetworkAcl	DELETE_IN_PROGRESS	-
2020-06-09 12:11:18 UTC-0500	RouteTable	DELETE_IN_PROGRESS	-
2020-06-09 12:11:17 UTC-0500	AttachGateway	DELETE_IN_PROGRESS	-
2020-06-09 12:11:17 UTC-0500	Int3NetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SubnetNetworkAclAssociation1	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	HTTPSIngressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	Neo4jHTTPSIngressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	InboundResponsePortsNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SSHEgressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SSHIngressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	Neo4jHTTPSEgressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	HTTPIngressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	Int2NetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	Int1NetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SubnetNetworkAclAssociation0	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SubnetNetworkAclAssociation2	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	HTTPEgressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	BoltIngressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SubnetRouteTableAssociation0	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	HTTPSEgressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	BoltEgressNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	OutBoundResponsePortsNetworkAclEntry	DELETE_COMPLETE	-
2020-06-09 12:11:17 UTC-0500	SubnetRouteTableAssociation1	DELETE_COMPLETE	-
2020-06-09 12:11:16 UTC-0500	SubnetRouteTableAssociation2	DELETE_COMPLETE	-
2020-06-09 12:11:16 UTC-0500	Route	DELETE_COMPLETE	-
2020-06-09 12:11:01 UTC-0500	SSHIngressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Neo4jServer2DNS	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SSHEgressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Neo4jHTTPSEgressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	WaitOnPasswordReset	DELETE_COMPLETE	-
2020-06-09 12:11:01 UTC-0500	Int1NetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetNetworkAclAssociation0	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetRouteTableAssociation0	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	HTTPSIngressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Int3NetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetRouteTableAssociation1	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Neo4jHTTPSIngressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Int2NetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetNetworkAclAssociation1	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	BoltIngressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	HTTPIngressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Neo4jServer1DNS	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	HTTPEgressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	WaitOnPasswordReset	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Neo4jServer0DNS	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	HTTPSEgressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	InboundResponsePortsNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	BoltEgressNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetNetworkAclAssociation2	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	OutBoundResponsePortsNetworkAclEntry	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	SubnetRouteTableAssociation2	DELETE_IN_PROGRESS	-
2020-06-09 12:11:01 UTC-0500	Route	DELETE_IN_PROGRESS	-
2020-06-09 12:10:29 UTC-0500	GTFNEO4J	ROLLBACK_IN_PROGRESS	The following resource(s) failed to create: [WaitOnPasswordReset]. . Rollback requested by user.
2020-06-09 12:10:28 UTC-0500	WaitOnPasswordReset	CREATE_FAILED	WaitCondition timed out. Received 0 conditions when expecting 1
2020-06-09 11:40:12 UTC-0500	Neo4jServer0DNS	CREATE_COMPLETE	-
2020-06-09 11:40:12 UTC-0500	Neo4jServer2DNS	CREATE_COMPLETE	-
2020-06-09 11:39:21 UTC-0500	Neo4jServer1DNS	CREATE_COMPLETE	-
2020-06-09 11:35:48 UTC-0500	Neo4jServer1DNS	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:35:48 UTC-0500	Neo4jServer1DNS	CREATE_IN_PROGRESS	-
2020-06-09 11:35:45 UTC-0500	Neo4jServer1	CREATE_COMPLETE	-
2020-06-09 11:35:28 UTC-0500	Neo4jServer2DNS	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:35:28 UTC-0500	Neo4jServer0DNS	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:35:28 UTC-0500	Neo4jServer0DNS	CREATE_IN_PROGRESS	-
2020-06-09 11:35:28 UTC-0500	Neo4jServer2DNS	CREATE_IN_PROGRESS	-
2020-06-09 11:35:27 UTC-0500	WaitOnPasswordReset	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:35:27 UTC-0500	WaitOnPasswordReset	CREATE_IN_PROGRESS	-
2020-06-09 11:35:24 UTC-0500	Neo4jServer0	CREATE_COMPLETE	-
2020-06-09 11:35:24 UTC-0500	Neo4jServer2	CREATE_COMPLETE	-
2020-06-09 11:35:17 UTC-0500	DNSZone	CREATE_COMPLETE	-
2020-06-09 11:34:53 UTC-0500	Neo4jServer0	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:34:52 UTC-0500	Neo4jServer2	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:34:52 UTC-0500	Neo4jServer1	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:34:51 UTC-0500	Neo4jServer1	CREATE_IN_PROGRESS	-
2020-06-09 11:34:51 UTC-0500	Neo4jServer0	CREATE_IN_PROGRESS	-
2020-06-09 11:34:51 UTC-0500	Neo4jServer2	CREATE_IN_PROGRESS	-
2020-06-09 11:34:48 UTC-0500	instProfNeo4jEnterprise	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	SubnetRouteTableAssociation1	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	Route	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	SubnetRouteTableAssociation0	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	SubnetNetworkAclAssociation2	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	SubnetRouteTableAssociation2	CREATE_COMPLETE	-
2020-06-09 11:33:19 UTC-0500	SubnetNetworkAclAssociation0	CREATE_COMPLETE	-
2020-06-09 11:33:18 UTC-0500	SubnetNetworkAclAssociation1	CREATE_COMPLETE	-
2020-06-09 11:33:12 UTC-0500	HTTPIngressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:04 UTC-0500	SubnetRouteTableAssociation1	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:04 UTC-0500	Route	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:04 UTC-0500	SubnetRouteTableAssociation0	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:04 UTC-0500	SubnetNetworkAclAssociation2	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:03 UTC-0500	SubnetRouteTableAssociation2	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:03 UTC-0500	SubnetNetworkAclAssociation0	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:03 UTC-0500	Route	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SubnetRouteTableAssociation1	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SubnetNetworkAclAssociation2	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SubnetRouteTableAssociation0	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SubnetNetworkAclAssociation1	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:33:03 UTC-0500	SubnetNetworkAclAssociation0	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SubnetRouteTableAssociation2	CREATE_IN_PROGRESS	-
2020-06-09 11:33:03 UTC-0500	SSHIngressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:03 UTC-0500	HTTPSIngressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:03 UTC-0500	OutBoundResponsePortsNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:03 UTC-0500	InboundResponsePortsNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:03 UTC-0500	Neo4jHTTPSIngressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:03 UTC-0500	HTTPEgressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	SubnetNetworkAclAssociation1	CREATE_IN_PROGRESS	-
2020-06-09 11:33:02 UTC-0500	Int1NetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	BoltEgressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	Neo4jHTTPSEgressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	BoltIngressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	Int3NetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	HTTPSEgressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	Int2NetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:02 UTC-0500	SSHEgressNetworkAclEntry	CREATE_COMPLETE	-
2020-06-09 11:33:00 UTC-0500	Subnet2	CREATE_COMPLETE	-
2020-06-09 11:33:00 UTC-0500	AttachGateway	CREATE_COMPLETE	-
2020-06-09 11:33:00 UTC-0500	Subnet1	CREATE_COMPLETE	-
2020-06-09 11:33:00 UTC-0500	Subnet0	CREATE_COMPLETE	-
2020-06-09 11:32:56 UTC-0500	HTTPIngressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:56 UTC-0500	HTTPIngressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:49 UTC-0500	sgNeo4jEnterprise	CREATE_COMPLETE	-
2020-06-09 11:32:48 UTC-0500	sgNeo4jEnterprise	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:48 UTC-0500	instProfNeo4jEnterprise	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	SSHIngressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	OutBoundResponsePortsNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	HTTPSIngressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	HTTPEgressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	Neo4jHTTPSIngressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	instProfNeo4jEnterprise	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	InboundResponsePortsNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	Int1NetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	BoltEgressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	Neo4jHTTPSEgressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	BoltIngressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	Int2NetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	Int3NetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	HTTPSEgressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	SSHEgressNetworkAclEntry	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:47 UTC-0500	SSHIngressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	HTTPSIngressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	OutBoundResponsePortsNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	HTTPEgressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	Neo4jHTTPSIngressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	Int1NetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	InboundResponsePortsNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:47 UTC-0500	Neo4jHTTPSEgressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	BoltIngressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	BoltEgressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	HTTPSEgressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	Int2NetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	Int3NetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:46 UTC-0500	SSHEgressNetworkAclEntry	CREATE_IN_PROGRESS	-
2020-06-09 11:32:45 UTC-0500	ReadOwnTags	CREATE_COMPLETE	-
2020-06-09 11:32:45 UTC-0500	AttachGateway	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	NetworkAcl	CREATE_COMPLETE	-
2020-06-09 11:32:44 UTC-0500	RouteTable	CREATE_COMPLETE	-
2020-06-09 11:32:44 UTC-0500	DNSZone	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	AttachGateway	CREATE_IN_PROGRESS	-
2020-06-09 11:32:44 UTC-0500	Subnet0	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	Subnet2	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	Subnet1	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	NetworkAcl	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	RouteTable	CREATE_IN_PROGRESS	Resource creation Initiated
2020-06-09 11:32:44 UTC-0500	DNSZone	CREATE_IN_PROGRESS	-
2020-06-09 11:32:44 UTC-0500	Subnet2	CREATE_IN_PROGRESS	-
2020-06-09 11:32:43 UTC-0500	Subnet0	CREATE_IN_PROGRESS	-
2020-06-09 11:32:43 UTC-0500	sgNeo4jEnterprise	CREATE_IN_PROGRESS	-
2020-06-09 11:32:43 UTC-0500	Subnet1	CREATE_IN_PROGRESS	-
2020-06-09 11:32:43 UTC-0500	NetworkAcl	CREATE_IN_PROGRESS	-

That's the output in the Cloud Formation Events. I didn't see any related logs in CloudWatch.

I don't see any more useful details anywhere.

david_allen · ‎06-10-2020

@bledi.feshti1 can you have a look at this? This interaction pattern shows that all of the components got created correctly (no errors) but the wait on cluster formation failed, meaning that the cluster did not properly form before the timeout. There are two possible causes of this:

System misconfiguration (which is unlikely as these templates have been through previous testing)
Race condition (cluster does not form quickly enough) and that the timeout that signals cluster failure is happening seconds or a minute before the cluster would have formed.

My best bet is that it is the second condition, and that if you increase the timeout for the CF deploy, that it'll take a minute or two longer but the cluster will form. But this needs checking.

thomas4 · ‎06-22-2020

Did you figure out how to fix this or work around it?

I'm trying to setup a Enterprise Causal Cluster in Amazon, using the stack template.
I have tried creating the stack multiple times with 3.5.15, 4.0.3, 4.0.4 and 4.0.5 (multiple times), with a timeout of 60 minutes.

Each time it looks like everything is created, but it hangs on the last step until posting a final event (
WaitOnPasswordReset
CREATE_FAILED
WaitCondition timed out. Received 0 conditions when expecting 1) and doing a rollback, exactly like the log above.

If this is a different issue, I apologize for the hijack, but o.w. I would appreciate a heads-up with a template that actually works, if you managed to get one.

Thank you in advance.

david_allen · ‎06-22-2020

Can you please indicate the command or method you're using to launch, and what region you're launching into? We did look at this, and we weren't able to replicate the problem. It's true that the deploy will fail if the timeout is too short. And it's also true that the 4.0 series takes a bit more time than 3.5 to finish the cluster formation. But in either case, 10 minutes maximum should be more than enough, and if it's failing after 60 minutes something else is going on.

To look into this we'd need more information on exactly how the deploy is happening, and whether or not you have other network rules in place in your account. What's failing is the cluster formation step -- this requires that the machines in the cluster be able to reach one another on the right ports. So (I'm a bit guessing blindly) but there may be a network situation that is preventing them from contacting one another.

mike2 · ‎06-22-2020

All I was doing was picking Casual Cluster 4.0.5 from the AWS Marketplace and selecting the US-EAST2 region to deploy it on with t2 Medium machines (3) everything else was the defaults and it fails every time I tried. I have never gotten it to work and am currently using a single Enterprise Node from an AMI, I do not yet have a plan of how I am going to run in production when I get all my bugs figured out.

thomas4 · ‎06-22-2020

And the same here, except us-east-1 and default machines (r4.large x 3).

Product code: 1emg6yskh0jf81czgzfadiu9w, if that helps.

Settings I have changed are:

Version
Stack name
SSH Key name (I could log into the machines with it, when disabling the rollback)
IP whitelist (0.0.0.0/0 for testing)
Password (changeme for testing)
Sometimes also the wait timeout and/or rollback on fail

And then next, accept, next. Creation started, runs for quite a while, blows up and rolls back.

On a side note, it just struck me, that you could have different password constraints in the system compared to the template, so I am retrying with a more complex password.

Edit:
"Complex" password also failed, same symptoms.

As for the second part of your question, in my case, I am running as my developer role on a company account (IAM account?) that also serves the rest of our infrastructure. Neo4j creates it's own VPC(Neo4jVPC-{stack name}), subnet (same naming scheme) and security group ({stack name}-sgNeo4jEnterprise-1{12 char HEX}), as far as I can tell.

I can log into the machines via the public ip using my SSH key and jump between the machines using the private IPs (again using ssh and the key).

If you need anything else, please be specific; I'm pretty new to most thing AWS myself.

Again, thank you in advance

david_allen · ‎06-23-2020

Give it 10 minutes or so to form. It should by then.

Then, SSH into the machines, and grab a copy of /var/lib/neo4j/logs/debug.log from all 3 VMs. Those logs are going to tell what's going on. Scan them for errors or exceptions, in particular anything that mentions akka. Report back with those errors, and we can diagnose

thomas4 · ‎06-23-2020

Node0.txt (21.8 KB) Node1.txt (55.3 KB) Node2.txt (14.9 KB)

I have truncated it a bit for brevity. The "Failed to load" log lines in Node1.txt are on all three instances.

Private IPs:
Node0: 10.0.0.46
Node1: 10.0.1.222
Node2: 10.0.2.84

Public IPs:
Node0: 34.201.44.16
Node1: 54.92.147.57
Node2: 34.234.73.125

Nodes 0 and 2 seem to know the IPs for the others (in the log). Node 1 doesn't or isn't telling that it does. It does, however, log that it is available on 172.31.45.91, which is not a part of this cluster.

And it looks like node0 is trying to start the server while it is already running.

Do you need anything else?

david_allen · ‎06-23-2020

I see two different errors in your logs that are directly preventing cluster formation. It looks like you have some customization in either your configuration or network settings. I can't speak to how to fix this, because I'm not sure what your CF looks like, but here is one issue on node0:

2020-06-23 15:57:05.397+0000 ERROR [a.i.TcpListener] Bind failed for TCP channel on endpoint [/10.0.0.46:5000] Address already in use
java.net.BindException: Address already in use

Without port 5000 open for cluster traffic you can't form a cluster. Best guess: in your neo4j configuration, somehow you're setting some other process to use port 5000 (or possibly you're binding a different neo4j service to port 5000). Either way, this is a clear problem with node0.

On node1:

2020-06-23 15:56:51.008+0000 ERROR [a.e.DummyClassForStringSources] Outbound message stream to [akka://cc-discovery-actor-system@node0.neo4j:5000] failed. Restarting it. Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j
akka.stream.StreamTcpException: Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j
Caused by: java.net.UnknownHostException: node0.neo4j

In this case, node1 can't reach node0.neo4j, because its DNS doesn't even resolve. In our cloudformation templates, there are provisions for these private internal DNS addresses (node0.neo4j, node1.neo4j, node2.neo4j). I would look into any custom configuration you have here which could interfere with these DNS names.

It looks like overall your cluster has no chance to form correctly and succeed, due to a combination of port conflicts & network/DNS misconfiguration.

thomas4 · ‎06-24-2020

Thank you for the feedback.

I agree with your assessments, however, I'm unsure how to actually do anything about it. Node0 is created by the cloudfront template; I'm not sure how anything in our setup would inject itself into the container and start running on port 5000 on that specific machine during provisioning? Isn't it more likely that something isn't shut down properly during the initial stopping of Neo4j during setup?
There is nowhere in the CF template that I picked any port, let alone 5000.

As for the other part, the machines are running in their own VPC, and they can reach each other using ssh. I can't say that we do not have any DNS setting that disallows them from registering themselves, however. Would that be enough to prevent the cluster from forming?

david_allen · ‎06-24-2020

All I can tell you is which parts in the CF provide for the things that appear to be broken in your install. This is the bit that creates the DNS Zones, you will see something similar to it repeated in the CF with template substitutions

{
    "Type": "AWS::Route53::RecordSet",
    "Condition" : "{{condition}}",
    "DependsOn" : "DNSZone",
    "Properties": {
        "HostedZoneId": { "Ref" : "DNSZone" },
        "Comment" : "DNS names for neo4j {{groupName}} {{i}}.",  
        "Name" : "{{groupName}}{{i}}.{{INTERNAL_DNS_TLD}}.",
        "Type" : "A",
        "TTL" : "900",
        "ResourceRecords" : [
            {# Map DNS to **private IP** not PublicIp, this because
                # it's inside the VPC and cluster coord traffic isn't allowed
                # outside anyway.
                #}
            { "Fn::GetAtt" : [ "Neo4jServer{{i}}", "PrivateIp" ] }
        ]
    }
}

You can look for corresponding records in AWS.

Similarly, you can inspect the open ports on each of the VMs. You should see 5000, 6000, and 7000 open to the internal VPC only, and 7473, 7687 open outside of the VPC.

Sorry -- unfortunately with customizations and customer tenancies in AWS there are a lot of ways these deploys can go wrong in ways that we can't see because we can't see your AWS setup. Between these logs and those bits, you ought to be able to go through your configuration and see what's missing / wrong. As I said, we tried to replicate this on our side and couldn't. That means that either a modification was made to the CF template that interfered with something, or there may be something about your AWS tenancy / policy / quotas that interferes, or some exotic third possibility I just don't know.

thomas4 · ‎06-30-2020

I realize the complexity involved can be quite high. I have tried reducing that by running the stack creation on my own personal account instead, to see if it was different. Note that I am still provisioning in N. Virginia (us-east-1) and the remaining choices are identical (only ran the test with 4.0.5, though).

This time, I managed to get the cluster up and running, and I could connect to it on :7473 through my browser, add a node and find it again using Cypher. So it seems you are right that some setting is preventing the cluster from forming properly on our company account.

However, the WaitOnPasswordReset step never completed, so after 30 minutes (roughly, my chosen timeout), everything rolled back again so it also seems that something in that script is incompatible with my other account (that I have only used for EC2 containers previously).

dayel · ‎08-03-2020

I am also running into the same issue as mentioned by @mike2 and @thomas4 when trying to run the Cloud Formation for Neo4j Enterprise Causal Cluster using 3 nodes on r4.large instances. On the "WaitOnPasswordReset" step of the formation, the following error occurs:

WaitCondition timed out. Received 0 conditions when expecting 1

After that, the entire Stack rolls back and deletes all resources.

All resources seem to be created before the wait on password reset stage.

Any help would be greatly appreciated as this stack is EXACTLY what we need for our APIs and we are no longer at the phase where we can sustain a single neo4j instance with routine snapshots.

Thanks!

roberto1 · ‎08-10-2020

Same error with AWS @dayel, @mike2, @thomas4, could you solved it?

thomas4 · ‎08-11-2020

I never found a solution, no.
We ended up using a different product, since we do not need the clustered version yet and I can't really justify spending any more company resources on this problem.
But I would still like to hear if anyone else finds a solution or the problem goes away.

david_allen · ‎08-13-2020

A bug was recently found in the AWS marketplace template that mostly affected those people launching clusters with read replicas. A fix is underway by @bledi.feshti and I think has been completed; we're mostly waiting for the new version to be listed by the AWS marketplace. This listing can take up to 10 days to get approved, but should be ready soon.

diego_ulloa13 · ‎01-13-2021

Hi everyone!

I am experiencing the same situation, apparently, the cluster nodes can't reach as @david.allen pointed out (according to my logs). But I didn't modify anything about the CF template, all resources are created by the template.

Did someone found a solution to this? I'd appreciate any help.

Also, I created nodes in one instance and they also appeared on the other two instances. So do they communicate with each other? or I am not understanding the problem?

debug2.txt (513.2 KB) debug1.txt (584.0 KB) debug0.txt (541.6 KB)

Neo4j

Can't get NEO4J 4 running (AWS or any Cloud Deployment)