Neo4j + Lucene are a powerful combination. There is one feature I would love to see in this integration. Lucene was built on document stores, where a single document contains a collection of key value pairs. When we read a document we expect all related information to be contained in that document. So when you index documents in a MongoDB database, or records in a MySQL database (i.e. a row of a single table), in both cases you are limited to the key:value pairs contained inside the container.
But a graph database differs from document stores; in a graph you generally benefit from a more exploded view of everything. So a Person node may not contain fields describing that person, instead Person may have relationships to other nodes like Hobbies, Employment, Address...
When I index person, I would like to be able to use more context when I describe my Person to the Lucene index. I would like to be able to make a query to compose the set of fields for my Person node index to include Address.city, Employment.currentEmployer, Hobbies.favorite.
I can't see that there is any way to do this other than to run a query that creates an actual field in my Person node derived from those related nodes, and then base my index on that materialised field. Lucene will accept only one label at a time, and there is no place to specify a query.
Perhaps a great feature would be to be to allow the index creator to include fields from connected nodes.
PS: I do see a challenge in implementing this. The index for the Person node above would have to be aware when the Address node changes so that Person node connected to it would be re-indexed.
... View more