I’m building an ASP.NET MVC site where I plan to use Lucene.Net. I’ve envisioned a way to structure the usage of Lucene, but not sure whether my planned architecture is OK and efficient.
My Plan:
- On
Application_Startevent in Global.asax: I check for the existence of the index on the file system – if it doesn’t exist, I create it and fill it with documents extracted it from the database. - When new content is submitted: I create an
IndexWriter, fill up a document, write to the index, and finally dispose of theIndexWriter.IndexWritersare not reused, as I can’t imagine a good way to do that in an ASP.NET MVC application. - When content is edited: I repeat the same process as when new content is submitted, except that I first delete the old content and then add the edits.
- When a user searches for content: I check
HttpRuntime.Cacheto see if a user has already searched for this term in the last 5 minutes – if they have, I return those results; otherwise, I create anIndexReader, build and run a query, put the results inHttpRuntime.Cache, return them to the user, and finally dispose of theIndexReader. Once again,IndexReadersaren’t reused.
My Questions:
- Is that a good structure – how can I improve it?
- Are there any performance/efficiency problems I should be aware of?
- Also, is not reusing the IndexReaders and IndexWriters a huge code smell?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The answer to all three of your questions is the same: reuse your readers (and possibly your writers). You can use a singleton pattern to do this (i.e. declare your reader/writer as public static). Lucene’s FAQ tells you the same thing: share your readers, because the first query is reaaalllyyyy slow. Lucene handles all the locking for you, so there is really no reason why you shouldn’t have a shared reader.
It’s probably easiest to just keep your writer around and (using the NRT model) get the readers from that. If it’s rare that you are writing to the index, or if you don’t have a huge need for speed, then it’s probably OK to open your writer each time instead. That is what I do.
Edit: added a code sample:
public static IndexWriter writer = new IndexWriter(myDir);
public JsonResult SearchForStuff(string query)
{
IndexReader reader = writer.GetReader();
IndexSearcher search = new IndexSearcher(reader);
// do the search
}
Method 2
I would probably skip the caching — Lucene is very, very efficent. Perhaps so efficent that it is faster to search again than cache.
The OnApplication_Start full index feels a bit off to me — should probably be run in it’s own thread so as not to block other expensive startup activities.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0