Indexing .PDF, .XLS, .DOC, .PPT using Lucene.NET
I’ve heard of Lucene.Net and I’ve heard of Apache Tika. The question is – how do I index these documents using C# vs Java? I think the issue is that there is no .Net equivalent of Tika which extracts relevant text from these document types.