Blog

Tagged by 'lucene.net'

  • Over the last few months I have been carrying out endless amounts of research and development to find a way to create my own eCommerce styled search similar to the likes of what eBay and Amazon use. Otherwise known as “Faceted Search”, whereby the search results are filtered through a series of facets belonging to your search criteria. Each facet typically corresponds to the possible values of a property common to a set of objects.

    Sounds very difficult and complex doesn’t it! Smile Even to this very day, I am sure eBay and Amazon must use some kind of “magic” to get their search to work in a seamlessly and efficient format.

    There are numerous search solutions out there that could help you achieve in making this type of search. From my experience I couldn’t find any low cost out-of-the-box solutions that would help me in making my own search. Majority of the search vendors were not only very expensive but they also required a quote to tailor make a solution for you.

    In the early stages I tried expanding my Lucene.NET knowledge, but I couldn’t find a flexible way to introduce facets into my search. I must admit I am not exactly an expert in Lucene and this could have also had a part to play in failing miserably.

    When I thought all was lost and there was no chance in hell in being able to figure this thing out, I luckily came across a few blog and StackOverflow posts by a guy called Mauricio Scheffer. Mauricio seems to be the brains behind the .NET client version of a search platform called: SolrNet. SolrNet is a  Solr client library built for the .NET Framework. This is one of the strengths of Solr. It can be consumed within other development platforms such as Python and Ruby.

    SolrNet just happened to be an ideal solution to what I was looking for and with just over a weeks development I was able to build my own basic search, which looks something like this:

    SolrNet1 SolrNet2

    As you can see from my screenshots, you can carry out a search by report type and/or global text search. In addition, the showing and hiding of the facet objects are purely dependent on the searches returned.

    SolrNet is a very flexible package and I know just enough to implement the basics. But I was really surprised on how well the searches performed even with the most basic implementation. So I am looking forward to adding additional features as over the next few months and perfecting both my Solr search index and code.

    I won’t be posting the code that I used to create my search since its quite a big project and tailor made specific to my database architecture. But here are a few links that I found useful to get me started in the world of SolrNet:

  • Over the last few days I have been doing some research on the best way to implement search functionality for a site I am currently building. The site will consists mainly of news articles. The client wanted a search that would allow a user to search across all fields that related to a news article.

    Originally, I envisaged writing my own SQL to query a few tables within my database to return some search results. But as I delved further into designing the database architecture in the early planning stages, I found that my original (somewhat closed minded) approach wouldn't be flexible nor scalable enough to search and extract all the information I required.

    From what I have researched, the general consensus is to either use SQL Full Text Search or Lucene.NET. Many have favoured the use of Lucene due to its richer querying language and generally more flexible since you have the ability to write a search index tailored to your project. From what I gather, Lucene can work with any type of text data. For example, you not only can index rows in your database but there are also solutions to support indexing physical files in your application. Neat!

    I have written some basic code (below) with a couple methods to get started in creating a search index and carrying out a multi-query search across your whole index. You would further enhance this code to only carry out a full index once all required records have been added. Most implementations of Lucene would use incremental indexing, where documents already in the index are just updated individually, rather than deleting the whole index and building a new one every time. I plan to hook up and optimise my Lucene code into a service that would be scheduled to carry out an incremental index every midnight.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Lucene;
    using Lucene.Net;
    using Lucene.Net.Store;
    using Lucene.Net.Analysis;
    using Lucene.Net.Analysis.Standard;
    using Lucene.Net.Index;
    using Lucene.Net.Documents;
    using Lucene.Net.QueryParsers;
    using Lucene.Net.Search;
    using System.Configuration; 
    
    namespace MES.DataManager.Search
    {
        public class Lucene
        {
            public static void IndexSite()
            {           
                    //The file location of the index
                    string indexLocation = @ConfigurationManager.AppSettings["SearchIndexPath"];
    
                    Directory searchDirectory = null;
    
                    if (System.IO.Directory.Exists(indexLocation))
                        searchDirectory = FSDirectory.GetDirectory(indexLocation, false);
                    else
                        searchDirectory = FSDirectory.GetDirectory(indexLocation, true); 
    
                    //Create an analyzer to process the text
                    Analyzer searchAnalyser = new StandardAnalyzer(); 
    
                    //Create the index writer with the directory and analyzer.
                    IndexWriter indexWriter = new IndexWriter(searchDirectory, searchAnalyser, true);
    
                    //Iterate through Article table and populate the index
                    foreach (Article a in ArticleBLL.GetArticleDetails())
                    {
                        Document doc = new Document();
    
                        doc.Add(new Field("id", a.ID.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));
                        doc.Add(new Field("title", a.Title, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
                        doc.Add(new Field("articletype", a.Type.TypeName, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES)); 
    
                        if (!String.IsNullOrEmpty(a.Summary))
                            doc.Add(new Field("summary", a.Summary, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));                
    
                        if (!String.IsNullOrEmpty(a.ByLineShort))
                            doc.Add(new Field("bylineshort", a.ByLineShort, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));                    
    
                        if (!String.IsNullOrEmpty(a.ByLineLong))
                            doc.Add(new Field("bylinelong", a.ByLineLong, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));                   
    
                        if (!String.IsNullOrEmpty(a.BasicWords))
                            doc.Add(new Field("basicwords", a.BasicWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));                   
    
                        if (!String.IsNullOrEmpty(a.MediumWords))
                            doc.Add(new Field("mediumwords", a.MediumWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));                   
    
                        if (!String.IsNullOrEmpty(a.LongWords))
                            doc.Add(new Field("longwords", a.LongWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));  
    
                        //Write the document to the index
                        indexWriter.AddDocument(doc);
                    }
                              
    
                    //Optimize and close the writer
                    indexWriter.Optimize();
                    indexWriter.Close();         
            }
    
            public static List<CoreArticleDetail> SearchArticles(string searchTerm)
            {
                Analyzer analyzer = new StandardAnalyzer(); 
    
                //Search by multiple fields
                MultiFieldQueryParser parser = new MultiFieldQueryParser(
                                                                    new string[]
                                                                    {
                                                                        "title",
                                                                        "summary",
                                                                        "bylineshort",
                                                                        "bylinelong",
                                                                        "basicwords",
                                                                        "mediumwords",
                                                                        "longwords"
                                                                    },
                                                                    analyzer); 
    
                Query query = parser.Parse(searchTerm); 
    
                //Create an index searcher that will perform the search
                IndexSearcher searcher = new IndexSearcher(@ConfigurationManager.AppSettings["SearchIndexPath"]); 
    
                //Execute the query
                Hits hits = searcher.Search(query);
    
                List<int> articleIDs = new List<int>(); 
    
                //Iterate through index and return all article id’s
                for (int i = 0; i < hits.Length(); i++)
                {
                    Document doc = hits.Doc(i);
    
                    articleIDs.Add(int.Parse(doc.Get("id")));
                } 
    
                return ArticleBLL.GetArticleSearchInformation(articleIDs);
            }
    
        }
    }
    

    As you can see, my example allows you to carry out a search across as many of your fields as you require which I am sure you will find useful. It took a lot of research to find out how to carry out a multi query search. Majority of the examples I found over the internet showed you how to search only one field.

    The main advantage I can see straight away from using Lucene is that since the search data is held on disk, there is hardly any need to query the database. The only downside I can see is problems being caused by the possibility a corrupt index.

    For more information on using Lucene, here are a couple of links that you may find useful to get started (I know I did):

    http://www.codeproject.com/KB/library/IntroducingLucene.aspx http://ifdefined.com/blog/post/Full-Text-Search-in-ASPNET-using-LuceneNET.aspx