Chapter 8. Searching documents

Table of Contents
Using search front-ends
How to write search result templates
Designing search.html
Relevancy
Search queries tracking
Search results cache
Fuzzy search

Using search front-ends

Performing search

Open your preferred front-end in Web browser:


http://your.web.server/path/to/search.cgi
or
http://your.web.server/path/to/search.php3
or
http://your.web.server/path/to/search.pl

To find something just type words you want to find and press SUBMIT button. For example, "mysql odbc". You should not use quotes " in query, they are written here only to divide a query from other text. mnoGoSearch will find all documents that contain word "mysql" and/or word "odbc". Best documents having bigger weights will be displayed first.

Search parameters

mnoGoSearch front-ends support the following parameters given in CGI query string. You may use them in HTML form on search page.

Table 8-1. Available search parameters

qtext parameter with search query
pspage size, number of search results displayed on one page, 20 by default. Maximum page size is 100. This value does not allow passing very big page sizes to avoid server overload and might be changed with MAX_PS definition in search.c.
nppage number, 0 by default (first page)
msearch mode. Currently "all","any" and "bool" values are supported.
wmword match. You may use this parameter to choose word match type. There are "wrd", "beg", "end" and "sub" values that respectively mean whole word, word beginning, word ending and word substring match.
osearch result type. 0 by default. You may describe several template sections for every part, for example "res". This allows choosing for example "Long" or "Short" search result output types. Up to 100 different formats are allowed in the same template.
ttag limit. Limits search through only documents with given tag. This parameter has the same effect with -t indexer option
catCategory limit. Take a look into Categories section for details.
ulURL limit, URL substring to limit search through subsection of database. It supports SQL % and _ LIKE wildcards. This parameter has the same effect with -u indexer option. If relative URL is specified search.cgi inserts % signs before and after "ul" value when compiled with SQL support. It allows to write URL substring in HTML from to limit search, for example <OPTION VALUE="/manual/"> instead of VALUE="%/manual/%". When full URL with schema is specified search.cgi adds % sign only after this value. For example for <OPTION VALUE="http://localhost/"> search.cgi will pass http://localhost/% in SQL LIKE comparison.
wfWeight factors. It allows changing different document sections weights at a search time. Should be passed in the form of hex number. Check the explanation below.
gLanguage limit. Language abbreviation to limit search results by url.lang field.

Changing different document parts weights at search time

It is possible to pass "wf" HTML form variable to search.cgi. "wf" variable represents weight factors for specific document parts. Currently body,title,keywords,description,url parts, crosswords as well as user defined META and HTTP headers are supported. Take a look into "Section" part of indexer.conf-dist.

To be able use this feature it is recommended to set different sections IDs for different document parts in "Section" indexer.conf command. Currently up to 256 different sections are supported.

Imagine that we have these default sections in indexer.conf:

Section body 1 Section title 2 Section keywords 3 Section description 4

"wf" value is a string of hex digits ABCD. Each digit is a factor for corresponding section's weight. The most right digit corresponds to section 1. For the given above sections configuration:


      D is a factor for section 1 (body)
      C is a factor for section 2 (title)
      B is a factor for section 3 (keywords)
      A is a factor for section 4 (description)

Examples:


   wf=0001 will search through body only.

   wf=1110 will search through title,keywords,desctription but not 
through the body.

   wf=F421 will search through:
          Description with factor 15  (F hex)
          Keywords with factor 4
          Title with factor  2
          Body with factor 1

By default, if "wf" variable is omitted in the query, all sections factors are 1, it means all sections have the same weight.

Using front-end with an shtml page

When using a dynamic shtml page containing SSI that calls search.cgi, i.e. search.cgi is not called directly as a CGI program, it is necessary to override Apache's SCRIPT_NAME environment attribute so that all the links on search pages lead to the dynamic page and not to search.cgi.

For example, when a shtml page contains a line <--#include virtual="search.cgi">, SCRIPT_NAME variable will still point to search.cgi, but not to the shtml page.

To override SCRIPT_NAME variable we implemented a UDMSEARCH_SELF variable that you may add to Apache's httpd.conf file. Thus search.cgi will check UDMSEARCH_SELF variable first and then SCRIPT_NAME. Here is an example of using UDMSEARCH_SELF environment variable with SetEnv/PassEnv Apache's httpd.conf command:

SetEnv UDMSEARCH_SELF /path/to/search.cgi PassEnv UDMSEARCH_SELF

Using several templates

It is often required to use several templates with the same search.cgi. There are actually several ways to do it. They are given here in the order how search.cgi detects template name.

  1. search.cgi checks environment variable UDMSEARCH_TEMPLATE. So you can put a path to desired search template into this variable.

  2. search.cgi also supports Apache internal redirect. It checks REDIRECT_STATUS and REDIRECT_URL environment variables. To activate this way of template usage you may add these lines in Apache srm.conf:

    AddType text/html .zhtml AddHandler zhtml .zhtml Action zhtml /cgi-bin/search.cgi

    Put search.cgi into your /cgi-bin/ directory. Then put HTML template into your site directory structure under any name with .zthml extension, for example template.zhtml. Now you may open search page: http://www.site.com/path/to/template.zhtml You may use any unused extension instead of .zthml of course.

  3. If the above two ways fail, search.cgi opens a template which has the same name with the script being executed using SCRIPT_NAME environment variable. search.cgi will open a template ETC/search.htm, search1.cgi will open ETC/search1.htm and so on, where ETC is mnoGoSearch /etc directory (usually /usr/local/mnoGoSearch/etc). So, you can use the same search.cgi with different templates without having to recompile it. Just create one or several hard or symbolic links for search.cgi or copy it and put corresponding search templates into /etc directory of mnoGoSearch installation.

    Take a look also into Making multi-language search pages section

Advanced boolean search

If you want more advanced results you may use query language. You should select "bool" match mode in the search from.

mnoGoSearch understands the following boolean operators:

& - logical AND. For example, "mysql & odbc". mnoGoSearch will find any URLs that contain both "mysql" and "odbc".

| - logical OR. For example "mysql|odbc". mnoGoSearch will find any URLs that contain word "mysql" or word "odbc".

~ - logical NOT. For example "mysql & ~odbc". mnoGoSearch will find URLs that contain word "mysql" and do not contain word "odbc" at the same time. Note that ~ just excludes given word from results. Query "~odbc" will find nothing!

() - group command to compose more complex queries. For example "(mysql | msql) & ~postgres". Query language is simple and powerful at the same time. Just consider query as usual boolean expression.

How search handles expired documents

Expired documents are still searchable with their old content.