mnoGoSearch 3.2 reference manual: Full-featured search engine software | ||
---|---|---|
Prev | Chapter 8. Searching documents | Next |
mnoGoSearch users have an ability to customize search results (output of search.cgi or search.php). You may do it by providing template file search.htm, which should be located in /etc/ directory of mnoGoSearch installation.
Template file is usual HTML file, which is divided into sections. Keep in mind that you can just open template file in your favorite browser and get the idea of how the search results will look like.
Each section begins with <!--sectionname--> and ends with <!--/sectionname--> delimiters, which should reside on a separate line.
Each section consists of HTML formatted text with special meta symbols. Every meta symbol is replaced by it's corresponding string. You can think of meta symbols as of variables, which will have their appropriate values while displaying search results.
Format of variables is the following:
$(x) - plain value
$&(x) - HTML-escaped value and search words highlighted.
$%(x) - value escaped to be used in URLs
$^(x) - search words highlighted.
The following section names are defined:
TOP - This section is included first on every page. You should begin this section with <HTML><HEAD> and so on. Also, this is a definitive place to provide a search form. There are two special meta symbols you may use in this section:
$(self) - argument for FORM ACTION tag
$(q) - a search query
$(ndocs) - total number of documents in the database
$(cat) - current category value
$(tag) - current tag value
$(rN) - random number (here N is a number)
If you want to include some random banners on your pages, please use $rN. You should also place string like "RN xxxx" in 'variables' section (see below), which will give you a range 0..xxxx for $rN. You can use as many up random numbers as you want.
Example: $(r0), $(r1), $(r45) etc.
Simple top section should be like this:
<!--top--> <HTML> <HEAD> <TITLE>mnoGoSearch: $(q)</TITLE> </HEAD> <BODY> <FORM METHOD=GET ACTION="$(self)"> <INPUT TYPE="hidden" NAME="ul" VALUE=""> <INPUT TYPE="hidden" NAME="ps" VALUE="20"> Search for: <INPUT TYPE="text" NAME="q" SIZE=30 VALUE="$&(q)"> <INPUT TYPE="submit" VALUE="Search!"><BR> </FORM> <!--/top-->
There are some variables defined in FORM.
lang limit results by language. Value is a two-letter language code.
ul is the filter for URL. It allows you to limit results to particular site or section etc. For example, you can put the following in the form
Search through:
<SELECT NAME="ul"> <OPTION VALUE="" SELECTED="$(ul)">Entire site <OPTION VALUE="/manual/" SELECTED="$(ul)">Manual <OPTION VALUE="/products/" SELECTED="$(ul)">Products <OPTION VALUE="/support/" SELECTED="$(ul)">Support </SELECT>
to limit your search to particular section.
The expression SELECTED="$(ul)" in example above (and all the examples below) allows the selected option to be reproduced on next pages. If search front-end finds that expression it prints the string SELECTED only in the case OPTION VALUE given is equal to that variable.
ps is default page size (e.g. how many documents to display per page).
q is the query itself.
pn is ps*np. This variable is not used by mnoGoSearch, but may be useful for example in <!INCLUDE CONTENT="..."> directive if you want to include result produced by another search engine.
Following variables are concerning advanced search capabilities:
m can be used to choose default search type if your query consists of more than one word. In case m=any, the search will try to find at least one word, in case m=all, the search is more restrictive - all words should be in the document. If m=bool query string is considered as a boolean expression.
o is used to specify the output format, so user can select different formats. There are three formats in default search.htm-dist for "res" section. They are used to output documents information in "Long", "Short" and "URL only" representation. However, you may use several formats for all sections, not for "res" only.
dt is time limiting type. There are three types supported.
If 'dt' is 'back', that means you want to limit result to recent pages, and you should specify this "recentness" in variable 'dp' in the form xxxA[yyyB[zzzC]]. Spaces are allowed between xxx and A and yyy and so on). xxx, yyy, zzz are numbers (can be negative!) A, B, C can be one of the following (the letters are the same as in strptime/strftime functions):
s - second
M - minute
h - hour
d - day
m - month
y - year
Examples:
4h30m - 2 hours and 30 minutes
1Y6M-15d - 1 year and six month minus 15 days
1h-60m+1s - 1 hour minus 60 minutes plus 1 second
If 'dt' is 'er' (which is short for newer/older), that means the search will be limited to pages newer or older than date given. Variable dx is newer/older flag (1 means "newer" or "after", -1 means "older" or "before"). Date is separated into fields as follows:
'dm' - month (0 - January, 1 - February, .., 11 - December)
'dy' - year (four digits, for example 1999 or 2000)
'dd' - day (1...31)
If 'dt' is 'range', that means search within given range of dates. Variables 'db' and 'de' are used here and stands for beginning and end date. Each date is string in the form dd/mm/yyyy, there dd is day, mm is month and yyyy is four-digits year.
This is the example of FORM part where you can choose between different time limiting options.
<!-- 'search with time limits' options --> <TR><TD> <TABLE CELLPADDING=2 CELLSPACING=0 BORDER=0> <CAPTION> Limit results to pages published within a specified period of time.<BR> <FONT SIZE=-1><I>(Please select only one option) </I></FONT> </CAPTION> <TR> <TD VALIGN=center><INPUT TYPE=radio NAME="dt" VALUE="back" CHECKED></TD> <TD><SELECT NAME="dp"> <OPTION VALUE="0" SELECTED="$(dp)">anytime <OPTION VALUE="10M" SELECTED="$(dp)">in the last ten minutes <OPTION VALUE="1h" SELECTED="$(dp)">in the last hour <OPTION VALUE="7d" SELECTED="$(dp)">in the last week <OPTION VALUE="14d" SELECTED="$(dp)">in the last 2 weeks <OPTION VALUE="1m" SELECTED="$(dp)">in the last month <OPTION VALUE="3m" SELECTED="$(dp)">in the last 3 months <OPTION VALUE="6m" SELECTED="$(dp)">in the last 6 months <OPTION VALUE="1y" SELECTED="$(dp)">in the last year <OPTION VALUE="2y" SELECTED="$(dp)">in the last 2 years </SELECT> </TD> </TR> <TR> <TD VALIGN=center><INPUT type=radio NAME="dt" VALUE="er"> </TD> <TD><SELECT NAME="dx"> <OPTION VALUE="1" SELECTED="$(dx)">After <OPTION VALUE="-1" SELECTED="$(dx)">Before </SELECT>
or on
<SELECT NAME="dm"> <OPTION VALUE="0" SELECTED="$(dm)">January <OPTION VALUE="1" SELECTED="$(dm)">February <OPTION VALUE="2" SELECTED="$(dm)">March <OPTION VALUE="3" SELECTED="$(dm)">April <OPTION VALUE="4" SELECTED="$(dm)">May <OPTION VALUE="5" SELECTED="$(dm)">June <OPTION VALUE="6" SELECTED="$(dm)">July <OPTION VALUE="7" SELECTED="$(dm)">August <OPTION VALUE="8" SELECTED="$(dm)">September <OPTION VALUE="9" SELECTED="$(dm)">October <OPTION VALUE="10" SELECTED="$(dm)">November <OPTION VALUE="11" SELECTED="$(dm)">December </SELECT> <INPUT TYPE=text NAME="dd" VALUE="$(dd)" SIZE=2 maxlength=2> , <SELECT NAME="dy" > <OPTION VALUE="1990" SELECTED="$(dy)">1990 <OPTION VALUE="1991" SELECTED="$(dy)">1991 <OPTION VALUE="1992" SELECTED="$(dy)">1992 <OPTION VALUE="1993" SELECTED="$(dy)">1993 <OPTION VALUE="1994" SELECTED="$(dy)">1994 <OPTION VALUE="1995" SELECTED="$(dy)">1995 <OPTION VALUE="1996" SELECTED="$(dy)">1996 <OPTION VALUE="1997" SELECTED="$(dy)">1997 <OPTION VALUE="1998" SELECTED="$(dy)">1998 <OPTION VALUE="1999" SELECTED="$(dy)">1999 <OPTION VALUE="2000" SELECTED="$(dy)">2000 <OPTION VALUE="2001" SELECTED="$(dy)">2001 </SELECT> </TD> </TR> </TR> <TD VALIGN=center><INPUT TYPE=radio NAME="dt" VALUE="range"> </TD> <TD> Between <INPUT TYPE=text NAME="db" VALUE="$(db)" SIZE=11 MAXLENGTH=11> and <INPUT TYPE=text NAME="de" VALUE="$(de)" SIZE=11 MAXLENGTH=11> </TD> </TR> </TABLE> </TD></TR> <!-- end of stl options -->
BOTTOM This section is always included last in every page. So you should provide all closing tags which have their counterparts in top section. Although it is not obligatory to place this section at the end of template file, but doing so will help you to view your template as an ordinary html file in a browser to get the idea how it's look like.
Below is an example of bottom section:
<!--bottom--> <P> <HR> <DIV ALIGN=right> <A HREF="http://search.mnogo.ru/"> <IMG SRC="mnogosearch.gif" BORDER=0 ALT="[Powered by mnoGoSearch search engine software]"> </A> </BODY> </HTML> <!--/bottom-->
RESTOP This section is included just before the search results. It's a good idea to provide some common search results. You can do so by using the next meta symbols:
$(first) - number of First document displayed on this page
$(last) - number of Last document displayed on this page
$(total) - total number of found documents
$(WE) - search results with full statistics of every word form search
$(W) - serach results with information about the number of the word form found and the number of all word forms found delimited with "/" sign for every search word, e.g. if the search result is test: 25/73, it means that the number of word form "test" found is 25, and the number of all its forms ("test", "tests", "testing", etc.) found is 73.
Below is an example of 'restop' section:
<!--restop--> <TABLE BORDER=0 WIDTH=100%> <TR> <TD>Search<BR>results:</TD> <TD><small>$(WE)</small></TD> <TD><small>$(W)</small></TD> </TR> </TABLE> <HR> <CENTER> Displaying documents $(first)-$(last) of total <B>$(total)</B> found. </CENTER> <!--/restop-->
RES - This section is used for displaying various information about every found document. The following meta symbols are used:
$(DU) Document URL
$(DT) Document Title
$(DR) Document Rating (as calculated by mnoGoSearch
$(DX) Document teXt (the first couple of lines to give an idea of what the document is about).
$(DC) Document Content-type (for example, text/html)
$(DM) Document Last-Modified date
$(DS) Document Size (in bytes)
$(DN) Document Number (in order of appearance)
$(DD) Document Description (from META DESCRIPTION tag)
$(DE) if non empty $DD then $DD else $DX
$(DK) Document Keywords (from META KEYWORDS tag)
$(DY) Document category with links, i.e. /home/computers/software/www/
$(CL) Clone List (see section 'clone' for details)
Note: It is possible to specify maximum number of characters returned by any of the above variables. E.g. $DU may return a long URL that may break page table structure. To specify maximum number of characters in the displayed URL's, use $(DU:xx), where xx - maximum number of characters:
$(DU:40)
will return a URL, and if it is longer than 40 character, only 40 characters will be displayed including the ending points:
http://very.long.url/path/veery/long/...
Here is an example of res section:
<!--res--> <DL><DT> <b>$(DN).</b><a href="$(DU)" TARGET="_blank"> <b>$(DT)</b></a> [<b>$(DR)</b>]<DD> $(DX)...<BR> <b>URL: </b> <A HREF="$(DU)" TARGET="_blank">$(DU)</A>($(DC))<BR> $(DM), $(DS) bytes<BR> <b>Description: </b>$(DD)<br> <b>Keywords: </b>$(DK)<br> </DL> <UL> $(CL) </UL> <!--/res-->
CLONE - The contents of this section is included in result just instead of $CL meta symbol for every document clone found. This is used to provide all URLs with the same contents (like mirrors etc.). You can use the same $(D*) meta symbols here as in 'res' section. Of course, some information about clone, like $(DS), $(DR), $(DX) will be the same so it is of little use to place it here.
Below is an example of 'clone' section.
RESBOT - This is included just after last 'res' section. You usually give a navigation bar here to allow user go to next/previous results page.
Navigator is a complex thing and therefore is constructed from the following template sections: navleft, navleft_nop
These are used for printing the link to the previous page. If that page exists, <!--navleft--> is used, and on the first page there is no previous page, so <!--navleft_nop--> is used.
<!--navleft--> <TD><A HREF="$(NH)"><IMG...></A><BR> <A HREF="$(NH)">Prev</A></TD> <!--/navleft--> <!--navleft_nop--> <TD><IMG...><BR> <FONT COLOR=gray>Prev</FONT></TD> <!--/navleft_nop-->
navbar0 - This is used for printing the current page in the page list.
navright, navright_nop - These are used for printing the link to the next page. If that page exists, <!--navright--> is used, and on the last page <!--navright_nop--> is used instead.
<!--navright--> <TD> <A HREF="$(NH)"><IMG...></A> <BR> <A HREF="$(NH)">Next</A></TD> <!--/navright--> <!--navright_nop--> <TD> <IMG...> <BR> <FONT COLOR=gray>Next</FONT></TD> <!--/navright_nop-->
navbar1 - This is used for printing the links to the other pages in the page list.
<!--navbar1--> <TD> <A HREF="$(HR)"> <IMG...></A><BR> <A HREF="$(NH)">$(NN)</A> </TD> <!--/navbar1-->
This is an example of 'resbot' section:
notfound - As its name implies, this section is displayed in case when no documents are found. You usually give a little message saying that and maybe some hints how to make search less restrictive.
Below is an example of notfound section:
<!--notfound--> <CENTER> Sorry, but search hasn't returned results.<P> <I>Try to compose less restrictive search query or check spelling.</I> </CENTER> <HR> <!--/notfound-->
noquery - This section is displayed in case when user gives an empty query. Below is an example of noquery section:
error - This section is displayed in case some internal error occurred while searching. For example, database server is not running or so. You may provide the following meta symbol:$(E) - error text.
Example of error section:
There is also a special variables section, in which you can set up some values for search.
Special variables section usually looks like this:
<!--variables DBAddr mysql://foo:bar@localhost/search/ DBMode single VarDir /usr/local/mnogosearch/var/ LocalCharset iso-8859-1 BrowserCharset iso-8859-1 TrackQuery no Cache no DetectClones yes HlBeg <font color="blue"><b><i> HlEnd </i></b> R1 100 R2 256 Synonym synonym/english.syn -->
Note: Database option DBAddr works only for SQL back-end and does not matter for built-in text files support. Like in indexer.conf, host part in DBAddr argument takes affect for natively supported databases only and does not matter for ODBC databases. In case of ODBC use database name part of DBAddr to specify ODBC DSN.
VarDir command specifies a custom path to directory that indexer stores data to when use with use with built-in database and in cache mode. By default /var directory of mnoGoSearch installation is used.
LocalCharset specifies a charset of database. It must be the same with indexer.conf LocalChatser.
BrowserCharset specifies which charset will be used to display results. It may differ from LocalCharset. All template variables which correspond data from search result (such as document title, description, text) will be converted from LocalCharset to BrowserCharset. Contents of template itself is not converted, it must be in BrowserCharset.
Use "Cache yes/no" to enable/disable search results cache.
Use "Clone yes/no" to enable/disable closes detection.
HlBeg and HlEnd commands are used to configure search results highlighting. Found words will be surrounded in those tags.
There is an Alias command in search.htm, that is similar to the one in indexer.conf, but it affects only search results while having no effect on indexing. See Aliases section for details.
R1 and R2 specify ranges for random variables $(R1) and $(R2).
Synonym command is used to load specified synonyms list. Synonyms file name is either absolute or relative to /etc directory of mnoGoSearch installation.
You may use <!INCLUDE Content="http://hostname/path"> to include external URLs into search results.
WARNING: You can use <!INCLUDE> ONLY in the following template sections:
<!--top-->
<!--bottom-->
<!--restop-->
<!--resbot-->
<!--notfound-->
<!--error-->
This is an example of includes usage:
mnoGoSearch allows to define several (up to 100) descriptions for the same template section. It is often reasonable, for example, to have both "Long" and "Short" search results format. To implement this just write two separate "res" template sections for "Long" and "Short" result output formats one by one. The sample of different formats usage is given in search.htm-dist. Note that "res" is not the only section - every template section may be given several times. So, it is easy for example to prepare multi-language templates.
WARNING: Since the template file contains such info as password, it is highly recommended to give the file proper permissions to protect it from reading by anyone but you and search program. Otherwise your passwords may leak.