SearchD support

Beginning from mnoGoSearch version 3.2 searchd support is available.

Why using searchd

Starting searchd

To start using searchd:

To suppress output to stderr, use -l option. The output will go through syslog only (in case syslog support was not disabled during installation with --disable-syslog). In case syslog is disabled, it is possible to direct stderr to a file:

/usr/local/mnogosearch/sbin/searchd 2>/var/log/searchd.log &

searchd just like indexer can be used with an option of a configuration file, e.g. relative path to /etc directory of mnoGoSearch installation:

searchd searchd1.conf

or with absolute path:

searchd /usr/local/mnogosearch/etc/searchd1.conf

Merging several databases

It is possible to indicate several SearchAddr commands in search.htm. In this case search.cgi will send queries via TCP/IP to several searchd's and compile results. In version 3.2.0 up to 256 databases are supported. DBMode and type of databases (both SQL and built-in) may differ with various searchd's.

search.cgi starts with sending queries to every searchd, thus activating parallel searches in every searchd. Then it waits for the results, compiles them and selects best matches.

Thus it is possible to create a distributed across several machines database. Please note that databases should not intersect, i.e. same documents should not be present in several merged databases. Otherwise the document will be duplicated in search results.

Distributed indexing

Indexing distribution can be done by means of hostname filtering.

Imagine it is necessary to create a search engine, e.g. for .de domain. Search administrator has 28 machines available, and their names for example are:


a.hostname.de
b.hostname.de
...
...
z.hostname.de

indexer.conf is created for every machine. E.g. on a machine a.hostname.de:

# For hostnames starting with www: Realm http://www.a*.de/ # For hostnames without www: Realm http://a*.de/

Repeat this action for every machine.

Searchd understands the following commands in searchd.conf as well, they are similar to those in indexer.conf.

Allow x.x.x.x Disallow x.x.x.x

With the above commands you may specify which hosts can/can not connect to searchd. In case the commands are not specified, any host can connect. E.g. to allow connecting from localhost only:

Allow 127.0.0.1 Disallow *

Or from the 192.168.x.x network only:

Allow 192.168.*.* Disallow *

To make searchd reload the configuration file with the HUP signal, use the following command:

kill -HUP xxx

Where xxx - id number of the process (pid).

Then indexer is run on every machine (or several indexers) that index their own area.

A search.cgi is installed on every machine and the following lines are added to every corresponding template:


SearchAddr a.hostname.de
SearchAddr b.hostname.de
....
SearchAddr z.hostname.de

Thus search.cgi will send parallel queries to every machine and return best results to user.

In the current version indexing of each area is done independently. If on the server http://a.domane.de/ there is a link to http://b.doname.de/ server, this link will not be transferred from the machine responsible for a to the machine responsible for b.

Since distribution by hostname is used, in case one of the machines is not operational, the information of all the web servers that were indexed on this machine will be unavailable.

It is planned to implement in the future versions communication between "neighbouring" hosts (i.e. the hosts will be able to transfer links between each other, as well as other types of distribution - by hash-function from document's URL. That means that one site's pages will be evenly distributed by all the machines of the cluster. So in case one of the machines is unavailable, all the sites will still be available on other machines.