October 30, 2014
 
 
RSSRSS feed

Advanced Recoll Setup: Indexing Your Data the Convenient Way - page 2

System Indexing Without Bogging Down

  • August 11, 2008
  • By A. Lizard

The recoll indexing scripts are set up to be started manually by a user and shut down the computer unattended afterwards. So the first step is to make it possible to shut down the computer via a script.

To make it possible for a script to access the shutdown command /sbin/poweroff from userspace, one has to add the shutdown command and the user whose privileges must be extended to cover the usually root-only shutdown to the /etc/sudoers file. This must be done because the update scripts must be accessible from userspace to make it possible for them to update databases and leave them as user files that can be accessed from the user-side recoll to access the indexing.

To open /etc/sudoers:

# visudo
======================== script begin
/etc/sudoers
#
# This file MUST be edited with the 'visudo' command as root.
#
# See the man page for details on how to write a sudoers file.
#

Defaults env_reset

# Host alias specification
# User alias specification
# Cmnd alias specification
Cmnd_Alias POWEROFF = /sbin/poweroff

# User privilege specification
username ALL = NOPASSWD:POWEROFF
root ALL=(ALL) ALL

======================== script end

 

Recoll Configuration

Here are the changes one has to make to the /home/username/.recoll/recoll.conf file. This file is intended to index the static content and ignore the dynamic content. Database sizes for HD indexing are large enough that one wants to avoid duplicating context within the indexes. Create a directory to store this file in. In this example it is called xapiandb-eudora; you may call it anything you want.

$ cd /home/username/.recoll
$ mkdir xapiandb-eudora
$ nano recoll.conf

In the following configuration files, the content has been trimmed to contain only what the reader must change and enough surrounding data so the reader can locate the changes within these files.

[snip] denotes content the reader can ignore unless there is reason to change these defaults. vdi and vmdk is omitted because in my machine configuration, data is not kept in any guestVM virtual drive.

The following recoll.conf file is for the static content configuration and written to exclude the dynamic files. Revise the topdirs and skippedNames entries in these files to reflect where you actually keep your content. Topdirs is for this configuration are content you want read occasionally, skippednames are what the dynamic content configuration will be reading.

======================== end quote
# @(#$Id: recoll.conf.in,v 1.14 2007/01/16 10:58:42 dockes Exp $ (C) 2004 J.F.Dockes
#
# Recoll default configuration file. This should be copied to
# ~/.recoll/recoll.conf

# Space-separated list of directories to index. Next line indexes $HOME

topdirs = /home/username

# Wildcard expressions for names of files and directories that we should
# ignore. If you need index mozilla/thunderbird mail folders, don't put
# ".*" in there (as was the case with an older sample config)
# These are simple names, not paths (must contain no / )
skippedNames = #* bin CVS Cache* cache* caughtspam tmp .thumbnails
.svn *~ recollrc .wine dot* VirtualBox* *.vmdk .* *.vdi

# new:
skippedPaths = /home/username/win/Eudora /home/username/win/business /home/username/win/image
/home/username/win/data2 /home/username/win/books /home/username/win/pdf

[snip]

# Where to store the database (directory). This may be an absolute path,
# else it is taken as relative to the configuration directory (-c argument
# or $RECOLL_CONFDIR).
# If nothing is specified, the default is then ~/.recoll/xapiandb/
dbdir = xapiandb

[snip]
======================== end quote

Note that the skippedPaths entry is a single line regardless of the linewraps showed here due to the fact that you're reading from a browser. I suggest that you simply copy and paste the skippedPaths entry of the list of directories you don't want your static database to read after you create that list into the recoll.conf topdirs entry telling your dynamic database what it's supposed to read. So once you're typing the directory list from within a text editor, just let the lines wrap naturally and end the line with Enter at the end.

Copy the recoll.conf file to xapiandb-eudora and go into that directory.

$ cp recoll.conf xapiandb-eudora/recoll.conf
$ cd xapiandb-eudora

Here are the changes one has to make to the /home/username/.recoll/xapiandb-eudora.recoll.conf file. This file is intended to index the dynamic content and ignore the static content.

Note that the skipped directory paths from the .recoll/recoll.conf are the top directories to be indexed in the dynamic content indexing xapiandb-eudora/recoll.conf is designed to configure.

Note that the topdirs entry is also a single line regardless of the linewraps showed here/

$ nano xapiandb-eudora/recoll.conf

======================== start quote
# @(#$Id: recoll.conf.in,v 1.14 2007/01/16 10:58:42 dockes Exp $ (C) 2004 J.F.Dockes
#
# Recoll default configuration file. This should be copied to
# ~/.recoll/xapiandb-eudora/recoll.conf for the modified dynamic database

# Space-separated list of directories to index. Next line indexes $HOME

topdirs = /home/username/win/Eudora /home/username/win/business /home/username/win/image /home/username/win/data2 /home/username/win/books /home/username/win/pdf

# Wildcard expressions for names of files and directories that we should
# ignore. If you need index mozilla/thunderbird mail folders, don't put
# ".*" in there (as was the case with an older sample config)
# These are simple names, not paths (must contain no / )
skippedNames = #* bin CVS Cache* cache* caughtspam tmp .thumbnails
.svn *~ recollrc .wine dot* VirtualBox* *.vmdk .* *.vdi

[snip]

# Where to store the database (directory). This may be an absolute path,
# else it is taken as relative to the configuration directory (-c argument
# or $RECOLL_CONFDIR).
# NOTE: in this case, it's relative to the config directory
# If nothing is specified, the default is then ~/.recoll/xapiandb/
dbdir = xapiandb-eudora

[snip]
======================== end quote

 

Sitemap | Contact Us