April 22, 2014
 
 
RSSRSS feed

Advanced Recoll Setup: Indexing Your Data the Convenient Way - page 3

System Indexing Without Bogging Down

  • August 11, 2008
  • By A. Lizard

Recollindex is the program that actually does the hard drive indexing.

The following is the dynamic content script. I'm using time to find out how long it takes to run this. I put these in /home/username/.recoll , you can put them where you please as long as the permissions are set to user and executable, for instance:

# chmod a+x recollindex-time.sh

sudo in the script tells the script to execute /sbin/poweroff using root privileges.

======================== script begin
#/bin/sh!
# recollindex-time.sh
# run recollindex for dynamic content once, shut down
# may modify to send create and continuously update a log
rm -rf /home/username/.recoll/log-dynamic.txt
/usr/bin/time recollindex -c /home/username/.recoll/xapiandb-eudora > /home/username/.recoll
sudo /sbin/poweroff
======================== script end

To run:

$ sh /path-to/.recoll/recollindex-time.sh
======================== script begin
#/bin/sh!
# recollindex-static-time.sh
# run recollindex for static content once, shut down
# may modify to send create and continuously update a log
rm -rf /home/username/.recoll/log-static.txt
/usr/bin/time recollindex > /home/username/.recoll/log-static.txt 2>&1
sudo /sbin/poweroff
======================== script end

To run:

$ sh /path-to/.recoll/recollindex-static-time.sh

typical messages from dynamic update:

:../internfile/mh_html.cpp:105:textHtmlToDoc: final transcode had 8 errors for [unknown]
:2:../internfile/mh_html.cpp:105:textHtmlToDoc: final transcode had 129 errors for [unknown]
:2:../internfile/mh_mail.cpp:511:walkmime: transcode failed from cs 'unknown-8bit' to UTF-8
:2:../internfile/mh_mail.cpp:511:walkmime: transcode failed from cs 'DEFAULT_CHARSET' to UTF-8
:3:../rcldb/rcldb.cpp:918:dumb_string: unac failed for [From: solidbusinessopportunity@yahoo.com
To: username@mindspring.com
Date: Sun, 30 Dec 2001 17:22:51
Subject: I COULDN'T BELIEVE IT!

Would you spend $1,000 in order to receive $30,000 in return?

[snip]
================== end log

These error messages are not significant, and can be ignored.

Setting Up the Recoll GUI

Assuming that you've run at least search for your static and dynamic databases (I suggest static first), you can set up the recoll gui (see Figure 1). You can use the defaults except that you'll have to add the dynamic database directory to the static database directory already in the search path.

Open Recoll by Start > Utilities > Local Text Search (recoll)

Set Top Menu > Preferences > External Index Dialog > External Indexes Click Browse. Follow the path until you find the dynamic database directory. Click OK. You'll see that path in the External Indexes window with a checkbox. Check it. You're ready to search.

Searching

I regard the GUI setup as fairly self-explanatory. For a simple search, open Recoll. If you expect to use it a lot, you might want to drag/drop the menu item onto the desktop or add it to the KDE Taskbar as I do. Pull down the menu marked Query Language and find Any term (OR) or All Terms (AND) and put in keywords. Or open Tools and pull down Advanced Search for multiple boolean operators. The best way to become familiar with the UI is to do some searches and find out from experience what the menus and icons do. There is a help menu and documentation is available.

Database updating

As I said, since I run a personal workstation that's usually shut down at night, I don't run a cron job to run these scripts automatically, I start the searches manually before going to bed when a reminder program tells me to do so. As a KDE user, I use Kalarm for this.

Set up two alerts in Kalarm.
Start > Utilities > PIM > Personal Alarm Scheduler (Kalarm) (for KDE 3.x, don't know what the KDE4 or Gnome equivalent is)

After opening a new alert by:
File > New
Set up your reminder message including the script name/path
Click recurrence tab and set the reminder interval... and for a weekly cycle, you can select days of the week. I run my dynamic searches 3x a week, on Monday, Wednesday, and Friday.

Then set one the same way for your static searches with the static script, I'd run this every 3-6 months. You can set months by name, too.

Documentation and further information

file:///usr/share/recoll/doc/usermanual.htm is where the onboard help is kept.
This is the recoll website. Recoll: A Linux Desktop Search Engine That Works

.

About the Author: A.Lizard is an Internet consultant who lives in the San Francisco Bay Area. He has been writing technology articles for publication since 1987.

Sitemap | Contact Us