Back to article
Advanced Recoll Setup: Indexing Your Data the Convenient Way
System Indexing Without Bogging Down
August 11, 2008
Finding a satisfactory way to index a lot of data in Linux is a lot harder than it sounds. The most popular tools like Beagle tend to be limited to single keyword searches, which are a pretty blunt tool when looking through hundreds of gigabytes of files. Some tools are a massive pain to set up, I found htree an example of this. Search tools are also frequently set up to default to running as background daemons. While this gives you instant indexed access to anything that goes into the indexed filesystem, the price you pay are massive computer resource usage, to the point where user processes frequently slow down.
The obvious alternative is to use regular database updates when the computer is not in use by a human user.
A cron job is obvious, but I don't use them for this, because unless the workstation runs all the time, there's a good chance the computer will be shut down when it's time to run an indexing database update.
The other problem is that the database is large enough and growing, sooner or later, an update is going to take longer than overnight to run.
Most people and companies have two classes of data. Archival data that isn't going to change frequently, and dynamic data that does. The bulk of the data in possession of most persons and businesses are historic, not current/dynamic.
My e-mail from 20 years ago isn't likely to change, and I rarely need access to it. But if a person or business surfaces from my past, I'll want to know what my interactions with that entity were like immediately. Just because someone or something I dealt with were bad news doesn't necessarily mean I will remember that by name.
What if you could run very occasional indexing to keep any changes in the archival data overnight and much more frequent indexing for dynamic information and be able to retreive from both at the same time? What if the indexing could run until it finished and then shut down the computer? It can, with recoll. Recoll is a personal full text search tool for Unix/Linux.
That's how I do it. Follow the procedures that follow and you, too will be able to.
For "username", substitute your userID, of course.
You can get recoll in most cases via whichever automated repository-based installation setup your Linux distribution uses and it'll read just about any document filetype that lives on your system. Or get the source from the recoll site.