Back to article
Finding Things on Linux and Understanding Regular Expressions
The Shell Built-in Wildcard Provision
September 14, 2009
Regular expressions (regexps) are a very powerful tool, allowing you to look for text strings matching a particular pattern. In this first part of a two-part series, I'm going to look at using them on the command line. The next part will cover regexps in editors and other programs.
The shell built-in wildcard provision
There's some basic regexp-type provision built into the shell: the most basic example of this is the * wildcard. This example will list every file in the current directory which has a .jpg extension:
What actually happens here is that the shell expands the * before it passes the file list to ls. So that line is really equivalent to
ls file1.jpg file2.jpg ...
In contrast, this command-line will produce the same output, but using grep with full regexp syntax (see the next section for more on grep
ls | grep '.*\.jpg'
This runs ls on the current directory (so listing all files), then passes the output through grep, which uses 'proper' regexps, rather than the shell built-in. Here, . means 'any character', and * means '0 or more of the preceding characters': so .* is '0 or more of any character'. The \ is used to escape the second period, so it's treated as a real period rather than a standin for 'any character'. i.e. we get files ending .jpg. Note the difference between this and the shell built-in, where a period is just treated as a real period, and * means 'any character'.
The single quotes are very important! Without the single quotes, the shell will try to do expansion before running the command, and strange things will result. Always single-quote your regular expressions on the command line.
The shell wildcard provision can be very useful. The important thing is to remember that the syntax isn't quite the same as for 'proper' regular expressions. Here's another shell built-in example, which will move all of your old logs (which on my system are named like mail.log.0.gz, system.log.1.bz2, etc) to a subdirectory:
mv *.log.[0-9].* logarchive/
[0-9] will match any character between 0 and 9: this works with proper regexps as well as with the shell built-in.