Advanced Tips for Search-and-Replace in Linux
Search and Replace Power Tools

Juliet Kemp
Tuesday, September 29, 2009 12:11:39 PM
In my previous article about regular expressions, I gave some examples of
ways in which you can use them on the command line, with various utilities.
Regexps can also be used within many text editors (sometimes with a slightly
different syntax, but the gist is the same). I'll use Vim and Emacs as
examples; for different editors you may need to check the manual for the
syntax details.
Search-and-replace is likely to be the operation you'll most often use
regexps for in an editor. First let's look at a straightforward non-regexp
search-and-replace. Let's say that you've just
decided to rename a variable from foo to fooOne. In Vim,
hit Esc for command mode, then use this command:
:%s/foo/fooOne/g
% means that the operation should be carried out throughout
the whole document. The important part is s/foo/fooOne/, which means
"replace every instance of 'foo' with 'fooOne'". The final g means
"global"; without this you'll just replace the first instance on every line,
but with it, you replace every occurrence.
To use this search-and-replace pattern in Emacs, hit M-x then type
replace-string RET foo RET fooOne.
However, while this non-regexp operation would replace foo with
fooOne, it would also replace foobar with
fooOnebar, which you probably didn't want. To get around this, use
the word boundary markers \< and \>:
:%s/\<foo\>/fooOne/g
This restricts the replacement to occur only when 'foo' exists as a word on
its own (with a word boundary character on each side of it). In Emacs:
M-x replace-regexp RET \<foo\> RET fooOne
Backreferences
Backreferences (as used in the previous tutorial) can also be very useful.
For example, say you wanted to change all the date references in a file from
US-style (09/22/09) to UK style, with long year and a dot instead of a
slash (22.09.2009). This regexp would do the trick in Vim:
:%s#\<\(\d\+\)/\(\d\+\)/\(\d\{2\}\)\>#\2.\1.20\3#g
For Emacs, use:
M-x replace-regexp RET \<\([[:digit:]]+\)/\([[:digit:]]+\)/\([[:digit:]]\{2\}\)\> RET \2.\1.20\3
OK, that looks quite complicated! First of all, let's note that in vim, we
use # rather than /, giving us s###g rather than s///g.
This makes it easier to read if you're looking for / in the pattern, and also
means that you don't need to escape any / characters.
As discussed in the previous article, each pair
of escaped brackets,
\(PATT\), store a backreference to PATT. Here we have three
backreferences, with a word boundary in front and afterwards (the
\< and \>), and separated by a slash between each of
the backreferences (as in 09/22/09).
The first pattern we're looking for is \d\+: this means at least
one digit character (\d). So this will match 9, 09, 12, etc. In
Emacs, this is written [[:digit:]]+ (there is no need to escape the
+ in Emacs regexp syntax, as you must do in Vim). You can also use
[[:digit:]] instead of \d in vim if you prefer.
The second backreference pattern is the same as the first one, to match the
number of days. The third pattern, \d\{2\} matches exactly 2 digit
characters (\{n\} matches exactly n of the previous character type),
because years aren't usually written as single digits.
The replace string is then straightforward: reorder the three
backreferences so that the day digits come first, then the month, then the
year with 20 in front of it, all separated by a period.
Next: Leaning Toothpick Syndrome and Complemented Character Sets »