Conquering Character Encoding Chaos With GNU Recode
Conversion and IdentificationIn the beginning were C and C++, and hosts of other computer programming languages. These are all based on ASCII (American Standard Code for Information Interchange), which as the name implies is based on the English alphabet. Which wouldn't be an issue except there are lot of other humans in the world, and they don't use the English alphabet.
So along came Unicode to the rescue. Unicode provides a framework for all alphabets of the world to be represented on computers. UTF-8 is the most popular Unicode implementation because it preserves backwards compatibility with ASCII. Which is all fun to know, but what good is it when you're looking at piles of computer files that need to converted from ISO-8859-1 (Latin-1, Western European) into whatever encoding you prefer? Naturally, there are a number of utilities just for this task.
$ recode UTF-8 recode-test.txtCheck out the GNU Recode Manual for instructions.
That's fast and easy enough, but there's one more job- converting the filename. The convmv command is just the tool for this job. This example converts all the ISO-8859-1 filenames in the files/ directory to UTF-8:
$ convmv -f iso-8859-1 -t utf8 --notest files/
convmv run without the --notest option does a dry-run without changing anything, which is probably a wise thing to do first.
Maybe you have a file that you don't know what the encoding is. Upload your file to this online tool and it will tell you. You can even do file conversions here.
ResourcesThe subject of character encoding is huge and bewildering, especially for us dinosaurs from the typewriter era. By golly, when you hit a typewriter key it came out the same way every single time. Wikipedia has a number of excellent introductory articles:
Solid state disks (SSDs) made a splash in consumer technology, and now the technology has its eyes on the enterprise storage market. Download this eBook to see what SSDs can do for your infrastructure and review the pros and cons of this potentially game-changing storage technology.
- 1Linux Top 3: Ubuntu 14.04, Debian Gives Squeeze More Life and Red Hat Goes Atomic
- 2Linux Top 3: CoreOS, Oracle Enterprise Linux 7 and Ubuntu 14.10
- 3Linux Top 3: Debian Dumps SPARC, Ubuntu Takes Over Linux 3.13 and the Core Infrastructure Initiative
- 4Linux Top 3: Fedora, Ubuntu and Gluster Lose Community Leaders
- 5Red Hat Enterprise Linux 7 Finally Hits the Big Time