
![]()
If you've ever learned a foreign language, you know that the most common approach is to start by building your vocabulary (almost always including the names of the months, for some reason), and then you learn about sentence construction rules. The UNIX command line is a lot like a language. Now you've learned a lot of UNIX words, so it's time to learn how to put them together as sentences using file redirection, filters, and pipes.
Commands to be added to your vocabulary this hour include wc, sort, nl, uniq. You also learn about the -n flag to the cat command, which forces cat to add line numbers, and how you can use that to help find information within files.
In this hour, you learn
This hour begins by focusing on one aspect of constructing powerful
custom commands in UNIX by using file redirection. The introduction
of some filters, programs that are intended to be used as part
of command pipes, follow. Next you learn another aspect of creating
your own UNIX commands using pipelines.
Task 8.1: The Secrets of File Redirection
|
| So far, all the commands you've learned while teaching yourself UNIX have required you to enter information at the command line, and all have produced output on the screen. But, as Gershwin wrote in Porgy and Bess, "it ain't necessarily so." In fact, one of the most powerful features of UNIX is that the input can come from a file as easily as it can come from the keyboard, and the output can be saved to a file as easily as it can be displayed on your screen. |
The secret is file redirection, the special commands in UNIX that instruct the computer to read from a file, write to a file, or even append information to an existing file. Each of these acts can be accomplished by placing a file-redirection command in a regular command line: < redirects input, > redirects output, and >> redirects output and appends the information to the existing file. A mnemonic for remembering which is which is to remember that, just as in English, UNIX works from left to right, so a character that points to the left (<) changes the input, whereas a character that points right (>) changes the output.
% touch testme
% ls -l testme
-rw-rw-r-- 1 taylor 0 Nov 15 09:11 testme
% ls -l > testme
% ls -l testme
-rw-rw-r-- 1 taylor 120 Nov 15 09:12 testme
Notice that when you redirected the output, nothing was displayed on the screen; there was no visual confirmation that it worked. But it did, as you can see by the increased size of the new file.
% cat < testme total 127 drwx------ 2 taylor 512 Nov 6 14:20 Archives/ drwx------ 3 taylor 512 Nov 16 21:55 InfoWorld/ drwx------ 2 taylor 1024 Nov 19 14:14 Mail/ drwx------ 2 taylor 512 Oct 6 09:36 News/ drwx------ 3 taylor 512 Nov 11 10:48 OWL/ drwx------ 2 taylor 512 Oct 13 10:45 bin/ -rw-rw---- 1 taylor 57683 Nov 20 20:10 bitnet.lists.Z -rw-rw---- 1 taylor 46195 Nov 20 06:19 drop.text.hqx -rw-rw---- 1 taylor 12556 Nov 16 09:49 keylime.pie drwx------ 2 taylor 512 Oct 13 10:45 src/ drwxrwx--- 2 taylor 512 Nov 8 22:20 temp/ -rw-rw---- 1 taylor 0 Nov 20 20:21 testme
The results are the same as if you had used the ls command, but the output file is saved, too. You now can easily print the file or go back to it later to compare the way it looks with the way your files look in the future.
% ls -FC >> testme
Recall that the -C flag to ls forces the system to list output in multicolumn mode. Try redirecting the output of ls -F to a file to see what happens without the -C flag.
% cat -v < mystery.file > visible.mystery.file
This command has cat -v take its input from the file mystery.file and save its output in visible.mystery.file. All the nonprinting characters are transformed, and Shala can poke through the file at her leisure.
Find a file on your system that file reports as a data file, and try using the redirection commands to create a version with all characters printable through the use of cat -v.
|
| There is an infinite number of ways that you can combine the various forms of file redirection to create custom commands and to process files in various ways. This hour has really just scratched the surface. Next, you learn about some popular UNIX filters and how they can be combined with file redirection to create new versions of existing files. Also, study the example about Shala's file, which shows the basic steps in all UNIX file-redirection operations: Specify the input to the command, specify the command, and specify where the output should go. |
|
| Writers generally talk about the length of their work in terms of number of words, rather than number of pages. In fact, most magazines and newspapers are laid out according to formulas based on multiplying an average-length word by the number of words in an article. |
These people are obsessed with counting the words in their articles, but how do they do it? You can bet they don't count each word themselves. If they're using UNIX, they simply use the UNIX wc program, which computes a word count for the file. It also can indicate the number of characters (which ls -l indicates, too) and the number of lines in the file.
% wc testme
4 12 121
% wc < testme
4 12 121
% cat testme | wc
4 12 121
All three of these commands offer the same result (which probably seems a bit cryptic now). Why do you need to have three ways of doing the same thing? Later, you learn why this is so helpful. For now, stick to using the first form of the command.
The output is three numbers, which reveal how many lines, words, and characters, respectively, are in the file. You can see that there are 4 lines, 12 words, and 121 characters in testme.
% wc -w testme
12 testme
% wc -l testme
4 testme
% wc -wl testme
12 4 testme
% wc -lw testme
4 12 testme
% <ls | wc -l
37
The ls command lists each file, one per line (because you didn't use the -C flag). The output of that command is fed to wc, which counts the number of lines it's fed. The result is that you can find out how many files you have (37) in your home directory.
% who | wc -l
12
% cat /etc/passwd | wc -l
3877
|
| The wc command is a great example of how the simplest of commands, when combined in a sophisticated pipeline, can be very powerful. |
Task 8.3: Removing Extraneous Lines Using uniq
|
| Sometimes when you're looking at a file, you'll notice that there are many duplicate entries, either blank lines or, perhaps, lines of repeated information. To clean up these files and shrink their size at the same time, you can use the uniq command, which lists each unique line in the file. |
Well, it sort of lists each unique line in the file. What uniq really does is compare each line it reads with the previous line. If the lines are the same, uniq does not list the second line. You can use flags with uniq to get more specific results: -u lists only lines that are not repeated, -d lists only lines that are repeated (the exact opposite of -u), and -c adds a count of how many times each line occurred.
% uniq testme Archives/ OWL/ keylime.pie InfoWorld/ bin/ src/ Mail/ bitnet.mailing-lists.Z temp/ News/ drop.text.hqx testme
% cat testme testme testme > newtest
Examine newtest to verify that it contains three copies of testme, one after the other. (Try using wc.)
% wc newtest 12 36 363 % uniq newtest | wc 12 36 363
They're the same. Remember, the uniq command removes duplicate lines only if they're adjacent.
% tail -1 testme > lastline % cat lastline lastline lastline lastline > newtest2 % cat newtest2 News/ drop.text.hqx testme News/ drop.text.hqx testme News/ drop.text.hqx testme News/ drop.text.hqx testme
Now you can see what uniq does:
% uniq newtest2
News/ drop.text.hqx testme
% uniq -c newtest2
4 News/ drop.text.hqx testme
This shows that this line occurs four times in the file. Lines that are unique have no number preface.
% uniq -d newtest2 News/ drop.text.hqx testme % uniq -u newtest2 %
Why did the -u flag list no output? The answer is that the -u flag tells uniq to list only those lines that are not repeated in the file. Because the only line in the file is repeated four times, there's nothing to display.
|
| Given this example, you probably think uniq is of marginal value, but you will find that it's not uncommon for files to have many blank lines scattered willy-nilly throughout the text. The uniq command is a fast, easy, and powerful way to clean up such files. |
Task 8.4: Sorting Information in a File Using sort
|
| Whereas wc is useful at the end of a pipeline of commands, uniq is a filter, a program that is really designed to be tucked in the middle of a pipeline. Filters, of course, can be placed anywhere in a line, anywhere that enables them to help direct UNIX to do what you want it to do. The common characteristic of all UNIX filters is that they can read input from standard input, process it in some manner, and list the results in standard output. With file redirection, standard input and output also can be files. To do this, you can either specify the filenames to the command (usually input only) or use the file-redirection symbols you learned earlier in this hour (<, >, and >>). |
Standard input and standard output are two very common expressions in UNIX. When a program is run, the default location for receiving input is called standard input. The default location for output is standard output. If you are running UNIX from a terminal, standard input and output are your terminal.
There is a third I/O location, standard error. By default, this is the same as standard output, but you can re-direct standard error to a different location than standard output. You learn more about I/O redirection later in the book.
One of the most useful filters is sort, a program that reads information and sorts it alphabetically. You can customize the behavior of this program, like all UNIX programs, to ignore the case of words (for example, to sort Big between apple and cat, rather than before - most sorts put all uppercase letters before the lowercase letters), and to reverse the order of a sort (z to a). The program sort also enables you to sort lists of numbers.
Few flags are available for sort, but they are powerful, as shown in Table 8.1.
Flag Function -b Ignore leading blanks. -d Sort in dictionary order (only letters, digits, and blanks are significant). -f Fold uppercase into lowercase; that is, ignore the case of words. -n Sort in numerical order. -r Reverse order of the sort. Table 8.1. Flags for the sort command.
% ls -1F Archives/ InfoWorld/ Mail/ News/ OWL/ bin/ bitnet.mailing-lists.Z drop.text.hqx keylime.pie src/ temp/ testme
To force ls to list output one file per line, you can use the -1 flag (that's the number one, not a lowercase L).
To sort filenames alphabetically regardless of case, you can use sort -f:
% ls -1 | sort -f Archives/ bin/ bitnet.mailing-lists.Z drop.text.hqx InfoWorld/ keylime.pie Mail/ News/ OWL/ src/ temp/ testme
% sort < testme Archives/ OWL/ keylime.pie InfoWorld/ bin/ src/ Mail/ bitnet.mailing-lists.Z temp/ News/ drop.text.hqx testme
% ls -s | sort -n total 127 1 Archives/ 1 InfoWorld/ 1 Mail/ 1 News/ 1 OWL/ 1 bin/ 1 src/ 1 temp/ 1 testme 13 keylime.pie 46 drop.text.hqx 64 bitnet.mailing-lists.Z
It would be more convenient if the largest files were listed first in the output. That's where the -r flag to reverse the sort order can be useful:
% ls -s | sort -nr 64 bitnet.mailing-lists.Z 46 drop.text.hqx 13 keylime.pie 1 testme 1 temp/ 1 src/ 1 bin/ 1 OWL/ 1 News/ 1 Mail/ 1 InfoWorld/ 1 Archives/ total 127
% ls -s | sort -nr | head -5 64 bitnet.mailing-lists.Z 46 drop.text.hqx 13 keylime.pie 1 testme 1 temp/
That's a powerful and complex UNIX command, yet it is composed of simple and easy-to-understand components.
|
| Like many of the filters, sort isn't too exciting by itself. As you explore UNIX further and learn more about how to combine these simple commands to build sophisticated instructions, you will begin to see their true value. |
|
| It often can be helpful to have a line number listed next to each line of a file. It's quite simple to do with the cat program by specifying the -n flag to number lines in the file displayed. |
On many UNIX systems, there's a considerably better command for numbering lines in a file and for many other tasks. The command nl, for number lines, is an AT&T System V command. A system that doesn't have the nl command will complain nl: command not found. If you have this result, experiment with cat -n instead.
% ls -l > testme
To see line numbers now, cat -n will work fine:
% cat -n testme
1 total 60
2 -rw-r--r-- 1 taylor 1861 Jun 2 1992 Global.Software
3 -rw------- 1 taylor 22194 Oct 1 1992 Interactive.Unix
4 drwx------ 4 taylor 4096 Nov 13 11:09 Mail/
5 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 News/
6 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 Src/
7 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 bin/
8 -rw-r--r-- 1 taylor 12445 Sep 17 14:56 history.usenet.Z
9 -rw-r--r-- 1 taylor 0 Nov 20 18:16 testme
% nl testme
1 total 60
2 -rw-r--r-- 1 taylor 1861 Jun 2 1992 Global.Software
3 -rw------- 1 taylor 22194 Oct 1 1992 Interactive.Unix
4 drwx------ 4 taylor 4096 Nov 13 11:09 Mail/
5 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 News/
6 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 Src/
7 drwxr-xr-x 2 taylor 4096 Nov 13 11:09 bin/
8 -rw-r--r-- 1 taylor 12445 Sep 17 14:56 history.usenet.Z
9 -rw-r--r-- 1 taylor 0 Nov 20 18:16 testme
% ls -CF | cat -n
1 Global.Software News/ history.usenet.Z
2 Interactive.Unix Src/ testme
3 Mail/ bin/
% ls -CF | nl
1 Global.Software News/ history.usenet.Z
2 Interactive.Unix Src/ testme
3 Mail/ bin/
|
| Like many other UNIX tools, nl and its doppelganger cat -n aren't very thrilling by themselves. As additional members in the set of powerful UNIX tools, however, they can prove tremendously helpful in certain situations. As you soon will see, nl also has some powerful options that can make it a bit more fun. |
Task 8.6: Cool nl Tricks and Capabilities
|
| A program that prefaces each line with a line number isn't much of an addition to the UNIX command toolbox, so the person who wrote the nl program added some further capabilities. With different command flags, nl can either number all lines (by default it numbers only lines that are not blank) or skip line numbering (which means it's an additional way to display the contents of a file). The best option, though, is that nl can selectively number just those lines that contain a specified pattern. |
If you don't have the nl command on your system, I'm afraid you're out of luck in this section. Later in the book, you learn other ways to accomplish these tasks. For now, though, if you don't have nl, skip to the next hour and start to learn about the grep command.
The command flag format for nl is a bit more esoteric than you've seen up to this point. The different approaches to numbering lines with nl are all modifications of the -b flag (for body numbering options). The four flags are -ba, which numbers all lines; -bt, which numbers printable text only; -bn, which results in no numbering; and -bp pattern, for numbering lines that contain the specified pattern.
One final option is to insert a different separator between the line number and the line by telling nl to use -s, the separator flag.
% rm testme % ls -CF > testme % echo "" >> testme % echo "" >> testme % ls -CF >> testme % cat testme Global.Software News/ history.usenet.Z Interactive.Unix Src/ testme Mail/ bin/ Global.Software News/ history.usenet.Z Interactive.Unix Src/ testme Mail/ bin/
Parts of UNIX are rather poorly designed, as you have already learned. For example, if you use the echo command without arguments, you get no output. However, if you add an empty argument (a set of quotation marks with nothing between them), echo outputs a blank line. It doesn't make much sense, but it works.
% nl testme
1 Global.Software News/ history.usenet.Z
2 Interactive.Unix Src/ testme
3 Mail/ bin/
4 Global.Software News/ history.usenet.Z
5 Interactive.Unix Src/ testme
6 Mail/ bin/
You can accomplish the same thing by specifying nl -bt testme. Try this to verify that your system gives the same results.
% nl -ba testme
1 Global.Software News/ history.usenet.Z
2 Interactive.Unix Src/ testme
3 Mail/ bin/
4
5
6 Global.Software News/ history.usenet.Z
7 Interactive.Unix Src/ testme
8 Mail/ bin/
% nl -bphistory testme
1 Global.Software News/ history.usenet.Z
Interactive.Unix Src/ testme
Mail/ bin/
2 Global.Software News/ history.usenet.Z
Interactive.Unix Src/ testme
Mail/ bin/
Notice that numbering the two lines has caused the rest of the lines to fall out of alignment on the display.
% nl -bphistory -s: testme
1:Global.Software News/ history.usenet.Z
Interactive.Unix Src/ testme
Mail/ bin/
2:Global.Software News/ history.usenet.Z
Interactive.Unix Src/ testme
Mail/ bin/
In this case, I specified that instead of using a tab, which is the default separator between the number and line, nl should use a colon. As you can see, the output now lines up again.
Just about anything can be specified as the separator, as sensible or weird as it might be:
% nl -s', line is: ' testme
1, line is: Global.Software News/ history.usenet.Z
2, line is: Interactive.Unix Src/ testme
3, line is: Mail/ bin/
4, line is: Global.Software News/ history.usenet.Z
5, line is: Interactive.Unix Src/ testme
6, line is: Mail/ bin/
Notice the use of single quotation marks (') in this example. I want to include spaces as part of my pattern, so I need to ensure that the program knows this. If I didn't use the quotation marks, nl would use a comma as the separator and then tell me that it couldn't open a file called line or is:.
|
| The nl command demonstrates that there are plenty of variations on simple commands. When you read earlier that you would learn how to number lines in a file, did you think that this many subtleties were involved? |
You have learned quite a bit in this hour and are continuing down the road to UNIX expertise. You learned about file redirection. You can't go wrong by spending time studying these closely. The concept of using filters and building complex commands by combining simple commands with pipes has been more fully demonstrated here, too. This higher level of UNIX command language is what makes UNIX so powerful and easy to mold.
This hour hasn't skimped on commands, either. It introduced wc for counting lines, words, and characters in a file (or more than one file: try wc * in your home directory). You also learned to use the uniq, sort, and spell commands. You learned about using nl for numbering lines in a file - in a variety of ways - and cat -n as an alternative "poor person's" line-numbering strategy. You also were introduced to the echo command.
By the way, the echo command also can tell you about specific environment variables, just like env or printenv do. Try echo $HOME or echo $PATH to see what happens, and compare the output with env HOME and env PATH.
The Workshop summarizes the key terms you learned and poses some questions about the topics presented in this chapter. It also provides you with a preview of what you will learn in the next hour.
< file wc wc file < wc < file cat file | wc cat < file | wc wc | cat
Now try them and see if you're correct.
The next hour introduces wildcards and regular expressions, and tools to use those powerful concepts. You learn how these commands can help you extract data from even the most unwieldy files.You learn one of the secret UNIX commands for those really in the know, the secret-society, pattern-matching program grep. Better yet, you learn how it got its weird and confusing name! You also learn about the tee command and the curious-but-helpful << file-redirection command.