Читать книгу LPIC-1 Linux Professional Institute Certification Study Guide - Richard Blum - Страница 14
Part I
Exam 101-400
Chapter 1
Exploring Linux Command-Line Tools
Processing Text Using Filters
ОглавлениеIn keeping with Linux's philosophy of providing small tools that can be tied together via pipes and redirection to accomplish more complex tasks, many simple commands to manipulate text are available. These commands accomplish tasks of various types, such as combining files, transforming the data in files, formatting text, displaying text, and summarizing data.
Many of the following descriptions include input-file specifications. In most cases, you can omit these input-file specifications, in which case the utility reads from standard input instead.
File-Combining Commands
The first text-filtering commands are those used to combine two or more files into one file. Three important commands in this category are cat
, join
, and paste
, which join files end to end based on fields in the file or by merging on a line-by-line basis.
Combining Files with cat
The cat
command's name is short for concatenate, and this tool does just that: It links together an arbitrary number of files end to end and sends the result to standard output. By combining cat
with output redirection, you can quickly combine two files into one:
Although cat
is officially a tool for combining files, it's also commonly used to display the contents of a short file to STDOUT. If you type only one filename as an option, cat
displays that file. This is a great way to review short files; but for long files, you're better off using a full-fledged pager command, such as more
or less
.
You can add options to have cat
perform minor modifications to the files as it combines them:
Display Line Ends
If you want to see where lines end, add the
-E
or-show-ends
option. The result is a dollar sign ($
) at the end of each line.Number Lines
The
-n
or-number
option adds line numbers to the beginning of every line. The-b
or-number-nonblank
option is similar, but it numbers only lines that contain text.Minimize Blank Lines
The
-s
or-squeeze-blank
option compresses groups of blank lines down to a single blank line.Display Special Characters
The
-T
or-show-tabs
option displays tab characters as∧I
. The-v
or-show-nonprinting
option displays most control and other special characters using carat (∧
) andM-
notations.
The tac
command is similar to cat
, but it reverses the order of lines in the output:
Joining Files by Field with join
The join
command combines two files by matching the contents of specified fields within the files. Fields are typically space-separated entries on a line. However, you can specify another character as the field separator with the -t
char option, where char is the character you want to use. You can cause join
to ignore case when performing comparisons by using the -i
option.
The effect of join
may best be understood through a demonstration. Consider Listing 1.1 and Listing 1.2, which contain data on telephone numbers. Listing 1.1 shows the names associated with those numbers, and Listing 1.2 shows whether the numbers are listed or unlisted.
Listing 1.1: Demonstration file containing telephone numbers and names
Listing 1.2: Demonstration file containing telephone number listing status
You can display the contents of both files using join
:
By default, join
uses the first field as the one to match across files. Because Listing 1.1 and Listing 1.2 both place the phone number in this field, it's the key field in the output. You can specify another field by using the -1
or -2
option to indicate the join field for the first or second file, respectively. For example, type join -1 3 -2 2 cameras.txt lenses.txt to join using the third field in cameras.txt
and the second field in lenses.txt
. The -o
FORMAT option enables more complex specifications for the output file's format. You can consult the man
page for join
for even more details.
The join
command can be used at the core of a set of simple customized database-manipulation tools using Linux text-manipulation commands. It's very limited by itself, though. For instance, it requires its two files to have the same ordering of lines. (You can use the sort
command to ensure this is so.)
Merging Lines with paste
The paste
command merges files line by line, separating the lines from each file with tabs, as shown in the following example, using Listings 1.1 and 1.2 again:
You can use paste
to combine data from files that aren't keyed with fields suitable for use by join
. Of course, to be meaningful, the files' line numbers must be exactly equivalent. Alternatively, you can use paste
as a quick way to create a two-column output of textual data; however, the alignment of the second column may not be exact if the first column's line lengths aren't exactly even.
File-Transforming Commands
Many of Linux's text-manipulation commands are aimed at transforming the contents of files. These commands don't actually change files' contents but instead send the changed files' contents to standard output. You can then pipe this output to another command or redirect it into a new file.
An important file-transforming command is
sed
. This command is very complex and is covered later in this chapter in “Usingsed
.”
Converting Tabs to Spaces with expand
Sometimes text files contain tabs but programs that need to process the files don't cope well with tabs. In such a case, you may want to convert tabs to spaces. The expand
command does this.
By default, expand
assumes a tab stop every eight characters. You can change this spacing with the -t
num or -tabs=
num option, where num is the tab spacing value.
Displaying Files in Octal with od
Some files aren't easily displayed in ASCII. For example, most graphics files, audio files, and so on use non-ASCII characters that look like gibberish. Worse, these characters can do strange things to your display if you try to view such a file with cat
or a similar tool. For instance, your font may change, or your console may begin beeping uncontrollably. Nonetheless, you may sometimes want to display such files, particularly if you want to investigate the structure of a data file.
In such a case, od
(whose name stands for octal dump) can help. It displays a file in an unambiguous format – octal (base 8) numbers by default. For instance, consider Listing 1.2 as parsed by od
:
The first field on each line is an index into the file in octal. For instance, the second line begins at octal 20 (16 in base 10) bytes into the file. The remaining numbers on each line represent the bytes in the file. This type of output can be difficult to interpret unless you're well versed in octal notation and perhaps in the ASCII code.
Although od
is nominally a tool for generating octal output, it can generate many other output formats, such as hexadecimal (base 16), decimal (base 10), and even ASCII with escaped control characters. Consult the man
page for od
for details on creating these variants.
Sorting Files with sort
Sometimes you'll create an output file that you want sorted. To do so, you can use a command that's called, appropriately enough, sort
. This command can sort in several ways, including the following:
Ignore Case
Ordinarily,
sort
sorts by ASCII value, which differentiates between uppercase and lowercase letters. The-f
or-ignore-case
option causessort
to ignore case.Month Sort
The
-M
or-month-sort
option causes the program to sort by three-letter month abbreviation (JAN
throughDEC
).Numeric Sort
You can sort by number by using the
-n
or-numeric-sort
option.Reverse Sort Order
The
-r
or-reverse
option sorts in reverse order.Sort Field
By default,
sort
uses the first field as its sort field. You can specify another field with the-k
field or-key=
field option. (The field can be two numbered fields separated by commas, to sort on multiple fields.)
As an example, suppose you wanted to sort Listing 1.1 by first name. You could do so like this:
The sort
command supports a large number of additional options, many of them quite exotic. Consult sort
's man
page for details.
Breaking a File into Pieces with split
The split
command can split a file into two or more files. Unlike most of the text-manipulation commands described in this chapter, this command requires you to enter an output filename or, more precisely, an output filename prefix, to which is added an alphabetic code. You must also normally specify how large you want the individual files to be:
Split by Bytes
The
-b
size or-bytes=
size option breaks the input file into pieces of size bytes. This option can have the usually undesirable consequence of splitting the file mid-line.Split by Bytes in Line-Sized Chunks
You can break a file into files of no more than a specified size without breaking lines across files by using the
-C=
size or-line-bytes=
size option. (Lines will still be broken across files if the line length is greater than size.)Split by Number of Lines
The
-l
lines or-lines=
lines option splits the file into chunks with no more than the specified number of lines.
As an example, consider breaking Listing 1.1 into two parts by number of lines:
The result is two files, numbersaa
and numbersab
, which together hold the original contents of listing1.1.txt
.
If you don't specify any defaults (as in split listing1.1.txt), the result is output files split into 1,000-line chunks, with names beginning with x
(xaa
, xab
, and so on). If you don't specify an input filename, split
uses standard input.
Translating Characters with tr
The tr
command changes individual characters from standard input. Its syntax is as follows:
You specify the characters you want replaced in a group (SET1) and the characters with which you want them to be replaced as a second group (SET2). Each character in SET1 is replaced with the one at the equivalent position in SET2. Here's an example using Listing 1.1:
The
tr
command relies on standard input, which is the reason for the input redirection (<) in this example. This is the only way to pass the command a file.
This example translates some, but not all, of the uppercase characters to lowercase. Note that SET2 in this example was shorter than SET1. The result is that tr
substitutes the last available letter from SET2 for the missing letters. In this example, the J
in Jones
became a c
. The -t
or -truncate-set1
option causes tr
to truncate SET1 to the size of SET2 instead.
Another tr
option is -d
, which causes the program to delete the characters from SET1. When using -d
, you omit SET2 entirely.
The tr
command also accepts a number of shortcuts, such as [:alnum:]
(all numbers and letters), [:upper:]
(all uppercase letters), [:lower:]
(all lowercase letters), and [:digit:]
(all digits). You can specify a range of characters by separating them with dashes (-
), as in A-M
for characters between A
and M
, inclusive. Consult tr
's man
page for a complete list of these shortcuts.
Converting Spaces to Tabs with unexpand
The unexpand
command is the logical opposite of expand
; it converts multiple spaces to tabs. This can help compress the size of files that contain many spaces and can be helpful if a file is to be processed by a utility that expects tabs in certain locations.
Like expand
, unexpand
accepts the -t
num or -tabs=
num option, which sets the tab spacing to once every num characters. If you omit this option, unexpand
assumes a tab stop every eight characters.
Deleting Duplicate Lines with uniq
The uniq
command removes duplicate lines. It's most likely to be useful if you've sorted a file and don't want duplicate items. For instance, suppose you want to summarize Shakespeare's vocabulary. You might create a file with all of the Bard's works, one word per line. You can then sort this file using sort
and pass it through uniq
. Using a shorter example file containing the text to be or not to be, that is the question
(one word per line), the result looks like this:
Note that the words to
and be
, which appeared in the original file twice, appear only once in the uniq
-processed version.
File-Formatting Commands
The next three commands —fmt
, nl
, and pr
– reformat the text in a file. The first of these is designed to reformat text files, such as when a program's README
documentation file uses lines that are too long for your display. The nl
command numbers the lines of a file, which can be helpful in referring to lines in documentation or correspondence. Finally, pr
is a print-processing tool; it formats a document in pages suitable for printing.
Reformatting Paragraphs with fmt
Sometimes text files arrive with outrageously long line lengths, irregular line lengths, or other problems. Depending on the difficulty, you may be able to cope simply by using an appropriate text editor or viewer to read the file. If you want to clean up the file a bit, though, you can do so with fmt
. If called with no options (other than the input filename, if you're not having it work on standard input), the program attempts to clean up paragraphs, which it assumes are delimited by two or more blank lines or by changes in indentation. The new paragraph formatting defaults to paragraphs that are no more than 75 characters wide. You can change this with the -
width, -w
width, and -width=
width options, which set the line length to width characters.
Numbering Lines with nl
As described earlier, in “Combining Files with cat
,” you can number the lines of a file with that command. The cat
line-numbering options are limited, though, if you need to do complex line numbering. The nl
command is the tool to use in this case. In its simplest form, you can use nl
alone to accomplish much the same goal as cat – b
achieves: numbering all the non-blank lines in a file. You can add many options to nl
to achieve various special effects:
Body Numbering Style
You can set the numbering style for the bulk of the lines with the
-b
style or-body-numbering=
style option, where style is a style format code, described shortly.Header and Footer Numbering Style
If the text is formatted for printing and has headers or footers, you can set the style for these elements with the
-h
style or-header-numbering=
style option for the header and-f
style or-footer-numbering=
style option for the footer.Page Separator
Some numbering schemes reset the line numbers for each page. You can tell
nl
how to identify a new page with the-d=
code or-section-delimiter=
code option, where code is a code for the character that identifies the new page.Line-Number Options for New Pages
Ordinarily,
nl
begins numbering each new page with line 1. If you pass the-p
or-no-renumber
option, though, it doesn't reset the line number with a new page.Number Format
You can specify the numbering format with the
-n
format or-number-format=
format option, where format isln
(left justified, no leading zeros),rn
(right justified, no leading zeros), orrz
(right justified with leading zeros).
The body, header, and footer options enable you to specify a numbering style for each of these page elements, as described in Table 1.3.
Table 1.3 Styles used by nl
As an example, suppose you've created a script, buggy
, but you find that it's not working as you expect. When you run it, you get error messages that refer to line numbers, so you want to create a version of the script with lines that are numbered for easy reference. You can do so by calling nl
with the option to number all lines, including blank lines (-b a
):
Because the input file doesn't have any explicit page delimiters, the output will be numbered in a single sequence. The
nl
command doesn't try to impose its own page-length limits.
The numbered-buggy.txt
file created by this command isn't useful as a script because of the line numbers that begin each line. You can, however, load it into a text editor or display it with a pager such as less
to view the text and see the line numbers along with the commands they contain.
Preparing a File for Printing with pr
If you want to print a plain-text file, you may want to prepare it with headers, footers, page breaks, and so on. The pr
command was designed to do this. In its most basic form, you pass the command a file:
The result is text formatted for printing on a line printer – that is, pr
assumes an 80-character line length in a monospaced font. Of course, you can also use pr
in a pipe, either to accept input piped from another program or to pipe its output to another program. (The recipient program might be lpr
, which is used to print files, as described in Chapter 6, “Configuring the X Window System, Localization, and Printing.”)
By default, pr
creates output that includes the original text with headers, which lists the current date and time, the original filename, and the page number. You can tweak the output format in a variety of ways, including the following:
Generate Multicolumn Output
Passing the
-
numcols or-columns=
numcols option creates output with numcols columns. For example, if you typed pr -3 myfile.txt, the output would be displayed in three columns. Note thatpr
doesn't reformat text; if lines are too long, they're truncated or run over into multiple columns.Generate Double-Spaced Output
The
-d
or-double-space
option causes double-spaced output from a single-spaced file.Use Form Feeds
Ordinarily,
pr
separates pages by using a fixed number of blank lines. This works fine if your printer uses the same number of lines thatpr
expects. If you have problems with this issue, you can pass the-F
,-f
, or-form-feed
option, which causespr
to output a form-feed character between pages. This works better with some printers.Set Page Length
The
-l
lines or-length=
lines option sets the length of the page in lines.Set the Header Text
The
-h
text or-header=
text option sets the text to be displayed in the header, replacing the filename. To specify a multi-word string, enclose it in quotes, as in-header=”My File”
. The-t
or-omit-header
option omits the header entirely.Set Left Margin and Page Width
The
-o
chars or-indent=
chars option sets the left margin to chars characters. This margin size is added to the page width, which defaults to 72 characters and can be explicitly set with the-w
chars or-width
chars option.
These options are just the beginning; pr
supports many more options, which are described in its man
page. As an example of pr
in action, consider printing a double-spaced and numbered version of a configuration file (say, /etc/profile
) for your reference. You can do this by piping together cat
and its -n
option to generate a numbered output, pr
and its -d
option to double-space the result, and lpr
to print the file:
The result should be a printout that might be handy for taking notes on the configuration file. One caveat, though: If the file contains lines that approach or exceed 80 characters in length, the result can be single lines that spill across two lines. The result will be disrupted page boundaries. As a workaround, you can set a somewhat short page length with -l
and use -f
to ensure that the printer receives form feeds after each page:
The
pr
command is built around assumptions about printer capabilities that were reasonable in the early 1980s. It's still useful today, but you might prefer to look into GNU Enscript (www.codento.com/people/mtr/genscript/). This program has many of the same features aspr
, but it generates PostScript output that can take better advantage of modern printer features.
File-Viewing Commands
Sometimes you just want to view a file or part of a file. A few commands can help you accomplish this goal without loading the file into a full-fledged editor.
As described earlier, the
cat
command is also handy for viewing short files.
Viewing the Starts of Files with head
Sometimes all you need to do is see the first few lines of a file. This may be enough to identify what a mystery file is, for instance; or you may want to see the first few entries of a log file to determine when that file was started. You can accomplish this goal with the head
command, which echoes the first 10 lines of one or more files to standard output. (If you specify multiple filenames, each one's output is preceded by a header to identify it.) You can modify the amount of information displayed by head
in two ways:
Specify the Number of Bytes
The
-c
num or-bytes=
num option tellshead
to display num bytes from the file rather than the default 10 lines.Specify the Number of Lines
You can change the number of lines displayed with the
-n
num or-lines=
num option.
Viewing the Ends of Files with tail
The tail
command works just like head
, except that tail
displays the last 10 lines of a file. (You can use the -c
or -bytes
, and -n
or -lines
options to change the amount of data displayed, just as with head
.) This command is useful for examining recent activity in log files or other files to which data may be appended.
The tail
command supports several options that aren't present in head
and that enable the program to handle additional duties, including the following:
Track a File
The
-f
or-follow
option tellstail
to keep the file open and to display new lines as they're added. This feature is helpful for tracking log files because it enables you to see changes as they're made to the file.Stop Tracking on Program Termination
The
-pid=
pid option tellstail
to terminate tracking (as initiated by-f
or-follow
) once the process with a process ID (PID) of pid terminates. (PIDs are described in more detail in Chapter 2, “Managing Software.”)
Some additional options provide more obscure capabilities. Consult tail
's man
page for details.
You can combine
head
withtail
to display or extract portions of a file. For instance, suppose you want to display lines 11 through 15 of a file,sample.txt
. You can extract the first 15 lines of the file withhead
and then display the last five lines of that extraction withtail
. The final command would be head – n 15 sample.txt | tail – n 5.
Paging through Files with less
The less
command's name is a joke; it's a reference to the more
command, which was an early file pager. The idea was to create a better version of more
, so the developers called it less
(“less is more”).
The idea behind less
(and more
, for that matter) is to enable you to read a file a screen at a time. When you type less filename, the program displays the first few lines of filename. You can then page back and forth through the file:
● Pressing the spacebar moves forward through the file a screen at a time.
● Pressing Esc followed by V moves backward through the file a screen at a time.
● The Up and Down arrow keys move up or down through the file a line at a time.
● You can search the file's contents by pressing the slash (/) key followed by the search term. For instance, typing /portable finds the first occurrence of the string
portable
after the current position. Typing a slash followed by the Enter key moves to the next occurrence of the search term. Typing n alone repeats the search forward, while typing N alone repeats the search backward.● You can search backward in the file by using the question mark (?) key rather than the slash key.
● You can move to a specific line by typing g followed by the line number, as in g50 to go to line 50.
● When you're done, type q to exit from the program.
Unlike most of the programs described here, less
can't be readily used in a pipe, except as the final command in the pipe. In that role, though, less
is very useful because it enables you to examine lengthy output conveniently.
Although
less
is quite common on Linux systems and is typically configured as the default text pager, some Unix-like systems usemore
in this role. Many ofless
's features, such as the ability to page backward in a file, don't work inmore
.
One additional less
feature can be handy: Typing h displays less
's internal help system. This display summarizes the commands you may use, but it's long enough that you must use the usual less
paging features to view it all! When you're done with the help screens, just type q as if you were exiting from viewing a help document with less
. This action will return you to your original document.
File-Summarizing Commands
The final text-filtering commands described here are used to summarize text in one way or another. The cut
command takes segments of an input file and sends them to standard output, while the wc
command displays some basic statistics on the file.
Extracting Text with cut
The cut
command extracts portions of input lines and displays them on standard output. You can specify what to cut from input lines in several ways:
By Byte
The
-b
list or-bytes=
list option cuts the specified list of bytes from the input file. (The format of list is described shortly.)By Character
The
-c
list or-characters=
list option cuts the specified list of characters from the input file. In practice, this method and the by-byte method usually produce identical results. (If the input file uses a multibyte encoding system, though, the results won't be identical.)By Field
The
-f
list or-fields=
list option cuts the specified list of fields from the input file. By default, a field is a tab-delimited section of a line, but you can change the delimiting character with the-d
char,-delim=
char, or-delimiter=
char option, where char is the character you want to use to delimit fields. Ordinarily,cut
echoes lines that don't contain delimiters. Including the-s
or-only-delimited
option changes this behavior so that the program doesn't echo lines that don't contain the delimiter character.
Many of these options take a list option, which is a way to specify multiple bytes, characters, or fields. You make this specification by number. It can be a single number (such as 4
), a closed range of numbers (such as 2-4
), or an open range of numbers (such as -4
or 4-
). In this final case, all bytes, characters, or fields from the beginning of the line to the specified number (or from the specified number to the end of the line) are included in the list.
Конец ознакомительного фрагмента. Купить книгу