Читать книгу LPIC-1 Linux Professional Institute Certification Study Guide - Richard Blum - Страница 14

Part I
Exam 101-400
Chapter 1
Exploring Linux Command-Line Tools
Processing Text Using Filters

Оглавление

In keeping with Linux's philosophy of providing small tools that can be tied together via pipes and redirection to accomplish more complex tasks, many simple commands to manipulate text are available. These commands accomplish tasks of various types, such as combining files, transforming the data in files, formatting text, displaying text, and summarizing data.


Many of the following descriptions include input-file specifications. In most cases, you can omit these input-file specifications, in which case the utility reads from standard input instead.

File-Combining Commands

The first text-filtering commands are those used to combine two or more files into one file. Three important commands in this category are cat, join, and paste, which join files end to end based on fields in the file or by merging on a line-by-line basis.

Combining Files with cat

The cat command's name is short for concatenate, and this tool does just that: It links together an arbitrary number of files end to end and sends the result to standard output. By combining cat with output redirection, you can quickly combine two files into one:


Although cat is officially a tool for combining files, it's also commonly used to display the contents of a short file to STDOUT. If you type only one filename as an option, cat displays that file. This is a great way to review short files; but for long files, you're better off using a full-fledged pager command, such as more or less.

You can add options to have cat perform minor modifications to the files as it combines them:

Display Line Ends

If you want to see where lines end, add the -E or -show-ends option. The result is a dollar sign ($) at the end of each line.

Number Lines

The -n or -number option adds line numbers to the beginning of every line. The -b or -number-nonblank option is similar, but it numbers only lines that contain text.

Minimize Blank Lines

The -s or -squeeze-blank option compresses groups of blank lines down to a single blank line.

Display Special Characters

The -T or -show-tabs option displays tab characters as ∧I. The -v or -show-nonprinting option displays most control and other special characters using carat () and M- notations.

The tac command is similar to cat, but it reverses the order of lines in the output:


Joining Files by Field with join

The join command combines two files by matching the contents of specified fields within the files. Fields are typically space-separated entries on a line. However, you can specify another character as the field separator with the -t char option, where char is the character you want to use. You can cause join to ignore case when performing comparisons by using the -i option.

The effect of join may best be understood through a demonstration. Consider Listing 1.1 and Listing 1.2, which contain data on telephone numbers. Listing 1.1 shows the names associated with those numbers, and Listing 1.2 shows whether the numbers are listed or unlisted.


Listing 1.1: Demonstration file containing telephone numbers and names


Listing 1.2: Demonstration file containing telephone number listing status


You can display the contents of both files using join:


By default, join uses the first field as the one to match across files. Because Listing 1.1 and Listing 1.2 both place the phone number in this field, it's the key field in the output. You can specify another field by using the -1 or -2 option to indicate the join field for the first or second file, respectively. For example, type join -1 3 -2 2 cameras.txt lenses.txt to join using the third field in cameras.txt and the second field in lenses.txt. The -o FORMAT option enables more complex specifications for the output file's format. You can consult the man page for join for even more details.

The join command can be used at the core of a set of simple customized database-manipulation tools using Linux text-manipulation commands. It's very limited by itself, though. For instance, it requires its two files to have the same ordering of lines. (You can use the sort command to ensure this is so.)

Merging Lines with paste

The paste command merges files line by line, separating the lines from each file with tabs, as shown in the following example, using Listings 1.1 and 1.2 again:


You can use paste to combine data from files that aren't keyed with fields suitable for use by join. Of course, to be meaningful, the files' line numbers must be exactly equivalent. Alternatively, you can use paste as a quick way to create a two-column output of textual data; however, the alignment of the second column may not be exact if the first column's line lengths aren't exactly even.

File-Transforming Commands

Many of Linux's text-manipulation commands are aimed at transforming the contents of files. These commands don't actually change files' contents but instead send the changed files' contents to standard output. You can then pipe this output to another command or redirect it into a new file.


An important file-transforming command is sed. This command is very complex and is covered later in this chapter in “Using sed.”

Converting Tabs to Spaces with expand

Sometimes text files contain tabs but programs that need to process the files don't cope well with tabs. In such a case, you may want to convert tabs to spaces. The expand command does this.

By default, expand assumes a tab stop every eight characters. You can change this spacing with the -t num or -tabs=num option, where num is the tab spacing value.

Displaying Files in Octal with od

Some files aren't easily displayed in ASCII. For example, most graphics files, audio files, and so on use non-ASCII characters that look like gibberish. Worse, these characters can do strange things to your display if you try to view such a file with cat or a similar tool. For instance, your font may change, or your console may begin beeping uncontrollably. Nonetheless, you may sometimes want to display such files, particularly if you want to investigate the structure of a data file.

In such a case, od (whose name stands for octal dump) can help. It displays a file in an unambiguous format – octal (base 8) numbers by default. For instance, consider Listing 1.2 as parsed by od:


The first field on each line is an index into the file in octal. For instance, the second line begins at octal 20 (16 in base 10) bytes into the file. The remaining numbers on each line represent the bytes in the file. This type of output can be difficult to interpret unless you're well versed in octal notation and perhaps in the ASCII code.

Although od is nominally a tool for generating octal output, it can generate many other output formats, such as hexadecimal (base 16), decimal (base 10), and even ASCII with escaped control characters. Consult the man page for od for details on creating these variants.

Sorting Files with sort

Sometimes you'll create an output file that you want sorted. To do so, you can use a command that's called, appropriately enough, sort. This command can sort in several ways, including the following:

Ignore Case

Ordinarily, sort sorts by ASCII value, which differentiates between uppercase and lowercase letters. The -f or -ignore-case option causes sort to ignore case.

Month Sort

The -M or -month-sort option causes the program to sort by three-letter month abbreviation (JAN through DEC).

Numeric Sort

You can sort by number by using the -n or -numeric-sort option.

Reverse Sort Order

The -r or -reverse option sorts in reverse order.

Sort Field

By default, sort uses the first field as its sort field. You can specify another field with the -k field or -key=field option. (The field can be two numbered fields separated by commas, to sort on multiple fields.)

As an example, suppose you wanted to sort Listing 1.1 by first name. You could do so like this:


The sort command supports a large number of additional options, many of them quite exotic. Consult sort's man page for details.

Breaking a File into Pieces with split

The split command can split a file into two or more files. Unlike most of the text-manipulation commands described in this chapter, this command requires you to enter an output filename or, more precisely, an output filename prefix, to which is added an alphabetic code. You must also normally specify how large you want the individual files to be:

Split by Bytes

The -b size or -bytes=size option breaks the input file into pieces of size bytes. This option can have the usually undesirable consequence of splitting the file mid-line.

Split by Bytes in Line-Sized Chunks

You can break a file into files of no more than a specified size without breaking lines across files by using the -C=size or -line-bytes=size option. (Lines will still be broken across files if the line length is greater than size.)

Split by Number of Lines

The -l lines or -lines=lines option splits the file into chunks with no more than the specified number of lines.

As an example, consider breaking Listing 1.1 into two parts by number of lines:


The result is two files, numbersaa and numbersab, which together hold the original contents of listing1.1.txt.

If you don't specify any defaults (as in split listing1.1.txt), the result is output files split into 1,000-line chunks, with names beginning with x (xaa, xab, and so on). If you don't specify an input filename, split uses standard input.

Translating Characters with tr

The tr command changes individual characters from standard input. Its syntax is as follows:


You specify the characters you want replaced in a group (SET1) and the characters with which you want them to be replaced as a second group (SET2). Each character in SET1 is replaced with the one at the equivalent position in SET2. Here's an example using Listing 1.1:


The tr command relies on standard input, which is the reason for the input redirection (<) in this example. This is the only way to pass the command a file.

This example translates some, but not all, of the uppercase characters to lowercase. Note that SET2 in this example was shorter than SET1. The result is that tr substitutes the last available letter from SET2 for the missing letters. In this example, the J in Jones became a c. The -t or -truncate-set1 option causes tr to truncate SET1 to the size of SET2 instead.

Another tr option is -d, which causes the program to delete the characters from SET1. When using -d, you omit SET2 entirely.

The tr command also accepts a number of shortcuts, such as [:alnum:] (all numbers and letters), [:upper:] (all uppercase letters), [:lower:] (all lowercase letters), and [:digit:] (all digits). You can specify a range of characters by separating them with dashes (-), as in A-M for characters between A and M, inclusive. Consult tr's man page for a complete list of these shortcuts.

Converting Spaces to Tabs with unexpand

The unexpand command is the logical opposite of expand; it converts multiple spaces to tabs. This can help compress the size of files that contain many spaces and can be helpful if a file is to be processed by a utility that expects tabs in certain locations.

Like expand, unexpand accepts the -t num or -tabs=num option, which sets the tab spacing to once every num characters. If you omit this option, unexpand assumes a tab stop every eight characters.

Deleting Duplicate Lines with uniq

The uniq command removes duplicate lines. It's most likely to be useful if you've sorted a file and don't want duplicate items. For instance, suppose you want to summarize Shakespeare's vocabulary. You might create a file with all of the Bard's works, one word per line. You can then sort this file using sort and pass it through uniq. Using a shorter example file containing the text to be or not to be, that is the question (one word per line), the result looks like this:


Note that the words to and be, which appeared in the original file twice, appear only once in the uniq-processed version.

File-Formatting Commands

The next three commands —fmt, nl, and pr– reformat the text in a file. The first of these is designed to reformat text files, such as when a program's README documentation file uses lines that are too long for your display. The nl command numbers the lines of a file, which can be helpful in referring to lines in documentation or correspondence. Finally, pr is a print-processing tool; it formats a document in pages suitable for printing.

Reformatting Paragraphs with fmt

Sometimes text files arrive with outrageously long line lengths, irregular line lengths, or other problems. Depending on the difficulty, you may be able to cope simply by using an appropriate text editor or viewer to read the file. If you want to clean up the file a bit, though, you can do so with fmt. If called with no options (other than the input filename, if you're not having it work on standard input), the program attempts to clean up paragraphs, which it assumes are delimited by two or more blank lines or by changes in indentation. The new paragraph formatting defaults to paragraphs that are no more than 75 characters wide. You can change this with the -width, -w width, and -width=width options, which set the line length to width characters.

Numbering Lines with nl

As described earlier, in “Combining Files with cat,” you can number the lines of a file with that command. The cat line-numbering options are limited, though, if you need to do complex line numbering. The nl command is the tool to use in this case. In its simplest form, you can use nl alone to accomplish much the same goal as cat – b achieves: numbering all the non-blank lines in a file. You can add many options to nl to achieve various special effects:

Body Numbering Style

You can set the numbering style for the bulk of the lines with the -b style or -body-numbering=style option, where style is a style format code, described shortly.

Header and Footer Numbering Style

If the text is formatted for printing and has headers or footers, you can set the style for these elements with the -h style or -header-numbering=style option for the header and -f style or -footer-numbering=style option for the footer.

Page Separator

Some numbering schemes reset the line numbers for each page. You can tell nl how to identify a new page with the -d=code or -section-delimiter=code option, where code is a code for the character that identifies the new page.

Line-Number Options for New Pages

Ordinarily, nl begins numbering each new page with line 1. If you pass the -p or -no-renumber option, though, it doesn't reset the line number with a new page.

Number Format

You can specify the numbering format with the -n format or -number-format=format option, where format is ln (left justified, no leading zeros), rn (right justified, no leading zeros), or rz (right justified with leading zeros).

The body, header, and footer options enable you to specify a numbering style for each of these page elements, as described in Table 1.3.


Table 1.3 Styles used by nl


As an example, suppose you've created a script, buggy, but you find that it's not working as you expect. When you run it, you get error messages that refer to line numbers, so you want to create a version of the script with lines that are numbered for easy reference. You can do so by calling nl with the option to number all lines, including blank lines (-b a):


Because the input file doesn't have any explicit page delimiters, the output will be numbered in a single sequence. The nl command doesn't try to impose its own page-length limits.

The numbered-buggy.txt file created by this command isn't useful as a script because of the line numbers that begin each line. You can, however, load it into a text editor or display it with a pager such as less to view the text and see the line numbers along with the commands they contain.

Preparing a File for Printing with pr

If you want to print a plain-text file, you may want to prepare it with headers, footers, page breaks, and so on. The pr command was designed to do this. In its most basic form, you pass the command a file:


The result is text formatted for printing on a line printer – that is, pr assumes an 80-character line length in a monospaced font. Of course, you can also use pr in a pipe, either to accept input piped from another program or to pipe its output to another program. (The recipient program might be lpr, which is used to print files, as described in Chapter 6, “Configuring the X Window System, Localization, and Printing.”)

By default, pr creates output that includes the original text with headers, which lists the current date and time, the original filename, and the page number. You can tweak the output format in a variety of ways, including the following:

Generate Multicolumn Output

Passing the -numcols or -columns=numcols option creates output with numcols columns. For example, if you typed pr -3 myfile.txt, the output would be displayed in three columns. Note that pr doesn't reformat text; if lines are too long, they're truncated or run over into multiple columns.

Generate Double-Spaced Output

The -d or -double-space option causes double-spaced output from a single-spaced file.

Use Form Feeds

Ordinarily, pr separates pages by using a fixed number of blank lines. This works fine if your printer uses the same number of lines that pr expects. If you have problems with this issue, you can pass the -F, -f, or -form-feed option, which causes pr to output a form-feed character between pages. This works better with some printers.

Set Page Length

The -l lines or -length=lines option sets the length of the page in lines.

Set the Header Text

The -h text or -header=text option sets the text to be displayed in the header, replacing the filename. To specify a multi-word string, enclose it in quotes, as in -header=”My File”. The -t or -omit-header option omits the header entirely.

Set Left Margin and Page Width

The -o chars or -indent=chars option sets the left margin to chars characters. This margin size is added to the page width, which defaults to 72 characters and can be explicitly set with the -w chars or -width chars option.

These options are just the beginning; pr supports many more options, which are described in its man page. As an example of pr in action, consider printing a double-spaced and numbered version of a configuration file (say, /etc/profile) for your reference. You can do this by piping together cat and its -n option to generate a numbered output, pr and its -d option to double-space the result, and lpr to print the file:


The result should be a printout that might be handy for taking notes on the configuration file. One caveat, though: If the file contains lines that approach or exceed 80 characters in length, the result can be single lines that spill across two lines. The result will be disrupted page boundaries. As a workaround, you can set a somewhat short page length with -l and use -f to ensure that the printer receives form feeds after each page:


The pr command is built around assumptions about printer capabilities that were reasonable in the early 1980s. It's still useful today, but you might prefer to look into GNU Enscript (www.codento.com/people/mtr/genscript/). This program has many of the same features as pr, but it generates PostScript output that can take better advantage of modern printer features.

File-Viewing Commands

Sometimes you just want to view a file or part of a file. A few commands can help you accomplish this goal without loading the file into a full-fledged editor.


As described earlier, the cat command is also handy for viewing short files.

Viewing the Starts of Files with head

Sometimes all you need to do is see the first few lines of a file. This may be enough to identify what a mystery file is, for instance; or you may want to see the first few entries of a log file to determine when that file was started. You can accomplish this goal with the head command, which echoes the first 10 lines of one or more files to standard output. (If you specify multiple filenames, each one's output is preceded by a header to identify it.) You can modify the amount of information displayed by head in two ways:

Specify the Number of Bytes

The -c num or -bytes=num option tells head to display num bytes from the file rather than the default 10 lines.

Specify the Number of Lines

You can change the number of lines displayed with the -n num or -lines=num option.

Viewing the Ends of Files with tail

The tail command works just like head, except that tail displays the last 10 lines of a file. (You can use the -c or -bytes, and -n or -lines options to change the amount of data displayed, just as with head.) This command is useful for examining recent activity in log files or other files to which data may be appended.

The tail command supports several options that aren't present in head and that enable the program to handle additional duties, including the following:

Track a File

The -f or -follow option tells tail to keep the file open and to display new lines as they're added. This feature is helpful for tracking log files because it enables you to see changes as they're made to the file.

Stop Tracking on Program Termination

The -pid=pid option tells tail to terminate tracking (as initiated by -f or -follow) once the process with a process ID (PID) of pid terminates. (PIDs are described in more detail in Chapter 2, “Managing Software.”)

Some additional options provide more obscure capabilities. Consult tail's man page for details.


You can combine head with tail to display or extract portions of a file. For instance, suppose you want to display lines 11 through 15 of a file, sample.txt. You can extract the first 15 lines of the file with head and then display the last five lines of that extraction with tail. The final command would be head – n 15 sample.txt | tail – n 5.

Paging through Files with less

The less command's name is a joke; it's a reference to the more command, which was an early file pager. The idea was to create a better version of more, so the developers called it less (“less is more”).

The idea behind less (and more, for that matter) is to enable you to read a file a screen at a time. When you type less filename, the program displays the first few lines of filename. You can then page back and forth through the file:

● Pressing the spacebar moves forward through the file a screen at a time.

● Pressing Esc followed by V moves backward through the file a screen at a time.

● The Up and Down arrow keys move up or down through the file a line at a time.

● You can search the file's contents by pressing the slash (/) key followed by the search term. For instance, typing /portable finds the first occurrence of the string portable after the current position. Typing a slash followed by the Enter key moves to the next occurrence of the search term. Typing n alone repeats the search forward, while typing N alone repeats the search backward.

● You can search backward in the file by using the question mark (?) key rather than the slash key.

● You can move to a specific line by typing g followed by the line number, as in g50 to go to line 50.

● When you're done, type q to exit from the program.

Unlike most of the programs described here, less can't be readily used in a pipe, except as the final command in the pipe. In that role, though, less is very useful because it enables you to examine lengthy output conveniently.


Although less is quite common on Linux systems and is typically configured as the default text pager, some Unix-like systems use more in this role. Many of less's features, such as the ability to page backward in a file, don't work in more.

One additional less feature can be handy: Typing h displays less's internal help system. This display summarizes the commands you may use, but it's long enough that you must use the usual less paging features to view it all! When you're done with the help screens, just type q as if you were exiting from viewing a help document with less. This action will return you to your original document.

File-Summarizing Commands

The final text-filtering commands described here are used to summarize text in one way or another. The cut command takes segments of an input file and sends them to standard output, while the wc command displays some basic statistics on the file.

Extracting Text with cut

The cut command extracts portions of input lines and displays them on standard output. You can specify what to cut from input lines in several ways:

By Byte

The -b list or -bytes=list option cuts the specified list of bytes from the input file. (The format of list is described shortly.)

By Character

The -c list or -characters=list option cuts the specified list of characters from the input file. In practice, this method and the by-byte method usually produce identical results. (If the input file uses a multibyte encoding system, though, the results won't be identical.)

By Field

The -f list or -fields=list option cuts the specified list of fields from the input file. By default, a field is a tab-delimited section of a line, but you can change the delimiting character with the -d char, -delim=char, or -delimiter=char option, where char is the character you want to use to delimit fields. Ordinarily, cut echoes lines that don't contain delimiters. Including the -s or -only-delimited option changes this behavior so that the program doesn't echo lines that don't contain the delimiter character.

Many of these options take a list option, which is a way to specify multiple bytes, characters, or fields. You make this specification by number. It can be a single number (such as 4), a closed range of numbers (such as 2-4), or an open range of numbers (such as -4 or 4-). In this final case, all bytes, characters, or fields from the beginning of the line to the specified number (or from the specified number to the end of the line) are included in the list.


Конец ознакомительного фрагмента. Купить книгу
LPIC-1 Linux Professional Institute Certification Study Guide

Подняться наверх