There 's a standard format for sending an instruction to Unix. In this book, we'll refer to commands and to the command line. Each of Unix's many native commands has a tangible existence as an executable program, and to issue the command is to tell Unix to execute that program. In this section and those that follow, we move fairly quickly through concepts and commands. While we can give you a brief overview of the Unix features we find most useful, this book isn't designed to replace a comprehensive Unix reference book. If you're new to Unix, we strongly recommend that you review the basics of Unix with the help of books such as Learning the Unix Operating System, Running Linux, or Unix for the Impatient. We've provided a list of recommended reading in the Bibliography.
5.2.1 The Command-Line Format
The command line consists of the command itself, optional arguments that modify how the command works, and operands such as files upon which the command operates. For example, the chsh (change shell) command, which we just discussed briefly, has several possible options. The first is the -s option, which must be followed by the name of a shell program as its argument. The second is the -l option, which needs no argument, and which lists the shells that are available on your system. The operand for the chsh command is the username of the user whose shell is being changed. So, to change your default shell program, you might first type:
% chsh -l
which gives you a list of the shell programs available on the system:
/bin/bsh /bin/bash2 /bin/tcsh /bin/csh /bin/ksh /bin/zsh
Then, to actually change your shell to tcsh, you can type:
% chsh -s /bin/tcsh yourusername
Options can simply be single-letter codes, or they can have their own arguments. Options that take no arguments can be given as a group, while each option that takes an argument must be specified
separately. Each option group and separate option must be preceded by a hyphen (-). The last option in a group, or separate options, can be followed by the option argument. The operands follow the final option in the list.
Many Unix commands have options that, frankly, you'll never use. And we're not going to talk about them. But there are ways of finding out more.
5.2.2 Unix Information Commands
Unix has its own built-in reference manual, which is quite comprehensive and informative, and which will give you the correct information about the commands and options available on the particular system you're using.
The man command is one of the most useful Unix commands; it allows you to view Unix manual pages. While some Unix systems have implemented a web browser-like interface to the Unix
manpages, you can't always count on this option being available. The man command is available on all types of Unix systems.
Usage: man name
where name can be a Unix command, such as grep, or a system file, such as the password file
/etc/passwd.
If you're not sure of the command you're looking for, you can sometimes find the right information using man's slightly smarter cousin, apropos. The apropos command locates commands by keyword lookup.
Usage: apropos name
For instance, if you're concerned about disk usage on your system, you can enter apropos usage. The output of this command on our PC running Red Hat Linux is:
du (1) - summarize disk usage
getrlimit, getrusage, setrlimit (2) - get/set resource limits and usage quota (1) - display disk usage and limits
apropos doesn't always produce such brief and informative output. Entering a smart combination of keywords is (as always with such searches) the key to getting the output you want. If you want a predictable listing of Unix commands, it's probably best to pick up a comprehensive Unix book. What should you do if you find the following text in a manpage?
This documentation is no longer being maintained and may be inaccurate or incomplete. The Texinfo documentation is now the authoritative source.
The GNU[1] set of Unix tools are adopting a documentation system, called texinfo, that is different from
the traditional man system. If you come across this message, you should be able to read the up-to-date documentation on the program by typing in the command infoprogname. For instance, info info gives you a complete set of documentation on the use of info and even provides instructions for creating your own info documentation when you start writing your own programs.
[1]
GNU tools are distributed and maintained by the GNU Project at the Free Software Foundation. GNU stands for "GNU's Not Unix" and refers to a complete, Unix- like operating system that's built and maintained by the GNU Project (http://www.Gnu.org).
5.2.3 Standard Input and Output
By default, many Unix commands read from standard input and send their output to standard output. Standard input and output are file descriptors associated with your terminal. A program reading from standard input will simply hang out and wait for you to type something on your keyboard and press the Enter key. A program writing to standard output spews its output to your terminal, sometimes far faster than you can read it.
Some Unix commands read a hyphen (-) surrounded by whitespace on either side as "data from standard input." This construct can then be used in place of a filename in the command line. Absence of an output filename is sufficient to cause the program to write to standard output.
5.2.4 Redirection of Command Input and Output
The standard input and output descriptors are useful because you can redirect both standard input and output, associating them with filenames, with no effects on the functioning of the program. Here are the most common redirection constructs used by the C shell:
<
This redirector preceding a filename associates that filename with standard input, i.e., the contents of the file are presented to the program as if they are standard input.
>
This redirector associates a filename with standard output, so that the filename is created on execution of the command, or whatever is in an existing file of that name is overwritten by the output of the command.
This redirector associates a filename with standard output. It differs from > in that the output of the command is appended to the end of the existing file.
The cat command reads the contents of a file and writes them to standard output. If you want to use the
cat command to combine the contents of three files into one new file, you can use a redirector like this:
% cat file1 file2 file3 > file4
This construct with cat would be useful if, for example, you'd just downloaded a bunch of individual sequence files from the NCBI web site and want to collect them into one large file that can be read by another program. (This is an example of something that seems like it should be simple, but is actually time-consuming and annoying to do with a standard PC word -processing program. Unix provides a neat solution that doesn't even require you to open any files).
You can also use redirectors to direct the contents of a file into a program at run-time, as standard input (useful if you are running a program that prompts you for input from the keyboard) or to capture output from a program that is normally written to standard output:
program < inputfile program > outputfile
For example, let's say you've just finished an extensive BLAST search, and you want to send the results to your colleague. You can use the redirector < ("less than"), to scoop the file huge_blast_report out of your directory and mail it directly to your colleague:
% mail [email protected] < huge_blast_report
If you want to increase the chances of your colleague opening the message, you can add a subject header to the mail message using the mail option -s. The command reads:
% mail -s "surprise!" [email protected] < huge_blast_report
The reverse operation, sending the results of standard output (or text that's displayed on your screen) to a file, can be accomplished using > ("greater than"). Perhaps your colleague wants to write a quick reminder to herself to reply to your mail. She could do it using the cat command to take input from the keyboard and redirect it to a file, like this:
% cat > reminder_to_self
Ha! Send fifteen BLAST reports to colleague on Monday. ^D
%
Ctrl-D (^D) signals that you have finished entering text. Your colleague now has a file called
reminder_to_self in her current working directory.
5.2.5 Operators
Operators are similar to redirectors in that they are ways of directing standard input and output. However, they direct input and output to and from other commands rather than to filenames.
The most commonly used operator is the pipe (|). The pipe directs standard output of one command into standard input for the next command. This allows you to chain together several different filtering commands or programs without creating input or output files each time.
You can use the cat command to direct the contents of a file into a program that reads information from standard input:
% cat inputfile | program
This command construct does the same thing as the example we showed earlier (program < inputfile). Both cause the output of the cat command to act as input for program. If you want to do a lot of runs of the same program using slightly different input, you can create multiple input files and then write a script that cat s each of those input files in turn and pipes their contents to program.
Pipes can carry out a complete set of file-processing options without writing to disk. For instance, imagine that you have a datafile consisting of multiple tables concatenated together. The first table in the file takes up the first 67 lines, the second table takes up the next 100 lines, and the rest of the file is taken up by a third table.[2] You want the information that's contained in the second column of the
middle table, which stretches from characters 30 -39 in the row. Using filters and pipes, you can construct the following command to crop out the data you need:
[2]
This isn't an imaginary format at all. It's pretty close to the format of the output file from a calculation that we do frequently: computing the pKa values of individual amino acids in a protein.
% head -167 protein1.pka | tail -100 | cut -c30-39 > protein1.pka.data
In this example, head sends the top 167 lines of a specified file or files (in this case protein1.pka) to standard output; tail takes the last 100 lines of the output of head; and cut takes the correct column of characters out of the results of head and tail and then stores it in protein1.pka.data.
5.2.6 Wildcard Characters
A useful construct Unix shells recognize is the presence of wildcard characters in filenames. The shell locates matches for any wildcards before passing filenames on to the program. The two most
commonly used wildcards are the asterisk (*) and the question mark (?). * means "any sequence of zero or more characters, except for the / character." ? means "any single character." Thus, "every file in this directory" can be denoted by a lone *, which is a useful shortcut.
The shell recognizes other wildcards as well. The construct [cset ] refers to any characters in the specified set. If you want to move all files beginning with letters a through m to a new directory, you can structure the command as mv [a -m]* ../newdir. If you want to move all files beginning with a number to a new directory, enter mv [0 -9]* ../newdir.
5.2.7 Running X Commands
On Unix systems running the X Window System, there are many commands available that initiate programs with functions that aren't command line-based. Once these programs, which can include anything from graphics viewers to complicated scientific applications, are called from the command line, they use the X Window System to open their own windows, which generally contain a complete,