uniq
The uniq
command, by default (with no options) will look for consecutive lines in a file, and will only output 1 copy of a row of the file for each batch of consecutive lines that agree. For instance, if there are five copies of the same row of data in a file, uniq
will only print that row one time.
The uniq
command will not, however, sort the file first. For this reason, it is common to use the uniq
command in a pipeline, after the sort
command.
There are several helpful options available with the uniq
utility. For instance, the -c
option prints the number of copies of the line (at the far left of the line).
The -u
option indicates that we only want to print lines that are isolated, in other words, which are not in a batch of consecutive repeated lines. In contrast, the -d
option tells uniq
that we only want to lines that do occur in a batch of two or more consecutive repeated lines.
The -n
option (where n
is an integer) signifies that we want to skip the first n
lines of each row of data, when comparing whether lines are equal. FIXTHIS page 419 in Section 21.20 of the Unix Power Tools book has an example with -4
that demonstrates how this is useful. In that example, the first four columns of a file are timestamps that should not be considered, when comparing whether lines are identical. (In other words, in that example, the uniq
command only pays attention to the 5th field and beyond, when comparing the rows to see if they are unique.)