Tuesday, May 29, 2012

Shell: How To Remove Duplicate Text Lines

Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

A.. You need to use shell pipes along with following two utilities:
a] sort command - sort lines of text files
b] uniq command - report or omit repeated lines

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Here is a sample test file called garbage.txt:
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog
Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:
food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire
Where,
  • -u : check for strict ordering, remove all duplicate lines.

No comments:

Post a Comment