Every Linux Geek Needs To Know Sed and Awk. Here’s Why…

Datetime:2016-08-23 01:05:31          Topic: AWK           Share

Two of the most criminally under-appreciated Linux utilities are Sed and Awk. Although admittedly they can seem a bit arcane, if you ever have to make repetitive changes to large pieces of code or text, or if you ever have to analyze some text, Sed and Awk are invaluable.

So, what are they? How are they used? And how, when combined together, do they make it easier to process text?

What Is Sed?

Sed was developed in 1971 at Bell Labs , by legendary computing pioneer Lee E. McMahon .

The name stands for stream editor , and that’s kinda what it does. It allows you to edit bodies or streams of text programmatically , through a compact and simple, yet Turing-complete programming language.

The way it works is simple: it reads text, line-by-line into a buffer. For each line, it’ll perform the predefined instructions, where applicable.

For example, if someone was to write a Sed script that replaced the word “beer” with “soda”, and then passed in a text-file that contained the entire lyrics to “99 Bottles of Beer on the Wall”, it would go through that file on a line by line basis, and print out “99 Bottles of Soda on the Wall”, and so on.

The most basic Sed script is a Hello World one. Here, we use the Unix Echo utility, which merely output strings, to print “Hello World”. But we pipe this to Sed, and tell it to replace “World” with”Dave”. Self explanatory stuff.

echo "Hello World" | sed s/world/Dave

You can also combine Sed instructions into files, if you need to do some more complicated editing. Inspired by this hilarious Reddit thread , I’m going to take the lyrics to A-Ha’s Take On Me , and replace each instance of “I”, “Me”, and “My”, with Greg.

First, I’ll put the lyrics to the song in a text file called tom.txt . Then I’ll open up my preferred text editor (my favorite is Vim, butNano andGedit are both excellent choices), and add the following lines. Ensure the file you create ends with .sed.

You might notice that in the example above, I’ve repeated myself (e.g. s/me/Greg/ and  s/Me/Greg/). That’s because some versions of Sed, like the one that ships with Mac OS X, do not support case-insensitive matching. As a result, we have to write a two Sed instructions for each word, so it recognizes the capitalized and uncapitalized version.

This won’t work perfectly, as though you’ve replaced each instance of “I”, “Me”, and “My” by hand. Remember, we’re just using this as an exercise to demonstrate how you can group Sed instructions into one script, and then execute them with a single command.

Then, we need to invoke the file. To do that, we run this command.

cat tom.txt | sed -f greg.sed

Let’s slow down and look at what this does. Eagle-eyed readers will have noticed the we’re not using Echo here. We’re using Cat. That’s because while Cat will print out the entire contents of the file, echo will only print out the file name. You’ll have also noticed that we’re running Sed with the “-f” flag. This tells it to open the script as a file.

The end result is this.

It’s also worth noting that Sed supports regular expressions (REGEX). These allow you to define patterns in text, using a special and complicated syntax.

Here’s an example of how that might work. We’re going to take the aforementioned song lyrics, but use regex to print out every line that doesn’t start with “Take”.

cat tom.txt | sed /^Take/d

Sed is, of course, incredibly useful. But it’s even more powerful when combined with Awk.

What Is Awk?

Awk , like Sed, is a programming language designed for dealing with large bodies of text. But while Sed is used to process and modify text, Awk is mostly used as a tool for analysis and reporting .

Like Sed, Awk was first developed at Bell Labs in the 1970s. Its name doesn’t come from what the program does , but rather the surnames of each of the authors – Alfred Aho, Peter Weinberger, and Brian Kernaghan.

Awk works by reading a text file or input stream one line at a time. Each line is scanned to see if it matches a predefined pattern. If a match is found, an action is performed.

But while Sed and Awk may share similar purposes, they’re two completely different languages, with two completely different design philosophies. Awk more closely resembles some general purpose languages , like C, Python and Bash. It has things like functions, and a more C-like approach to things like iteration and variables (James Bruce explained how iteration works ). Put simply, it feels more like a programming language.

So, let’s try it out. Using the lyrics to Take On Me, we’re going to print all the lines that are longer than 20 characters.

awk ' length($0) > 80 ' tom.txt

awk-length

The next example I’ve shamelessly cribbed from the official Awk documentation . But it’s a great example of the potential of this powerful, yet tiny language. It’s also a great demonstration of how things like iteration and variables work in it.  First, create a file called “WordCount.awk”, and add the following lines.

{
 for (i = 1; i <= NF; i++)
 freq[$i]++
}
END {
 for (word in freq)
 printf "%s\t%d\n", word, freq[word]
}

Save it, and then run it with the following command.

awk -f WordCount.awk tom.txt

Cool, right? You’ll probably notice that they’re not in any kind of order. You can sort the results using the Unix sort utility. But we’ll leave that for another day. We’re going to keep it simple.

Combining The Two

Awk and Sed are both incredibly powerful when combined. You can do this by using Unix pipes. Those are the “|” bits between commands.

Let’s try this: We’re going to list all the lines in Take On Me that have more than 20 characters, using Awk. Then, we’re going to strip all the lines that begin with “Take” . Together, it all looks like this:

awk 'length($0)>20' tom.txt | sed /^Take/d

And produces this:

Now let’s flip that around. We’re going to start by removing all the lines that start with Take, and then pipe them to Awk, where we’ll count how many times each word appears. It looks a bit like this:

cat tom.txt | sed /^Take/d | awk -f WordCount.awk

The Power Of Sed and Awk

There’s only so much you can explain in a single article. But I hope I’ve illustrated how immeasurably powerful Sed and Awk are. Simply put, they’re a text-processing powerhouse.

So, why should you care? Well, besides the fact that you never know when you need to make predictable, repetitive changes to a text document, Sed and Awk are great for parsing log files. This is especially handy when you’re trying to debug a problemin your LAMP server, or looking at your access logs to see whether your server has been hacked.

Have you found an interesting use for Sed and Awk? Are there any other Linux utilities you feel are under-appreciated? Let me know in the comments below, and we’ll chat.

Join live MakeUseOf Groups on Grouvi App Join live Groups on Grouvi

Master the Linux Command Line

369 Members

Join

Linux for New Switchers

270 Members

Join

Best Linux Apps

214 Members

Join

Linux Distros Talk

164 Members

Join

Get a free Grouvi app (iOS/Android) to participate in MakeUseOf live chat groups.

Enter your mobile number to receive a free text message with the download link for the app.





About List