We’ve covered getting up and running with Terminal, and addressed some basic Terminal commands. Now, we can start digging into more useful Terminal techniques.
What is grep?
grep is a command line utility that searches plain text. It takes whatever input you provide and searches for a specific search term, or “string.” And because it supports regular expressions, those strings can get extremely complicated—and extremely powerful. When used correctly, it can search faster than Spotlight, with more control and more exact results.
The utilities somewhat unusual name comes from its origin. In an old command line utility called ed
, the command g/re/p
would print all lines matching a previously defined search string. That functionality was then built into its own utility, but retained the same cryptic name. Today, the command is so popular that grep is often used as a verb, as in the phrase, “You can’t grep dead trees.”
Using grep
A grep command has three parts: the command, the search string, and the search target. If no search target is specified, grep will search the “standard input,” or whatever is currently displayed on the command line.
By default, grep will return a list of all the matches within a specific file.
grep also matches partial strings by default. For example, searching for "Au"
instead of "Austin"
will return Austin, as well as all the other cities with "Au"
in the name. And because grep is case-sensitive by default, I’ll only see results that include a capital A followed by a lower-case U.
Sending Input to grep with Pipes
You can send input to grep using the pipe character ( | ), found above the Enter key on your keyboard. As we covered earlier, this command sends the output one of command to the input of another command.
For example, if I want to search a directory for a specific file, I could pipe the output of ls to grep using the command below:
ls -l | grep filename
Of course, I could accomplish this task with grep alone, but this way might seem more fluid to some users.
grep and Regular Expressions
Properly formatting your search terms is a major part of using grep successfully. The command uses regular expressions, also called regex, to format search terms. Regex is a methodology for defining a sequence of characters as a search term, using special strings called “control characters” to make searching more powerful. If your familiar with Boolean search terms, it’s a similar concept, but much more advanced.
Regular expressions can take months to master, but here’s a guide for the most useful control characters:
- .: the period character is a wildcard, meaning that any character (except for a newline) will match it.
- ^: the caret indicates a match must occur at the start of a word boundary. Use this at the start of your search string.
- $: the dollar sign indicates a match must occur at the end of a word boundary. Use this at the end of your search string.
- *: repeat the previous character as many times as necessary to get a match. For example, 0* would match one or more zeros.
- \d: match any digit.
- \w: match any alphanumeric character.
- {n}: repeat the previous character exactly n times. For example, w{3} would match the string www but not ww.
If you want to dig into all regular expressions can do, Princeton has a thorough guide. You can also get a quick regular expressions introduction here.
grep doesn’t use perfectly normal regular expressions, however. For example, regular expressions can technically contain spaces, but grep gets confused by them. If you need to search for two words separated by spaces, for example, try and put the whole search string in quotes.
grep Flags
grep also includes a number of essential flags. The following are the most important:
- -i: ignore case when matching. For example,
grep -i unix
would return Unix, UNIX, unix, uNIX and more. - -r: search recursively, examining each file in the provided directory.
- -w: use “whole word” matches. Words, in grep’s definition, are text strings surrounded by whitespace. For example, if you use
grep -w book foo.txt
, “booklet” would not be a valid match. - -x: like word matches, but for entire lines instead of words.
- -l: returns only the names of files containing a valid match
- -v: returns all terms that don’t match the search string.
- -n: list the line number for each match.
- -e: indicates that the following text is a search term formatted as a regular expression. Is most useful for specifying more than one simultaneous search string, or for regular expressions that start with a dash.
grep Code Examples
$ grep boo pass.txt
Search pass.txt file for the search term boo
$ grep -r boo /etc/
Search each file in directory /etc/ for boo
$ grep -w "Star" movienames.txt
Search movienames.txt for the string “Star” bounded by whitespace, matching “Star Wars” and “Star Trek” but ignoring “Stargate.”
$ grep -l "main" *.c
List the filenames only of any file with the extension .c with the search term “main” inside the file.
$ grep -w -e "Saint" -e "St." cities.txt
Search cities.txt for entries that include the full word “Saint” or “St.” in the name. If I want to search for two terms simultaneously, I can use the -e
flag to indicate that the following string should be considered a regular expression. By doing that twice, I can get grep to use two search terms at once.
$ grep -e ^"(\d{3}) \d{3}-\d{4}"$ contactlist.csv
Search contactlist.csv for any properly formatted phone number. This search looks for three digits inside parentheses, a space, three more digits, a dash, and finally four digits. This matches the standard formatting for U.S. phone numbers: (123) 456-7890
grep -e ^\w*@\w*.\w{3}$ contactlist.csv > emails.txt
Search contactlist.csv for all email addresses with standard top-level domains and send them to a new text file named emails.txt.
Conclusion
Even at the most basic usage, grep is a powerful and useful command. But as your grepping skills grow, you’ll see more and more uses for this flexible utility. It’s a great example of the power of the command line.
You might also like: