grep (逐行处理文本)

舞夕之 发表于 2020-07-11 03:56
浏览次数:
在手机上阅读

在类Unix的操作系统上,grep命令逐行处理文本,并打印与指定模式匹配的任何行。 本文介绍grep的GNU/Linux版本

查看英文版

目录

1 grep 运行系统环境

2 grep 语法

3 grep 示例

4 grep 选项

5 grep 示例

grep 运行系统环境

Unix&Linux

grep 语法

grep [OPTIONS] PATTERN [FILE...]

概述

Grep代表“global regular expression print”,是一个强大的工具,用于将正则表达式与文件、多个文件或输入流中的文本进行匹配。它将搜索您在命令行上指定的文本模式,并为您输出结果。

grep [OPTIONS] PATTERN [FILE...]

Overview

Grep, which stands for "global regular expression print," is a powerful tool for matching a regular expression against text in a file, multiple files, or a stream of input. It searches for the PATTERN of text that you specify on the command line, and outputs the results for you.

查看英文版

查看中文版

grep 示例

假设想要在您的机器上的HTML文件中快速找到短语“Our Products”。让我们从搜索单个文件开始。在这里,我们的模式是“我们的产品”,我们的文件是product-listing.html。

发现包含我们的模式的一行,grep将整个匹配行输出到终端。该行比我们的终端宽度长,因此文本环绕到以下行,但此输出正好对应于我们文件中的一行。

注意:模式被grep解释为正则表达式。在上面的例子中,我们使用的所有字符(字母和空格)都是在正则表达式中按字面意思解释的,因此只匹配精确的短语。其他的字符有特殊的含义,例如一些标点符号。

以彩色查看grep输出

如果我们使用--color选项,我们的成功匹配将突出显示:

查看成功匹配的行数

如果我们知道匹配的行出现在文件中的什么位置,它将更加有用。如果我们指定-n选项,grep将在每个匹配的行前加上行号:

我们匹配的行以“18:”为前缀,这告诉我们这对应于文件中的第18行。

执行不区分大小写的grep搜索

如果“Our Products”出现在句子的开头,或者全部用大写字母出现,该怎么办?我们可以指定-i选项来执行不区分大小写的匹配:

使用-i选项,grep也可以在第23行找到匹配项。

使用通配符搜索多个文件

如果要搜索多个文件,可以在文件名中使用通配符搜索所有文件。而不是指定产品-列表.html,我们可以使用星号(“*”)和.html扩展名。当执行命令时,shell将把星号扩展到它找到的(在当前目录中)以“.html”结尾的任何文件的名称。

请注意,每行都从出现匹配的特定文件开始。

递归搜索子目录

我们可以使用-r选项将搜索扩展到子目录及其包含的任何文件,该选项告诉grep递归地执行搜索。让我们将文件名改为星号(“*”),这样它将匹配任何文件或目录名,而不仅仅是HTML文件:

这给了我们三个额外的匹配。请注意,不在当前目录中的任何匹配文件的目录名都包括在内。

使用正则表达式执行更强大的搜索

grep的真正威力在于它可以用来匹配正则表达式。(这就是“grep”中“re”的意思)。正则表达式在模式字符串中使用特殊字符来匹配更广泛的字符串数组。让我们看一个简单的例子。

假设您希望在HTML文件中找到与“our products”相似的短语,但该短语应该始终以“our”开头,以“products”结尾。我们可以指定这样的模式:“我们的。*产品”。

在正则表达式中,句点(“.”)。被解释为单字符通配符。它的意思是“出现在此位置的任何字符都将匹配。”星号(“*”)表示“前面出现零次或多次的字符将匹配。”因此,组合“.*”将匹配任意数量的任意字符。例如,“我们令人惊叹的产品”,“我们的,有史以来最好的产品”,甚至“我们的产品”都会匹配。因为我们指定了-i选项,所以“Our Products”和“OuRpRoDuCtS”也将匹配。

在这里,我们也找到了匹配的短语“our fine products”。

Grep是一个功能强大的工具,可以帮助您处理文本文件,当您习惯于使用正则表达式时,它会变得更加强大。

技术说明

grep在命名的输入文件(如果没有文件命名,或者如果文件名是一个短划线(“-”)作为文件名,则为标准输入)搜索包含与给定模式匹配的行。默认情况下,grep打印匹配的行。

此外,还提供了三个变体程序egrep、fgrep和rgrep:

  • egrep与运行grep-E相同,在这种模式下,grep将模式字符串作为扩展正则表达式(ERE)进行计算。如今,ERE并没有“扩展”到基本正则表达式之外,但它们仍然非常有用。 
  • fgrep与运行grep-F相同。在这种模式下,grep将模式字符串计算为“固定字符串”——字符串中的每个字符都按字面意思处理。例如,如果字符串包含星号(“*”),grep将尝试将其与实际星号匹配,而不是将其解释为通配符。如果字符串包含多行(如果包含换行符),则每一行都将被视为固定字符串,其中任何一行都可以触发匹配。
  • rgrep与运行grep-r相同。在这种模式下,grep将递归地执行搜索。如果遇到一个目录,它将遍历到该目录并继续搜索。(符号链接被忽略;如果要搜索符号链接的目录,则应该使用-R选项)。

在较旧的操作系统中,egrep、fgrep和rgrep是具有自己的可执行文件的不同程序。在现代系统中,这些特殊的命令名是启用了适当标志的grep的快捷方式。它们在功能上是等效的。

Let's say want to quickly locate the phrase "our products" in HTML files on your machine. Let's start by searching a single file. Here, our PATTERN is "our products" and our FILE is product-listing.html.

A single line was found containing our pattern, and grep outputs the entire matching line to the terminal. The line is longer than our terminal width so the text wraps around to the following lines, but this output corresponds to exactly one line in our FILE.

Note: The PATTERN is interpreted by grep as a regular expression. In the above example, all the characters we used (letters and a space) are interpreted literally in regular expressions, so only the exact phrase will be matched. Other characters have special meanings, however — some punctuation marks, for example. For more information, see our Regular Expression Quick Reference.

Viewing grep output in color

If we use the --color option, our successful matches will be highlighted for us:

Viewing line numbers of successful matches

It will be even more useful if we know where the matching line appears in our file. If we specify the -n option, grep will prefix each matching line with the line number:

Our matching line is prefixed with "18:" which tells us this corresponds to line 18 in our file.

Performing case-insensitive grep searches

What if "our products" appears at the beginning of a sentence, or appears in all uppercase? We can specify the -i option to perform a case-insensitive match:

Using the -i option, grep finds a match on line 23 as well.

Searching multiple files using a wildcard

If we have multiple files to search, we can search them all using a wildcard in our FILE name. Instead of specifying product-listing.html, we can use an asterisk ("*") and the .html extension. When the command is executed, the shell will expand the asterisk to the name of any file it finds (within the current directory) which ends in ".html".

Notice that each line starts with the specific file where that match occurs.

Recursively searching subdirectories

We can extend our search to subdirectories and any files they contain using the -r option, which tells grep to perform its search recursively. Let's change our FILE name to just an asterisk ("*"), so that it will match any file or directory name, and not just HTML files:

This gives us three additional matches. Notice that the directory name is included for any matching files that are not in the current directory.

Using regular expressions to perform more powerful searches

The true power of grep is that it can be used to match regular expressions. (That's what the "re" in "grep" stands for). Regular expressions use special characters in the PATTERN string to match a wider array of strings. Let's look at a simple example.

Let's say you want to find every occurrence of a phrase similar to "our products" in your HTML files, but the phrase should always start with "our" and end with "products". We can specify this PATTERN instead: "our.*products".

In regular expressions, the period (".") is interpreted as a single-character wildcard. It means "any character that appears in this place will match." The asterisk ("*") means "the preceding character, appearing zero or more times, will match." So the combination ".*" will match any number of any character. For instance, "our amazing products", "ours, the best-ever products", and even "ourproducts" will match. And because we're specifying the -i option, "OUR PRODUCTS" and "OuRpRoDuCtS will match as well. Let's run the command with this regular expression, and see what additional matches we can get:

Here, we also got a match from the phrase "our fine products".

Grep is a powerful tool that can help you work with text files, and it gets even more powerful when you become comfortable using regular expressions.

Technical Description

grep searches the named input FILEs (or standard input if no files are named, or if a single dash ("-") is given as the file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.

Also, three variant programs egrep, fgrep and rgrep are available:

  • egrep is the same as running grep -E. In this mode, grep evaluates your PATTERN string as an extended regular expression (ERE). Nowadays, ERE does not "extend" very far beyond basic regular expressions, but they can still be very useful. 
  • fgrep is the same as running grep -F. In this mode, grep evaluates your PATTERN string as a "fixed string" — every character in your string is treated literally. For example, if your string contains an asterisk ("*"), grep will try to match it with an actual asterisk rather than interpreting this as a wildcard. If your string contains multiple lines (if it contains newlines), each line will be considered a fixed string, and any of them can trigger a match.
  • rgrep is the same as running grep -r. In this mode, grep will perform its search recursively. If it encounters a directory, it will traverse into that directory and continue searching. (Symbolic links are ignored; if you want to search directories that are symbolically linked, you should use the -R option instead).

In older operating systems, egrep, fgrep and rgrep were distinct programs with their own executables. In modern systems, these special command names are shortcuts to grep with the appropriate flags enabled. They are functionally equivalent.

查看英文版

查看中文版

grep 选项

常规选项
--help 打印帮助消息,简要概述命令行选项,然后退出。
-V,-- version 打印grep的版本号,然后退出。
比赛选择选项
-E,--extended-regexp 将PATTERN解释为扩展的正则表达式(请参见基本与扩展的正则表达式)。
-F, --fixed-strings 将PATTERN解释为要匹配的固定字符串列表,以换行符分隔。
-G,-- basic-regexp 将PATTERN解释为基本正则表达式(请参见基本与扩展正则表达式)。这是运行grep时的默认选项。
-P,-- perl-regexp 将PATTERN解释为Perl正则表达式。此功能仍处于试验阶段,可能会产生警告消息。
匹配控制选项
-e PATTERN,-regexp=PATTERN 使用PATTERN作为匹配的模式。这可用于指定多个搜索模式,或保护以破折号(-)开头的模式。
-f FILE,--file = FILE 从FILE获取模式,每行一个。
-i,--ignore-case 忽略PATTERN和输入文件中的大小写区别。
-v, --invert-match 反转匹配感,以选择不匹配的行。
-w,--word-regexp 仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须在该行的开头,或者必须在非单词组成字符之前。或者,它必须在行的末尾,或后跟非单词的组成字符。单词组成的字符是字母,数字和下划线。
-x,-- line-regexp 仅选择与整行完全匹配的匹配项。
-y 与-i相同。
通用输出控制
-c,-- count 代替正常输出,为每个输入文件打印匹配行数。使用-v,-- invert-match选项(请参见下文),计算不匹配的行。
--color [= WHEN],-- colour [=WHEN ] 匹配的(非空)字符串,匹配行,上下文行,文件名,行号,字节偏移量和分隔符(用于字段和上下文行组)用转义序列括起来,以在终端上以彩色显示它们。颜色由环境变量GREP_COLORS定义。仍支持较早的环境变量GREP_COLOR,但其设置没有优先级。WHEN是永远,永远,或汽车。
-L, --files-without-match 代替普通输出,打印每个输入文件的名称,通常不会从该文件中打印输出。扫描将在第一个匹配项时停止。
-l,--files-with-matches 代替正常的输出,打印通常会从中打印输出的每个输入文件的名称。扫描将在第一个匹配项时停止。
-m NUM,--max-count = NUM 在NUM条匹配的行之后停止读取文件。如果输入是来自常规文件的标准输入,并且输出NUM条匹配行,则grep确保将标准输入定位在退出之前的最后一条匹配行之后,而不管尾随上下文行是否存在。这使呼叫过程可以恢复搜索。当grep在NUM条匹配行之后停止时,它将输出任何尾随上下文行。当还使用-c或--count选项时,grep不会输出大于NUM的计数。当-v或--invert-match还使用了option,grep在输出NUM条不匹配的行后停止。
-o,-- only-matching 仅打印匹配行的匹配(非空)部分,每个这样的部分都在单独的输出行上。
-q,-- quiet,--silent 安静; 不要在标准输出中写任何东西。如果发现任何匹配项,即使检测到错误,也以零状态立即退出。另请参见-s或--no-messages选项。
-s, --no-messages 禁止显示有关不存在或不可读文件的错误消息。
输出线前缀控制
-b,-- byte-offset 在输出的每一行之前,在输入文件中打印基于0的字节 偏移量。如果指定-o(-- only-matching),则打印匹配部分本身的偏移量。
-H,--with-filename 打印每个匹配项的文件名。当要搜索多个文件时,这是默认设置。
-h,-- no-filename 在输出中禁止文件名的前缀。当只有一个文件(或只有标准输入)要搜索时,这是默认设置。
--label=LABEL 显示实际上来自标准输入的输入作为来自文件LABEL的输入。在实现zgrep之类的工具时,这尤其有用,例如gzip -cd foo.gz | grep --label = foo -H某物。另请参见-H选项。
-n,--line-number 在输出的每一行之前,在其输入文件中添加从1开始的行号。
-T,-- initial-tab 确保实际行内容的第一个字符位于制表位上,以使制表符的对齐看起来正常。这对于将其输出前缀为实际内容的选项很有用:-H,-n和-b。为了提高单个文件中的行全部从同一列开始的可能性,这还将使行号和字节偏移(如果存在)以最小尺寸的字段宽度打印。
-u,--unix-byte-offsets 报告Unix样式的字节偏移量。此开关使grep报告字节偏移,就好像该文件是Unix样式的文本文件一样,即,去除了CR字符。这将产生与在Unix机器上运行grep相同的结果。除非也使用-b选项,否则该选项无效。它对MS-DOS和MS-Windows以外的平台没有影响。
-Z,-- null 输出零字节(ASCII NUL字符),而不是通常在文件名后的字符。例如,grep -lZ在每个文件名之后输出一个零字节,而不是通常的newline。即使存在包含不寻常字符(例如换行符)的文件名,此选项也可以使输出明确。此选项可与find -print0,perl -0,sort -z和xargs -0等命令一起使用,以处理任意文件名,即使是包含换行符的文件名。
上下文线控制
-A NUM,--after-context = NUM 在匹配的行之后打印NUM行尾随上下文。在连续的匹配组之间放置包含组分隔符(-)的行。使用-o或--only-matching选项,此选项无效,并给出警告。
-B NUM,--before-context = NUM 在匹配行之前打印前导上下文的NUM行。在连续的匹配组之间放置包含组分隔符(-)的行。使用-o或--only-matching选项,此选项无效,并给出警告。
-C NUM,- NUM,--context = NUM 打印NUM行输出上下文。在连续的匹配组之间放置包含组分隔符(-)的行。使用-o或--only-matching选项,此选项无效,并给出警告。
文件和目录选项
-a,-- text 像对待文本一样处理二进制文件;这等效于--binary-files = text选项。
--binary-files=TYPE 如果文件的前几个字节指示该文件包含二进制数据,则假定该文件的类型为TYPE。默认情况下,TYPE为二进制,并且grep通常输出单行消息,表明二进制文件匹配,如果不匹配,则不输出消息。如果TYPE为不匹配,则grep假定二进制文件不匹配;否则,grep假定二进制文件不匹配。这等效于-I选项。如果TYPE为text,则grep将二进制文件视为文本;这等效于-a选项。警告:grep --binary-files = text 可能会输出二进制垃圾,如果输出是终端并且终端驱动程序将其中一些解释为命令,则二进制垃圾可能会带来讨厌的副作用。
-D ACTION,--devices = ACTION 如果输入文件是设备,FIFO或套接字,请使用ACTION进行处理。默认情况下,ACTION为read,这意味着设备就像普通文件一样被读取。如果ACTION为skip,则设备将以静默方式跳过。
-d ACTION,--directories = ACTION 如果输入文件是目录,请使用ACTION进行处理。默认情况下,ACTION为read,即读取目录,就像它们是普通文件一样。如果ACTION为skip,则静默跳过目录。如果ACTION是recurse,则仅当它们在命令行上时,才按照符号链接递归地读取每个目录下的所有文件。这等效于-r选项。
--exclude = GLOB 跳过基本名称与GLOB匹配的文件(使用通配符匹配)。一个文件名水珠可以使用*,?和[...]作为通配符和\引用通配符或反斜杠字符字面。
--exclude-from=FILE 跳过基本名称与从FILE读取的任何文件名名称匹配的文件(使用通配符匹配,如--exclude所述)。
--exclude-dir =DIR 从递归搜索中排除与模式DIR匹配的目录。
-I 处理二进制文件,就好像它不包含匹配数据一样;这等效于--binary-files = without-match选项。
--include = GLOB 仅搜索基本名称与GLOB匹配的文件(使用通配符匹配,如--exclude所述)。
-r, --recursive 仅在命令行上,才遵循符号链接递归地读取每个目录下的所有文件。这等效于-d recurse选项。
-R,--dereference-recursive 递归读取每个目录下的所有文件。跟随所有符号链接,这与-r不同。
其他选项
--line-buffered 在输出上使用行缓冲。这可能会导致性能下降。
--mmap 如果可能,请使用mmap系统调用读取输入,而不是默认的读取系统调用。在某些情况下,-- mmap会产生更好的性能。但是,如果在grep运行时输入文件缩小,或者发生I / O错误,则--mmap可能导致未定义的行为(包括核心转储)。
-U,--binary 将文件视为二进制文件。默认情况下,在MS-DOS和MS-Windows下,grep通过查看从文件读取的前32 KB的内容来猜测文件类型。如果grep认为文件是文本文件,它将从原始文件内容中删除CR字符(以使带有^和$的正则表达式正确运行)。指定-U会否决此猜测,导致所有文件都被逐字读取并传递给匹配机制;如果文件是每行末尾带有CR / LF对的文本文件,则将导致某些正则表达式失败。此选项对MS-DOS和MS-Windows以外的平台无效。
-z, --null-data 将输入视为一组行,每行以零字节(ASCII NUL字符)而不是换行符结尾。与-Z或--null选项一样,此选项可与sort -z之类的命令一起使用以处理任意文件名。
常用表达

正则表达式是描述一组字符串的模式。通过使用各种运算符组合较小的表达式,可以类似于算术表达式来构造正则表达式。

grep理解正则表达式语法的三种不同版本:“基本”(BRE),“扩展”(ERE)和“ perl”(PRCE)。在GNU grep中,基本语法和扩展语法之间的可用功能没有区别。在其他实现中,基本正则表达式的功能较弱。以下描述适用于扩展的正则表达式;基本正则表达式的差异将在后面总结。Perl正则表达式提供了其他功能。

基本的构建块是与单个字符匹配的正则表达式。大多数字符(包括所有字母和数字)都是匹配自己的正则表达式。任何具有特殊含义的元字符都可以在其前面加上反斜杠来引用。

句点(。)匹配任何单个字符。

字符类和括号表达式

方括号表达式是由[和]括起来的字符的列表。它匹配该列表中的任何单个字符;如果列表的第一个字符是插入符号^,则它匹配列表中未包含的任何字符。例如,正则表达式[0123456789]与任何一位数字匹配。

在方括号表达式中,范围表达式由两个字符组成,并用连字符分隔。它使用语言环境的整理顺序和字符集来匹配在两个字符(包括两个字符)之间排序的任何单个字符。例如,在默认的C语言环境中,[ad]等效于[abcd]。许多语言环境都按字典顺序对字符进行排序,在这些语言环境中,[ad]通常不等同于[abcd];例如,它可能等效于[aBbCcDd]。要获得括号表达式的传统解释,可以通过将LC_ALL 环境变量设置为值C来使用C语言环境。

最后,在括号表达式中预定义了某些命名的字符类,如下所示。它们的名称不言自明,分别是[:alnum:],[:alpha:],[:cntrl:],[:digit:],[:graph:],[:lower:],[:print:],[:punct:],[:space:],[:upper:]和[:xdigit:]。例如,[[:alnum:]]表示当前语言环境中的数字和字母的字符类。在C语言环境和ASCII字符集编码中,这与[0-9A-Za-z]相同。(请注意,这些类名称中的方括号是符号名的一部分,并且除了界定方括号的方括号外,还必须包括这些方括号。)大多数元字符在方括号表达式中失去其特殊的含义。要包含文字],请将其放在列表的第一位。同样,要包含文字^,请先将其放置在其他任何地方。最后,要包含文字-,请将其放在最后。

锚定

插入符号^和美元符号$是元字符,分别与行的开头和结尾处的空字符串匹配。

反斜杠字符和特殊表达

符号\< and \>分别与单词开头和结尾的空字符串匹配。符号\ b匹配单词边缘的空字符串,\ B匹配单词不在单词边缘的空字符串。符号\ w是[_[:alnum:]]的同义词,\ W是[^_[:alnum:]]的同义词。

重复

正则表达式后可以跟几个重复运算符之一:

? 上一项是可选的,最多匹配一次。
* 前一项将被匹配零次或多次。
+ 前一项将被匹配一次或多次。
{ n } 上一项完全匹配n次。
{ n ,} 前一项匹配n次或多次。
{ n ,m } 前一项至少匹配n次,但不超过m次。
串联

两个正则表达式可以串联 ; 生成的正则表达式与通过串联两个分别与串联表达式匹配的子字符串形成的字符串匹配。

交替

infix运算符可以将两个正则表达式连接起来。; 结果正则表达式与匹配任一备用表达式的任何字符串匹配。

优先顺序

重复优先于串联,反过来优先于交替。整个表达式可以用括号括起来,以覆盖这些优先级规则并形成子表达式。

反向引用和子表达式

向后引用\ n,其中n是一个数字,与先前由正则表达式的第n个括号括起来的子表达式匹配的子字符串匹配。

基本与扩展正则表达式

在基本正则表达式中,元字符?, +, {, |, (, and )失去特殊含义;而是使用反斜杠版本\?, \+, \{, \|, \(, and \)。

传统版本的egrep不支持{元字符,而有些egrep实现则支持\ {,因此可移植脚本应避免使用grep -E模式中的{,并应使用[{]来匹配文字{。

GNU grep -E尝试通过假设{如果是无效间隔指定的开始不特殊,则支持传统用法。例如,命令grep -E'{1'搜索两个字符的字符串{1,而不是在正则表达式中报告语法错误。POSIX允许将此行为作为扩展,但可移植脚本应避免这种情况。

环境变量

grep的行为受以下环境变量影响。

通过依次检查三个环境变量LC_ALL,LC_foo和LANG,可以指定类别LC_foo的语言环境。设置的这些变量中的第一个指定语言环境。例如,如果未设置LC_ALL,但LC_MESSAGES设置为pt_BR,则将巴西葡萄牙语语言环境用于LC_MESSAGES类别。该Ç如果没有这些环境变量的设置使用的语言环境,如果没有安装的区域设置目录,或者如果grep的不符合国家语言支持(NLS)编译。

注意的其他变量:

GREP_OPTIONS 此变量指定将默认选项放置在任何显式选项的前面。例如,如果GREP_OPTIONS为' --binary- files = without-match --directories = skip ',则grep的行为就像在任何一个选项之前都指定了--binary-files = without-match和--directories = skip这两个选项一样显式选项。选项规格由空格分隔。反斜杠转义下一个字符,因此可用于指定包含空格或反斜杠的选项。
GREP_COLOR 此变量指定用于突出显示匹配的(非空)文本的颜色。不推荐使用GREP_COLORS,但仍然支持。GREP_COLORS的mt,ms和mc功能具有优先权。它只能指定用于在任何匹配行中突出显示匹配的非空文本的颜色(省略-v命令行选项时为选定的行,或指定-v时为上下文行)。默认值为01; 31,这表示终端的默认背景上的红色粗体前景文本。
GREP_COLORS 指定用于突出显示输出各部分的颜色和其他属性。它的值是用冒号分隔的功能列表,默认为ms = 01; 31:mc = 01; 31:sl =:cx =:fn = 35:ln = 32:bn = 32:se = 36,其中rv和ne布尔功能被忽略(即false)。支持的功能如下:
sl = 整个选定行的SGR子字符串(即,省略-v命令行选项时匹配的行,或指定-v时不匹配的行)。但是,如果同时指定了布尔rv功能和-v命令行选项,则它将应用于上下文匹配行。默认值为空(即终端的默认颜色对)。
cx = 整个上下文行的SGR子字符串(即,当省略-v命令行选项时,不匹配的行;或者当指定-v时,匹配的行)。但是,如果同时指定了布尔rv功能和-v命令行选项,则它将应用于选定的不匹配行。默认值为空(即终端的默认颜色对)。
rv 指定-v命令行选项时,布尔值反转(交换)sl =和cx =功能的含义。默认值为false(即,功能被省略)。
mt = 01; 31 SGR子字符串,用于匹配任何匹配行中的非空文本(即,当省略-v命令行选项时为选定的行,或者在指定-v时为上下文行)。设置此值等同于将ms =和mc =一次设置为相同值。默认为当前行背景上方的粗体红色文本前景。
ms = 01; 31 SGR子字符串,用于匹配所选行中的非空文本。(仅在省略-v命令行选项时使用。)启动时,sl =(或cx =如果rv)功能的效果保持活动。默认值为当前行上方的红色粗体文本前景背景。
mc = 01; 31 SGR子字符串,用于匹配上下文行中的非空文本。(仅在指定-v命令行选项时使用。)启动时,cx =(或sl =如果rv)功能的效果保持活动。默认值为当前行上方的红色粗体文本前景背景。
fn = 35 文件名的SGR子字符串以任何内容行为前缀。默认值是终端默认背景上的洋红色文本前景。
ln = 32 行号的SGR子字符串以任何内容行为前缀。默认值为终端默认背景上的绿色文本前景。
bn = 32 SGR子字符串,用于在任何内容行前添加字节偏移量。默认值为终端默认背景上的绿色文本前景。
se = 36 SGR子串为被插入选择的行场间分离器(:),上下文线场之间,( - ),和相邻行的组之间时被指定的非零上下文(- )。默认值为终端默认背景上的青色文本前景。
天生 布尔值,用于防止每次彩色项目结束时使用向右的行擦除(EL)向右(\ 33 [K)清除到行尾。在不支持EL的终端上需要这样做。否则,在不使用back_color_erase(bce)布尔terminfo功能,选择的突出显示颜色不影响背景,或者EL太慢或引起过多闪烁的终端上,此选项很有用。默认值为false(即,功能被省略)。
请注意,布尔功能没有= ...部分。默认情况下将它们省略(即false),并在指定时变为true。
请参阅文本终端文档中的“选择图形呈现(SGR)”部分,该部分用于允许的值及其作为字符属性的含义。这些子字符串值是整数的十进制表示,可以用分号连接起来。grep负责将结果组装成完整的SGR序列(\ 33 [ ... m)。串联的常见值包括1表示粗体,4表示下划线,5表示闪烁,7表示反色,39表示默认前景色,30到37表示前景色,90到97表示16色模式前景色,38; 5; 0到38; 5; 255表示88色和256色模式前景色,49为默认背景色,40至47为背景色,100至107为16色模式背景色,48; 5; 0至48; 5; 255为88色和256色模式背景颜色。
LC_ALL,LC_COLLATE,LANG 这些变量指定LC_COLLATE类别的语言环境,该语言环境确定用于解释范围表达式(如[az])的整理顺序。
LC_ALL,LC_CTYPE,LANG 这些变量为LC_CTYPE类别指定区域设置,该区域设置确定字符的类型,例如哪些字符为空格。
LC_ALL,LC_MESSAGES,LANG 这些变量指定LC_MESSAGES类别的语言环境,该语言环境确定grep用于消息的语言。默认的C语言环境使用美国英语消息。
POSIXLY_CORRECT 如果设置,则grep的行为与POSIX要求相同;否则,grep的行为将更类似于其他GNU程序。POSIX要求文件名后的选项必须被视为文件名。默认情况下,此类选项被排列在操作数列表的最前面,并被视为选项。同样,POSIX要求将无法识别的选项诊断为“非法”,但是由于它们并非真正违法,因此默认情况下将其诊断为“无效”。POSIXLY_CORRECT还禁用_N_GNU_nonoption_argv_flags_,如下所述。
_N_GNU_nonoption_argv_flags_ (这里N是grep的数字进程ID。)如果此环境变量的值的第i个字符为1,则即使该grep的第i个操作数似乎为1,也不要将其视为选项。Shell可以为它运行的每个命令将此变量放入环境中,并指定哪些操作数是文件名通配符扩展的结果,因此不应将其视为选项。仅对于GNU C库,并且仅在未设置POSIXLY_CORRECT时,此行为才可用。
退出状态

如果找到选定的行,则退出状态为0,如果未找到,则退出状态为1。如果发生错误,则退出状态为2。

General Options
--help Print a help message briefly summarizing command-line options, and exit.
-V, --version Print the version number of grep, and exit.
Match Selection Options
-E, --extended-regexp Interpret PATTERN as an extended regular expression (see Basic vs. Extended Regular Expressions).
-F, --fixed-strings Interpret PATTERN as a list of fixed strings, separated by newlines, that is to be matched.
-G, --basic-regexp Interpret PATTERN as a basic regular expression (see Basic vs. Extended Regular Expressions). This is the default option when running grep.
-P, --perl-regexp Interpret PATTERN as a Perl regular expression. This functionality is still experimental, and may produce warning messages.
Matching Control Options
-e PATTERN, --regexp=PATTERN Use PATTERN as the pattern to match. This can be used to specify multiple search patterns, or to protect a pattern beginning with a dash (-).
-f FILE, --file=FILE Obtain patterns from FILE, one per line.
-i, --ignore-case Ignore case distinctions in both the PATTERN and the input files.
-v, --invert-match Invert the sense of matching, to select non-matching lines.
-w, --word-regexp Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Or, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and underscores.
-x, --line-regexp Select only matches that exactly match the whole line.
-y The same as -i.
General Output Control
-c, --count Instead of the normal output, print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.
--color[=WHEN], --colour[=WHEN] Surround the matched (non-empty) strings, matching lines, context lines, file names, line numbers, byte offsets, and separators (for fields and groups of context lines) with escape sequences to display them in color on the terminal. The colors are defined by the environment variable GREP_COLORS. The older environment variable GREP_COLOR is still supported, but its setting does not have priority. WHEN is never, always, or auto.
-L, --files-without-match Instead of the normal output, print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.
-l, --files-with-matches Instead of the normal output, print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
-m NUM, --max-count=NUM Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are output, grep ensures that the standard input is positioned to just after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. When grep stops after NUM matching lines, it outputs any trailing context lines. When the -c or --count option is also used, grep does not output a count greater than NUM. When the -v or --invert-match option is also used, grep stops after outputting NUM non-matching lines.
-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-q, --quiet, --silent Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also see the -s or --no-messages option.
-s, --no-messages Suppress error messages about nonexistent or unreadable files.
Output Line Prefix Control
-b, --byte-offset Print the 0-based byte offset within the input file before each line of output. If -o (--only-matching) is specified, print the offset of the matching part itself.
-H, --with-filename Print the file name for each match. This is the default when there is more than one file to search.
-h, --no-filename Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.
--label=LABEL Display input actually coming from standard input as input coming from file LABEL. This is especially useful when implementing tools like zgrep, e.g., gzip -cd foo.gz | grep --label=foo -H something. See also the -Hoption.
-n, --line-number Prefix each line of output with the 1-based line number within its input file.
-T, --initial-tab Make sure that the first character of actual line content lies on a tab stop, so that the alignment of tabs looks normal. This is useful with options that prefix their output to the actual content: -H, -n, and -b. To improve the probability that lines from a single file will all start at the same column, this also causes the line number and byte offset (if present) to be printed in a minimum size field width.
-u, --unix-byte-offsets Report Unix-style byte offsets. This switch causes grep to report byte offsets as if the file were a Unix-style text file, i.e., with CR characters stripped off. This will produce results identical to running grep on a Unix machine. This option has no effect unless -b option is also used; it has no effect on platforms other than MS-DOS and MS-Windows.
-Z, --null Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file name. For example, grep -lZ outputs a zero byte after each file name instead of the usual newline. This option makes the output unambiguous, even in the presence of file names containing unusual characters like newlines. This option can be used with commands like find -print0, perl -0, sort -z, and xargs -0 to process arbitrary file names, even those that contain newline characters.
Context Line Control
-A NUM, --after-context=NUM Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
-B NUM, --before-context=NUM Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM Print NUM lines of output context. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
File and Directory Selection
-a, --text Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
--binary-files=TYPE If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. If TYPE is without-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option. Warning: grep --binary-files=text might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands.
-D ACTION, --devices=ACTION If an input file is a device, FIFO or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read just as if they were ordinary files. If ACTION is skip, devices are silently skipped.
-d ACTION, --directories=ACTION If an input file is a directory, use ACTION to process it. By default, ACTION is read, i.e., read directories just as if they were ordinary files. If ACTION is skip, silently skip directories. If ACTION is recurse, read all files under each directory, recursively, following symbolic linksonly if they are on the command line. This is equivalent to the -roption.
--exclude=GLOB Skip files whose base name matches GLOB (using wildcard matching). A file-name glob can use *, ?, and [...] as wildcards, and \ to quote a wildcard or backslash character literally.
--exclude-from=FILE Skip files whose base name matches any of the file-name globs read from FILE (using wildcard matching as described under --exclude).
--exclude-dir=DIR Exclude directories matching the pattern DIR from recursive searches.
-I Process a binary file as if it did not contain matching data; this is equivalent to the --binary-files=without-match option.
--include=GLOB Search only files whose base name matches GLOB (using wildcard matching as described under --exclude).
-r, --recursive Read all files under each directory, recursively, following symbolic links only if they are on the command line. This is equivalent to the -d recurse option.
-R, --dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
Other Options
--line-buffered Use line buffering on output. This can cause a performance penalty.
--mmap If possible, use the mmap system call to read input, instead of the default readsystem call. In some situations, --mmap yields better performance. However, --mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.
-U, --binary Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, grepguesses the file type by looking at the contents of the first 32 KB read from the file. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows.
-z, --null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.
Regular Expressions

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.

grep understands three different versions of regular expression syntax: "basic" (BRE), "extended" (ERE) and "perl" (PRCE). In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl regular expressions give additional functionality.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta-character with special meaning may be quoted by preceding it with a backslash.

The period (.) matches any single character.

Character Classes and Bracket Expressions

A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d]is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal -, place it last.

Anchoring

The caret ^ and the dollar sign $ are meta-characters that respectively match the empty string at the beginning and end of a line.

The Backslash Character and Special Expressions

The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it's not at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]].

Repetition

A regular expression may be followed by one of several repetition operators:

? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{n,m} The preceding item is matched at least n times, but not more than m times.
Concatenation

Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substringsthat respectively match the concatenated expressions.

Alternation

Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either alternate expression.

Precedence

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole expression may be enclosed in parentheses to override these precedence rules and form a subexpression.

Back References and Subexpressions

The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and )lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Traditional versions of egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.

GNU grep -E attempts to support traditional usage by assuming that {is not special if it would be the start of an invalid interval specification. For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX allows this behavior as an extension, but portable scripts should avoid it.

Environment Variables

The behavior of grep is affected by the following environment variables.

The locale for category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, and LANG, in that order. The first of these variables that is set specifies the locale. For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian Portuguese locale is used for the LC_MESSAGES category. The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support (NLS).

Other variables of note:

GREP_OPTIONS This variable specifies default options to be placed in front of any explicit options. For example, if GREP_OPTIONS is '--binary- files=without-match --directories=skip', grepbehaves as if the two options --binary-files=without-matchand --directories=skip had been specified before any explicit options. Option specifications are separated by whitespace. A backslash escapes the next character, so it can be used to specify an option containing whitespace or a backslash.
GREP_COLOR This variable specifies the color used to highlight matched (non-empty) text. It is deprecated in favor of GREP_COLORS, but still supported. The mt, ms, and mc capabilities of GREP_COLORS have priority over it. It can only specify the color used to highlight the matching non-empty text in any matching line (a selected line when the -v command-line option is omitted, or a context line when -v is specified). The default is 01;31, which means a bold red foreground text on the terminal's default background.
GREP_COLORS Specifies the colors and other attributes used to highlight various parts of the output. Its value is a colon-separated list of capabilities that defaults to ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36with the rv and ne boolean capabilities omitted (i.e., false). Supported capabilities are as follows:
sl= SGR substring for whole selected lines (i.e., matching lines when the -v command-line option is omitted, or non-matching lines when -v is specified). However, if the boolean rvcapability and the -v command-line option are both specified, it applies to context matching lines instead. The default is empty (i.e., the terminal's default color pair).
cx= SGR substring for whole context lines (i.e., non-matching lines when the -v command-line option is omitted, or matching lines when -v is specified). However, if the boolean rvcapability and the -v command-line option are both specified, it applies to selected non-matching lines instead. The default is empty (i.e., the terminal's default color pair).
rv Boolean value that reverses (swaps) the meanings of the sl= and cx= capabilities when the -v command-line option is specified. The default is false (i.e., the capability is omitted).
mt=01;31 SGR substring for matching non-empty text in any matching line (i.e., a selected line when the -v command-line option is omitted, or a context line when -v is specified). Setting this is equivalent to setting both ms= and mc= at once to the same value. The default is a bold red text foreground over the current line background.
ms=01;31 SGR substring for matching non-empty text in a selected line. (This is only used when the -vcommand-line option is omitted.) The effect of the sl= (or cx= if rv) capability remains active when this kicks in. The default is a bold red text foreground over the current line background.
mc=01;31 SGR substring for matching non-empty text in a context line. (This is only used when the -vcommand-line option is specified.) The effect of the cx= (or sl= if rv) capability remains active when this kicks in. The default is a bold red text foreground over the current line background.
fn=35 SGR substring for file names prefixing any content line. The default is a magenta text foreground over the terminal's default background.
ln=32 SGR substring for line numbers prefixing any content line. The default is a green text foreground over the terminal's default background.
bn=32 SGR substring for byte offsets prefixing any content line. The default is a green text foreground over the terminal's default background.
se=36 SGR substring for separators that are inserted between selected line fields (:), between context line fields, (-), and between groups of adjacent lines when nonzero context is specified (--). The default is a cyan text foreground over the terminal's default background.
ne Boolean value that prevents clearing to the end of line using Erase in Line (EL) to Right (\33[K) each time a colorized item ends. This is needed on terminals on which EL is not supported. It is otherwise useful on terminals for which the back_color_erase (bce) boolean terminfo capability does not apply, when the chosen highlight colors do not affect the background, or when EL is too slow or causes too much flicker. The default is false (i.e., the capability is omitted).

Note that boolean capabilities have no =... part. They are omitted (i.e., false) by default and become true when specified.
See the Select Graphic Rendition (SGR) section in the documentation of the text terminal that is used for permitted values and their meaning as character attributes. These substring values are integers in decimal representation and can be concatenated with semicolons. grep takes care of assembling the result into a complete SGR sequence (\33[...m). Common values to concatenate include 1 for bold, 4 for underline, 5 for blink, 7 for inverse, 39 for default foreground color, 30 to 37 for foreground colors, 90 to 97 for 16-color mode foreground colors, 38;5;0 to 38;5;255 for 88-color and 256-color modes foreground colors, 49 for default background color, 40 to 47 for background colors, 100 to 107for 16-color mode background colors, and 48;5;0 to 48;5;255for 88-color and 256-color modes background colors.
LC_ALL, LC_COLLATE, LANG These variables specify the locale for the LC_COLLATEcategory, which determines the collating sequence used to interpret range expressions like [a-z].
LC_ALL, LC_CTYPE, LANG These variables specify the locale for the LC_CTYPE category, which determines the type of characters, e.g., which characters are whitespace.
LC_ALL, LC_MESSAGES, LANG These variables specify the locale for the LC_MESSAGEScategory, which determines the language that grep uses for messages. The default C locale uses American English messages.
POSIXLY_CORRECT If set, grep behaves as POSIX requires; otherwise, grepbehaves more like other GNU programs. POSIX requires that options that follow file names must be treated as file names; by default, such options are permuted to the front of the operand list and are treated as options. Also, POSIX requires that unrecognized options be diagnosed as "illegal", but since they are not really against the law the default is to diagnose them as "invalid". POSIXLY_CORRECT also disables _N_GNU_nonoption_argv_flags_, described below.
_N_GNU_nonoption_argv_flags_ (Here N is grep's numeric process ID.) If the ith character of this environment variable's value is 1, do not consider the ith operand of grep to be an option, even if it appears to be one. A shell can put this variable in the environment for each command it runs, specifying which operands are the results of file name wildcard expansion and therefore should not be treated as options. This behavior is available only with the GNU C library, and only when POSIXLY_CORRECT is not set.
Exit Status

The exit status is 0 if selected lines are found, and 1 if not found. If an error occurred the exit status is 2.

查看英文版

查看中文版

grep 示例

grep chope /etc/passwd

在/ etc / passwd中搜索用户chope。

grep "May 31 03" /etc/httpd/logs/error_log

在Apache error_log文件中搜索5月31日凌晨3点发生的任何错误条目。通过在字符串周围添加引号,可以在grep搜索中放置空格。

grep -r"computerhope" / www /

递归搜索目录/ www /和所有子目录,以查找任何包含字符串“ computerhope ”的文件的任何行。

grep -w "hope" myfile.txt

在文件myfile.txt中搜索包含单词“ hope ”的行。只有包含不同词“希望”的行才会被匹配。“希望”是单词一部分的行将不匹配。

grep -cw "hope" myfile.txt

与上一个命令相同,但是显示匹配的行数,而不是匹配的行本身。

grep -cvw "hope" myfile.txt

与上一个命令相反:显示myfile.txt中不包含单词“ hope”的行数。

grep -l "hope" /www/*

显示/ www /(但不包括其子目录)中内容包含字符串“ hope ” 的任何文件的文件名(但不显示匹配行本身)。

相关命令

ed —一个简单的文本编辑器。
egrep —过滤与扩展的正则表达式匹配的文本。
sed —用于过滤和转换文本的实用程序。
sh — Bourne shell命令解释器。

如果您还没有看过我们的示例用法部分,我们建议您先回顾一下该部分。

If you haven't already see our example usage section we suggest reviewing that section first.

查看英文版

查看中文版

其他命令行

gzip,gunzip and zcat | gawk | getfacl | gpasswd | gprof | groupadd | groupdel | groupmod |

如此好文,分享给朋友
发表评论
验证码:
评论列表
共0条