agrep (grep实用程序的一个版本)

小猪老师 发表于 2020-07-06 13:51
浏览次数:
在手机上阅读

agrep是grep实用程序的一个版本,它也与近似模式匹配。

查看英文版

目录

1 agrep 运行系统环境

2 agrep 描述

3 agrep 语法

4 agrep 选项

5 agrep 模式

6 agrep 示例

agrep 运行系统环境

Unix&Linux

agrep 描述

agrep在输入文件名(默认输入为标准)中搜索包含与模式完全匹配或近似匹配的字符串的记录。

一条记录默认情况下是一行,但是可以使用-d选项(见下文)对它进行不同的定义。通常,将找到的每个记录复制到标准输出。近似匹配允许查找包含模式的记录,该记录带有多个错误,包括替换,插入和删除。

例如,“ Massechusets”将“ Massachusetts”与两个错误(一次替换和一次插入)匹配。运行agrep -2 Massechusets foo将输出文件foo中的所有行,其中包含“ Massechusets”中最多有2个错误的任何字符串。

agrep支持多种查询,包括任意通配符,模式集,以及通常所有的正则表达式。它支持grep家族支持的大多数选项以及更多选项(但与grep并非100%兼容)。

与其他grep家族一样,字符 $,^,*,[,],^和| ,(,),!和\包含在模式中时可能会导致意外结果,因为这些特殊字符对shell也是有意义的。为避免这些问题,应始终将整个pattern参数用单引号引起来,即'pattern'。请勿使用双引号(“)。

当agrep应用于多个输入文件时,文件名将显示在与模式匹配的每一行的开头。(处理单个文件时不显示文件名,但在这种情况下,如果用户希望显示文件名,则应使用/ dev / null作为列表中的第二个文件,然后将显示文件名。 )。

agrep searches the input file names (standard input is the default) for records containing strings which either exactly or approximately match a pattern.

record is by default a single line, but it can be defined differently using the -d option (see below). Normally, each record found is copied to the standard output. Approximate matching allows finding records that contain the pattern with several errors including substitutionsinsertions, and deletions.

For example, "Massechusets" matches "Massachusetts" with two errors (one substitution and one insertion). Running agrep -2 Massechusets foo outputs all lines in the file foo containing any string with (at most) 2 errors from "Massechusets".

agrep supports many kinds of queries including arbitrary wildcards, sets of patterns, and in general, all regular expressions. It supports most of the options supported by the grep family plus several more (but it is not 100% compatible with grep).

As with the rest of the grep family, the characters $, ^, *, [, ], ^, |, (, ), !, and \ can cause unexpected results when included in the pattern, as these special characters are also meaningful to the shell. To avoid these problems, one should always enclose the entire pattern argument in single quotes, i.e., 'pattern'. Do not use double quotes (").

When agrep is applied to more than one input file, the name of the file is displayed at the beginning of each line which matches the pattern. (The file name is not displayed when processing a single file, but in that case if the user wants the file name to appear, they should use /dev/null as a second file in the list, and then the file name will be displayed).

查看英文版

查看中文版

agrep 语法

agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]
agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]

查看英文版

查看中文版

agrep 选项

-#

#是一个非负整数(最多8个),指定查找近似匹配项时允许的最大错误数。

默认为零。

通常,每次插入,删除或替换都算作一个错误。

可以调整插入,删除和替换的相对成本;请参阅 -I -D和 -S选项。

-c

仅显示匹配记录的计数(出现次数)。

-d 'delim'

将delim定义为两个记录之间的分隔符。默认值是' $ ',它与行尾匹配。

因此,默认情况下,一条记录是一行。

该DELIM可高达八个字符(可能使用的字符串^和$)。

在两个delim之间,在第一个delim之前和在最后一个delim之后的文本被视为一个记录。

例如,-d'$$'将段落定义为记录(如果一个段落由两个换行符表示),而-d'^ From'将邮件消息定义为记录。

阿格列普分别匹配每个记录。 此选项适用于正则表达式,但delim本身不能是正则表达式。

-e pattern

与提供简单的模式参数相同,但是在模式以“ - ” 开头时,使用-e很有用。

-f patternfile

匹配patternfile中的模式。输出是与patternfile中的至少一种模式匹配的所有行。

当前,-f选项仅适用于完全匹配和简单模式(任何元符号都被解释为常规字符)。

它仅与-c,-h,-i,-l,-s,-v,-w和-x选项兼容。

-h

不显示文件名。

-i

不区分大小写的搜索;例如,“ A ”和“ a ”被认为是等效的。

-k

使用简单的模式匹配,即,模式中不将任何符号视为元字符。

例如,AGREP -k '一个(B | C)* d'富会发现文字串的出现|中的“(B c)中* d” FOO,

而AGREP '一个(B | C)* d' foo将在foo中找到与正则表达式'a(b | c)* d'匹配的子字符串。

-l

仅列出包含匹配项的文件的名称。例如,AGREP -l“精彩” *将列出当前这些文件的名称目录包含单词精彩。

-n

打印的每一行都在文件中带有其记录号作为前缀。

-p

在文本中查找包含模式超级顺序的记录。例如,agrep -p DCS foo将与“ 计算机科学系 ” 匹配。

-s

默默地工作;也就是说,除了错误消息外,什么也不显示。

-t

从delim的末尾开始输出记录(并包括下一个delim)。这对于记录应在末尾出现delim的情况很有用。

-v

反向模式-仅显示不包含模式的记录。

-w

仅以单词搜索模式-即,如果模式被非字母数字字符(例如空格或破折号)包围,则仅匹配模式。

非字母数字必须包含在匹配项中;它们不能算作错误。

例如,agrep -w -1汽车将匹配“ 汽车 ”,但不匹配“ 字符 ”。

-x

模式必须与整行匹配。

-y

与-B选项一起使用。当-y是,AGREP将一直输出没有给出提示的最佳匹配。

-B

最佳匹配模式。

当指定-B且未找到完全匹配时,agrep将继续搜索,直到找到最接近的匹配(即错误次数最少的匹配)为止,

此时将显示以下消息:“ 最佳匹配包含x错误,有y个匹配项,将其输出?(y / n) “。

标准输入(例如管道输入)不支持最佳匹配模式。

指定- #,- c或-l选项时,-B选项将被忽略。通常,-B可能比-#慢,但幅度不大。

-Dk

将删除成本设置为k(k是一个正整数)。该选项当前不适用于正则表达式。

-G

输出包含匹配项的文件。

-Ik

将插入成本设置为k(k为正整数)。该选项当前不适用于正则表达式。

-Sk

将替换成本设置为k(k为正整数)。该选项当前不适用于正则表达式。

-#

# is a non-negative integer (at most 8) specifying the maximum number of errors permitted in finding the approximate matches. It defaults to zero. Generally, each insertion, deletion, or substitution counts as one error. It is possible to adjust the relative cost of insertions, deletions, and substitutions; see -I -D and -S options.

-c

Display only the count (number of occurrences) of matching records.

-d 'delim'

Define delim to be the separator between two records. The default value is '$', which matches the end of a line; therefore, by default, a record is a single line. The delim can be a string of up to eight characters (with possible use of ^ and $).

Text between two delim's, before the first delim, and after the last delim is considered as one record.

For example, -d '$$' defines paragraphs as records (if a paragraph is represented by two newlines) and -d '^From ' defines mail messages as records. agrep matches each record separately.

This option does work with regular expressions, but delim itself cannot be a regular expression.

-e pattern

Same as providing a simple pattern argument, but using -e is useful when the pattern begins with a '-'.

-f patternfile

Match the patterns in patternfile. The output is all lines that match at least one of the patterns in patternfile.

Currently, the -f option works only for exact match and for simple patterns (any meta symbol is interpreted as a regular character).

It is compatible only with -c, -h, -i, -l, -s, -v, -w, and -x options.

-h

Do not display file names.

-i

Case-insensitive search; e.g., "A" and "a" are considered equivalent.

-k

Use simple pattern matching, i.e., treat no symbols in the pattern as a meta character.

For example, agrep -k 'a(b|c)*d' foo will find the occurrences of the literal string "a(b|c)*d" in foo, whereas agrep 'a(b|c)*d' foo will find substrings in foo that match the regular expression 'a(b|c)*d'.

-l

List only the names of the files that contain a match. For example, agrep -l 'wonderful' * will list the names of those files in current directory that contain the word wonderful.

-n

Each line that is printed is prefixed by its record number in the file.

-p

Find records in the text that contain a supersequence of the pattern. For example, agrep -p DCS foo will match "Department of Computer Science".

-s

Work silently; that is, display nothing except error messages.

-t

Output the record starting from the end of delim to (and including) the next delim. This is useful for cases where delim should come at the end of the record.

-v

Inverse mode — display only those records that do not contain the pattern.

-w

Search for the pattern as a word only — i.e., only match patterns if they are surrounded by non-alphanumeric characters, such as a space or a dash. The non-alphanumeric must surround the match; they cannot be counted as errors. For example, agrep -w -1 car will match "cars", but not "characters".

-x

The pattern must match the whole line.

-y

Used with -B option. When -y is on, agrep will always output the best matches without giving a prompt.

-B

Best match mode. When -B is specified and no exact matches are found, agrep will continue to search until the closest matches (i.e., the ones with minimum number of errors) are found, at which point the following message will be shown: "the best match contains x errors, there are y matches, output them? (y/n)".

The best match mode is not supported for standard input, e.g., pipeline input. When the -#, -c, or -l options are specified, the -B option is ignored. In general, -B may be slower than -#, but not by very much.

-Dk

Set the cost of a deletion to k (k is a positive integer). This option does not currently work with regular expressions.

-G

Output the files that contain a match.

-Ik

Set the cost of an insertion to k (k is a positive integer). This option does not currently work with regular expressions.

-Sk

Set the cost of a substitution to k (k is a positive integer). This option does not currently work with regular expressions.

查看英文版

查看中文版

agrep 模式

agrep支持多种模式,包括简单的字符串,带有字符类的字符串,字符串集,通配符和正则表达式。

String

一个字符串是字符,包括特殊符号的任意序列^线路的起点和$为行尾。上面列出的特殊字符($,^,*,[,^,|,(,),!,和\),应通过前面\如果他们是要匹配的常规字符。例如,\ ^ abc \\对应于字符串“ ^ abc \”,而^ abc对应于行首的字符串“ abc”。

角色类

甲类字符是内部“字符列表[] ”(按顺序)对应于从列表中,其中短划线表示两个字符之间的范围中的任何字符。例如,[a-ho-z]是a和h之间或o和z之间的任何字符。符号^内[]表示的字符不匹配(“补”的列表)。例如,[^ in]表示字符i到n以外的任何字符。因此,符号^具有两个含义,但这与egrep一致。符号。 代表换行符以外的任何字符。

布尔运算

agrep支持AND 操作 ' ; '和OR操作' , ',但不能同时使用两者。例如,fast; network搜索包含“ fast”和“ network”的所有记录。

通配符

符号“ # ”用于表示通配符。#匹配零个或任意数量的任意字符。例如,ex#e匹配“ example”。符号#等价于egrep中的。*。实际上,。*也是有效的,因为它是一个有效的正则表达式,但是除非它是实际正则表达式的一部分,否则#会更快地工作。

精确匹配和近似匹配的组合

即使匹配出现错误,尖括号<>中的任何模式也必须与文本完全匹配。例如,集成电路匹配数学有一个错误(将最后小号有一个),但MATHE不匹配的数学不管我们有多少误差允许。

常用表达

agrep中的正则表达式语法通常与egrep相同。联合行动' | ',Kleene闭包' * '和括号()均受支持。目前不支持' + '。正则表达式当前限制为大约30个字符(不包括元字符)。某些选项(-d,-w,-f,-t,-x,-D,-I,-S)当前不适用于正则表达式。使用' *的正则表达式的最大错误数'或' | '是4。

agrep supports a large variety of patterns, including simple strings, strings with classes of characters, sets of strings, wildcards, and regular expressions.

Strings

A string is any sequence of characters, including the special symbols ^ for beginning of line and $ for end of line. The special characters listed above ( $, ^, *, [, ^, |, (, ), !, and \ ) should be preceded by \ if they are to be matched as regular characters. For example, \^abc\\ corresponds to the string "^abc\", whereas ^abc corresponds to the string "abc" at the beginning of a line.

Character classes

A class of characters is a list of characters inside "[]" (in order) corresponds to any character from the list, where a dash represents the range between two characters. For example, [a-ho-z] is any character between a and h or between o and z. The symbol ^ inside [] denotes which characters not to match ("complements" the list). For example, [^i-n] denotes any character except characters i through n. The symbol ^ thus has two meanings, but this is consistent with egrep. The symbol . stands for any character except for the newline character.

Boolean operations

agrep supports an AND operation ';' and an OR operation ',', but not a combination of both. For example, fast;network searches for all records containing both "fast" and "network".

Wildcards

The symbol '#' is used to denote a wildcard. # matches zero, or any number of, arbitrary characters. For example, ex#e matches "example". The symbol # is equivalent to .* in egrep. In fact, .* will work too, because it is a valid regular expression, but unless this is part of an actual regular expression, # will work faster.

Combination of Exact and Approximate Matching

any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, ics matches mathematical with one error (replacing the last s with an a), but mathe does not match mathematical no matter how many errors we allow.

Regular Expressions

The syntax of regular expressions in agrep is in general the same as that for egrep. The union operation '|', Kleene closure '*', and parentheses () are all supported. Currently '+' is not supported. Regular expressions are currently limited to approximately 30 characters (excluding meta characters). Some options (-d, -w, -f, -t, -x, -D, -I, -S) do not currently work with regular expressions. The maximal number of errors for regular expressions that use '*' or '|' is 4.

查看英文版

查看中文版

agrep 示例

agrep -2 -c ABCDEFG foo

给出文件foo中两个错误中包含“ ABCDEFG ” 的行数。

agrep -1 -D2 -S2'ABCD#YZ'foo

输出包含“ ABCD ” 的行,在任意距离之内输出“ YZ ”,最多可以再插入一个(-D2和-S2会使删除和替换太“昂贵”)。

agrep -5 -p abcdefghij / path / to / dictionary / words

输出字典中位于/ path / to / dictionary / words的字典中所有单词的列表,这些单词顺序至少包含字母表的前10个字母中的5个。

agrep -1'abc [0-9](de | fg)* [xz]'foo

输出包含以下内容的行,最多包含一个错误,该字符串以“ abc ”开头,后跟一位数字,后跟零个或多个重复的“ de ”或“ fg ”,然后是“ x ”,“ y ”或“ z ”。

agrep -d'^ From''breakdown; internet'mbox

输出包含关键字“ 细分 ”和“ Internet ”的所有邮件(模式“ ^ From ”将邮件文件中的邮件分开)。

agrep -d'$$'-1''foo

查找包含word1后跟word2的所有段落,其中一个错误代替空白。特别是,如果word1是一行中的最后一个单词,而word2是下一行中的第一个单词,则该空格将由换行符替换,并且将匹配。因此,这是一种克服换行符分隔的方法。请注意,-d'$$'(或另一行跨越多行的delim)是必需的,因为否则agrep一次只能搜索一行。

agrep -2 -c ABCDEFG foo

Gives the number of lines in file foo that contain "ABCDEFG" within two errors.

agrep -1 -D2 -S2 'ABCD#YZ' foo

Outputs the lines containing "ABCD" followed within arbitrary distance by "YZ", with up to one additional insertion (-D2 and -S2 make deletions and substitutions too "expensive").

agrep -5 -p abcdefghij /path/to/dictionary/words

Outputs the list of all words in the dictionary located at /path/to/dictionary/words containing at least 5 of the first 10 letters of the alphabet in order.

agrep -1 'abc[0-9](de|fg)*[x-z]' foo

Outputs the lines containing, within up to one error, the string that starts with "abc" followed by one digit, followed by zero or more repetitions of either "de" or "fg", followed by either "x", "y", or "z".

agrep -d '^From ' 'breakdown;internet' mbox

Outputs all mail messages (the pattern "^From " separates mail messages in a mail file) that contain keywords "breakdown" and "internet".

agrep -d '$$' -1 '' foo

Finds all paragraphs that contain word1 followed by word2 with one error in place of the blank. In particular, if word1 is the last word in a line and word2 is the first word in the next line, then the space will be substituted by a newline symbol and it will match. Thus, this is a way to overcome separation by a newline. Note that -d '$$' (or another delim which spans more than one line) is necessary, because otherwise agrep searches only one line at a time.

查看英文版

查看中文版

其他命令行

apt-cache | apt-get | ar | arch | arp | as | a2p | aspell | ac | at | awk | adduser | a2p | apropos | alias | addgroup |

如此好文,分享给朋友
发表评论
验证码:
评论列表
共0条