cut (删除(或“切出”)文件每一行的部分)

rose1 发表于 2020-08-18 10:24
浏览次数:
在手机上阅读

在类似Unix的操作系统上,cut命令删除(或“切出”)文件每一行的部分。 本文档介绍了cut的GNU / Linux版本。

查看英文版

目录

1 cut 运行系统环境

2 cut 语法

3 cut 例子

cut 运行系统环境

Linux

cut 语法

cut OPTION... [FILE]...

选件

-b--bytes=LIST 按照LIST中的指定,仅从每行中选择字节。LIST指定一个字节,一组字节或一个字节范围;请参阅下面的指定列表。
-c--characters=LIST 按照LIST中的指定,仅从每一行中选择字符。LIST指定一个字符,一组字符或一个字符范围;请参阅下面的指定列表。
-d--delimiter=DELIM 使用字符DELIM,而不是一个标签的领域 分隔符。
-f--fields=LIST 每行仅选择这些字段;除非指定-s选项,否则还将打印任何不包含定界符的行。LIST指定一个字段,一组字段或一系列字段;请参阅下面的指定列表。
-n 该选项将被忽略,但出于兼容性原因而被包括在内。
--complement 补充所选字节,字符或字段的集合。
-s--only-delimited 不要打印不包含定界符的行。
--output-delimiter=STRING 使用STRING作为输出定界符字符串。默认为使用输入定界符。 
--help 显示帮助消息并退出。
--version 输出版本信息并退出。

使用说明

调用cut时,请使用-b-c-f选项,但只能使用其中之一。

如果未指定FILE,则cut从标准输入读取。

指定清单

每个LIST由一个整数,一个整数范围或多个以逗号分隔的整数范围组成。所选输入的写入顺序与读取的顺序相同,并且仅写入一次即可输出。范围包括:

N N从第1个字节开始计数的第N个字节,字符或字段。
N- 从第N个字节,字符或字段到行尾的N-。
N-M 第N至第M个字节,字符或字段(含)的N-M。
-M 从第一个到第M个字节,字符或字段。

例如,假设您有一个名为data.txt的文件,其中包含以下文本:

one	two	three	four	five
alpha	beta	gamma	delta	epsilon

在此示例中,这些单词中的每个单词都由制表符而不是空格分隔。制表符是cut的默认分隔符,因此默认情况下它将认为字段是由制表符分隔的任何内容。

要仅“剪切”每行的第三个字段,请使用以下命令:

cut -f 3 data.txt

...将输出以下内容:

three
gamma

相反,如果您只想“剪切”每行的第二至第四字段,请使用以下命令:

cut -f 2-4 data.txt

...将输出以下内容:

two	three	four
beta	gamma	delta

如果要仅“剪切”每行的第一至第二和第四至第五字段(省略第三字段),请使用以下命令:

cut -f 1-2,4-5 data.txt

...将输出以下内容:

one	two	four	five
alpha	beta	delta	epsilon

或者,假设您要第三个字段及其后的每个字段,而忽略前两个字段。在这种情况下,您可以使用以下命令:

cut -f 3- data.txt

...将输出以下内容:

three	four	five
gamma	delta	epsilon

使用LIST指定范围还适用于从一行中剪切字符(-c)或字节(-b)。例如,要仅输出data.txt每行的第三到第十二个字符,请使用以下命令:

cut -c 3-12 data.txt

...将输出以下内容:

e	two	thre
pha	beta	g

请记住,每个单词之间的“空格”实际上是一个制表符,因此输出的两行都显示十个字符:八个字母数字字符和两个制表符。换句话说,cut省略了每行的前两个字符,将制表符视为一个字符。输出3到12个字符,每个制表符作为一个字符计数;并删除第十二个字符。

计数字节而不是字符将导致在这种情况下相同的输出,因为在一个ASCII - 编码的文本文件中,每个字符由数据的单个字节(8位)表示。所以命令:

cut -b 3-12 data.txt

...将为我们的文件data.txt产生完全相同的输出:

e	two	thre
pha	beta	g

指定制表符以外的定界符

制表符是cut用来确定构成字段的默认分隔符。因此,如果文件的字段已由制表符分隔,则无需指定其他分隔符。

但是,您可以指定任何字符作为分隔符。例如,文件/ etc / passwd包含有关系统上每个用户的信息,每行一个用户,并且每个信息字段均以冒号(“ : ”)分隔。例如,线/ etc / passwd中为根用户可能看起来像这样:

root:x:0:0:root:/root:/bin/bash

这些字段按以下顺序包含以下信息,并用冒号分隔: 

  1. 用户名
  2. 密码(如果加密,则显示为x
  3. 用户ID号(UID)
  4. 组ID号(GID)
  5. 注释字段(由finger命令使用)
  6. 主目录
  7. Shell

用户名是该行的第一个字段,因此要显示系统上的每个用户名,请使用以下命令:

cut -f 1 -d ':' /etc/passwd

...将输出,例如:

root
daemon
bin
sys
chope

(在一个典型的系统上,有更多的用户帐户,包括许多特定于系统服务的帐户,但是在此示例中,我们假设只有五个用户。)

/ etc / passwd文件中每行的第三个字段是UID(用户ID号),因此要显示每个用户名和用户ID号,请使用以下命令:

cut -f 1,3 -d ':' /etc/passwd

...这将输出以下内容,例如:

root:0
daemon:1
bin:2
sys:3
chope:1000

如您所见,默认情况下,将使用为输入指定的相同分隔符来分隔输出。在这种情况下,这就是冒号(“ ”)。但是,您可以为输入和输出指定其他定界符。因此,如果您想运行前面的命令,但输出用空格分隔,则可以使用以下命令:

cut -f 1,3 -d ':' --output-delimiter=' ' /etc/passwd
root 0
daemon 1
bin 2
sys 3
chope 1000

但是,如果您希望输出由制表符分隔怎么办?在命令行上指定制表符比较复杂,因为它是不可打印的字符。要在命令行上指定它,必须从外壳“保护”它。根据您使用的外壳,此操作的执行方法有所不同,但是在Linux默认外壳(bash)中,可以使用$'\ t'指定制表符。所以命令:

cut -f 1,3 -d ':' --output-delimiter=$'\t' /etc/passwd

...将输出以下内容,例如:

root	0
daemon	1
bin	2
sys	3
chope	1000
cut OPTION... [FILE]...

Options

-b--bytes=LIST Select only the bytes from each line as specified in LIST. LIST specifies a byte, a set of bytes, or a range of bytes; see Specifying LIST below.
-c--characters=LIST Select only the characters from each line as specified in LIST. LIST specifies a character, a set of characters, or a range of characters; see Specifying LIST below.
-d--delimiter=DELIM use character DELIM instead of a tab for the field delimiter.
-f--fields=LIST select only these fields on each line; also print any line that contains no delimiter character, unless the -s option is specified. LIST specifies a field, a set of fields, or a range of fields; see Specifying LIST below.
-n This option is ignored, but is included for compatibility reasons.
--complement complement the set of selected bytes, characters or fields.
-s--only-delimited do not print lines not containing delimiters.
--output-delimiter=STRING use STRING as the output delimiter string. The default is to use the input delimiter.
--help Display a help message and exit.
--version output version information and exit.

Usage Notes

When invoking cut, use the -b-c, or -f option, but only one of them.

If no FILE is specified, cut reads from the standard input.

Specifying LIST

Each LIST is made up of an integer, a range of integers, or multiple integer ranges separated by commas. Selected input is written in the same order that it is read, and is written to output exactly once. A range consists of:

N the Nth byte, character, or field, counted from 1.
N- from the Nth byte, character, or field, to the end of the line.
N-M from the Nth to the Mth byte, character, or field (inclusive).
-M from the first to the Mth byte, character, or field.

For example, let's say you have a file named data.txt which contains the following text:

one	two	three	four	five
alpha	beta	gamma	delta	epsilon

In this example, each of these words is separated by a tab character, not spaces. The tab character is the default delimiter of cut, so it will by default consider a field to be anything delimited by a tab.

To "cut" only the third field of each line, use the command:

cut -f 3 data.txt

...which will output the following:

three
gamma

If instead you want to "cut" only the second-through-fourth field of each line, use the command:

cut -f 2-4 data.txt

...which will output the following:

two	three	four
beta	gamma	delta

If you want to "cut" only the first-through-second and fourth-through-fifth field of each line (omitting the third field), use the command:

cut -f 1-2,4-5 data.txt

...which will output the following:

one	two	four	five
alpha	beta	delta	epsilon

Or, let's say you want the third field and every field after it, omitting the first two fields. In this case, you could use the command:

cut -f 3- data.txt

...which will output the following:

three	four	five
gamma	delta	epsilon

Specifying a range with LIST also applies to cutting characters (-c) or bytes (-b) from a line. For example, to output only the third-through-twelfth character of every line of data.txt, use the command:

cut -c 3-12 data.txt

...which will output the following:

e	two	thre
pha	beta	g

Remember that the "space" in between each word is actually a single tab character, so both lines of output are displaying ten characters: eight alphanumeric characters and two tab characters. In other words, cut is omitting the first two characters of each line, counting tabs as one character each; outputting characters three through twelve, counting tabs as one character each; and omitting any characters after the twelfth.

Counting bytes instead of characters will result in the same output in this case, because in an ASCII-encoded text file, each character is represented by a single byte (eight bits) of data. So the command:

cut -b 3-12 data.txt

...will, for our file data.txt, produce exactly the same output:

e	two	thre
pha	beta	g

Specifying A Delimiter Other Than Tab

The tab character is the default delimiter that cut uses to determine what constitutes a field. So, if your file's fields are already delimited by tabs, you don't need to specify a different delimiter character.

You can specify any character as the delimiter, however. For instance, the file /etc/passwd contains information about each user on the system, one user per line, and each information field is delimited by a colon (":"). For example, the line of /etc/passwd for the root user may look like this:

root:x:0:0:root:/root:/bin/bash

These fields contain the following information, in the following order, separated by a colon character:

  1. Username
  2. Password (shown as x if encrypted)
  3. User ID number (UID)
  4. Group ID number (GID)
  5. Comment field (used by the finger command)
  6. Home Directory
  7. Shell

The username is the first field on the line, so to display each username on the system, use the command:

cut -f 1 -d ':' /etc/passwd

...which will output, for example:

root
daemon
bin
sys
chope

(There are many more user accounts on a typical system, including many accounts specific to system services, but for this example we will pretend there are only five users.)

The third field of each line in the /etc/passwd file is the UID (user ID number), so to display each username and user ID number, use the command:

cut -f 1,3 -d ':' /etc/passwd

...which will output the following, for example:

root:0
daemon:1
bin:2
sys:3
chope:1000

As you can see, the output will be delimited, by default, using the same delimiter character specified for the input. In this case, that's the colon character (":"). You can specify a different delimiter for the input and output, however. So, if you wanted to run the previous command, but have the output delimited by a space, you could use the command:

cut -f 1,3 -d ':' --output-delimiter=' ' /etc/passwd
root 0
daemon 1
bin 2
sys 3
chope 1000

But what if you want the output to be delimited by a tab? Specifying a tab character on the command line is a bit more complicated, because it is an unprintable character. To specify it on the command line, you must "protect" it from the shell. This is done differently depending on which shell you're using, but in the Linux default shell (bash), you can specify the tab character with $'\t'. So the command:

cut -f 1,3 -d ':' --output-delimiter=$'\t' /etc/passwd

...will output the following, for example:

root	0
daemon	1
bin	2
sys	3
chope	1000

查看英文版

查看中文版

cut 例子

cut -c 3 file.txt

输出文件file.txt的每一行的第三个字符,省略其他字符。

cut -c 1-3 file.txt

输出文件file.txt的每一行的前三个字符,省略其余部分。

cut -c -3 file.txt

与以上命令相同。输出file.txt每行的前三个字符。

cut -c 3- file.txt

输出文件file.txt每行的第三个到最后一个字符,省略前两个字符。

cut -d ':' -f 1 /etc/passwd

输出文件/ etc / passwd的第一个字段,其中的字段由冒号(' : ')分隔。/ etc / passwd的第一个字段是username,因此此命令将输出passwd文件中的每个用户名。

grep '/bin/bash' /etc/passwd | cut -d ':' -f 1,6

输出/ etc / passwd文件中将/ bin / bash指定为登录shell 的任何条目的第一和第六字段,以冒号分隔。此命令将输出任何登录外壳程序为/ bin / bash的用户的用户名和主目录。

cut -c 3 file.txt

Output the third character of every line of the file file.txt, omitting the others.

cut -c 1-3 file.txt

Output the first three characters of every line of the file file.txt, omitting the rest.

cut -c -3 file.txt

Same as the above command. Output the first three characters of every line of file.txt.

cut -c 3- file.txt

Output the third through the last characters of each line of the file file.txt, omitting the first two characters.

cut -d ':' -f 1 /etc/passwd

Output the first field of the file /etc/passwd, where fields are delimited by a colon (':'). The first field of /etc/passwd is the username, so this command will output every username in the passwd file.

grep '/bin/bash' /etc/passwd | cut -d ':' -f 1,6

Output the first and sixth fields, delimited by a colon, of any entry in the /etc/passwd file which specifies /bin/bash as the login shell. This command will output the username and home directory of any user whose login shell is /bin/bash.

查看英文版

查看中文版

其他命令行

cu | csplit | crontab | cpio | continue | compress | col | cmp | cksum | chsh | chroot | chkey | cd | chmod | cp | comm | chown | cal | calendar | clear | chfn | cancel | cat | cc | cfdisk | checkeq | checknr | chgrp |

如此好文,分享给朋友
发表评论
验证码:
评论列表
共0条