comm (逐行比较两个排序的文件)

瑞兹 发表于 2021-01-06 09:08
浏览次数:
在手机上阅读

在类似Unix的操作系统上,comm comand逐行比较两个排序的文件。

查看英文版

目录

1 comm 运行系统环境

2 comm 描述

3 comm 语法

4 comm 例子

comm 运行系统环境

Unix&Linux

comm 描述

如果不选择,则comm会产生三列输出。第一列包含FILE1独有的行,第二列包含FILE2独有的行,第三列包含两个文件共同的行。这些列中的每一个都可以使用选项分别抑制。
With no options, comm produces three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. Each of these columns can be suppressed individually with options.

查看英文版

查看中文版

comm 语法

comm [OPTION]... FILE1 FILE2

选件

-1

取消第1列(FILE1独有的行

-2

禁止显示第2列(FILE2独有的行

-3

取消显示第3列(两个文件中均显示行)

--check-order

即使所有输入行都是可配对的,也要检查输入是否正确排序

--nocheck-order

不要检查输入是否正确排序

--output-delimiter = STR

字符串 STR的单独列

--help

显示帮助消息,然后退出。

--version

输出版本信息,然后退出。

comm [OPTION]... FILE1 FILE2

Options

-1

suppress column 1 (lines unique to FILE1)

-2

suppress column 2 (lines unique to FILE2)

-3

suppress column 3 (lines that appear in both files)

--check-order

check that the input is correctly sorted, even if all input lines are pairable

--nocheck-order

do not check that the input is correctly sorted

--output-delimiter=STR

separate columns with string STR

--help

display a help message, and exit.

--version

output version information, and exit.


查看英文版

查看中文版

comm 例子

假设您有两个文本文件,recipe.txtshopping-list.txt

recipe.txt包含以下几行:

All-Purpose Flour
Baking Soda
Bread
Brown Sugar
Chocolate Chips
Eggs
Milk
Salt
Vanilla Extract
White Sugar

shopping-list.txtLIST.TXT包含这些行:

All-Purpose Flour
Bread
Brown Sugar
Chicken Salad
Chocolate Chips
Eggs
Milk
Onions
Pickles
Potato Chips
Soda Pop
Tomatoes
White Sugar

如您所见,这两个文件是不同的,但是许多行是相同的。并非所有配方成分都在购物清单中,并且并非购物清单中的所有内容都是配方的一部分。

如果我们在两个文件上运行comm命令,它将读取两个文件并为我们提供三列输出:

comm recipe.txt shopping-list.txt
        All-Purpose Flour
Baking Soda
        Bread
        Brown Sugar
    Chicken Salad
        Chocolate Chips
        Eggs
        Milk
    Onions
    Pickles
    Potato Chips
Salt
    Soda Pop
    Tomatoes
Vanilla Extract
        White Sugar

在这里,输出的每一行在开头都有零,一或两个制表符,将输出分为三列:

  1. 第一列(零制表符)是仅出现在第一个文件中的行。
  2. 第二列(一个选项卡)是仅出现在第二个文件中的行。
  3. 第三列(两个选项卡)是出现在两个文件中的行。

(列在视觉上重叠,因为在这种情况下,我们的终端将标签页打印为八个空格。在屏幕上看起来可能有所不同。)

接下来,让我们看看如何将分离的数据导入电子表格。

为电子表格创建CSV文件

一种使用comm的有用方法是输出到CSV文件,然后可以通过电子表格程序读取该文件。CSV文件只是使用特定字符(通常是逗号,制表符或分号)的文本文件,以可以作为电子表格读取的方式来分隔数据。按照惯例,CSV文件名的扩展名为 .csv

例如,我们运行相同的命令,但是这次我们使用>运算符将输出重定向到名为output.csv的文件:

comm recipe.txt shopping-list.txt > output.csv

这次屏幕上没有输出。而是将输出发送到名为output.csv的文件。要检查它是否工作正常,我们可以猫的内容output.csv

cat output.csv
All-Purpose Flour
Baking Soda
                Bread
                Brown Sugar
        Chicken Salad
                Chocolate Chips
                Eggs
                Milk
        Onions
        Pickles
        Potato Chips
Salt
        Soda Pop
        Tomatoes
Vanilla Extract
                White Sugar

要将这些数据导入电子表格,我们可以在LibreOffice Calc中将其打开:

在打开文件之前,LibreOffice会问我们如何解释文件数据。

我们希望列定界符为制表符,默认情况下已选中。(我们的数据中没有逗号或分号,因此我们不必担心其他复选框。)根据给定的选项,它还为我们提供了数据外观的预览。

一切看起来不错,因此我们可以单击“确定”,LibreOffice会将数据导入电子表格。

现在,如果需要,我们可以将电子表格保存为其他格式,例如Microsoft Excel文件,XML文件或HTML。

抑制列

如果只想输出特定的列,则可以在命令中指定要取消显示的列号,并在前面加一个破折号。例如,此命令将取消显示第1列和第2列,仅显示第3列(两个文件共享的行)。这将隔离购物清单中也是食谱一部分的项目:

comm -12 recipe.txt shopping-list.txt
All-Purpose Flour
Bread
Brown Sugar
Chocolate Chips
Eggs
Milk
White Sugar

下一条命令将取消显示第2列和第3列,仅显示第1列-配方中不在购物清单中的行。这向我们展示了橱柜中已经拥有的成分:

comm -23 recipe.txt shopping-list.txt
Baking Soda
Salt
Vanilla Extract

接下来的命令将取消显示第3列,仅显示第1列和第2列-食谱中不在购物清单中的项目和购物清单中不在食谱中的项目,它们分别在各自的列中。

comm -3 recipe.txt shopping-list.txt
Baking Soda
        Chicken Salad
        Onions
        Pickles
        Potato Chips
Salt
        Soda Pop
        Tomatoes
Vanilla Extract

Let's say you have two text files, recipe.txt and shopping-list.txt.

recipe.txt contains these lines:

All-Purpose Flour
Baking Soda
Bread
Brown Sugar
Chocolate Chips
Eggs
Milk
Salt
Vanilla Extract
White Sugar

And shopping-list.txt contains these lines:

All-Purpose Flour
Bread
Brown Sugar
Chicken Salad
Chocolate Chips
Eggs
Milk
Onions
Pickles
Potato Chips
Soda Pop
Tomatoes
White Sugar

As you can see, the two files are different, but many of the lines are the same. Not all of the recipe ingredients are on the shopping list, and not everything on the shopping list is part of the recipe.

If we run the comm command on the two files, it will read both files and give us three columns of output:

comm recipe.txt shopping-list.txt
        All-Purpose Flour
Baking Soda
        Bread
        Brown Sugar
    Chicken Salad
        Chocolate Chips
        Eggs
        Milk
    Onions
    Pickles
    Potato Chips
Salt
    Soda Pop
    Tomatoes
Vanilla Extract
        White Sugar

Here, each line of output has either zero, one, or two tabs at the beginning, separating the output into three columns:

  1. The first column (zero tabs) is lines that only appear in the first file.
  2. The second column (one tab) is lines that only appear in the second file.
  3. The third column (two tabs) is lines that appear in both files.

(The columns overlap visually because in this case, our terminal prints a tab as eight spaces. It might look different on your screen.)

Next, let's look at how we can bring our separated data into a spreadsheet.

Creating a CSV file for spreadsheets

One useful way to use comm is to output to a CSV file, which can then be read by a spreadsheet program. CSV files are just text files that use a certain character, usually a comma, tab, or semicolon, to delimit data in a way that can be read as a spreadsheet. By convention, CSV file names have the extension .csv.

For instance, let's run the same command, but this time let's redirect the output to a file called output.csv by using the > operator:

comm recipe.txt shopping-list.txt > output.csv

This time there is no output on the screen. Instead, output is sent to a file called output.csv. To check that it worked correctly, we can cat the contents of output.csv:

cat output.csv
All-Purpose Flour
Baking Soda
                Bread
                Brown Sugar
        Chicken Salad
                Chocolate Chips
                Eggs
                Milk
        Onions
        Pickles
        Potato Chips
Salt
        Soda Pop
        Tomatoes
Vanilla Extract
                White Sugar

To bring this data into a spreadsheet, we can open it in LibreOffice Calc.

Before it opens the file, LibreOffice asks us how to interpret the file data.

We want the column delimiter to be tab characters, which is already checked by default. (There are no commas or semicolons in our data, so we don't have to worry about the other checkboxes.) It also gives us a preview of how the data will look, given the options we selected.

Everything looks good, so we can click OK, and LibreOffice will import our data into a spreadsheet.

Now if we wanted to, we could save the spreadsheet in another format such as a Microsoft Excel file, or an XML file, or even HTML.

Suppressing columns

If you only want to output specific columns, you can specify the column numbers to suppress in the command, preceded by a dash. For instance, this command will suppress columns 1 and 2, displaying only column 3 — lines shared by both files. This isolates the items on the shopping list that are also part of the recipe:

comm -12 recipe.txt shopping-list.txt
All-Purpose Flour
Bread
Brown Sugar
Chocolate Chips
Eggs
Milk
White Sugar

The next command will suppress columns 2 and 3, displaying only column 1 — lines in the recipe that are not in the shopping list. This shows us what ingredients we already have in our cupboard:

comm -23 recipe.txt shopping-list.txt
Baking Soda
Salt
Vanilla Extract

And the next command will suppress column 3, displaying only columns 1 and 2 — the items in the recipe that are not on the shopping list, and the items on the shopping list that are not in the recipe, each in their own column.

comm -3 recipe.txt shopping-list.txt
Baking Soda
        Chicken Salad
        Onions
        Pickles
        Potato Chips
Salt
        Soda Pop
        Tomatoes
Vanilla Extract

查看英文版

查看中文版

其他命令行

cut | cu | csplit | crontab | cpio | continue | compress | col | cmp | cksum | chsh | chroot | chkey | cd | chmod | cp | chown | cal | calendar | clear | chfn | cancel | cat | cc | cfdisk | checkeq | checknr | chgrp |

如此好文,分享给朋友
发表评论
验证码:
评论列表
共0条