JP2017194903A

JP2017194903A - Data comparison program, data comparison device and data comparison method

Info

Publication number: JP2017194903A
Application number: JP2016086079A
Authority: JP
Inventors: 慧杉山; Satoshi Sugiyama; 光樹蓬田; Mitsuki Yomogida; 斉大脇; Hitoshi Owaki; 将佐藤; Susumu Sato; 宏己住田; Hiromi Sumita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-04-22
Filing date: 2016-04-22
Publication date: 2017-10-26

Abstract

PROBLEM TO BE SOLVED: To provide a data comparison program, a data comparison device and a data comparison method capable of enhancing comparison accuracy of data.SOLUTION: The data comparison device receives plural pieces of data each of which includes character string of plural lines. When any of the data in the received plural data includes a piece of segmentation information which indicates a segmentation of a page, the data comparison device generates a piece of data from which the segmentation information is deleted. After generating plural blocks by dividing the generated data, the data comparison device determines whether each of the generated blocks is included in another piece of data in the received plural pieces of data.SELECTED DRAWING: Figure 7

Description

本発明は、データ比較プログラム、データ比較装置及びデータ比較方法に関する。 The present invention relates to a data comparison program, a data comparison device, and a data comparison method.

利用者に対してサービスを提供する事業者（以下、単に事業者とも呼ぶ）は、例えば、利用者に対して各種サービスを提供するために、業務システムの構築を行う。具体的に、事業者は、マニュアル等の文書データの改版が行われた場合に、改版が行われた文書データ（以下、改版後の文書データとも呼ぶ）及び改版が行われる前の文書データ（以下、改版前の文書データとも呼ぶ）を比較する機能を有する業務システムを構築する。 A provider that provides services to users (hereinafter, also simply referred to as a provider) constructs a business system in order to provide various services to users, for example. Specifically, when the document data such as a manual is revised, the business operator has revised the document data (hereinafter, also referred to as document data after the revision) and the document data before the revision ( Hereinafter, a business system having a function of comparing document data before revision) is constructed.

そして、利用者は、改版前の文書データと改版後の文書データとを業務システムに入力し、改版前の文書データの内容と改版後の文書データの内容との差分の特定を行う。これにより、利用者は、改版後の文書データに含まれる内容のうち、改版された内容の把握を効率的に行うことが可能になる（例えば、特許文献１乃至３参照）。 Then, the user inputs the document data before the revision and the document data after the revision into the business system, and specifies the difference between the contents of the document data before the revision and the contents of the document data after the revision. Accordingly, the user can efficiently grasp the revised contents among the contents included in the revised document data (see, for example, Patent Documents 1 to 3).

特開２０１３−４５４３７号公報JP 2013-45437 A 特開２００８−７７５８１号公報JP 2008-77581 A 特開２００１−２９７０８０号公報JP 2001-297080 A

上記のように改版が行われる文書データには、複数のページから構成されるものが存在する。この場合、改版前の文書データ及び改版後の文書データの比較は、例えば、各ページに含まれている内容毎に行われる。 Some document data that is revised as described above includes a plurality of pages. In this case, comparison between the document data before the revision and the document data after the revision is performed for each content included in each page, for example.

しかしながら、文書データの改版が行われた場合、実際に改版が行われていない内容についてもページが変更される場合がある。具体的に、例えば、改版前の文書データの一部が削除されたことに伴って、改版が行われていない内容の一部が前のページに含まれるようになった場合がこれに該当する。そして、業務システムは、この場合、ページが変更された内容についても、改版前の文書データの内容と改版後の文書データの内容との差分として特定する。そのため、業務システムが特定した差分には、実際に改版が行われていない内容が含まれる可能性がある。 However, when document data is revised, the page may be changed even for content that has not been revised. More specifically, for example, when a part of the document data before the revision is deleted, a part of the content that has not been revised is included in the previous page. . In this case, the business system also specifies the content of the changed page as the difference between the content of the document data before the revision and the content of the document data after the revision. For this reason, the difference specified by the business system may include content that has not actually been revised.

そこで、一つの側面では、本発明は、データの比較精度を高めることを可能とするデータ比較プログラム、データ比較装置及びデータ比較方法を提供することを目的とする。 In view of this, an object of one aspect of the present invention is to provide a data comparison program, a data comparison device, and a data comparison method that can increase the accuracy of data comparison.

実施の形態の一態様では、複数行の文字列をそれぞれ含む複数のデータを受け付け、受け付けた前記複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、前記何らかのデータから前記切れ目情報を削除したデータを生成し、生成した前記データを分割して複数のブロックを生成し、生成した前記ブロックそれぞれが、受け付けた前記複数のデータのうちの他のデータに含まれるか否かを判定する、処理をコンピュータに実行させる。 In one aspect of the embodiment, when a plurality of pieces of data each including a plurality of lines of character strings are received, and when any piece of data among the plurality of pieces of received data includes break information indicating a break of a page, Generate data in which the break information is deleted, divide the generated data to generate a plurality of blocks, and whether each of the generated blocks is included in other data of the received plurality of data To make the computer execute the process.

一つの側面によれば、データの比較精度を高めることを可能とする。 According to one aspect, it is possible to improve data comparison accuracy.

図１は、情報処理システム１０の構成を示す図である。FIG. 1 is a diagram illustrating a configuration of the information processing system 10. 図２は、文書データを比較する際の具体例を説明する図である。FIG. 2 is a diagram illustrating a specific example when comparing document data. 図３は、文書データを比較する際の具体例を説明する図である。FIG. 3 is a diagram illustrating a specific example when comparing document data. 図４は、本実施の形態におけるデータ比較処理の概略を説明する図である。FIG. 4 is a diagram for explaining the outline of the data comparison processing in the present embodiment. 図５は、情報処理装置１のハードウエア構成を示す図である。FIG. 5 is a diagram illustrating a hardware configuration of the information processing apparatus 1. 図６は、情報処理装置１の機能ブロック図である。FIG. 6 is a functional block diagram of the information processing apparatus 1. 図７は、第１の実施の形態におけるデータ比較処理の概略を説明するフローチャート図である。FIG. 7 is a flowchart for explaining the outline of the data comparison process according to the first embodiment. 図８は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 8 is a flowchart for explaining the details of the data comparison processing in the first embodiment. 図９は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 9 is a flowchart for explaining the details of the data comparison processing in the first embodiment. 図１０は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 10 is a flowchart for explaining the details of the data comparison process in the first embodiment. 図１１は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 11 is a flowchart for explaining the details of the data comparison processing in the first embodiment. 図１２は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 12 is a flowchart for explaining the details of the data comparison processing in the first embodiment. 図１３は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 13 is a flowchart for explaining the details of the data comparison processing in the first embodiment. 図１４は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 14 is a flowchart for explaining the details of the data comparison process in the first embodiment. 図１５は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。FIG. 15 is a flowchart for explaining the details of the data comparison process in the first embodiment. 図１６は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 16 is a diagram illustrating a specific example of the processing from S12 to S25. 図１７は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 17 is a diagram illustrating a specific example of the processing from S12 to S25. 図１８は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 18 is a diagram illustrating a specific example of the processing from S12 to S25. 図１９は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 19 is a diagram illustrating a specific example of the processing from S12 to S25. 図２０は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 20 is a diagram illustrating a specific example of the processing from S12 to S25. 図２１は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 21 is a diagram illustrating a specific example of the processing from S12 to S25. 図２２は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 22 is a diagram illustrating a specific example of the processing from S12 to S25. 図２３は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 23 is a diagram illustrating a specific example of the processing from S12 to S25. 図２４は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 24 is a diagram illustrating a specific example of the processing from S12 to S25. 図２５は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 25 is a diagram illustrating a specific example of the processing from S12 to S25. 図２６は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 26 is a diagram illustrating a specific example of the processing from S12 to S25. 図２７は、Ｓ１２からＳ２５の処理の具体例を説明する図である。FIG. 27 is a diagram illustrating a specific example of the processing from S12 to S25. 図２８は、ブロック管理情報１３３の具体例を説明する図である。FIG. 28 is a diagram illustrating a specific example of the block management information 133. 図２９は、Ｓ３６の処理で特定された同一ブロックについて説明する図である。FIG. 29 is a diagram for describing the same block identified in the process of S36. 図３０は、データ比較処理の結果を出力した際の具体例を説明する図である。FIG. 30 is a diagram illustrating a specific example when the result of the data comparison process is output.

［情報処理システムの構成］
図１は、情報処理システム１０の構成を示す図である。図１に示す情報処理システム１０は、情報処理装置１（以下、データ比較装置１またはコンピュータ１とも呼ぶ）と、記憶部２とを有する。そして、利用者端末１１は、インターネットやイントラネット等からなるネットワークＮＷを介して情報処理装置１にアクセスすることが可能である。 [Configuration of information processing system]
FIG. 1 is a diagram illustrating a configuration of the information processing system 10. An information processing system 10 illustrated in FIG. 1 includes an information processing apparatus 1 (hereinafter also referred to as a data comparison apparatus 1 or a computer 1) and a storage unit 2. The user terminal 11 can access the information processing apparatus 1 via the network NW including the Internet and an intranet.

情報処理装置１には、例えば、マニュアル等の文書データの改版が行われた場合に、改版前の文書データ（以下、比較元データとも呼ぶ）と、改版後の文書データ（以下、比較先データとも呼ぶ）とを比較する処理（以下、データ比較処理とも呼ぶ）を行う業務システムを構築されている。 In the information processing apparatus 1, for example, when document data such as a manual is revised, document data before revision (hereinafter also referred to as comparison source data) and document data after revision (hereinafter referred to as comparison destination data). A business system is constructed that performs processing for comparison (hereinafter also referred to as data comparison processing).

そして、情報処理装置１は、例えば、利用者が利用者端末１１を介して改版前の文書データと改版後の文書データとを入力した場合、改版前の文書データの内容と改版後の文書データの内容との差分の特定を行い、特定した差分を示す情報を利用者端末１１に送信する。具体的に、情報処理装置１は、例えば、Ｏ（ＮＤ）アルゴリズムを用いることにより、改版前の文書データの内容と改版後の文書データの内容との差分の特定を行う。 For example, when the user inputs the document data before the revision and the document data after the revision via the user terminal 11, the information processing apparatus 1 describes the contents of the document data before the revision and the document data after the revision. The information indicating the difference is transmitted to the user terminal 11. Specifically, the information processing apparatus 1 specifies the difference between the content of the document data before the revision and the content of the document data after the revision by using, for example, an O (ND) algorithm.

記憶部２は、利用者端末１１を介して入力された改版前の文書データや改版後の文書データ等を記憶する。 The storage unit 2 stores document data before revision, document data after revision, and the like input via the user terminal 11.

［データを比較する際の具体例］
次に、データを比較する際の具体例について説明を行う。図２及び図３は、文書データを比較する際の具体例を説明する図である。具体的に、図２は、それぞれ１ページからなる改版前の文書データと改版後の文書データとを比較する際の具体例を説明する図である。以下、改修前の文書データがＰＧ１−１であり、改修後の文書データがＰＧ２−１であるものとして説明を行う。また、以下、図２等で説明する文書データには、画像データが含まれる場合があるものとする。 [Specific example when comparing data]
Next, a specific example when comparing data will be described. 2 and 3 are diagrams for explaining a specific example when comparing document data. Specifically, FIG. 2 is a diagram for explaining a specific example when comparing document data before revision and document data after revision each consisting of one page. In the following description, it is assumed that the document data before the modification is PG1-1 and the document data after the modification is PG2-1. In the following description, it is assumed that the document data described with reference to FIG.

図２に示す例において、ＰＧ２−１には、一点鎖線で囲まれた文書データであるＤＦ２−１が含まれている。一方、ＰＧ１−１には、ＤＦ２−１と同じ内容の文書データが含まれていない。そのため、情報処理装置１は、図２に示すＰＧ１−１とＰＧ２−１との比較を行った場合、図２に示すように、ＤＦ２−１を差分として検知する。 In the example shown in FIG. 2, PG2-1 includes DF2-1 which is document data surrounded by a one-dot chain line. On the other hand, PG1-1 does not include document data having the same contents as DF2-1. Therefore, when the information processing apparatus 1 compares PG1-1 and PG2-1 illustrated in FIG. 2, the information processing apparatus 1 detects DF2-1 as a difference as illustrated in FIG.

次に、それぞれ複数ページからなる改修前の文書データと改修後の文書データとを比較する際の具体例について説明を行う。図３は、それぞれ複数ページからなる改修前の文書データと改修後の文書データとを比較する際の具体例を説明する図である。以下、改修前の文書データがＰＧ１−１及びＰＧ１−２からなる文書データであり、改修後の文書データがＰＧ２−１及びＰＧ２−２からなる文書データであるものとして説明を行う。 Next, a specific example when comparing the document data before modification and the document data after modification each consisting of a plurality of pages will be described. FIG. 3 is a diagram for explaining a specific example when comparing document data before modification and document data after modification each consisting of a plurality of pages. In the following description, it is assumed that the document data before modification is document data composed of PG1-1 and PG1-2, and the document data after modification is document data composed of PG2-1 and PG2-2.

図３に示す例において、ＰＧ２−１には、図２で説明した場合と同様に、ＤＦ２−１が含まれている。一方、ＰＧ１−１には、図２で説明した場合と同様に、ＤＦ２−１が含まれていない。そのため、情報処理装置１は、図３に示すＰＧ１−１とＰＧ２−１との比較を行った場合、図３に示すように、ＤＦ２−１を差分として検知する。 In the example shown in FIG. 3, PG2-1 includes DF2-1 as in the case described with reference to FIG. On the other hand, PG2-1 does not include DF2-1, as in the case described with reference to FIG. Therefore, when comparing PG1-1 and PG2-1 shown in FIG. 3, the information processing apparatus 1 detects DF2-1 as a difference as shown in FIG.

また、図３に示す例において、ＰＧ１−１には、一点鎖線で囲まれた文書データであるＤＦ１−１（一段落からなる文書データ）が含まれている。一方、ＰＧ２−１には、ＤＦ１−１と同じ内容の文書データのうちの一部であるＤＦ２−２のみが含まれている。そして、ＰＧ２−２には、ＤＦ１−１と同じ内容の文書データのうちのＰＧ２−１に含まれていない部分の文書データであるＤＦ２−３が含まれている。すなわち、図３に示す例では、ＰＧ２−１にＤＦ２−１に追加されたことに伴って、ＤＦ２−３がＰＧ２−１からＰＧ２−２に移動している。 In the example shown in FIG. 3, PG1-1 includes DF1-1 (document data consisting of one paragraph) that is document data surrounded by a one-dot chain line. On the other hand, PG2-1 includes only DF2-2 which is a part of document data having the same contents as DF1-1. PG2-2 includes DF2-3, which is document data of a portion not included in PG2-1 in the document data having the same contents as DF1-1. That is, in the example shown in FIG. 3, DF2-3 has moved from PG2-1 to PG2-2 with the addition of DF2-1 to PG2-1.

そして、図３に示す改修前の文書データと、改修後の文書データとの比較を行う場合、情報処理装置１は、例えば、各ページに含まれている内容毎に比較を行う。ここで、ＰＧ１−１には、ＤＦ１−１の全てが含まれているのに対し、ＰＧ２−１には、ＤＦ１−１と同じ内容の文書データの一部であるＤＦ２−２のみが含まれている。そのため、情報処理装置１は、ＰＧ１−１とＰＧ２−１との比較を行う場合、例えば、ＰＧ１−１におけるＤＦ１−１とＰＧ２−１におけるＤＦ２−２とを差分として検知する。また、ＰＧ１−２には、ＤＦ２−３が含まれていないのに対し、ＰＧ２−２には、ＤＦ２−３が含まれている。そのため、情報処理装置１は、ＰＧ１−２とＰＧ２−２との比較を行う場合、例えば、ＤＦ２−２を差分として検知する。 Then, when comparing the document data before the modification shown in FIG. 3 with the document data after the modification, the information processing apparatus 1 performs a comparison for each content included in each page, for example. Here, PG1-1 includes all of DF1-1, whereas PG2-1 includes only DF2-2, which is a part of document data having the same contents as DF1-1. ing. Therefore, the information processing apparatus 1 detects DF1-1 in PG1-1 and DF2-2 in PG2-1 as a difference, for example, when comparing PG1-1 and PG2-1. PG1-2 does not include DF2-3, whereas PG2-2 includes DF2-3. Therefore, the information processing apparatus 1 detects DF2-2 as a difference, for example, when comparing PG1-2 and PG2-2.

したがって、情報処理装置１が検知した差分には、実際に改版が行われていない内容が含まれる可能性がある。 Therefore, the difference detected by the information processing apparatus 1 may include content that has not been actually revised.

そこで、本実施の形態における情報処理装置１は、複数行の文字列をそれぞれ含む複数のデータを受け付ける。そして、情報処理装置１は、受け付けた複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、その何らかのデータから切れ目情報を削除したデータを生成し、生成したデータを分割して複数のブロックを生成する。その後、情報処理装置１は、生成したブロックそれぞれが、受け付けた複数のデータのうちの他のデータに含まれるか否かを判定する。 Therefore, the information processing apparatus 1 in the present embodiment accepts a plurality of data each including a plurality of lines of character strings. Then, when some data among the received plurality of data includes break information indicating a break in the page, the information processing apparatus 1 generates data in which the break information is deleted from the certain data, and divides the generated data To generate a plurality of blocks. Thereafter, the information processing apparatus 1 determines whether each of the generated blocks is included in other data among the received plurality of data.

すなわち、情報処理装置１は、複数ページからなるデータを受け付けた場合、切れ目情報を削除してから、文書データの比較を行う単位であるブロックの生成を行う。これにより、情報処理装置１は、ページの切れ目によって必要以上に分割されるブロックの発生を防止することが可能になる。そのため、情報処理装置１は、比較元データと比較先データとに共通して含まれる文書データ（改版が行われていない文書データ）のページの切れ目が異なる場合であっても、そのデータをそれぞれ含むブロックを同一の内容を含むブロックとして判定することが可能になる。したがって、情報処理装置１は、比較元データと比較先データとの比較精度を高めることが可能になる。以下、本実施の形態における情報処理装置１が実行するデータ比較処理の概略について説明を行う。 That is, when the information processing apparatus 1 receives data including a plurality of pages, the information processing apparatus 1 generates a block which is a unit for comparing document data after deleting the break information. As a result, the information processing apparatus 1 can prevent occurrence of blocks that are divided more than necessary due to page breaks. For this reason, the information processing apparatus 1 stores the data even when the page breaks of the document data (document data that has not been revised) included in the comparison source data and the comparison destination data are different. It is possible to determine a block including a block including the same content. Therefore, the information processing apparatus 1 can increase the comparison accuracy between the comparison source data and the comparison destination data. Hereinafter, an outline of the data comparison process executed by the information processing apparatus 1 according to the present embodiment will be described.

［本実施の形態におけるデータ比較処理の概略］
図４は、本実施の形態におけるデータ比較処理の概略を説明する図である。図４に示すＤＴ１−１は、図３で説明したＰＧ１−１とＰＧ２−１からページの切れ目情報を削除したデータである。また、図４に示すＤＴ２−１は、図３で説明したＰＧ１−２とＰＧ２−２からページの切れ目情報を削除したデータである。 [Outline of data comparison processing in this embodiment]
FIG. 4 is a diagram for explaining the outline of the data comparison processing in the present embodiment. DT1-1 shown in FIG. 4 is data obtained by deleting page break information from PG1-1 and PG2-1 described in FIG. DT2-1 shown in FIG. 4 is data obtained by deleting page break information from PG1-2 and PG2-2 described in FIG.

図４に示す例において、ＤＦ２−４は、図３で説明したＤＦ２−２とＤＦ２−３とに対応する。すなわち、情報処理装置１がＰＧ１−１とＰＧ２−１からページの切れ目情報を削除することにより、ＤＴ２−１には、ＤＦ２−２とＤＦ２−３とが連結されたＤＦ２−４が含まれる。そのため、情報処理装置１は、ＤＴ２−１を複数のブロックに分割する際に、ＤＦ２−４を１つのブロックに含めるようにブロックの分割を行うことが可能になる。したがって、情報処理装置１は、ＤＴ１−１とＤＴ２−１とを比較した際に、ＤＦ２−１を含むブロックと、ＤＦ２−４を含むブロックとが同一の内容を含むブロックである判定することが可能になる。 In the example shown in FIG. 4, DF2-4 corresponds to DF2-2 and DF2-3 described in FIG. That is, when the information processing apparatus 1 deletes the page break information from PG1-1 and PG2-1, DT2-1 includes DF2-4 in which DF2-2 and DF2-3 are connected. Therefore, when the information processing apparatus 1 divides DT2-1 into a plurality of blocks, the information processing apparatus 1 can divide the blocks so that DF2-4 is included in one block. Therefore, when the information processing apparatus 1 compares DT1-1 and DT2-1, it is determined that the block including DF2-1 and the block including DF2-4 are blocks including the same contents. It becomes possible.

［情報処理装置のハードウエア構成］
次に、情報処理装置１のハードウエア構成について説明する。図５は、情報処理装置１のハードウエア構成を示す図である。 [Hardware configuration of information processing device]
Next, the hardware configuration of the information processing apparatus 1 will be described. FIG. 5 is a diagram illustrating a hardware configuration of the information processing apparatus 1.

情報処理装置１は、プロセッサであるＣＰＵ１０１と、メモリ１０２と、外部インターフェース（Ｉ／Ｏユニット）１０３と、記憶媒体１０４とを有する。各部は、バス１０５を介して互いに接続される。 The information processing apparatus 1 includes a CPU 101 that is a processor, a memory 102, an external interface (I / O unit) 103, and a storage medium 104. Each unit is connected to each other via a bus 105.

記憶媒体１０４は、例えば、記憶媒体１０４内のプログラム格納領域（図示しない）に、データ比較処理等を行うためのプログラム１１０を記憶する。また、記憶媒体１０４は、例えば、データ比較処理等を行う際に用いられる情報を記憶する情報格納領域１３０（以下、記憶部１３０とも呼ぶ）を有する。 The storage medium 104 stores, for example, a program 110 for performing data comparison processing or the like in a program storage area (not shown) in the storage medium 104. In addition, the storage medium 104 has an information storage area 130 (hereinafter also referred to as a storage unit 130) that stores information used when performing a data comparison process or the like, for example.

ＣＰＵ１０１は、図５に示すように、プログラム１１０の実行時に、プログラム１１０を記憶媒体１０４からメモリ１０２にロードし、プログラム１１０と協働してデータ比較処理等を行う。また、外部インターフェース１０３は、利用者端末１１と通信を行う。 As shown in FIG. 5, when executing the program 110, the CPU 101 loads the program 110 from the storage medium 104 to the memory 102 and performs data comparison processing in cooperation with the program 110. The external interface 103 communicates with the user terminal 11.

［情報処理装置の機能］
次に、情報処理装置１の機能について説明する。図６は、情報処理装置１の機能ブロック図である。 [Functions of information processing device]
Next, functions of the information processing apparatus 1 will be described. FIG. 6 is a functional block diagram of the information processing apparatus 1.

情報処理装置１のＣＰＵ１０１は、プログラム１１０と協働することにより、例えば、データ受け付け部１１１（以下、単に受け付け部１１１とも呼ぶ）と、データ生成部１１２と、ブロック生成部１１３として動作する。また、情報処理装置１のＣＰＵ１０１は、プログラム１１０と協働することにより、例えば、ブロック判定部１１４（以下、単に判定部１１４とも呼ぶ）と、結果出力部１１５として動作する。 The CPU 101 of the information processing apparatus 1 operates as, for example, a data receiving unit 111 (hereinafter also simply referred to as a receiving unit 111), a data generating unit 112, and a block generating unit 113 in cooperation with the program 110. Further, the CPU 101 of the information processing apparatus 1 operates as a block determination unit 114 (hereinafter also simply referred to as a determination unit 114) and a result output unit 115 by cooperating with the program 110, for example.

さらに、情報格納領域１３０には、比較元データ１３１と、比較先データ１３２と、ブロック管理情報１３３とが記憶されている。情報格納領域１３０は、例えば、図１で説明した記憶部２に対応する。 Further, the information storage area 130 stores comparison source data 131, comparison destination data 132, and block management information 133. The information storage area 130 corresponds to, for example, the storage unit 2 described with reference to FIG.

データ受け付け部１１１は、例えば、利用者が利用者端末１１を介して送信した複数のデータを送信した場合に、送信された複数のデータを受け付ける。複数のデータは、それぞれ複数行の文字列を含むデータである。以下、複数のデータには、利用者が情報処理装置１に内容の比較を要求する比較元データ１３１及び比較先データ１３２が含まれるものとして説明を行う。 For example, when the user transmits a plurality of data transmitted via the user terminal 11, the data receiving unit 111 receives the plurality of transmitted data. The plurality of data are data each including a plurality of lines of character strings. In the following description, it is assumed that the plurality of data includes comparison source data 131 and comparison destination data 132 for which the user requests the information processing apparatus 1 to compare contents.

データ生成部１１２は、データ受け付け部１１１が受け付けた複数のデータのうちの何らかのデータ（例えば、比較先データ１３２）にページの切れ目を示す切れ目情報が含まれる場合、そのいずれかのデータから切れ目情報を削除したデータを生成する。切れ目情報は、例えば、ページの先頭を示すヘッダー情報と、ページの最後を示すフッター情報とを含む情報である。 When any data (for example, comparison destination data 132) of the plurality of data received by the data receiving unit 111 includes break information indicating a break of the page, the data generation unit 112 determines break information from any of the data. Generate data with deleted. The break information is information including, for example, header information indicating the top of the page and footer information indicating the end of the page.

ブロック生成部１１３は、データ生成部１１２が生成したデータを分割することにより、複数のブロックを生成する。 The block generation unit 113 generates a plurality of blocks by dividing the data generated by the data generation unit 112.

ブロック判定部１１４は、ブロック生成部１１３が生成したブロックのそれぞれが、データ受け付け部１１１が受け付けた複数のデータのうちの他のデータ（例えば、比較元データ１３１）に含まれるか否かを判定する。 The block determination unit 114 determines whether each of the blocks generated by the block generation unit 113 is included in other data (for example, comparison source data 131) among the plurality of data received by the data reception unit 111. To do.

結果出力部１１５は、ブロック判定部１１４による判定結果を示す情報を、出力装置（例えば、利用者端末１１）に対して出力する。 The result output unit 115 outputs information indicating the determination result by the block determination unit 114 to the output device (for example, the user terminal 11).

なお、データ受け付け部１１１は、例えば、比較元データ１３１または比較先データ１３２を受け付けた場合、受け付けたデータを情報格納領域１３０に記憶する。また、ブロック生成部１１３は、例えば、生成した複数のブロックを示す情報であるブロック管理情報１３３を作成し、作成したブロック管理情報１３３を情報格納領域１３０に記憶する。 For example, when receiving the comparison source data 131 or the comparison destination data 132, the data reception unit 111 stores the received data in the information storage area 130. Also, the block generation unit 113 creates block management information 133 that is information indicating a plurality of generated blocks, for example, and stores the created block management information 133 in the information storage area 130.

［第１の実施の形態］
次に、第１の実施の形態について説明する。図７は、第１の実施の形態におけるデータ比較処理の概略を説明するフローチャート図である。 [First Embodiment]
Next, a first embodiment will be described. FIG. 7 is a flowchart for explaining the outline of the data comparison process according to the first embodiment.

情報処理装置１は、データ比較タイミングになるまで待機する（Ｓ１のＮＯ）。データ比較タイミングは、例えば、利用者端末１１から比較元データ１３１及び比較先データ１３２が送信されたタイミングであってよい。また、例えば、比較元データ１３１及び比較先データ１３２が予め情報格納領域１３０に記憶されている場合、データ比較タイミングは、例えば、利用者が情報処理装置１に対してデータの比較を開始する旨の入力を行ったタイミングであってよい。 The information processing apparatus 1 waits until the data comparison timing comes (NO in S1). The data comparison timing may be a timing at which the comparison source data 131 and the comparison destination data 132 are transmitted from the user terminal 11, for example. For example, when the comparison source data 131 and the comparison destination data 132 are stored in the information storage area 130 in advance, the data comparison timing is, for example, that the user starts comparing data with the information processing apparatus 1. It may be the timing when the input is performed.

そして、データ比較タイミングになった場合（Ｓ１のＹＥＳ）、情報処理装置１は、比較元データ１３１及び比較先データ１３２にページの切れ目情報が含まれているか否かを判定する（Ｓ２）。その結果、ページの切れ目情報が含まれていると判定した場合（Ｓ２のＹＥＳ）、情報処理装置１は、比較元データ１３１及び比較先データ１３２に含まれるページの切れ目情報を削除する（Ｓ３）。一方、ページの切れ目情報が含まれていないと判定した場合（Ｓ２のＮＯ）、情報処理装置１は、Ｓ３の処理を行わない。 When the data comparison timing comes (YES in S1), the information processing apparatus 1 determines whether or not page break information is included in the comparison source data 131 and the comparison destination data 132 (S2). As a result, when it is determined that the page break information is included (YES in S2), the information processing apparatus 1 deletes the page break information included in the comparison source data 131 and the comparison destination data 132 (S3). . On the other hand, when it is determined that the page break information is not included (NO in S2), the information processing apparatus 1 does not perform the process in S3.

続いて、情報処理装置１は、Ｓ３の処理で作成したデータを分割して複数のブロックを生成する（Ｓ４）。その後、情報処理装置１は、Ｓ４の処理で生成したブロックそれぞれが、Ｓ１の処理で受け付けた複数のデータのうちの他のデータに含まれるか否かを判定する（Ｓ５）。 Subsequently, the information processing apparatus 1 generates a plurality of blocks by dividing the data created in the process of S3 (S4). Thereafter, the information processing apparatus 1 determines whether each of the blocks generated in the process of S4 is included in other data among the plurality of data received in the process of S1 (S5).

すなわち、情報処理装置１は、Ｓ１の処理で複数ページからなるデータを受け付けた場合、切れ目情報を削除してからブロックの生成を行う。これにより、情報処理装置１は、同一のブロックに含まれるべき文書データが複数のブロックに分かれて含まれることを防止することが可能になる。 That is, when the information processing apparatus 1 receives data including a plurality of pages in the process of S1, the information processing apparatus 1 generates a block after deleting the break information. As a result, the information processing apparatus 1 can prevent document data that should be included in the same block from being divided into a plurality of blocks.

このように、情報処理装置１は、複数行の文字列をそれぞれ含む複数のデータを受け付ける。そして、情報処理装置１は、受け付けた複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、その何らかのデータから切れ目情報を削除したデータを生成し、生成したデータを分割して複数のブロックを生成する。その後、情報処理装置１は、生成したブロックそれぞれが、受け付けた複数のデータのうちの他のデータに含まれるか否かを判定する。 As described above, the information processing apparatus 1 receives a plurality of data each including a plurality of lines of character strings. Then, when some data among the received plurality of data includes break information indicating a break in the page, the information processing apparatus 1 generates data in which the break information is deleted from the certain data, and divides the generated data To generate a plurality of blocks. Thereafter, the information processing apparatus 1 determines whether each of the generated blocks is included in other data among the received plurality of data.

これにより、情報処理装置１は、比較元データと比較先データとに共通して含まれる文書データ（改版が行われていない文書データ）のページの切れ目が異なる場合であっても、そのデータをそれぞれ含むブロックを同一の内容を含むブロックとして判定することが可能になる。そのため、情報処理装置１は、比較元データと比較先データとの比較精度を高めることが可能になる。 As a result, the information processing apparatus 1 stores the data even when the page breaks of the document data (document data that has not been revised) included in the comparison source data and the comparison destination data are different. It is possible to determine the blocks including each as a block including the same contents. Therefore, the information processing apparatus 1 can increase the comparison accuracy between the comparison source data and the comparison destination data.

［第１の実施の形態の詳細］
次に、第１の実施の形態の詳細について説明する。図８から図１５は、第１の実施の形態におけるデータ比較処理の詳細を説明するフローチャート図である。また、図１６から図３０は、第１の実施の形態におけるデータ比較処理の詳細を説明する図である。図１６から図３０を参照しながら、図８から図１５のデータ比較処理の詳細を説明する。 [Details of First Embodiment]
Next, details of the first embodiment will be described. FIGS. 8 to 15 are flowcharts for explaining details of the data comparison processing in the first embodiment. FIGS. 16 to 30 are diagrams for explaining the details of the data comparison processing in the first embodiment. Details of the data comparison processing of FIGS. 8 to 15 will be described with reference to FIGS.

［Ｓ１からＳ４の処理の詳細］
初めに、図７で説明したＳ１からＳ５のうち、Ｓ１からＳ４の処理の詳細について説明を行う。データ受け付け部１１１は、図８に示すように、利用者端末１１から比較元データ１３１及び比較先データ１３２を受け付けるまで待機する（Ｓ１１のＮＯ）。そして、比較元データ１３１及び比較先データ１３２を受け付けた場合（Ｓ１１のＹＥＳ）、データ生成部１１２は、Ｓ１１の処理で受け付けた文書データのうちの１つを取得する（Ｓ１２）。具体的に、データ生成部１１２は、Ｓ１１の処理で受け付けた比較元データ１３１及び比較先データ１３２のうちのいずれかの文書データを取得する。また、データ受け付け部１１１は、この場合、受け付けた文書データを情報格納領域１３０に記憶する。 [Details of processing from S1 to S4]
First, details of the processing from S1 to S4 among S1 to S5 described in FIG. 7 will be described. As shown in FIG. 8, the data reception unit 111 waits until the comparison source data 131 and the comparison destination data 132 are received from the user terminal 11 (NO in S11). When the comparison source data 131 and the comparison destination data 132 are received (YES in S11), the data generation unit 112 acquires one of the document data received in the process of S11 (S12). Specifically, the data generation unit 112 acquires any document data of the comparison source data 131 and the comparison destination data 132 received in the process of S11. In this case, the data receiving unit 111 stores the received document data in the information storage area 130.

そして、データ生成部１１２は、Ｓ１２の処理で取得した文書データ（例えば、比較元データ）に切れ目情報が含まれているか否か、すなわち、Ｓ１２の処理で取得した文書データが複数のページからなる文書データであるか否かを判定する（Ｓ１３）。その結果、切れ目情報が含まれていると判定した場合（Ｓ１３のＹＥＳ）、データ生成部１１２は、Ｓ１２の処理で取得した文書データから切れ目情報を削除した文書データである加工データを生成する（Ｓ１４）。すなわち、データ生成部１１２は、Ｓ１２の処理で取得した文書データが複数のページからなる文書データである場合、その文書データを１ページにまとめることによって、加工データの生成を行う。 Then, the data generation unit 112 determines whether or not the break data is included in the document data (for example, comparison source data) acquired in the process of S12, that is, the document data acquired in the process of S12 includes a plurality of pages. It is determined whether or not the document data (S13). As a result, when it is determined that the break information is included (YES in S13), the data generation unit 112 generates processed data that is document data obtained by deleting the break information from the document data acquired in the process of S12 ( S14). That is, when the document data acquired in the process of S12 is document data including a plurality of pages, the data generation unit 112 generates processed data by collecting the document data into one page.

その後、データ生成部１１２は、Ｓ１４の処理で生成した加工データから、Ｓ１２の処理で取得した文書データの１ページ分に相当するデータ（以下、処理対象データとも呼ぶ）を取得する（Ｓ１５）。そして、データ生成部１１２は、Ｓ１５の処理で取得された処理対象データに対応するページの前ページに含まれる文書データにおいて、含まれるブロックが決定していない文書データ（以下、残データとも呼ぶ）が存在するか否かを判定する（Ｓ１６）。そして、残データが存在すると判定した場合（Ｓ１６のＹＥＳ）、データ生成部１１２は、Ｓ１５の処理で取得した処理対象データの前に、Ｓ１６の処理で存在した残データを連結する（Ｓ１７）。 Thereafter, the data generation unit 112 acquires data corresponding to one page of the document data acquired in the process of S12 (hereinafter also referred to as processing target data) from the processed data generated in the process of S14 (S15). Then, the data generation unit 112 includes document data in which the included block is not determined in the document data included in the previous page of the page corresponding to the processing target data acquired in S15 (hereinafter also referred to as remaining data). Is determined (S16). If it is determined that there is remaining data (YES in S16), the data generation unit 112 concatenates the remaining data existing in the process of S16 before the processing target data acquired in the process of S15 (S17).

すなわち、各ページに含まれる文書データには、次のページに含まれる文書データと単一のブロックに含まれるべき文書データが含まれる場合がある。そのため、ブロック生成部１１３は、後述するように、各ページに含まれる文書データのうち、次のページに含まれる文書データと単一のブロックに含まれる可能性がある文書データ（残データ）について、次のページに含まれる文書データとともにブロックの生成を行うか否かの判定を行う。 That is, the document data included in each page may include document data included in the next page and document data that should be included in a single block. Therefore, as will be described later, the block generation unit 113, among the document data included in each page, the document data included in the next page and the document data (remaining data) that may be included in a single block. Then, it is determined whether or not to generate a block together with the document data included in the next page.

これにより、データ生成部１１２は、単一のブロックに含まれるべき文書データが複数のページに分かれている場合であっても、これらの文書データを単一のブロックに含めることが可能になる。 As a result, the data generation unit 112 can include these document data in a single block even when the document data to be included in the single block is divided into a plurality of pages.

その後、データ生成部１１２は、Ｓ１７の処理で連結した残データの最終行の文字列の最後が文章の終わりを示しているか否か（文字列の最後が句点や終止符等であるか否か）を判定する（Ｓ１８）。そして、残データの最終行の文字列の最後が文章の終わりを示していないと判定した場合（Ｓ１８のＮＯ）、データ生成部１１２は、残データの最終行と、Ｓ１５の処理で取得したページの先頭行との間の間隔を、文章が複数の行からなる場合における各行の間隔と同じ間隔に変更する（Ｓ１９）。一方、残データの最終行の文字列の最後が文章の終わりを示していると判定した場合（Ｓ１８のＹＥＳ）、データ生成部１１２は、Ｓ１９の処理を行わない。 Thereafter, the data generation unit 112 determines whether the end of the character string in the last line of the remaining data connected in the process of S17 indicates the end of the sentence (whether the end of the character string is a punctuation mark, a period, or the like). Is determined (S18). If it is determined that the end of the character string in the last line of the remaining data does not indicate the end of the sentence (NO in S18), the data generation unit 112 displays the last line of the remaining data and the page acquired in the process in S15. Is changed to the same interval as the interval of each line when the sentence is composed of a plurality of lines (S19). On the other hand, if it is determined that the end of the character string in the last line of the remaining data indicates the end of the sentence (YES in S18), the data generation unit 112 does not perform the process in S19.

すなわち、残データの最終行の文字列の最後が文章の終わりを示していない場合、残データの最終行の文字列と、Ｓ１５の処理で取得したページの先頭行の文字列とは、同一の文章に含まれる文字列である可能性がある。そして、ブロック生成部１１３は、Ｓ１１の処理で受け付けた複数の文書データの比較精度を向上させる観点から、これらの文字列が同一の文章に含まれる場合、単一のブロックに含めるようにブロックの生成を行うことが好ましい。 That is, when the last character string of the remaining data does not indicate the end of the sentence, the character string of the last line of the remaining data is the same as the character string of the first line of the page acquired in the process of S15. It may be a character string included in the sentence. Then, from the viewpoint of improving the comparison accuracy of the plurality of document data received in the process of S11, the block generation unit 113, when these character strings are included in the same sentence, It is preferable to perform the generation.

そこで、データ生成部１１２は、Ｓ１９の処理において、残データの最終行の文字列の最後が文章の終わりを示していない場合、残データの最終行と、Ｓ１５の処理で取得したページの先頭行との間の間隔を、Ｓ１１の処理で受け付けた文書データにおける通常の行間と同じ間隔に変更する。これにより、ブロック生成部１１３は、この場合、後述するように、残データの最終行とＳ１５の処理で取得したページの先頭行との間に、ブロックの分割を行うことを示す情報（以下、分割情報とも呼ぶ）が存在しないものと判定することが可能になる。そのため、ブロック生成部１１３は、後述するように、残データの最終行の文字列とＳ１５の処理で取得したページの先頭行の文字列とが単一のブロックに含まれるようにブロックの生成を行うことが可能になる。 Therefore, in the process of S19, when the last character string of the last line of the remaining data does not indicate the end of the sentence, the data generation unit 112 and the first line of the page acquired in the process of S15. Is changed to the same interval as the normal line spacing in the document data received in the process of S11. Thereby, in this case, as will be described later, the block generation unit 113 is information indicating that the block is divided between the last row of the remaining data and the first row of the page acquired in the process of S15 (hereinafter, referred to as “block division”). It is possible to determine that there is no division information). Therefore, as will be described later, the block generation unit 113 generates a block so that the character string of the last line of the remaining data and the character string of the first line of the page acquired in the process of S15 are included in a single block. It becomes possible to do.

なお、データ生成部１１２は、Ｓ１８の処理において、残データの最終行に文字が含まれている場合（空白のみを含む空白行でない場合）に限り、残データの最終行とＳ１５の処理で取得したページの先頭行との間の間隔を変更するものであってもよい。また、データ生成部１１２は、Ｓ１８の処理において、Ｓ１５の処理で取得した文書データの先頭行に文字が含まれている場合（空白のみを含む空白行でない場合）に限り、残データの最終行とＳ１５の処理で取得したページの先頭行との間の間隔を変更するものであってもよい。 Note that the data generation unit 112 obtains the last line of the remaining data and the process of S15 only when the last line of the remaining data includes characters in the process of S18 (if it is not a blank line including only blanks). It is also possible to change the interval between the first line of the page. In addition, the data generation unit 112, in the process of S18, only when the first line of the document data acquired in the process of S15 includes characters (if it is not a blank line including only blanks), the last line of the remaining data. And the interval between the first line of the page acquired in the process of S15 may be changed.

これにより、データ生成部１１２は、残データの最終行とＳ１５の処理で取得したページの先頭行との間の間隔の変更を効率的に行うことが可能になる。 Thereby, the data generation unit 112 can efficiently change the interval between the last line of the remaining data and the first line of the page acquired in the process of S15.

一方、Ｓ１６の処理において、前ページの残データが存在しないと判定した場合（Ｓ１６のＮＯ）、データ生成部１１２は、Ｓ１７からＳ１９の処理を行わない。 On the other hand, in the process of S16, when it is determined that there is no remaining data of the previous page (NO in S16), the data generation unit 112 does not perform the process of S17 to S19.

その後、ブロック生成部１１３は、図９に示すように、Ｓ１７の処理で残データを連結した処理対象データにブロックの分割情報が存在するか否かを判定する（Ｓ２１）。そして、Ｓ１７の処理で残データを連結した処理対象データにブロックの分割情報が存在すると判定した場合（Ｓ２１のＹＥＳ）、ブロック生成部１１３は、Ｓ１７の処理で残データを連結した処理対象データを、分割情報が存在すると判定された位置毎に分割し、ブロックを生成する（Ｓ２２）。一方、ブロックの分割情報が存在しないと判定した場合（Ｓ２１のＮＯ）、ブロック生成部１１３は、Ｓ２２の処理を行わない。 After that, as illustrated in FIG. 9, the block generation unit 113 determines whether there is block division information in the processing target data obtained by concatenating the remaining data in the processing of S17 (S21). And when it determines with the division | segmentation information of a block existing in the process target data which connected the remaining data by the process of S17 (YES of S21), the block production | generation part 113 changes the process target data which connected the remaining data by the process of S17. Then, the division is performed for each position where the division information is determined to exist, and a block is generated (S22). On the other hand, when it is determined that there is no block division information (NO in S21), the block generation unit 113 does not perform the process of S22.

具体的に、ブロック生成部１１３は、例えば、文字列の途中から最後までが空白またはインテンドである行が存在する場合、その行と次の行との間にブロックの分割情報が存在すると判定する。また、ブロック生成部１１３は、例えば、文字列に含まれる文字のフォントが次の行の文字列のフォントと異なる行が存在する場合に、その行と次の行との間にブロックの分割情報が存在すると判定する。 Specifically, for example, when there is a line that is blank or intent from the middle to the end of the character string, the block generation unit 113 determines that there is block division information between that line and the next line. . In addition, for example, when there is a line in which the font of the character included in the character string is different from the font of the character string in the next line, the block generation unit 113 performs block division information between that line and the next line. Is determined to exist.

さらに、ブロック生成部１１３は、例えば、Ｓ１７の処理で残データを連結した処理対象データに画像データが含まれている場合、その画像データが含まれる行のうちの先頭行と、その行の前の行との間にブロックの分割情報が存在すると判定する。 Further, for example, when the image data is included in the processing target data obtained by concatenating the remaining data in the process of S17, the block generation unit 113 includes the first line in the line including the image data and the previous line. It is determined that there is block division information between the first row and the second row.

また、ブロック生成部１１３は、例えば、空白のみを含む空白行が存在する場合、その行と次の行との間にブロックの分割情報が存在すると判定する。なお、この場合、ブロック生成部１１３は、例えば、空白行の直前の行と、空白行と、空白行の直後の行とがそれぞれ異なるブロックに含まれるようにブロックの生成を行う。すなわち、ブロック生成部１１１は、１行以上の空白行が存在する場合、その１行以上の空白行のみを含むブロック（以下、空白ブロックとも呼ぶ）を生成する。 For example, when there is a blank line including only a blank, the block generation unit 113 determines that there is block division information between that line and the next line. In this case, for example, the block generation unit 113 generates a block so that a line immediately before the blank line, a blank line, and a line immediately after the blank line are included in different blocks. That is, when there are one or more blank lines, the block generation unit 111 generates a block including only the one or more blank lines (hereinafter also referred to as a blank block).

これにより、ブロック判定部１１４は、後述するように、内容の区切れにおいて分割されたブロック毎に文書データの比較を行うことが可能になる。そのため、ブロック判定部１１４は、Ｓ１１の処理において受け付けた比較元データ１３１及び比較先データ１３２の比較を効率的に行うことが可能になる。 As a result, the block determination unit 114 can compare the document data for each of the blocks divided in the content division, as will be described later. Therefore, the block determination unit 114 can efficiently compare the comparison source data 131 and the comparison destination data 132 received in the process of S11.

その後、ブロック生成部１１３は、Ｓ１７の処理で残データを連結した処理対象データのうち、最後の分割情報よりも後に存在する文書データを新たな残データして特定する（Ｓ２３）。すなわち、Ｓ１７の処理で残データを連結した処理対象データのうち、最後の分割情報よりも後に存在する文書データは、次のページに含まれる文書データと単一のブロックに含まれる文書データである可能性がある。そのため、データ生成部１１２は、Ｓ２３の処理において、Ｓ１７の処理で残データを連結した処理対象データのうち、最後の分割情報よりも後に存在する文書データを新たな残データとして特定する。 After that, the block generation unit 113 specifies document data existing after the last division information as new remaining data from the processing target data obtained by concatenating the remaining data in the process of S17 (S23). That is, among the processing target data obtained by concatenating the remaining data in the process of S17, the document data existing after the last division information is the document data included in the next page and the document data included in a single block. there is a possibility. Therefore, in the process of S23, the data generation unit 112 specifies document data existing after the last division information as new remaining data among the processing target data obtained by concatenating the remaining data in the process of S17.

そして、Ｓ１２の処理で取得した文書データの全ページが取得済でない場合（Ｓ２４のＮＯ）、データ生成部１１２は、Ｓ１５以降の処理を再度行う。一方、Ｓ１２の処理で取得した文書データの全ページが取得済である場合（Ｓ２４のＹＥＳ）、ブロック生成部１１３は、Ｓ２３の処理で残データとして特定された文書データを含むブロックを生成する（Ｓ２５）。 If all the pages of the document data acquired in the process of S12 have not been acquired (NO in S24), the data generation unit 112 performs the processes after S15 again. On the other hand, if all pages of the document data acquired in the process of S12 have been acquired (YES in S24), the block generation unit 113 generates a block including the document data identified as the remaining data in the process of S23 ( S25).

その後、ブロック生成部１１３は、Ｓ１１の処理で受け付けた全文書データが取得済であるか否かの判定を行う（Ｓ２６）。その結果、Ｓ１１の処理で受け付けた全文書データが取得済でない場合（Ｓ２６のＮＯ）、データ生成部１１２は、Ｓ１２以降の処理を再度行う。一方、Ｓ１１の処理で受け付けた全文書データが取得済である場合（Ｓ２６のＹＥＳ）、ブロック生成部１１３は、Ｓ２２及びＳ２５の処理で分割された各ブロックを示す情報を、ブロック管理情報１３３として情報格納領域１３０に記憶する（Ｓ２７）。以下、Ｓ１２からＳ２７の処理の具体例について説明を行う。 Thereafter, the block generation unit 113 determines whether or not all document data received in the process of S11 has been acquired (S26). As a result, when all the document data received in the process of S11 has not been acquired (NO in S26), the data generation unit 112 performs the processes after S12 again. On the other hand, when all the document data received in the process of S11 has been acquired (YES in S26), the block generation unit 113 uses the information indicating each block divided in the processes of S22 and S25 as the block management information 133. The information is stored in the information storage area 130 (S27). Hereinafter, a specific example of the processing from S12 to S27 will be described.

［Ｓ１２からＳ２７の処理の具体例（１）］
図１６から図２７は、Ｓ１２からＳ２７の処理の具体例を説明する図である。データ生成部１１２は、図１６に示すように、ＰＧ１−１及びＰＧ１−２の２ページからなる文書データ（例えば、比較元データ１３１）を取得した場合、切れ目情報を削除した加工データを生成する（Ｓ１２、Ｓ１３のＹＥＳ、Ｓ１４）。具体的に、データ生成部１１２は、例えば、図１７に示すように、切れ目情報を削除し、ＰＧ１−１とＰＧ１−２とを連結した加工データであるＤＴ１−１を生成する。 [Specific Example of Processing from S12 to S27 (1)]
FIGS. 16 to 27 are diagrams illustrating specific examples of the processing from S12 to S27. As illustrated in FIG. 16, when the data generation unit 112 acquires document data (for example, comparison source data 131) including two pages PG 1-1 and PG 1-2, the data generation unit 112 generates processing data from which the break information is deleted. (YES in S12 and S13, S14). Specifically, for example, as illustrated in FIG. 17, the data generation unit 112 deletes the break information and generates DT1-1 that is processing data obtained by connecting PG1-1 and PG1-2.

そして、ブロック生成部１１３は、ＤＴ１−１のうち、ＰＧ１−１に含まれていた文書データを処理対象データとして取得する（Ｓ１５）。なお、ＰＧ１−１に含まれていた文書データは、ＤＴ１−１から最初に取得された文書データである。そのため、この場合、残データは存在していない（Ｓ１６のＮＯ）。 Then, the block generation unit 113 acquires the document data included in PG1-1 out of DT1-1 as processing target data (S15). The document data included in PG1-1 is the document data first acquired from DT1-1. Therefore, in this case, there is no remaining data (NO in S16).

続いて、ブロック生成部１１３は、ＰＧ１−１に含まれていた文書データを分割してブロックを生成する（Ｓ２１のＹＥＳ、Ｓ２２）。具体的に、ブロック生成部１１３は、図１８に示すように、ＰＧ１−１に含まれていた文書データを空白行が存在する位置において分割することにより、各ブロック（ＢＬ１−１、ＢＬ１−２、ＢＬ１−３及びＢＬ１−４）を生成する。なお、図１８に示す例において、ＢＬ１−１とＢＬ１−２との間、ＢＬ１−２とＢＬ１−３との間及びＢＬ１−３とＢＬ１−４との間には、それぞれ空白ブロックが生成される。以下、空白ブロックについての説明は省略する。 Subsequently, the block generation unit 113 generates a block by dividing the document data included in PG1-1 (YES in S21, S22). Specifically, as illustrated in FIG. 18, the block generation unit 113 divides the document data included in PG1-1 at a position where a blank line exists, so that each block (BL1-1, BL1-2) is divided. , BL1-3 and BL1-4). In the example shown in FIG. 18, blank blocks are generated between BL1-1 and BL1-2, between BL1-2 and BL1-3, and between BL1-3 and BL1-4, respectively. The Hereinafter, description of the blank block is omitted.

そして、ブロック生成部１１３は、ＢＬ１−４の後の文書データ（ＰＧ１−１に含まれていた最後の文書データ）である「・・・ＡＡＡ．」を、ＰＧ１−２に含まれる文書データの一部と単一のブロックに含まれる可能性がある文書データであると判定する。そのため、ブロック生成部１１３は、「・・・ＡＡＡ．」を残データとして特定する（Ｓ２３）。 Then, the block generation unit 113 converts “... AAA.”, Which is the document data after BL1-4 (the last document data included in PG1-1), to the document data included in PG1-2. It is determined that the document data may be included in a part and a single block. Therefore, the block generation unit 113 specifies “... AAA.” As the remaining data (S23).

その後、データ生成部１１２は、ＤＴ１−１のうち、ＰＧ１−２に含まれていた文書データを処理対象データとして取得する（Ｓ２４のＮＯ、Ｓ１５）。そして、データ生成部１１２は、残データである「・・・ＡＡＡ．」が存在するため、ＰＧ１−２に含まれていたデータの前に残データを連結した処理対象データを生成する（Ｓ１６のＹＥＳ、Ｓ１７）。 Thereafter, the data generation unit 112 acquires the document data included in PG1-2 in DT1-1 as the processing target data (NO in S24, S15). The data generation unit 112 generates the processing target data in which the remaining data is connected before the data included in PG1-2 because the remaining data “... AAA.” Exists (in S16). YES, S17).

ここで、残データである「・・・ＡＡＡ．」の最後には、終止符が付加されている。すなわち、前ページであるＰＧ１−１の最終行の文字列の最後は、文章の終わりを示している。そのため、データ生成部１１２は、残データの最終行に含まれる文字列とＳ１５の処理で取得したページの先頭行に含まれる文字列とが、それぞれ異なるブロックに含まれる文字列であると判定する。したがって、データ生成部１１２は、図１９に示すように、残データの最終行とＳ１５の処理で取得したページの先頭行との間の間隔の変更を行わない（Ｓ１８のＮＯ、Ｓ１９）。 Here, an end is added to the end of the remaining data “... AAA.”. That is, the end of the character string on the last line of PG1-1 which is the previous page indicates the end of the sentence. Therefore, the data generation unit 112 determines that the character string included in the last line of the remaining data and the character string included in the first line of the page acquired in the process of S15 are character strings included in different blocks. . Accordingly, as shown in FIG. 19, the data generation unit 112 does not change the interval between the last line of the remaining data and the top line of the page acquired in the process of S15 (NO in S18, S19).

その後、ブロック生成部１１３は、ＰＧ１−２に含まれていた文書データ（残データを含む）を分割してブロックを生成する（Ｓ２１のＹＥＳ、Ｓ２２）。具体的に、ブロック生成部１１３は、図２０に示すように、残データ及びＰＧ１−２に含まれていたデータのうち、空白行が存在する位置において分割することにより、ＢＬ１−５、ＢＬ１−６、ＢＬ１−７、ＢＬ１−８及びＢＬ１−９を生成する。 Thereafter, the block generation unit 113 divides the document data (including remaining data) included in PG1-2 to generate a block (YES in S21, S22). Specifically, as illustrated in FIG. 20, the block generation unit 113 divides the remaining data and the data included in PG1-2 at a position where a blank line is present, so that BL1-5, BL1- 6, BL1-7, BL1-8 and BL1-9 are generated.

［Ｓ１２からＳ２７の処理の具体例（２）］
次に、Ｓ１２からＳ２７の処理の別の具体例について説明を行う。図２１に示す例において、ＰＧ１−１の最終行の文字列の最後は、文章の終わりを示していない。そして、ＰＧ１−１の最終行及びＰＧ１−２の先頭行には、それぞれ文字が含まれている。 [Specific Example of Processing from S12 to S27 (2)]
Next, another specific example of the processing from S12 to S27 will be described. In the example shown in FIG. 21, the end of the character string on the last line of PG1-1 does not indicate the end of the sentence. The last line of PG1-1 and the first line of PG1-2 each contain characters.

この場合、データ生成部１１２は、図２２に示すように、ＰＧ１−１の最終行とＰＧ１−２の先頭行との間の間隔を、文章が複数の行からなる場合における各行の間隔と同じ間隔に変更する（Ｓ１８のＮＯ、Ｓ１９）。すなわち、データ生成部１１２は、ＰＧ１−１の最終行とＰＧ１−２の先頭行との間の間隔が、ブロック生成部１１３によってブロックの分割情報が存在すると判定されない間隔に変更する。 In this case, as shown in FIG. 22, the data generation unit 112 has the same interval between the last line of PG1-1 and the first line of PG1-2 as the interval between lines when the sentence is composed of a plurality of lines. The interval is changed (NO in S18, S19). That is, the data generation unit 112 changes the interval between the last row of PG1-1 and the first row of PG1-2 to an interval at which the block generation unit 113 does not determine that block division information exists.

これにより、ブロック生成部１１３は、図２３に示すように、ＰＧ１−１の最終行を含む文書データとＰＧ１−２の先頭行を含む文書データとが単一のブロックであるＢＬ−１０に含まれるように、ブロックの生成を行うことが可能になる（Ｓ２１のＹＥＳ、Ｓ２２）。 As a result, the block generation unit 113 includes document data including the last line of PG1-1 and document data including the first line of PG1-2 in a single block BL-10 as shown in FIG. As described above, the block can be generated (YES in S21, S22).

［Ｓ１２からＳ２７の処理の具体例（３）］
次に、Ｓ１２からＳ２７の処理のさらに別の具体例について説明を行う。図２４に示す例において、ＰＧ１−１の最終行の文字列の最後は、図２１で説明した場合と同様に、文章の終わりを示していない。そして、ＰＧ１−１の最終行及びＰＧ１−２の先頭行には、図２１で説明した場合と同様に、それぞれ文字が含まれている。一方、図２４に示すＰＧ１−１の最終行は、図２１で説明したＰＧ１−１と異なり、途中（文字列「・・・あああ」の直後）から最後までが空白である行である。 [Specific Example of Processing from S12 to S27 (3)]
Next, another specific example of the processing from S12 to S27 will be described. In the example shown in FIG. 24, the end of the character string on the last line of PG1-1 does not indicate the end of the sentence as in the case described with reference to FIG. The last line of PG1-1 and the first line of PG1-2 contain characters, as in the case described with reference to FIG. On the other hand, unlike PG1-1 described in FIG. 21, the last line of PG1-1 shown in FIG. 24 is a line that is blank from the middle (immediately after the character string “...”) To the end.

この場合、データ生成部１１２は、図２２で説明した場合と同様に、ＰＧ１−１の最終行とＰＧ１−２の先頭行との間の間隔を、文章が複数の行からなる場合における各行の間隔と同じ間隔に変更する（Ｓ１８のＮＯ、Ｓ１９）。その後、ブロック生成部１１３は、図２５に示すように、ＰＧ１−１の最終行を含む文書データとＰＧ１−２の先頭行を含む文書データとがそれぞれ異なるブロックになるように、ブロックの生成を行う（Ｓ２１のＹＥＳ、Ｓ２２）。 In this case, as in the case described with reference to FIG. 22, the data generation unit 112 sets the interval between the last line of PG1-1 and the first line of PG1-2 for each line when the sentence is composed of a plurality of lines. The interval is changed to the same interval (NO in S18, S19). Thereafter, as shown in FIG. 25, the block generation unit 113 generates blocks so that the document data including the last line of PG1-1 and the document data including the first line of PG1-2 are different blocks. It performs (YES of S21, S22).

すなわち、途中から最後までが空白である行が存在する場合、ブロック生成部１１３は、存在した空白が文書データの作成者によって意図的に含まれたものであると判定する。そのため、ブロック生成部１１３は、この場合、途中から最後までが空白である行と、その次の行とが異なるブロックに含まれるように、ブロックの生成を行う。 That is, when there is a line that is blank from the middle to the end, the block generation unit 113 determines that the existing blank is intentionally included by the creator of the document data. Therefore, in this case, the block generation unit 113 generates a block so that a line that is blank from the middle to the end is included in different blocks.

また、図２６に示す例において、ＰＧ１−１の最終行は、途中（文字列「・・・ＡＡＡ」の直後）から最後までが空白である行である。そのため、データ生成部１１２は、この場合、図２４で説明した場合と同様に、ＰＧ１−１の最終行とＰＧ１−２の先頭行との間の間隔を、文章が複数の行からなる場合における各行の間隔と同じ間隔に変更する（Ｓ１８のＮＯ、Ｓ１９）。 In the example shown in FIG. 26, the last line of PG1-1 is a line that is blank from the middle (immediately after the character string “... AAA”) to the end. Therefore, in this case, as in the case described with reference to FIG. 24, the data generation unit 112 determines the interval between the last line of PG1-1 and the first line of PG1-2 when the sentence is composed of a plurality of lines. The interval is changed to the same interval as each row (NO in S18, S19).

ここで、図２６に示すＰＧ１−１の最終行は、図２４で説明したＰＧ１−１の最終行と異なり、最後が英単語（半角文字）になっている。そのため、ブロック生成部１１３は、図２７に示すように、図２５で説明した場合と異なり、ＰＧ１−１の最終行を含む文書データとＰＧ１−２の先頭行を含む文書データとが単一のブロックであるＢＬ−１０に含まれるようにブロックの生成を行う。 Here, unlike the last line of PG1-1 described in FIG. 24, the last line of PG1-1 shown in FIG. 26 is an English word (single-byte character). Therefore, as shown in FIG. 27, the block generation unit 113 has a single document data including the last line of PG1-1 and document data including the first line of PG1-2, unlike the case described in FIG. Blocks are generated so as to be included in BL-10 which is a block.

すなわち、残データの最終行の最後が英単語である場合、その英単語の次の英単語は、残データの最終行に入らないために次の行（次のページの先頭行）の先頭に位置している場合がある。そのため、ブロック生成部１１３は、残データの最終行の最後が英単語等の半角文字である場合、残データの最終行の途中から最後までが空白である場合であっても、その空白が文書データの作成者によって意図的に含まれたものではないと判定する。 In other words, if the last row of the remaining data is an English word, the next English word after that English word will not be placed in the last row of the remaining data, so it will be at the beginning of the next row (the first row of the next page). May be located. Therefore, when the last line of the remaining data is a half-width character such as an English word, the block generation unit 113 determines that the blank is a document even if the last line of the remaining data is blank from the middle to the end. It is determined that the data was not intentionally included by the creator of the data.

これにより、ブロック生成部１１３は、Ｓ１１の処理で受け付けた複数の文書データの比較精度をさらに高めることが可能になる。 Thereby, the block generation unit 113 can further improve the comparison accuracy of the plurality of document data received in the process of S11.

［ブロック管理情報の具体例］
次に、ブロック管理情報１３３の具体例について説明を行う。図２８は、ブロック管理情報１３３の具体例を説明する図である。 [Specific example of block management information]
Next, a specific example of the block management information 133 will be described. FIG. 28 is a diagram illustrating a specific example of the block management information 133.

図２８に示すブロック管理情報１３３は、ブロック管理情報１３３に含まれる各情報を識別する「項番」と、Ｓ１２の処理で取得したデータを識別する「データ識別情報」とを項目として有する。また、図２８に示すブロック管理情報１３３は、Ｓ２２の処理で分割したブロックを識別する「ブロック識別情報」と、Ｓ１１の処理で受け付けた文書データにおいて各ブロックが含まれるページ数が設定される「ページ情報」とを項目として有する。さらに、図２８に示すブロック管理情報１３３は、各ページにおける各ブロックの位置情報が設定される「位置情報」を項目として有する。 The block management information 133 illustrated in FIG. 28 includes, as items, “item number” for identifying each piece of information included in the block management information 133 and “data identification information” for identifying the data acquired in the process of S12. Also, the block management information 133 shown in FIG. 28 is set with “block identification information” for identifying the blocks divided in the process of S22 and the number of pages in which each block is included in the document data received in the process of S11. "Page information" as an item. Furthermore, the block management information 133 shown in FIG. 28 includes “position information” in which the position information of each block in each page is set as an item.

具体的に、図２８に示すブロック管理情報１３３において、「項番」が「１」である情報には、「データ識別情報」として「ＤＴ１−１」が設定され、「ブロック識別情報」として「ＢＬ１−１」が設定されている。また、図２８に示すブロック管理情報１３３において、「項番」が「１」である情報には、「ページ情報」として「１（ページ）」が設定され、「位置情報」として「（１，１）」が設定されている。図２８に含まれる他の情報については説明を省略する。 Specifically, in the block management information 133 illustrated in FIG. 28, “DT1-1” is set as “data identification information” and “block identification information” is “1” for information whose “item number” is “1”. BL1-1 "is set. Also, in the block management information 133 shown in FIG. 28, “1 (page)” is set as “page information” and “(1, 1,) is set as“ position information ”for information whose“ item number ”is“ 1 ”. 1) "is set. Description of other information included in FIG. 28 is omitted.

［Ｓ５の処理の詳細］
次に、図７で説明したＳ１からＳ５のうち、Ｓ５の処理の詳細について説明を行う。図１０及び図１１は、Ｓ５の処理の詳細について説明する図である。なお、以下、Ｓ１からＳ４の処理において、比較元データ１３１及び比較先データ１３２についてのブロックの生成が行われたものとして説明を行う。 [Details of processing in S5]
Next, among S1 to S5 described in FIG. 7, the details of the process of S5 will be described. 10 and 11 are diagrams for explaining the details of the processing of S5. In the following description, it is assumed that blocks are generated for the comparison source data 131 and the comparison destination data 132 in the processing from S1 to S4.

ブロック判定部１１４は、図１０に示すように、比較元データ１３１に含まれるブロックを１つ取得する（Ｓ３１）。また、ブロック判定部１１４は、比較先データ１３２に含まれるブロックを１つ取得する（Ｓ３２）。そして、ブロック判定部１１４は、Ｓ３１の処理で取得したブロックに含まれる文書データの内容と、Ｓ３２の処理で取得したブロックに含まれる文書データの内容とが同一であるか否かを判定する（Ｓ３３）。 As shown in FIG. 10, the block determination unit 114 acquires one block included in the comparison source data 131 (S31). In addition, the block determination unit 114 acquires one block included in the comparison target data 132 (S32). Then, the block determination unit 114 determines whether the content of the document data included in the block acquired in the process of S31 is the same as the content of the document data included in the block acquired in the process of S32 ( S33).

その結果、取得した各ブロックに含まれる文書データの内容が同一であると判定した場合（Ｓ３３のＹＥＳ）、ブロック判定部１１４は、Ｓ３４以降の処理を行う。一方、取得した各ブロックに含まれる文書データの内容が同一でないと判定した場合（Ｓ３３のＮＯ）、ブロック判定部１１４は、Ｓ３２以降の処理を再度行う。 As a result, when it is determined that the contents of the document data included in each acquired block are the same (YES in S33), the block determination unit 114 performs the processing from S34 onward. On the other hand, when it is determined that the contents of the document data included in each acquired block are not the same (NO in S33), the block determination unit 114 performs the processes subsequent to S32 again.

その後、Ｓ３３の処理において、取得した各ブロックに含まれる文書データの内容が同一であると判定した場合（Ｓ３３のＹＥＳ）、ブロック判定部１１４は、Ｓ３２の処理で取得したブロックが比較元データ１３１に含まれる最後のブロックであるか否かを判定する（Ｓ３４）。また、ブロック判定部１１４は、この場合、Ｓ３３の処理で取得したブロックが比較先データ１３２に含まれる最後のブロックであるか否かを判定する（Ｓ３５）。そして、Ｓ３４及びＳ３５の処理において、Ｓ３１及びＳ３２の処理で取得したブロックがそれぞれ最後のブロックであると判定された場合(Ｓ３４のＹＥＳ、Ｓ３５のＹＥＳ)、ブロック判定部１１４は、Ｓ３６以降の処理を実行する。 Thereafter, in the process of S33, when it is determined that the contents of the document data included in each acquired block are the same (YES in S33), the block determination unit 114 determines that the block acquired in the process of S32 is the comparison source data 131. It is determined whether or not it is the last block included in (S34). In this case, the block determination unit 114 determines whether or not the block acquired in the process of S33 is the last block included in the comparison target data 132 (S35). And in the process of S34 and S35, when it determines with each block acquired by the process of S31 and S32 being the last block (YES of S34, YES of S35), the block determination part 114 will perform the process after S36. Execute.

一方、Ｓ３４の処理において、Ｓ３２の処理で取得したブロックが最後のブロックでないと判定された場合(Ｓ３４のＮＯ)、ブロック判定部１１４は、Ｓ３２以降の処理を再度実行する。また、Ｓ３５の処理において、Ｓ３１の処理で取得したブロックが最後のブロックでないと判定された場合(Ｓ３５のＮＯ)、ブロック判定部１１４は、Ｓ３１以降の処理を再度実行する。 On the other hand, in the process of S34, when it is determined that the block acquired in the process of S32 is not the last block (NO in S34), the block determination unit 114 executes the processes after S32 again. Further, in the process of S35, when it is determined that the block acquired in the process of S31 is not the last block (NO in S35), the block determination unit 114 executes the processes after S31 again.

すなわち、ブロック判定部１１４は、Ｓ３１からＳ３５の処理において、比較元データ１３１に含まれるブロックと比較先データ１３２に含まれるブロックとの間で、同一の内容の文書データをそれぞれ含むブロックの組み合わせを特定する。 That is, in the processing from S31 to S35, the block determination unit 114 determines combinations of blocks each including document data having the same content between the blocks included in the comparison source data 131 and the blocks included in the comparison destination data 132. Identify.

そして、Ｓ３５の処理において、Ｓ３１の処理で取得したブロックが最後のブロックであると判定された場合（Ｓ３５のＹＥＳ）、ブロック判定部１１４は、比較元データ１３１に含まれるブロックと、比較先データ１３２に含まれるブロックとの間における同一ブロックを特定する（Ｓ３６）。 Then, in the process of S35, when it is determined that the block acquired in the process of S31 is the last block (YES in S35), the block determination unit 114 determines whether the block included in the comparison source data 131 and the comparison destination data The same block with the block included in 132 is specified (S36).

すなわち、例えば、比較元データ１３１に含まれる１つのブロックと同じ内容の文書データを含むブロックが、比較先データ１３２において複数存在する場合がある。そして、この場合、比較先データ１３２に含まれる複数のブロックのうちの１つ以外のブロックは、文書データの改版によって追加されたブロックである。そのため、ブロック判定部１１４は、この場合、比較先データ１３２に含まれる複数のブロックのうち、比較元データ１３１に元々存在していたブロック（同一ブロック）の特定を行う。 That is, for example, there may be a plurality of blocks in the comparison destination data 132 that include document data having the same contents as one block included in the comparison source data 131. In this case, a block other than one of the plurality of blocks included in the comparison target data 132 is a block added by revision of the document data. Therefore, in this case, the block determination unit 114 identifies a block (same block) that originally existed in the comparison source data 131 among a plurality of blocks included in the comparison target data 132.

具体的に、ブロック判定部１１４は、例えば、Ｏ（ＮＤ）アルゴリズムを用いることにより、比較元データ１３１及び比較先データ１３２にそれぞれ含まれる同一ブロックの組み合わせを特定する。この場合、ブロック判定部１１４は、Ｓ３１からＳ３５の処理で特定した同一の内容の文書データが含まれるブロックの組み合わせを入力とし、比較元データ１３１及び比較先データ１３２にそれぞれ含まれる同一ブロックの特定を行う。 Specifically, the block determination unit 114 specifies combinations of the same blocks respectively included in the comparison source data 131 and the comparison destination data 132 by using, for example, an O (ND) algorithm. In this case, the block determination unit 114 receives a combination of blocks including the document data having the same content specified in the processing from S31 to S35, and specifies the same block included in the comparison source data 131 and the comparison destination data 132, respectively. I do.

これにより、ブロック判定部１１４は、比較先データ１３２に含まれるブロックにおいて、比較元データ１３１に元々存在していた同一ブロックと、同一ブロック以外のブロックとを分類することが可能になる。以下、Ｓ３６の処理で特定された同一ブロックについて説明を行う。 Thereby, the block determination unit 114 can classify the same block originally existing in the comparison source data 131 and the block other than the same block in the blocks included in the comparison destination data 132. Hereinafter, the same block specified by the process of S36 is demonstrated.

［Ｓ３６の処理で特定された同一ブロック］
図２９は、Ｓ３６の処理で特定された同一ブロックについて説明する図である。なお、図２９に示す例において、ＤＴ１−１は、Ｓ１からＳ４の処理が行われた後の比較元データ１３１に対応する。また、図２９に示す例において、ＤＴ２−１は、Ｓ１からＳ４の処理が行われた後の比較先データ１３２に対応する。そして、図２９に示すＤＴ１−１には、ＢＬ１−１からＢＬ１−９が含まれる。また、図２９に示すＤＴ２−１には、ＢＬ２−１からＢＬ２−９が含まれる。 [Identical blocks identified in S36]
FIG. 29 is a diagram for describing the same block identified in the process of S36. In the example shown in FIG. 29, DT1-1 corresponds to the comparison source data 131 after the processing from S1 to S4 is performed. In the example shown in FIG. 29, DT2-1 corresponds to the comparison destination data 132 after the processing from S1 to S4 is performed. And DT1-1 shown in FIG. 29 includes BL1-1 to BL1-9. Also, DT2-1 shown in FIG. 29 includes BL2-1 to BL2-9.

具体的に、ブロック判定部１１４は、図２９に示すように、Ｓ３１からＳ３６の処理を行うことにより、例えば、ＢＬ１−１とＢＬ２−１とが同一ブロックであり、ＢＬ１−４とＢＬ２−５とが同一ブロックであると判定する。また、ブロック判定部１１４は、図２９に示すように、例えば、ＢＬ１−５とＢＬ２−６とが同一ブロックであり、ＢＬ１−６とＢＬ２−７とが同一ブロックであると判定する。さらに、ブロック判定部１１４は、図２９に示すように、例えば、ＢＬ１−７とＢＬ２−８とが同一ブロックであり、ＢＬ１−９とＢＬ２−９とが同一ブロックであると判定する。 Specifically, as shown in FIG. 29, the block determination unit 114 performs the processing from S31 to S36, for example, so that BL1-1 and BL2-1 are the same block, and BL1-4 and BL2-5 Are the same block. In addition, as illustrated in FIG. 29, the block determination unit 114 determines that, for example, BL1-5 and BL2-6 are the same block, and BL1-6 and BL2-7 are the same block. Furthermore, as illustrated in FIG. 29, the block determination unit 114 determines that, for example, BL1-7 and BL2-8 are the same block, and BL1-9 and BL2-9 are the same block.

図１１に戻り、ブロック判定部１１４は、Ｓ３６の処理の後、比較元データ１３１及び比較先データ１３２に含まれるブロックのうち、Ｓ３６の処理で同一ブロックと特定されなかったブロックを、類似ブロックと、削除ブロックと、追加ブロックとにそれぞれ分類する（Ｓ４１）。 Returning to FIG. 11, after the process of S 36, the block determination unit 114 determines a block that is not identified as the same block in the process of S 36 among the blocks included in the comparison source data 131 and the comparison target data 132 as a similar block. The block is classified into a deleted block and an additional block (S41).

類似ブロックは、比較対象の文書データ（比較元データ１３１または比較先データ１３２）に、文書データの一致率が閾値以上であるブロックが存在するブロックである。また、削除ブロックは、比較元データ１３１に含まれるブロックのうち、比較先データ１３２に文書データの一致率が閾値以上であるブロックが存在しないブロックである。すなわち、削除ブロックは、文書データの改版によって、比較元データ１３１から削除されたものと判定されるブロック（比較先データ１３２に含まれないブロック）である。さらに、追加ブロックは、比較先データ１３２に含まれるブロックのうち、比較元データ１３１に一致率が閾値以上であるブロックが存在しないブロックである。すなわち、追加ブロックは、文書データの改版によって、比較元データ１３１に追加されたものと判定されるブロック（比較元データ１３１に含まれないブロック）である。以下、Ｓ４１の処理の詳細について説明を行う。 The similar block is a block in which the document data to be compared (comparison source data 131 or comparison destination data 132) has a block whose document data match rate is equal to or greater than a threshold value. The deleted block is a block that does not include a block whose matching rate of document data is equal to or higher than a threshold in the comparison destination data 132 among the blocks included in the comparison source data 131. That is, the deleted block is a block (a block not included in the comparison target data 132) determined to be deleted from the comparison source data 131 due to revision of the document data. Furthermore, the additional block is a block in which no block having a matching rate equal to or higher than the threshold exists in the comparison source data 131 among the blocks included in the comparison destination data 132. In other words, the additional block is a block (a block not included in the comparison source data 131) determined to have been added to the comparison source data 131 due to revision of the document data. Details of the process of S41 will be described below.

［Ｓ４１の処理の詳細］
図１２から図１５は、Ｓ４１の処理の詳細を説明するフローチャート図である。ブロック判定部１１４は、図１２に示すように、比較元データ１３１及び比較先データ１３２のそれぞれにおいて、Ｓ３６の処理で特定した同一ブロックの間にそれぞれ存在するブロックをブロック群として特定する（Ｓ５１）。 [Details of processing in S41]
12 to 15 are flowcharts for explaining details of the process of S41. As shown in FIG. 12, the block determination unit 114 specifies blocks existing between the same blocks specified in the process of S36 in each of the comparison source data 131 and the comparison destination data 132 as a block group (S51). .

具体的に、図２９に示す例において、ＤＴ１−１に含まれる同一ブロック（連続する同一ブロック）であるＢＬ１−１とＢＬ１−４との間には、ＢＬ１−２とＢＬ１−３とが存在する。そのため、ブロック判定部１１４は、ＢＬ１−２とＢＬ１−３とを含むブロック群（以下、ＢＧ１−１とも呼ぶ）を特定する。また、ＤＴ１−１に含まれる同一ブロックであるＢＬ１−７とＢＬ１−９との間には、ＢＬ１−８（以下、ＢＧ１−２とも呼ぶ）が存在する。そのため、ブロック判定部１１４は、ＢＬ１−８を含むブロック群を特定する。 Specifically, in the example shown in FIG. 29, BL1-2 and BL1-3 exist between BL1-1 and BL1-4, which are the same blocks (same consecutive blocks) included in DT1-1. To do. Therefore, the block determination unit 114 identifies a block group including BL1-2 and BL1-3 (hereinafter also referred to as BG1-1). In addition, BL1-8 (hereinafter also referred to as BG1-2) exists between BL1-7 and BL1-9, which are the same blocks included in DT1-1. Therefore, the block determination unit 114 specifies a block group including BL1-8.

さらに、図２９に示す例において、ＤＴ２−１に含まれる同一ブロックであるＢＬ２−１とＢＬ２−５との間には、ＢＬ２−２とＢＬ２−３とＢＬ２−４とが存在する。そのため、ブロック判定部１１４は、ＢＬ２−２とＢＬ２−３とＢＬ２−４とを含むブロック群（以下、ＢＧ２−１とも呼ぶ）を特定する。 Furthermore, in the example shown in FIG. 29, BL2-2, BL2-3, and BL2-4 exist between BL2-1 and BL2-5, which are the same blocks included in DT2-1. Therefore, the block determination unit 114 specifies a block group (hereinafter also referred to as BG2-1) including BL2-2, BL2-3, and BL2-4.

すなわち、ブロック判定部１１４は、Ｓ３６の処理において、ＢＬ１−１とＢＬ２−１とが同一ブロックであり、ＢＬ１−４とＢＬ２−５とが同一ブロックであると判定している。そのため、ブロック判定部１１４は、ＢＬ１−１とＢＬ１−４との間に存在するＢＬ１−２及びＢＬ１−３と、ＢＬ２−１とＢＬ２−５との間に存在するＢＬ２−２、ＢＬ２−３及びＢＬ２−４とが、同一ブロックでないと判定することが可能である。また、ブロック判定部１１４は、この場合において、ＢＬ１−２及びＢＬ１−３の類似ブロックがＤＴ２−１に存在する場合、ＢＬ２−２、ＢＬ２−３及びＢＬ２−４のうちのいずれかであると判定することが可能になる。そのため、ブロック判定部１１４は、Ｓ５１の処理において、Ｓ３６の処理で特定した同一ブロックの間にそれぞれ存在するブロックをブロック群として特定する。 That is, the block determination unit 114 determines that BL1-1 and BL2-1 are the same block and BL1-4 and BL2-5 are the same block in the process of S36. Therefore, the block determination unit 114 includes BL1-2 and BL1-3 that exist between BL1-1 and BL1-4, and BL2-2 and BL2-3 that exist between BL2-1 and BL2-5. And BL2-4 can be determined not to be the same block. Further, in this case, when the similar block of BL1-2 and BL1-3 exists in DT2-1, the block determination unit 114 is any one of BL2-2, BL2-3, and BL2-4. It becomes possible to judge. Therefore, the block determination unit 114 specifies, as a block group, blocks existing between the same blocks specified in the process of S36 in the process of S51.

なお、ＢＬ１−４とＢＬ１−５との間等にはブロックが存在しないが、ブロック判定部１１４は、ＢＬ１−４とＢＬ１−５との間にブロックを含まないブロック群が存在するものとして特定するものであってもよい。 In addition, although there is no block between BL1-4 and BL1-5, etc., the block determination unit 114 specifies that there is a block group that does not include a block between BL1-4 and BL1-5. You may do.

その後、ブロック判定部１１４は、比較元データ１３１に対応するブロック群及び比較先データ１３２に対応するブロック群のそれぞれから、共通する同一ブロックの間に存在するブロックを含むブロック群（以下、対応するブロック群とも呼ぶ）を１つずつ取得する（Ｓ５２）。具体的に、ブロック判定部１１４は、例えば、ＢＧ１−１とＢＧ２−１とを取得する。すなわち、ブロック判定部１１４は、Ｓ５２の処理において、比較元データ１３１に対応するブロック群及び比較先データ１３２に対応するブロック群のそれぞれから、類似ブロックが含まれている可能性があるブロック群をそれぞれ取得する。 Thereafter, the block determination unit 114 includes a block group including blocks existing between the same common blocks from each of the block group corresponding to the comparison source data 131 and the block group corresponding to the comparison destination data 132 (hereinafter, corresponding to the block group). (Also referred to as a block group) are acquired one by one (S52). Specifically, the block determination unit 114 acquires, for example, BG1-1 and BG2-1. That is, in the process of S52, the block determination unit 114 selects a block group that may contain similar blocks from each of the block group corresponding to the comparison source data 131 and the block group corresponding to the comparison destination data 132. Get each.

そして、ブロック判定部１１４は、Ｓ５２の処理で取得した比較元データ１３１のブロック群に含まれるブロックを１つ取得する（Ｓ５３）。また、ブロック判定部１１４は、Ｓ５２の処理で取得した比較先データ１３２のブロック群に含まれるブロックを１つ取得する（Ｓ５４）。具体的に、ブロック判定部１１４は、例えば、ＢＧ１−１に含まれるＢＬ１−２と、ＢＧ２−１に含まれるＢＬ２−２とをそれぞれ取得する。 Then, the block determination unit 114 acquires one block included in the block group of the comparison source data 131 acquired in the process of S52 (S53). Further, the block determination unit 114 acquires one block included in the block group of the comparison target data 132 acquired in the process of S52 (S54). Specifically, the block determination unit 114 acquires, for example, BL1-2 included in BG1-1 and BL2-2 included in BG2-1.

続いて、ブロック判定部１１４は、Ｓ５３の処理で取得したブロックに含まれる文書データと、Ｓ５４の処理で取得したブロックに含まれる文書データとの一致率を算出する（Ｓ５５）。その後、Ｓ５４の処理で取得したブロックが、Ｓ５２の処理で取得したブロック群に含まれる最後のブロックでない場合（Ｓ５６のＮＯ）、ブロック判定部１１４は、Ｓ５４以降の処理を再度行う。一方、Ｓ５４の処理で取得したブロックが、Ｓ５２の処理で取得したブロック群に含まれる最後のブロックである場合（Ｓ５６のＹＥＳ）、Ｓ６１以降の処理を行う。 Subsequently, the block determination unit 114 calculates a matching rate between the document data included in the block acquired in the process of S53 and the document data included in the block acquired in the process of S54 (S55). Thereafter, when the block acquired in the process of S54 is not the last block included in the block group acquired in the process of S52 (NO in S56), the block determination unit 114 performs the processes after S54 again. On the other hand, when the block acquired in the process of S54 is the last block included in the block group acquired in the process of S52 (YES in S56), the processes after S61 are performed.

すなわち、ブロック判定部１１４は、Ｓ５３の処理で取得したブロックの類似ブロックが存在するか否かを判定するために、Ｓ５３の処理で取得したブロックとＳ５４の処理で取得したブロックとの一致度をそれぞれ算出する。具体的に、Ｓ５３の処理で取得されたブロックがＢＬ１−２である場合、ブロック判定部１１４は、ＢＧ２−１に含まれるブロックであるＢＬ２−２、ＢＬ２−３及びＢＬ２−４との一致度をそれぞれ算出する。以下、ＢＬ１−２と、ＢＬ２−２、ＢＬ２−３及びＢＬ２−４との一致度が、それぞれ１０（％）、０（％）及び２０（％）であるものとして説明を行う。 That is, the block determination unit 114 determines the degree of coincidence between the block acquired in the process of S53 and the block acquired in the process of S54 in order to determine whether there is a similar block of the block acquired in the process of S53. Calculate each. Specifically, when the block acquired in the process of S53 is BL1-2, the block determination unit 114 determines the degree of coincidence with BL2-2, BL2-3, and BL2-4 that are blocks included in BG2-1. Are calculated respectively. In the following description, it is assumed that the degree of coincidence between BL1-2 and BL2-2, BL2-3, and BL2-4 is 10 (%), 0 (%), and 20 (%), respectively.

そして、Ｓ５６の処理において、Ｓ５２の処理で取得したブロック群に含まれる最後のブロックである場合（Ｓ５６のＹＥＳ）、ブロック判定部１１４は、図１３に示すように、一致率が閾値以上のブロックが存在するか否かを判定する（Ｓ６１）。その結果、一致率が閾値以上のブロックが存在すると判定した場合（Ｓ６１のＹＥＳ）、ブロック判定部１１４は、Ｓ６１の処理で存在したブロックのうち、最も一致率が高いブロックを類似ブロックとして特定する（Ｓ６２）。 In the process of S56, when it is the last block included in the block group acquired in the process of S52 (YES in S56), the block determination unit 114, as shown in FIG. Is determined (S61). As a result, when it is determined that there is a block having a matching rate equal to or higher than the threshold (YES in S61), the block determining unit 114 identifies a block having the highest matching rate as a similar block among the blocks existing in the processing of S61. (S62).

一方、一致率が閾値以上のブロックが存在しないと判定した場合において（Ｓ６１のＮＯ）、Ｓ５３の処理で取得したブロックが空白ブロックでないと判定した場合（Ｓ６３のＹＥＳ）、ブロック判定部１１４は、Ｓ５３の処理で取得したブロックを削除ブロックとして特定する（Ｓ６４）。すなわち、この場合、Ｓ５３の処理で取得されたブロックは、同一ブロック及び類似ブロック以外のブロックである。そのため、ブロック判定部１１４は、Ｓ５３の処理で取得されたブロックが削除ブロックであると判定する。 On the other hand, when it is determined that there is no block having a matching rate equal to or higher than the threshold (NO in S61), when it is determined that the block acquired in the process of S53 is not a blank block (YES in S63), the block determination unit 114 The block acquired by the process of S53 is specified as a deletion block (S64). That is, in this case, the blocks acquired in the process of S53 are blocks other than the same block and similar blocks. Therefore, the block determination unit 114 determines that the block acquired in S53 is a deleted block.

具体的に、Ｓ６１の処理における閾値が７０（％）である場合、ＢＬ１−２と、ＢＬ２−２、ＢＬ２−３及びＢＬ２−４との一致度である１０（％）、０（％）及び２０（％）は、全て閾値以下である。そのため、ブロック判定部１１４は、この場合、Ｓ５３の処理で取得されたブロックであるＢＬ１−２が削除ブロックであると特定する。 Specifically, when the threshold in the process of S61 is 70 (%), the degree of coincidence between BL1-2 and BL2-2, BL2-3 and BL2-4 is 10 (%), 0 (%) and 20 (%) is all below the threshold. Therefore, in this case, the block determination unit 114 specifies that BL1-2, which is the block acquired in the process of S53, is a deletion block.

なお、Ｓ５３の処理で取得したブロックが空白ブロックであると判定した場合（Ｓ６３のＮＯ）、ブロック判定部１１４は、Ｓ６４の処理を行わない。 When it is determined that the block acquired in the process of S53 is a blank block (NO in S63), the block determination unit 114 does not perform the process of S64.

そして、ブロック判定部１１４は、Ｓ６２の処理の後、図１４に示すように、比較先データ１３２において、Ｓ６２の処理で特定した類似ブロックの前にブロックが存在するか否かを判定する（Ｓ７１）。そして、類似ブロックの前にブロックが存在すると判定した場合において（Ｓ７１のＹＥＳ）、存在すると判定したブロックが空白ブロックでないと判定した場合（Ｓ７４のＹＥＳ）、ブロック判定部１１４は、Ｓ６２の処理で特定した類似ブロックの前に存在するブロックを追加ブロックとして特定する（Ｓ７５）。すなわち、この場合、Ｓ６２の処理で特定した類似ブロックの前に存在するブロックは、同一ブロック及び類似ブロック以外のブロックである。そのため、ブロック判定部１１４は、Ｓ６２の処理で特定した類似ブロックの前に存在するブロックが追加ブロックであると判定する。なお、Ｓ７１の処理で存在すると判定したブロックが空白ブロックであると判定した場合（Ｓ７４のＮＯ）、ブロック判定部１１４は、Ｓ７５の処理を行わない。 And after the process of S62, as shown in FIG. 14, the block determination part 114 determines whether the block exists before the similar block specified by the process of S62 in the comparison destination data 132 (S71). ). If it is determined that there is a block before the similar block (YES in S71), if it is determined that the block determined to exist is not a blank block (YES in S74), the block determination unit 114 performs the process in S62. A block existing before the identified similar block is identified as an additional block (S75). That is, in this case, the blocks existing before the similar block identified in the process of S62 are blocks other than the same block and the similar block. Therefore, the block determination unit 114 determines that the block existing before the similar block specified in the process of S62 is an additional block. In addition, when it determines with the block determined to exist by the process of S71 being a blank block (NO of S74), the block determination part 114 does not perform the process of S75.

その後、Ｓ６２の処理で特定した類似ブロックの前にブロックが存在しないと判定した場合（Ｓ７１のＮＯ）、ブロック判定部１１４は、Ｓ５３の処理で取得したブロックがＳ５２で取得した比較元データ１３１のブロック群に含まれる最後のブロックであるか否かを判定する（Ｓ７２）。そして、Ｓ５３の処理で取得したブロックが最後のブロックでない場合（Ｓ７２のＮＯ）、ブロック判定部１１４は、Ｓ５３以降の処理を再度行う。 After that, when it is determined that there is no block before the similar block identified in the process of S62 (NO in S71), the block determination unit 114 determines that the block acquired in the process of S53 is the comparison source data 131 acquired in S52. It is determined whether or not it is the last block included in the block group (S72). And when the block acquired by the process of S53 is not the last block (NO of S72), the block determination part 114 performs the process after S53 again.

具体的に、図２９に示す例において、ＢＧ１−１には、ＢＬ１−２及びＢＬ１−３が含まれている。そのため、Ｓ５２の処理においてＢＧ１−１を取得し、Ｓ５３の処理でＢＬ１−２を取得した場合、ブロック判定部１１４は、ＢＧ１−１に含まれる全てのブロックの取得を行っていないと判定する（Ｓ７２のＮＯ）。そのため、ブロック判定部１１４は、この場合、ＢＬ１−３の取得を行う（Ｓ５３）。 Specifically, in the example shown in FIG. 29, BG1-1 includes BL1-2 and BL1-3. Therefore, when BG1-1 is acquired in the process of S52 and BL1-2 is acquired in the process of S53, the block determination unit 114 determines that all blocks included in BG1-1 have not been acquired ( NO in S72). Therefore, in this case, the block determination unit 114 acquires BL1-3 (S53).

その後、ブロック判定部１１４は、ＢＬ１−３と、ＢＬ２−２、ＢＬ２−３及びＢＬ２−４との一致度をそれぞれ算出する（Ｓ５４、Ｓ５５、Ｓ５６）。以下、ＢＬ１−２と、ＢＬ２−２、ＢＬ２−３及びＢＬ２−４との一致度が、それぞれ７５（％）、８０（％）及び２０（％）であるものとして説明を行う。 Thereafter, the block determination unit 114 calculates the degree of coincidence between BL1-3 and BL2-2, BL2-3, and BL2-4 (S54, S55, and S56). In the following description, it is assumed that the degree of coincidence between BL1-2 and BL2-2, BL2-3, and BL2-4 is 75 (%), 80 (%), and 20 (%), respectively.

そして、Ｓ６１の処理における閾値が７０（％）である場合、ブロック判定部１１４は、閾値以上のブロックとして、一致度が７５（％）であるＢＬ２−２と、一致度が８０（％）であるＢＬ２−３とを一致度が閾値以上であるブロックとして特定する（Ｓ６１のＹＥＳ）。さらに、ブロック判定部１１４は、一致度が最も高いブロックであるＢＬ２−３を、ＢＬ１−３の類似ブロックとして特定する（Ｓ６２）。その後、ブロック判定部１１４は、ＢＬ２−３の前のブロックであるＢＬ２−２を追加ブロックとして特定する（Ｓ７１のＹＥＳ、Ｓ７４のＹＥＳ、Ｓ７５）。 And when the threshold value in the process of S61 is 70 (%), the block determination part 114 is BL2-2 with a matching degree of 75 (%) as a block more than a threshold value, and a matching degree is 80 (%). A certain BL2-3 is identified as a block whose degree of coincidence is greater than or equal to the threshold (YES in S61). Furthermore, the block determination unit 114 identifies BL2-3, which is the block having the highest degree of coincidence, as a similar block to BL1-3 (S62). Thereafter, the block determination unit 114 identifies BL2-2, which is a block before BL2-3, as an additional block (YES in S71, YES in S74, S75).

なお、ブロック判定部１１４は、Ｓ６４及びＳ７５の処理の後においても同様に、Ｓ７２の処理を行う。 In addition, the block determination part 114 performs the process of S72 similarly after the process of S64 and S75.

その後、Ｓ７２の処理において、Ｓ５３の処理で取得したブロックが最後のブロックである場合（Ｓ７２のＹＥＳ）、ブロック判定部１１４は、比較先データ１３２において、Ｓ６２の処理で特定した類似ブロックの後にブロックが存在するか否かを判定する（Ｓ７３）。そして、類似ブロックの後にブロックが存在すると判定した場合において（Ｓ７３のＹＥＳ）、存在すると判定したブロックが空白ブロックでないと判定した場合（Ｓ７６のＹＥＳ）、ブロック判定部１１４は、Ｓ６２の処理で特定した類似ブロックの後に存在するブロックを追加ブロックとして特定する（Ｓ７７）。すなわち、この場合、Ｓ６２の処理で特定した類似ブロックの後に存在するブロックは、同一ブロック及び類似ブロック以外のブロックである。そのため、ブロック判定部１１４は、Ｓ６２の処理で特定した類似ブロックの後に存在するブロックが追加ブロックであると判定する。 Thereafter, in the process of S72, when the block acquired in the process of S53 is the last block (YES in S72), the block determination unit 114 blocks the block after the similar block specified in the process of S62 in the comparison destination data 132. Is determined (S73). When it is determined that there is a block after the similar block (YES in S73), when it is determined that the block determined to be present is not a blank block (YES in S76), the block determination unit 114 is identified in the process of S62. A block existing after the similar block is identified as an additional block (S77). That is, in this case, the blocks existing after the similar block specified in the process of S62 are blocks other than the same block and the similar block. Therefore, the block determination unit 114 determines that the block existing after the similar block specified in the process of S62 is an additional block.

具体的に、ブロック判定部１１４は、ＢＬ２−３の後のブロックであるＢＬ２−４を追加ブロックとして特定する（Ｓ７３のＹＥＳ、Ｓ７６のＹＥＳ、Ｓ７７）。 Specifically, the block determination unit 114 specifies BL2-4, which is a block after BL2-3, as an additional block (YES in S73, YES in S76, S77).

なお、Ｓ７３の処理で存在すると判定したブロックが空白ブロックであると判定した場合（Ｓ７６のＮＯ）、ブロック判定部１１４は、Ｓ７７の処理を行わない。 In addition, when it determines with the block determined to exist by the process of S73 being a blank block (NO of S76), the block determination part 114 does not perform the process of S77.

そして、Ｓ６２の処理で特定した類似ブロックの後にブロックが存在しないと判定した場合（Ｓ７３のＮＯ）、ブロック判定部１１４は、図１５に示すように、Ｓ５２の処理で取得したブロック群のいずれかが最後のブロック群であるか否かを判定する（Ｓ８１）。その結果、Ｓ５２の処理で取得したブロック群のいずれかが最後のブロック群ではないと判定した場合（Ｓ８１のＮＯ）、ブロック判定部１１４は、Ｓ５２以降の処理を再度行う。 If it is determined that there is no block after the similar block specified in the process of S62 (NO in S73), the block determination unit 114, as shown in FIG. 15, selects any of the block groups acquired in the process of S52. Is the last block group (S81). As a result, when it is determined that any of the block groups acquired in the process of S52 is not the last block group (NO in S81), the block determination unit 114 performs the processes after S52 again.

一方、Ｓ５２の処理で取得したブロック群のいずれかが最後のブロック群であると判定した場合（Ｓ８１のＹＥＳ）、ブロック判定部１１４は、比較先データ１３２のブロック群のうち、Ｓ５２の処理において取得されていないブロック群に含まれるブロックを追加ブロックとして特定する。また、この場合、ブロック判定部１１４は、比較元データ１３１のブロック群のうち、Ｓ５２の処理において取得されていないブロック群に含まれるブロックを削除ブロックとして特定する（Ｓ８２）。 On the other hand, when it is determined that any of the block groups acquired in the process of S52 is the last block group (YES in S81), the block determination unit 114 performs the process of S52 among the block groups of the comparison target data 132. A block included in a block group that has not been acquired is identified as an additional block. In this case, the block determination unit 114 identifies a block included in the block group not acquired in the process of S52 among the block groups of the comparison source data 131 as a deleted block (S82).

具体的に、Ｓ５２の処理において、ブロック判定部１１４は、ＢＧ１−１とＢＧ２−１とを取得している。そのため、ブロック判定部１１４は、例えば、ＢＬ１−４とＢＬ１−５との間のブロック群と、ＢＬ２−５とＢＬ２−６との間のブロック群とを取得する（Ｓ８１のＮＯ、Ｓ５２）。ここで、ＢＬ１−４とＢＬ１−５との間のブロック群及びＢＬ２−５とＢＬ２−６との間のブロック群には、それぞれブロックが含まれていない。そのため、ＢＬ１−４とＢＬ１−５との間のブロック群及びＢＬ２−５とＢＬ２−６との間のブロック群について、ブロックの分類は行われない。ＢＬ１−５とＢＬ１−６との間のブロック群及びＢＬ２−６とＢＬ２−７との間のブロック群等についても同様である。 Specifically, in the process of S52, the block determination unit 114 acquires BG1-1 and BG2-1. Therefore, the block determination unit 114 acquires, for example, a block group between BL1-4 and BL1-5 and a block group between BL2-5 and BL2-6 (NO in S81, S52). Here, the block group between BL1-4 and BL1-5 and the block group between BL2-5 and BL2-6 do not include blocks, respectively. Therefore, the block classification is not performed for the block group between BL1-4 and BL1-5 and the block group between BL2-5 and BL2-6. The same applies to the block group between BL1-5 and BL1-6, the block group between BL2-6 and BL2-7, and the like.

その後、ブロック判定部１１４は、ＢＧ１−２と、ＢＬ２−８とＢＬ２−９との間のブロック群とを取得する（Ｓ８１のＮＯ、Ｓ５２）。ここで、ＢＬ２−８とＢＬ２−９との間のブロック群には、ブロックが含まれていない。そのため、ブロック判定部１１４は、ＢＧ１−２に含まれるＢＬ１−８を削除ブロックとして特定する（Ｓ６１のＮＯ、Ｓ６３のＹＥＳ、Ｓ６４）。 Thereafter, the block determination unit 114 acquires BG1-2 and a block group between BL2-8 and BL2-9 (NO in S81, S52). Here, the block group between BL2-8 and BL2-9 does not include a block. Therefore, the block determination unit 114 identifies BL1-8 included in BG1-2 as a deleted block (NO in S61, YES in S63, S64).

そして、Ｓ８２の処理の後、ブロック判定部１１４は、Ｓ５の処理を終了する。なお、ブロック判定部１１４は、Ｓ７７の処理の後においても同様に、Ｓ８１以降の処理を行う。 And after the process of S82, the block determination part 114 complete | finishes the process of S5. In addition, the block determination part 114 performs the process after S81 similarly after the process of S77.

図１１に戻り、ブロック判定部１１４は、Ｓ４１の処理で分類した比較元データ１３１及び比較先データ１３２に含まれる類似ブロックに含まれる文書データ（類似ブロックに含まれる画像データ以外の文書データ）を、同一文書データと、削除文書データと、追加文書データとに分類する（Ｓ４２）。同一文書データは、文書データの改版によって変更がなかった文書データである。また、削除文書データは、比較元データ１３１の類似ブロックに含まれる文書データのうち、比較先データ１３２の類似ブロックに含まれない文書データである。さらに、追加文書データは、比較先データ１３２の類似ブロックに含まれる文書データのうち、比較元データ１３１の類似ブロックに含まれない文書データである。 Returning to FIG. 11, the block determination unit 114 receives the document data included in the similar blocks included in the comparison source data 131 and the comparison target data 132 classified in the process of S 41 (document data other than the image data included in the similar blocks). Then, it is classified into the same document data, deleted document data, and additional document data (S42). The same document data is document data that has not been changed by revision of the document data. The deleted document data is document data that is not included in the similar block of the comparison destination data 132 among the document data included in the similar block of the comparison source data 131. Further, the additional document data is document data that is not included in the similar block of the comparison source data 131 among the document data included in the similar block of the comparison destination data 132.

具体的に、ブロック判定部１１４は、例えば、Ｏ（ＮＤ）アルゴリズムを用いることにより、比較元データ１３１及び比較先データ１３２にそれぞれ含まれる類似ブロックに含まれる文書データを、同一文書データと、削除文書データと、追加文書データとに分類する。ブロック判定部１１４は、この場合、例えば、Ｓ４１の処理で特定した類似ブロックに含まれる文書データを入力とし、同一文書データと、削除文書データと、追加文書データとの分類を行う。 Specifically, the block determination unit 114 deletes the document data included in the similar blocks respectively included in the comparison source data 131 and the comparison target data 132 from the same document data by using, for example, an O (ND) algorithm. The data is classified into document data and additional document data. In this case, for example, the block determination unit 114 receives the document data included in the similar block identified in the process of S41, and classifies the same document data, the deleted document data, and the additional document data.

その後、結果出力部１１５は、Ｓ４１の処理で分類した類似ブロックと、削除ブロックと、追加ブロックと、Ｓ４２で分類した削除文書データと、追加文書データとを示す情報を出力する（Ｓ４３）。以下、データ比較処理の結果を出力した際の具体例について説明を行う。 Thereafter, the result output unit 115 outputs information indicating the similar blocks, deleted blocks, added blocks, deleted document data classified in S42, and added document data (S43). Hereinafter, a specific example when the result of the data comparison process is output will be described.

［データ比較処理の結果を出力した際の具体例］
図３０は、データ比較処理の結果を出力した際の具体例を説明する図である。結果出力部１１６は、図３０に示すように、例えば、「削除ブロック」が「ＢＬ１−２、ＢＬ１−８」であり、「追加ブロック」が「ＢＬ２−２、ＢＬ２−４」であり、「類似ブロック」が「ＢＬ１−３」と「ＢＬ２−３」とであることを示す情報を「ブロック比較結果」として出力する。 [Specific example of outputting the result of data comparison processing]
FIG. 30 is a diagram illustrating a specific example when the result of the data comparison process is output. As shown in FIG. 30, the result output unit 116, for example, has “deleted block” as “BL1-2, BL1-8”, “additional block” as “BL2-2, BL2-4”, “ Information indicating that “similar blocks” are “BL1-3” and “BL2-3” is output as a “block comparison result”.

そして、結果出力部１１６は、図３０に示すように、例えば、「類似ブロック」として出力された「ＢＬ１−３」と「ＢＬ２−３」とにおいて、「削除文書データ」が２行目のＣＣＣであることを示す「ＣＣＣ（２行目）」であることを示す情報を「文書データ比較結果」として出力する。また、結果出力部１１６は、例えば、「類似ブロック」として出力された「ＢＬ１−３」と「ＢＬ２−３」とにおいて、「追加文書データ」が２行目のＤＤＤであることを示す「ＤＤＤ（２行目）」であることを示す情報を「文書データ比較結果」として出力する。 Then, as shown in FIG. 30, for example, in “BL1-3” and “BL2-3” output as “similar blocks”, the result output unit 116 sets “deleted document data” as the CCC in the second row. The information indicating “CCC (second line)” indicating that is “document data comparison result” is output. In addition, the result output unit 116 indicates, for example, “DDD” indicating that “additional document data” is DDD in the second row in “BL1-3” and “BL2-3” output as “similar blocks”. (Second line) ”is output as“ document data comparison result ”.

これにより、結果出力部１１６は、精度が高い比較元データ１３１と比較先データ１３２との比較結果を出力することが可能になる。そのため、利用者は、比較元データ１３１と比較先データとの差分を精度高く把握することが可能になる。 As a result, the result output unit 116 can output a comparison result between the comparison source data 131 and the comparison destination data 132 with high accuracy. Therefore, the user can grasp the difference between the comparison source data 131 and the comparison destination data with high accuracy.

以上の実施の形態をまとめると、以下の付記のとおりである。 The above embodiment is summarized as follows.

（付記１）
複数行の文字列をそれぞれ含む複数のデータを受け付け、
受け付けた前記複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、前記何らかのデータから前記切れ目情報を削除したデータを生成し、
生成した前記データを分割して複数のブロックを生成し、
生成した前記ブロックそれぞれが、受け付けた前記複数のデータのうちの他のデータに含まれるか否かを判定する、
処理をコンピュータに実行させることを特徴とするデータ比較プログラム。 (Appendix 1)
Accept multiple data, each containing multiple lines of text,
If any of the received data includes break information indicating a break in the page, generate data in which the break information is deleted from the some data,
Dividing the generated data to generate a plurality of blocks,
Determining whether each of the generated blocks is included in other data of the received plurality of data;
A data comparison program for causing a computer to execute processing.

（付記２）
付記１において、
前記データを生成する処理では、前記切れ目情報の直前ページの最終行の文字列の最後が文章の終わりを示していない場合、前記直前ページの最終行と、前記切れ目情報の直後ページの先頭行との間隔を、前記文章が複数の行からなる場合における各行の間隔と同じ間隔に変更する、
ことを特徴とするデータ比較プログラム。 (Appendix 2)
In Appendix 1,
In the process of generating the data, if the end of the character string of the last line of the page immediately before the break information does not indicate the end of the sentence, the last line of the previous page, the first line of the page immediately after the break information, The interval is changed to the same interval as each line when the sentence is composed of a plurality of lines.
A data comparison program characterized by that.

（付記３）
付記２において、
前記データを生成する処理では、前記切れ目情報の直前ページの最終行の文字列の最後が句点または終止符である場合に、前記最終行の文字列の最後が文章の終わりを示していると判定する、
ことを特徴とするデータ比較プログラム。 (Appendix 3)
In Appendix 2,
In the process of generating the data, when the end of the character string on the last line of the page immediately before the break information is a punctuation mark or a stop, it is determined that the end of the character string on the last line indicates the end of a sentence. ,
A data comparison program characterized by that.

（付記４）
付記２において、
前記データを生成する処理では、前記直前ページの最終行に文字が含まれている場合に、前記間隔を変更する処理を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 4)
In Appendix 2,
In the process of generating the data, when a character is included in the last line of the previous page, a process of changing the interval is performed.
A data comparison program characterized by that.

（付記５）
付記２において、
前記データを生成する処理では、前記直後ページの先頭行に文字が含まれている場合に、前記間隔を変更する処理を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 5)
In Appendix 2,
In the process of generating the data, when a character is included in the first line of the immediately following page, a process of changing the interval is performed.
A data comparison program characterized by that.

（付記６）
付記１において、
前記複数のブロックを生成する処理では、生成した前記データに空白のみを含む空白行が存在する場合、前記空白行の直前の行と、前記空白行と、前記空白行の直後の行とがそれぞれ異なるブロックに含まれるように前記複数のブロックの生成を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 6)
In Appendix 1,
In the process of generating the plurality of blocks, if there is a blank line that includes only blanks in the generated data, the line immediately before the blank line, the blank line, and the line immediately after the blank line are respectively Generating the plurality of blocks to be included in different blocks;
A data comparison program characterized by that.

（付記７）
付記１において、
前記複数のブロックを生成する処理では、途中から最後までが空白またはインテンドである特定の行が存在する場合、前記特定の行と、前記特定の行の次の行とがそれぞれ異なるブロックに含まれるように前記複数のブロックの生成を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 7)
In Appendix 1,
In the process of generating the plurality of blocks, when there is a specific line that is blank or intend from the middle to the end, the specific line and the line next to the specific line are included in different blocks. The plurality of blocks are generated as follows:
A data comparison program characterized by that.

（付記８）
付記１において、
前記複数のブロックを生成する処理では、文字列の最後が文章の終わりを示していない特定の行が存在する場合であって、前記特定の行の次の行の文字列の先頭が半角文字である場合、前記特定の行と、前記特定の行の次の行とが同じブロックに含まれるように前記複数のブロックの生成を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 8)
In Appendix 1,
In the process of generating the plurality of blocks, there is a specific line in which the end of the character string does not indicate the end of the sentence, and the first character string of the next line of the specific line is a single-byte character. In some cases, the plurality of blocks are generated so that the specific row and a row next to the specific row are included in the same block.
A data comparison program characterized by that.

（付記９）
付記１において、
前記複数のブロックを生成する処理では、文字列に含まれる文字のフォントが次の行の文字列に含まれる文字のフォントと異なる特定の行が存在する場合、前記特定の行と、前記特定の行の次の行とがそれぞれ異なるブロックに含まれるように前記複数のブロックの生成を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 9)
In Appendix 1,
In the process of generating the plurality of blocks, when there is a specific line in which the font of the character included in the character string is different from the font of the character included in the character string of the next line, the specific line and the specific line The plurality of blocks are generated so that the next row is included in different blocks.
A data comparison program characterized by that.

（付記１０）
付記１において、さらに、
前記複数のブロックのうち、前記他のデータに同一のブロックが含まれる特定のブロックが存在すると判定した場合、前記特定のブロックと、前記他のデータに含まれるブロックのうち、前記特定のブロックと同一のブロックとをそれぞれ特定し、
特定した前記ブロックのそれぞれが同一のブロックであることを示す情報を出力する、
ことを特徴とするデータ比較プログラム。 (Appendix 10)
In Appendix 1,
If it is determined that there is a specific block including the same block in the other data among the plurality of blocks, the specific block and the specific block among the blocks included in the other data Identify the same blocks,
Outputting information indicating that each of the identified blocks is the same block;
A data comparison program characterized by that.

（付記１１）
付記１０において、さらに、
前記複数のブロックのうち、前記他のデータに同一のブロックが含まれない第１ブロックを特定し、
前記他のデータに含まれるブロックのうち、生成した前記複数のブロックに同一のブロックが含まれない第２ブロックを特定し、
特定した前記第１ブロックに含まれる文字と、特定した前記第２ブロックに含まれる文字との一致率が所定の閾値以上である場合、前記第１ブロックと前記第２ブロックとが類似するブロックであることを示す情報を出力する、
ことを特徴とするデータ比較プログラム。 (Appendix 11)
In Appendix 10,
Identifying a first block of the plurality of blocks that does not include the same block in the other data;
Among the blocks included in the other data, identify a second block that does not include the same block in the plurality of generated blocks,
When the matching rate between the character included in the specified first block and the character included in the specified second block is equal to or higher than a predetermined threshold, the first block and the second block are similar blocks. Output information indicating that it is,
A data comparison program characterized by that.

（付記１２）
付記１１において、
前記類似するブロックであることを示す情報を出力する処理では、前記第１ブロックのうち、連続する前記同一のブロックの間に存在するブロックに含まれる文字と、前記第２ブロックのうち、前記連続する同一のブロックの間に存在するブロックに含まれる文字との一致率が前記所定の閾値以上であるか否かの判定を行う、
ことを特徴とするデータ比較プログラム。 (Appendix 12)
In Appendix 11,
In the process of outputting the information indicating that the blocks are similar, in the first block, the characters included in the consecutive blocks in the same block and the continuous blocks in the second block Determining whether or not the matching rate with the characters included in the blocks existing between the same blocks is equal to or higher than the predetermined threshold;
A data comparison program characterized by that.

（付記１３）
付記１１において、さらに、
前記複数のブロックのうち、特定された前記同一のブロック及び前記類似のブロックでのいずれでもない他のブロックを特定し、
特定した前記他のブロックが、前記他のデータに含まれないブロックであることを示す情報を出力する、
ことを特徴とするデータ比較プログラム。 (Appendix 13)
In Appendix 11,
Identifying another block that is not one of the identified identical block and the similar block among the plurality of blocks;
Outputting information indicating that the identified other block is a block not included in the other data;
A data comparison program characterized by that.

（付記１４）
付記１１において、さらに、
前記他のデータに含まれるブロックのうち、特定された前記同一のブロック及び前記類似のブロックのいずれでもない他のブロックを特定し、
特定した前記他のブロックが、前記複数のブロックに含まれないブロックであることを示す情報を出力する、
ことを特徴とするデータ比較プログラム。 (Appendix 14)
In Appendix 11,
Identifying another block that is not one of the identified identical block and the similar block among the blocks included in the other data;
Outputting information indicating that the identified other block is a block not included in the plurality of blocks;
A data comparison program characterized by that.

（付記１５）
付記１において、
前記切れ目情報は、前記ページの先頭を示すヘッダー情報と、前記ページの最後を示すフッター情報とを含む、
ことを特徴とするデータ比較プログラム。 (Appendix 15)
In Appendix 1,
The break information includes header information indicating the top of the page and footer information indicating the end of the page.
A data comparison program characterized by that.

（付記１６）
複数行の文字列をそれぞれ含む複数のデータを受け付ける受け付け部と、
受け付けた前記複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、前記何らかのデータから前記切れ目情報を削除したデータを生成するデータ生成部と、
生成した前記データを分割して複数のブロックを生成するブロック生成部と、
生成した前記ブロックそれぞれが、受け付けた前記複数のデータのうちの他のデータに含まれるか否かを判定する判定部と、を有する、
ことを特徴とするデータ比較装置。 (Appendix 16)
A receiving unit for receiving a plurality of data each including a plurality of lines of character strings;
A data generation unit that generates data obtained by deleting the break information from the some data, when the break information indicating a break of a page is included in some data of the plurality of received data;
A block generator that divides the generated data to generate a plurality of blocks;
A determination unit that determines whether each of the generated blocks is included in other data among the plurality of received data;
A data comparison apparatus characterized by that.

（付記１７）
複数行の文字列をそれぞれ含む複数のデータを受け付け、
受け付けた前記複数のデータのうちの何らかのデータにページの切れ目を示す切れ目情報が含まれる場合、前記何らかのデータから前記切れ目情報を削除したデータを生成し、
生成した前記データを分割して複数のブロックを生成し、
生成した前記ブロックそれぞれが、受け付けた前記複数のデータのうちの他のデータに含まれるか否かを判定する、
ことを特徴とするデータ比較方法。 (Appendix 17)
Accept multiple data, each containing multiple lines of text,
If any of the received data includes break information indicating a break in the page, generate data in which the break information is deleted from the some data,
Dividing the generated data to generate a plurality of blocks,
Determining whether each of the generated blocks is included in other data of the received plurality of data;
A data comparison method characterized by that.

１：情報処理装置２：記憶部
１１：利用者端末ＮＷ：ネットワーク 1: Information processing device 2: Storage unit 11: User terminal NW: Network

Claims

Accept multiple data, each containing multiple lines of text,
If any of the received data includes break information indicating a break in the page, generate data in which the break information is deleted from the some data,
Dividing the generated data to generate a plurality of blocks,
Determining whether each of the generated blocks is included in other data of the received plurality of data;
A data comparison program for causing a computer to execute processing.

In claim 1,
In the process of generating the data, if the end of the character string of the last line of the page immediately before the break information does not indicate the end of the sentence, the last line of the previous page, the first line of the page immediately after the break information, The interval is changed to the same interval as each line when the sentence is composed of a plurality of lines.
A data comparison program characterized by that.

In claim 2,
In the process of generating the data, when the end of the character string on the last line of the page immediately before the break information is a punctuation mark or a stop, it is determined that the end of the character string on the last line indicates the end of a sentence. ,
A data comparison program characterized by that.

In claim 2,
In the process of generating the data, when a character is included in the last line of the previous page, a process of changing the interval is performed.
A data comparison program characterized by that.

In claim 2,
In the process of generating the data, when a character is included in the first line of the immediately following page, a process of changing the interval is performed.
A data comparison program characterized by that.

In claim 1,
In the process of generating the plurality of blocks, if there is a blank line that includes only blanks in the generated data, the line immediately before the blank line, the blank line, and the line immediately after the blank line are respectively Generating the plurality of blocks to be included in different blocks;
A data comparison program characterized by that.

In claim 1,
In the process of generating the plurality of blocks, when there is a specific line that is blank or intend from the middle to the end, the specific line and the line next to the specific line are included in different blocks. The plurality of blocks are generated as follows:
A data comparison program characterized by that.

In claim 1,
In the process of generating the plurality of blocks, there is a specific line in which the end of the character string does not indicate the end of the sentence, and the first character string of the next line of the specific line is a single-byte character. In some cases, the plurality of blocks are generated so that the specific row and a row next to the specific row are included in the same block.
A data comparison program characterized by that.

In claim 1,
In the process of generating the plurality of blocks, when there is a specific line in which the font of the character included in the character string is different from the font of the character included in the character string of the next line, the specific line and the specific line The plurality of blocks are generated so that the next row is included in different blocks.
A data comparison program characterized by that.

The claim 1, further comprising:
If it is determined that there is a specific block including the same block in the other data among the plurality of blocks, the specific block and the specific block among the blocks included in the other data Identify the same blocks,
Outputting information indicating that each of the identified blocks is the same block;
A data comparison program characterized by that.

The claim 10, further comprising:
Identifying a first block of the plurality of blocks that does not include the same block in the other data;
Among the blocks included in the other data, identify a second block that does not include the same block in the plurality of generated blocks,
When the matching rate between the character included in the specified first block and the character included in the specified second block is equal to or higher than a predetermined threshold, the first block and the second block are similar blocks. Output information indicating that it is,
A data comparison program characterized by that.

In claim 11,
In the process of outputting the information indicating that the blocks are similar, in the first block, the characters included in the consecutive blocks in the same block and the continuous blocks in the second block Determining whether or not the matching rate with the characters included in the blocks existing between the same blocks is equal to or higher than the predetermined threshold;
A data comparison program characterized by that.

A receiving unit for receiving a plurality of data each including a plurality of lines of character strings;
A data generation unit that generates data obtained by deleting the break information from the some data, when the break information indicating a break of a page is included in some data of the plurality of received data;
A block generator that divides the generated data to generate a plurality of blocks;
A determination unit that determines whether each of the generated blocks is included in other data among the plurality of received data;
A data comparison apparatus characterized by that.

Accept multiple data, each containing multiple lines of text,
If any of the received data includes break information indicating a break in the page, generate data in which the break information is deleted from the some data,
Dividing the generated data to generate a plurality of blocks,
Determining whether each of the generated blocks is included in other data of the received plurality of data;
A data comparison method characterized by that.