JP2022144829A

JP2022144829A - Data correction method and data correction program

Info

Publication number: JP2022144829A
Application number: JP2021046008A
Authority: JP
Inventors: 康貴森脇; Yasutaka Moriwaki; 唯野間; Yui Noma
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2022-10-03

Abstract

To suggest appropriate correction candidates.SOLUTION: An information processing device detects any inconsistency between data in a data table and, if the inconsistency is detected, extracts a plurality of correction candidates for correcting the inconsistency. The information processing device calculates correction costs corresponding to the plurality of correction candidates, and displays the plurality of correction candidates and the correction costs corresponding to the plurality of correction candidates in association with each other.SELECTED DRAWING: Figure 3

Description

本発明は、データ修正方法等に関する。 The present invention relates to a data correction method and the like.

各種の分野において、使用される用語が統一されていない場合がある。たとえば、「氏名」および「名前」は同じ意味を表しているが表記が異なるため、システム上では、全く別のデータとして処理され、正確にデータを連携することができない原因になる。 Terminology used in various fields may not be unified. For example, "name" and "first name" have the same meaning, but different notations, so they are treated as completely different data on the system, which makes it impossible to link the data accurately.

同義語や上位語、下位語を定義した統制語彙データを作成し、利用することで、用語の曖昧性を吸収し、上記のような問題を解消できる。統制語彙データは、用語の曖昧さや同形異義、異形同義によって生じる検索の漏れ等を防ぐために、複数の用語間の意味的関係性をまとめた辞書であり、人手によって作成される。 By creating and using controlled vocabulary data that defines synonyms, hypernyms, and hyponyms, the ambiguity of terms can be absorbed, and the problems described above can be resolved. Controlled vocabulary data is a manually created dictionary summarizing the semantic relationships between multiple terms in order to prevent omissions in searches caused by ambiguity, homographs, and heteromorphic synonyms.

図１５は、統制語彙データのデータ構造の一例について説明する図である。一例として、統制語彙データのデータ構造を表形式のフォーマットで説明する。図１５に示すように、統制語彙データ１０は、用語名列１０ａ、標目列１０ｂ、上位語列１０ｃを有する。 FIG. 15 is a diagram illustrating an example of the data structure of controlled vocabulary data. As an example, the data structure of controlled vocabulary data will be described in a tabular format. As shown in FIG. 15, the controlled vocabulary data 10 has a term name column 10a, a headline column 10b, and a hypernym column 10c.

用語名列１０ａには、特定の分野で利用される用語集の用語が設定される。標目列１０ｂには、標目が設定される。標目は、複数種類の用語を代表する名称である。たとえば、図１５に示す例では、用語名「ソフト」、「ソフトウェア」、「ソフトウエア」の標目として「ソフトウェア」が設定されている。用語名「ｏｓ」、「ＯＳ」の標目として「ＯＳ」が設定されている。 The term name column 10a is set with terminology terms used in a specific field. Headings are set in the heading column 10b. Headings are names that represent multiple types of terms. For example, in the example shown in FIG. 15, "software" is set as a heading for the term names "software", "software", and "software". "OS" is set as the heading of the term name "os" and "OS".

上位語列１０ｃには、上位語が設定される。上位語は、該当する行の標目の上位概念となる他の列の標目を示す。たとえば、標目「ＯＳ」の行の上位語には「ソフトウェア」が設定されているため、標目「ＯＳ」の上位概念となる標目が「ソフトウェア」であることが設定されている。 Hypernym is set in the hypernym string 10c. A broader term indicates a heading in another column that is a broader concept of the heading in the corresponding row. For example, since "software" is set as the broader term of the heading "OS", the heading "software" is set as the broader concept of the heading "OS".

以下の説明では、用語名列１０ａの値（用語）および標目列１０ｂの値（標目）から同義関係である用語をまとめたものを、「同義グループ」と表記する。図１５に示す例では、１行目～３行目の情報（レコード）が、同義グループ１０－１に属する。４行目、５行目の情報が、同義グループ１０－２に属する。各同義グループには、原則として、ユニークな標目、上位語を持つものとする。 In the following description, a collection of synonymous terms from the values (terms) in the term name column 10a and the values (headings) in the heading column 10b is referred to as a "synonymous group". In the example shown in FIG. 15, the information (records) on the first to third lines belong to the same meaning group 10-1. Information on the fourth and fifth lines belongs to the same meaning group 10-2. In principle, each synonym group should have a unique heading and hypernym.

同義グループ１０－１には、ユニークな標目「ソフトウェア」が設定されている。同義グループ１０－２には、ユニークな標目「ＯＳ」、ユニークな上位語「ソフトウェア」が設定されている。 The synonym group 10-1 is given a unique heading "software". The synonym group 10-2 is set with a unique heading "OS" and a unique broader term "software".

続いて、同義グループに関する「不整合」について定義する。図１６は、不整合を説明するための図である。一例として、不整合Ａ，Ｂ，Ｃ，Ｄについて説明する。 Next, we define “inconsistency” regarding synonymous groups. FIG. 16 is a diagram for explaining mismatch. Mismatches A, B, C, and D will be described as an example.

統制語彙データ１１ａを用いて、「不整合Ａ」について説明する。不整合Ａは、同一の同義グループにおいて、標目が２種類以上存在するものである。たとえば、統制語彙データ１１ａにおいて、同一の同義グループ１１ａ－１には、２種類の標目「ソフト」、「ソフトウェア」が設定されており、不整合Ａに該当する。 "Inconsistency A" will be explained using the controlled vocabulary data 11a. Inconsistency A is the presence of two or more types of headings in the same synonymous group. For example, in the controlled vocabulary data 11a, two types of headings "soft" and "software" are set in the same synonym group 11a-1, which corresponds to inconsistency A.

統制語彙データ１１ｂを用いて、「不整合Ｂ」について説明する。不整合Ｂは、同一の同義グループにおいて、上位語が２種類存在するものである。たとえば、統制語彙データ１１ｂにおいて、同一の同義グループ１１ｂ－１では、２種類の上位語「ソフトウェア」、「software」が設定されており、不整合Ｂに該当する。 "Inconsistency B" will be explained using the controlled vocabulary data 11b. Inconsistency B is the presence of two broader terms in the same synonym group. For example, in the controlled vocabulary data 11b, two broader terms "software" and "software" are set in the same synonym group 11b-1, which corresponds to inconsistency B.

統制語彙データ１１ｃを用いて、「不整合Ｃ」について説明する。不整合Ｃは、ある同義グループに設定された上位語が、他の同義グループの標目以外の用語を指定しているというものである。たとえば、統制語彙データ１１ｃにおいて、同義グループ１１ｃ－１，１１ｃ－２が含まれている。同義グループ１１ｃ－２の上位語「ソフト」が、同義グループ１１ｃ－１の標目「ソフトウェア」と異なっており、不整合Ｃに該当する。 "Inconsistency C" will be explained using the controlled vocabulary data 11c. Inconsistency C is that a hypernym set in a certain synonym group designates a term other than the heading of another synonym group. For example, controlled vocabulary data 11c includes synonym groups 11c-1 and 11c-2. The broader term "software" of the synonym group 11c-2 is different from the heading "software" of the synonym group 11c-1, which corresponds to inconsistency C.

統制語彙データ１１ｄを用いて、「不整合Ｄ」について説明する。不整合Ｄは、異なる同義グループ間において、上下関係が循環しているというものである。たとえば、統制語彙データ１１ｄにおいて、同義グループ１１ｄ－１，１１ｄ－２が含まれている。同義グループ１１ｄ－１の上位語「ＯＳ」は、同義グループ１１ｄ－２の標目「ＯＳ」を指定しているが、同義グループ１１ｄ－２の上位語「ソフトウェア」は、同義グループ１１ｄ－１の標目「ソフトウェア」を指定しているため、循環しており、不整合Ｄに該当する。 "Inconsistency D" will be explained using the controlled vocabulary data 11d. Inconsistency D means that the hierarchy is cyclic among different synonymous groups. For example, controlled vocabulary data 11d includes synonym groups 11d-1 and 11d-2. The broader term "OS" of the synonym group 11d-1 specifies the heading "OS" of the synonym group 11d-2, but the broader term "software" of the synonym group 11d-2 specifies the heading of the synonym group 11d-1. Since "software" is specified, it is circular and falls under inconsistency D.

図１６で説明した不整合を修正するために、統制語彙データの各セルの値を人手で修正する手数のことを、「修正コスト」と表記する。図１７は、修正コストを説明するための図である。 The amount of work required to manually correct the values of each cell of the controlled vocabulary data in order to correct the inconsistency described in FIG. 16 is referred to as "correction cost". FIG. 17 is a diagram for explaining correction costs.

統制語彙データ１２ａ－１では、同一の同義グループにおいて、標目が２種類以上存在しており、不整合Ａとなる。統制語彙データ１２ａ－１の標目「ソフト」を、「ソフトウェア」に修正することで、不整合Ａが解消し、統制語彙データ１２ａ－２となる。この場合、修正コストは「１」となる。 In the controlled vocabulary data 12a-1, there are two or more types of headings in the same synonym group, resulting in inconsistency A. By correcting the heading "software" of the controlled vocabulary data 12a-1 to "software", the inconsistency A is resolved and the controlled vocabulary data 12a-2 is obtained. In this case, the correction cost is "1".

統制語彙データ１２ｂ－１では、同一の同義グループにおいて、上位語が２種類存在しており、不整合Ｂとなる。統制語彙データ１２ｂ－１の上位語「software」を「ソフトウェア」に修正し、統制語彙データ１２ｂ－１の上位語「ソフト」を「ソフトウェア」に修正することで、不整合Ｂが解消し、統制語彙データ１２ｂ－２となる。この場合、修正コストは「２」となる。 In the controlled vocabulary data 12b-1, there are two types of hypernyms in the same synonym group, which results in inconsistency B. FIG. By correcting the broader term "software" of the controlled vocabulary data 12b-1 to "software" and the broader term "software" of the controlled vocabulary data 12b-1 to "software", the inconsistency B is resolved and the controlled Vocabulary data 12b-2. In this case, the correction cost is "2".

ここで、図１７で説明したように、人手によって統制語彙データを修正する際、入力ミスや見落としが発生する場合がある。また、修正作業に時間を要し、統制語彙データの質が低下する場合がある。 Here, as described with reference to FIG. 17, when manually correcting the controlled vocabulary data, input errors or oversights may occur. In addition, correction work takes time, and the quality of the controlled vocabulary data may deteriorate.

統制語彙データ（表形式データ）から、不整合を検出し、修正候補を提示する手法として以下のような従来技術がある。 There are the following conventional techniques for detecting inconsistencies and presenting correction candidates from controlled vocabulary data (tabular data).

図１８は、従来技術を説明するための図である。図１８では、表形式データ１３を用いて説明を行う。表形式データ１３において、１行目～３行目のデータが、同義グループ１３－１に属するものとする。表形式データ１３には、用語名列１３ａ、標目列１３ｂが含まれる。 FIG. 18 is a diagram for explaining a conventional technique. In FIG. 18, the tabular data 13 is used for explanation. In tabular data 13, the data in the first to third rows belong to synonym group 13-1. The tabular data 13 includes a term name column 13a and a heading column 13b.

従来技術は、表形式データ１３の各列に対し、文字列同士の類似度を、編集距離等を用いて計算し、距離が近い文字列をクラスターに分け、クラスター毎に文字列を提示する。作業者は、各クラスターについて、文字列を確認し、表記ゆれなどを一括で修正することができる。 In the prior art, the degree of similarity between character strings for each column of tabular data 13 is calculated using an edit distance or the like, character strings with close distances are divided into clusters, and character strings are presented for each cluster. The operator can check the character strings for each cluster and collectively correct spelling variations.

たとえば、標目列１３ｂを用いて説明すると、同一の同義グループ１３－１において、標目として「ソフト」と「ソフトウェア」との２種類が存在するため、不整合Ａに対応する。従来技術では、標目列１３ｂに対して、「ソフト」と「ソフトウェア」とは同じクラスターであると判定し、修正コストが最小となる修正候補を提示できる。たとえば、従来技術では、ソフトをソフトウェアに置換する場合の修正コストは「２」、ソフトウェアをソフトに置換する場合の修正コストは「１」となるため、置換後の値として「ソフト」を提示する。作業者は、提示された置換後の値を参照して、ソフトウェアをソフトに修正することで、表形式データ１３を、表形式データ１４に修正する。表形式データ１４では、不整合Ａが解消されている。 For example, using the heading column 13b, there are two types of headings, "software" and "software", in the same synonym group 13-1. In the prior art, it is possible to determine that "software" and "software" are in the same cluster for the heading column 13b, and present a correction candidate with the lowest correction cost. For example, in the conventional technology, the correction cost is "2" when software is replaced with software, and the correction cost is "1" when software is replaced with software, so "soft" is presented as the value after replacement. . The worker corrects the tabular data 13 to the tabular data 14 by modifying the software with reference to the presented values after replacement. Inconsistency A is resolved in tabular data 14 .

特開２００８－２９３５３２号公報JP 2008-293532 A 特開平６－２６６７６９号公報JP-A-6-266769

しかしながら、上述した従来技術では、同義グループ間に上下関係が存在する場合には、適切な修正候補を提示することができないという問題がある。 However, the conventional technology described above has a problem that it is not possible to present appropriate correction candidates when there is a hierarchical relationship between synonym groups.

図１９は、従来技術の問題を説明するための図である。図１９では、統制語彙データ１５を用いて説明を行う。統制語彙データ１５には、用語名列１５ａ、標目列１５ｂ、上位語列１５ｃが含まれる。また、統制語彙データ１５の１行目～３行目の情報が、同義グループ１５－１に属し、４行目、５行目の情報が、同義グループ１５－２に属するものとする。 FIG. 19 is a diagram for explaining the problem of the conventional technology. In FIG. 19, the controlled vocabulary data 15 will be used for explanation. The controlled vocabulary data 15 includes a term name column 15a, a headline column 15b, and a hypernym column 15c. It is also assumed that the information on the 1st to 3rd rows of the controlled vocabulary data 15 belongs to the synonym group 15-1, and the information on the 4th and 5th rows belongs to the synonym group 15-2.

同義グループ１５－１の標目列１５ｂに着目すると、従来技術は、図１８で説明した処理を実行することで、置換後の値として「ソフト」を提示する。作業者は、提示された置換後の値を参照して、ソフトウェアをソフトに修正することで、統制語彙データ１５を、統制語彙データ１６に修正する。統制語彙データ１６には、用語名列１６ａ、標目列１６ｂ、上位語列１６ｃが含まれる。また、統制語彙データ１６の１行目～３行目の情報が、同義グループ１６－１に属し、４行目、５行目の情報が、同義グループ１６－２に属する。 Focusing on the heading column 15b of the synonym group 15-1, the conventional technology presents "soft" as the post-replacement value by executing the processing described with reference to FIG. The worker modifies the controlled vocabulary data 15 to the controlled vocabulary data 16 by modifying the software with reference to the presented values after replacement. The controlled vocabulary data 16 includes a term name column 16a, a headline column 16b, and a hypernym column 16c. Information on the first to third lines of the controlled vocabulary data 16 belongs to the synonym group 16-1, and information on the fourth and fifth lines belongs to the synonym group 16-2.

ここで、統制語彙データ１６では、同義グループ１６－２の上位語列１６ｃに含まれる上位語が、同義グループ１６－１の標目列１６ｂに含まれる標目と異なっており、不整合Ｃが含まれている。これに対して、従来技術では、統制語彙データ１６の同義グループ１６－２において、上位語列から検出されるクラスターが「ソフトウェア」しか検出されないため、不整合Ｃを検出して、適切な修正候補を提示することができない。 Here, in the controlled vocabulary data 16, the hypernym contained in the hypernym string 16c of the synonym group 16-2 is different from the heading contained in the heading string 16b of the synonym group 16-1, and inconsistent C is included. ing. On the other hand, in the conventional technology, in the synonym group 16-2 of the controlled vocabulary data 16, only "software" is detected as a cluster detected from the hypernym string. cannot be presented.

１つの側面では、本発明は、適切な修正候補を提示することができるデータ修正方法およびデータ修正プログラムを提供することを目的とする。 In one aspect, an object of the present invention is to provide a data correction method and a data correction program capable of presenting suitable correction candidates.

第１の案では、コンピュータが次の処理を実行する。コンピュータは、データテーブル内のデータ間の不整合を検出する。コンピュータは、不整合が検出された場合に、不整合を修正するための複数の修正候補を抽出する。コンピュータは、複数の修正候補に対応する修正コストをそれぞれ算出する。コンピュータは、複数の修正候補と、複数の修正候補に対応する修正コストとを対応付けて表示する。 In the first alternative, the computer performs the following processes. The computer detects inconsistencies between data in the data table. The computer extracts a plurality of correction candidates for correcting the inconsistency when the inconsistency is detected. A computer calculates a correction cost corresponding to each of the plurality of correction candidates. The computer associates and displays a plurality of correction candidates and correction costs corresponding to the plurality of correction candidates.

適切な修正候補を提示することができる。 Appropriate correction candidates can be presented.

図１は、参考技術を説明するための図である。FIG. 1 is a diagram for explaining the reference technology. 図２は、他の修正候補を説明するための図である。FIG. 2 is a diagram for explaining other correction candidates. 図３は、本実施例に係る情報処理装置の処理を説明するための図である。FIG. 3 is a diagram for explaining the processing of the information processing apparatus according to the embodiment. 図４は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。FIG. 4 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. 図５は、統制語彙データのデータ構造の一例を示す図である。FIG. 5 is a diagram showing an example of the data structure of controlled vocabulary data. 図６は、グループ内修正候補テーブルのデータ構造の一例を示す図である。FIG. 6 is a diagram showing an example of the data structure of the intra-group correction candidate table. 図７は、修正候補テーブルのデータ構造の一例を示す図である。FIG. 7 is a diagram showing an example of the data structure of a correction candidate table. 図８は、修正結果の一例を示す図（１）である。FIG. 8 is a diagram (1) showing an example of a correction result. 図９は、修正結果の一例を示す図（２）である。FIG. 9 is a diagram (2) showing an example of the correction result. 図１０は、修正結果の一例を示す図（３）である。FIG. 10 is a diagram (3) showing an example of the correction result. 図１１は、修正結果の一例を示す図（４）である。FIG. 11 is a diagram (4) showing an example of the correction result. 図１２は、表示制御部が生成する画面情報の一例を示す図である。FIG. 12 is a diagram illustrating an example of screen information generated by a display control unit; 図１３は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。FIG. 13 is a flow chart showing the processing procedure of the information processing apparatus according to this embodiment. 図１４は、情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。FIG. 14 is a diagram illustrating an example of a hardware configuration of a computer that implements functions similar to those of the information processing apparatus. 図１５は、統制語彙データのデータ構造の一例について説明する図である。FIG. 15 is a diagram illustrating an example of the data structure of controlled vocabulary data. 図１６は、不整合を説明するための図である。FIG. 16 is a diagram for explaining mismatch. 図１７は、修正コストを説明するための図である。FIG. 17 is a diagram for explaining correction costs. 図１８は、従来技術を説明するための図である。FIG. 18 is a diagram for explaining a conventional technique. 図１９は、従来技術の問題を説明するための図である。FIG. 19 is a diagram for explaining the problem of the conventional technology.

以下に、本願の開示するデータ修正方法およびデータ修正プログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Hereinafter, embodiments of the data correction method and data correction program disclosed in the present application will be described in detail based on the drawings. In addition, this invention is not limited by this Example.

本実施例の説明を行う前に、発明者が考案した参考技術について説明する。図１は、参考技術を説明するための図である。 Before describing the present embodiment, a reference technique devised by the inventor will be described. FIG. 1 is a diagram for explaining the reference technology.

図１では、統制語彙データ１５を用いて説明を行う。統制語彙データ１５には、用語名列１５ａ、標目列１５ｂ、上位語列１５ｃが含まれる。用語名列１５ａには用語が設定される。標目列１５ｂには、標目が設定される。上位語列１５ｃには、上位語が設定される。また、統制語彙データ１５の１行目～３行目の情報が、同義グループ１５－１に属し、４行目、５行目の情報が、同義グループ１５－２に属するものとする。 In FIG. 1, the controlled vocabulary data 15 is used for explanation. The controlled vocabulary data 15 includes a term name column 15a, a headline column 15b, and a hypernym column 15c. Terms are set in the term name column 15a. Headings are set in the heading column 15b. Hypernym is set in the hypernym string 15c. It is also assumed that the information on the 1st to 3rd rows of the controlled vocabulary data 15 belongs to the synonym group 15-1, and the information on the 4th and 5th rows belongs to the synonym group 15-2.

参考技術では、図１８で説明した処理と同様にして、置換後の値として「ソフト」を提示する。作業者は、提示された置換後の値を参照して、ソフトウェアをソフトに修正することで、統制語彙データ１５を、統制語彙データ１６に修正する。統制語彙データ１６には、用語名列１６ａ、標目列１６ｂ、上位語列１６ｃが含まれる。また、統制語彙データ１６の１行目～３行目の情報が、同義グループ１６－１に属し、４行目、５行目の情報が、同義グループ１６－２に属する。 In the reference technique, "soft" is presented as the post-replacement value in the same manner as the processing described with reference to FIG. The worker modifies the controlled vocabulary data 15 to the controlled vocabulary data 16 by modifying the software with reference to the presented values after replacement. The controlled vocabulary data 16 includes a term name column 16a, a headline column 16b, and a hypernym column 16c. Information on the first to third lines of the controlled vocabulary data 16 belongs to the synonym group 16-1, and information on the fourth and fifth lines belongs to the synonym group 16-2.

統制語彙データ１５を、統制語彙データ１６に修正することで、不整合Ａが修正される。また、統制語彙データ１５を、統制語彙データ１６に修正する場合の修正コストは「１」となる。 By correcting the controlled vocabulary data 15 to the controlled vocabulary data 16, the inconsistency A is corrected. Also, the correction cost for correcting the controlled vocabulary data 15 to the controlled vocabulary data 16 is "1".

ここで、同義グループ１６－１と、同義グループ１６－２との間で、整合性のチェックを行うと、同義グループ１６－２の上位語列１６ｃに含まれる「ソフトウェア」に対応する値が、同義グループ１６－１の標目列に存在しないため、不整合Ｃが検出される。 Here, when consistency is checked between the synonym group 16-1 and the synonym group 16-2, the value corresponding to "software" included in the hypernym string 16c of the synonym group 16-2 is Inconsistency C is detected because it does not exist in the heading column of synonym group 16-1.

参考技術では、同義グループ１６－１の上位語「ソフトウェア」に関して、置換後の値として「ソフト」を提示する。作業者は、提示された置換後の値を参照して、２つのソフトウェアをそれぞれ、ソフトに修正することで、統制語彙データ１６を、統制語彙データ１７に修正する。統制語彙データ１７には、用語名列１７ａ、標目列１７ｂ、上位語列１７ｃが含まれる。また、統制語彙データ１７の１行目～３行目の情報が、同義グループ１７－１に属し、４行目、５行目の情報が、同義グループ１７－２に属する。 In the reference technique, "soft" is presented as a post-substitution value for the broader term "software" of the synonym group 16-1. The worker modifies the controlled vocabulary data 16 to the controlled vocabulary data 17 by modifying the two pieces of software with reference to the presented values after replacement. The controlled vocabulary data 17 includes a term name column 17a, a headline column 17b, and a hypernym column 17c. Information on the first to third lines of the controlled vocabulary data 17 belongs to the synonym group 17-1, and information on the fourth and fifth lines belongs to the synonym group 17-2.

統制語彙データ１６を、統制語彙データ１７に修正することで、不整合Ｃが修正される。また、統制語彙データ１６を、統制語彙データ１７に修正する場合の修正コストは「２」となる。 By correcting the controlled vocabulary data 16 to the controlled vocabulary data 17, the inconsistency C is corrected. Also, the correction cost for correcting the controlled vocabulary data 16 to the controlled vocabulary data 17 is "2".

図１で説明したように、参考技術では、統制語彙データ１５の不整合を修正するためには、合計で修正コスト「３」を要する。 As described with reference to FIG. 1, in the reference technology, a total correction cost of "3" is required to correct inconsistencies in the controlled vocabulary data 15. FIG.

上述した参考技術では、同一の同義グループ内だけでなく、異なる同義グループ間の不整合を修正することが可能となる。しかしながら、参考技術では、修正コストが最小となる修正候補を提示できない場合がある。 With the above-described reference technology, it is possible to correct inconsistencies not only within the same synonymous group but also between different synonymous groups. However, the reference technique may not be able to present a correction candidate with the lowest correction cost.

図２は、他の修正候補を説明するための図である。図２では、統制語彙データ１５を用いて説明を行う。統制語彙データ１５に関する説明は、図１で行った説明と同様である。 FIG. 2 is a diagram for explaining other correction candidates. In FIG. 2, the controlled vocabulary data 15 is used for explanation. The explanation regarding the controlled vocabulary data 15 is the same as the explanation given in FIG.

たとえば、同義グループ１５－１の標目列１５ｂに含まれる２つの標目「ソフト」を「ソフトウェア」に修正する修正候補が考えられる。この修正候補の修正コストは「２」であり、この修正候補により、統制語彙データ１５は、統制語彙データ１８に修正される。統制語彙データ１８には、用語名列１８ａ、標目列１８ｂ、上位語列１８ｃが含まれる。また、統制語彙データ１８の１行目～３行目の情報が、同義グループ１８－１に属し、４行目、５行目の情報が、同義グループ１８－２に属する。統制語彙データ１８には不整合は存在していない。 For example, a correction candidate for correcting the two headings "soft" included in the heading column 15b of the synonym group 15-1 to "software" is conceivable. The correction cost of this correction candidate is "2", and the controlled vocabulary data 15 is corrected to the controlled vocabulary data 18 by this correction candidate. The controlled vocabulary data 18 includes a term name column 18a, a headline column 18b, and a hypernym column 18c. Information on the 1st to 3rd rows of the controlled vocabulary data 18 belongs to the synonym group 18-1, and information on the 4th and 5th rows belongs to the synonym group 18-2. There are no inconsistencies in the controlled vocabulary data 18 .

図１で説明した参考技術では、合計の修正コストが「３」であるが、図２で説明した修正候補では、合計の修正コストが「２」となる。すなわち、参考技術で修正した場合、修正コストが最小になっていない。 In the reference technique described with reference to FIG. 1, the total correction cost is "3", but in the correction candidate described with reference to FIG. 2, the total correction cost is "2". In other words, if the reference technique is used for correction, the correction cost is not minimized.

次に、本実施例に係る情報処理装置の処理について説明する。図３は、本実施例に係る情報処理装置の処理を説明するための図である。図３では、統制語彙データ２０を用いて説明を行う。統制語彙データ２０には、用語名列２０ａ、標目列２０ｂ、上位語列２０ｃが含まれる。情報処理装置は、統制語彙データ２０の用語、標目を基にして、統制語彙データ２０に含まれるレコードを、複数の同義グループに分類する。図３に示す例では、同義グループ２０－Ａ、同義グループ２０－Ｚを示すが、他の同義グループも含まれるものとする。情報処理装置は、統制語彙データ２０から不整合を検出した場合に、次の処理を実行する。 Next, processing of the information processing apparatus according to the present embodiment will be described. FIG. 3 is a diagram for explaining the processing of the information processing apparatus according to the embodiment. In FIG. 3, the controlled vocabulary data 20 will be used for explanation. The controlled vocabulary data 20 includes a term name string 20a, a heading string 20b, and a hypernym string 20c. The information processing device classifies the records included in the controlled vocabulary data 20 into a plurality of synonym groups based on the terms and headings of the controlled vocabulary data 20 . The example shown in FIG. 3 shows a synonymous group 20-A and a synonymous group 20-Z, but other synonymous groups are also included. When the information processing device detects inconsistency from the controlled vocabulary data 20, it performs the following processing.

情報処理装置は、各同義グループについて、複数の修正候補を特定し、修正候補に対する修正コストを計算する。以下の説明では、適宜、同義グループ毎に特定する修正候補を「グループ内修正候補」と表記する。情報処理装置は、同義グループに複数の標目が含まれる場合、どちらかの標目に統一する修正候補を抽出する。情報処理装置は、同義グループに複数の上位語が含まれる場合、どちらかの上位語に統一する修正候補を抽出する。たとえば、ある一つの同義グループにおいて、標目の種類がｎ_１種類で、上位語の種類がｎ_２種類の場合、ある一つの同義グループから、ｎ_１×ｎ_２個のグループ内修正候補が抽出される。 The information processing device identifies a plurality of correction candidates for each synonym group and calculates a correction cost for each of the correction candidates. In the following description, a correction candidate specified for each synonym group will be referred to as an "intra-group correction candidate" as appropriate. When the synonym group includes a plurality of headings, the information processing device extracts a correction candidate to be unified with one of the headings. When a synonym group includes a plurality of hypernyms, the information processing device extracts a correction candidate that unifies to one of the hypernyms. For example, if there are n ₁ types of headings and n ₂ types of hypernyms in one synonym group, n ₁ ×n ₂ in-group correction candidates are extracted from one synonym group. be.

同義グループ２０－Ａから特定されるグループ内修正候補について説明する。情報処理装置は、同義グループ２０－Ａから、グループ内修正候補ｇａ１，ｇａ２，ｇａ３，ｇａ４を抽出する。グループ内修正候補ｇａ１は、「項目をＡ１に統一し、上位語をＥ１に統一」となる。グループ内修正候補ｇａ１の修正コストは「２」となる。グループ内修正候補ｇａ２は、「項目をＡ１に統一し、上位語をＥ２に統一」となる。グループ内修正候補ｇａ１の修正コストは「２」となる。 Intra-group correction candidates identified from the synonym group 20-A will be described. The information processing device extracts in-group correction candidates ga1, ga2, ga3, and ga4 from the synonym group 20-A. The in-group correction candidate ga1 is "unify the item to A1 and unify the hypernym to E1". The correction cost of the in-group correction candidate ga1 is "2". The in-group correction candidate ga2 is "unify items to A1 and unify broader terms to E2". The correction cost of the in-group correction candidate ga1 is "2".

グループ内修正候補ｇａ３は、「項目をＡ２に統一し、上位語をＥ１に統一」となる。グループ内修正候補ｇａ３の修正コストは「２」となる。グループ内修正候補ｇａ４は、「項目をＡ２に統一し、上位語をＥ２に統一」となる。グループ内修正候補ｇａ４の修正コストは「２」となる。 The in-group correction candidate ga3 is "unify items to A2 and unify broader words to E1". The correction cost of the in-group correction candidate ga3 is "2". The in-group correction candidate ga4 is "unify items to A2 and unify hypernyms to E2". The correction cost of the in-group correction candidate ga4 is "2".

同義グループ２０－Ｚから特定されるグループ内修正候補について説明する。情報処理装置は、同義グループ２０－Ｚから、グループ内修正候補ｇｚ１，ｇｚ２，ｇｚ３，ｇｚ４を特定する。グループ内修正候補ｇｚ１は、「項目をＺ１に統一し、上位語をＨ１に統一」となる。グループ内修正候補ｇｚ１の修正コストは「２」となる。グループ内修正候補ｇｚ２は、「項目をＺ１に統一し、上位語をＨ２に統一」となる。グループ内修正候補ｇｚ２の修正コストは「２」となる。 Intra-group correction candidates identified from the synonym group 20-Z will be described. The information processing device identifies in-group correction candidates gz1, gz2, gz3, and gz4 from the synonym group 20-Z. The in-group correction candidate gz1 is "unify items to Z1 and unify broader terms to H1". The correction cost of the in-group correction candidate gz1 is "2". The in-group correction candidate gz2 is "unify the item to Z1 and unify the hypernym to H2". The correction cost of the in-group correction candidate gz2 is "2".

グループ内修正候補ｇｚ３は、「項目をＺ２に統一し、上位語をＨ１に統一」となる。グループ内修正候補ｇｚ３の修正コストは「２」となる。グループ内修正候補ｇｚ４は、「項目をＺ２に統一し、上位語をＨ２に統一」となる。グループ内修正候補ｇｚ４の修正コストは「２」となる。 The in-group correction candidate gz3 is "unify items to Z2 and unify broader words to H1". The correction cost of the in-group correction candidate gz3 is "2". The in-group correction candidate gz4 is "unify items to Z2 and unify broader words to H2". The correction cost of the in-group correction candidate gz4 is "2".

図示を省略するが、情報処理装置は、統制語彙データ２０に含まれる他の同義グループについても、上記処理を実行することで、各同義グループから、グループ内修正候補を抽出する。 Although illustration is omitted, the information processing device extracts in-group correction candidates from each synonym group by executing the above process for other synonym groups included in the controlled vocabulary data 20 as well.

続いて、情報処理装置は、各同義グループから抽出したグループ内修正候補を組み合わせることで、修正候補を生成する。一つの修正候補には、異なる同義グループのグループ内修正候補が一つずつ含まれる。図３に示す例では、修正候補（１）、修正候補（２）を示すが、他の修正候補も含まれる。修正候補（１）には、グループ内修正候補ｇａ１，ｇｚ１が含まれ、他のグループ内修正候補の図示を省略する。修正候補（２）には、グループ内修正候補ｇａ１，ｇｚ２が含まれ、他のグループ内修正候補の図示を省略する。 Subsequently, the information processing apparatus generates correction candidates by combining the intra-group correction candidates extracted from the synonymous groups. One correction candidate includes one intra-group correction candidate of a different synonym group. In the example shown in FIG. 3, correction candidate (1) and correction candidate (2) are shown, but other correction candidates are also included. Correction candidate (1) includes in-group correction candidates ga1 and gz1, and other in-group correction candidates are omitted from illustration. The correction candidate (2) includes the intra-group correction candidates ga1 and gz2, and the other intra-group correction candidates are omitted.

情報処理装置は、候補候補（１）に含まれる各グループ内修正候補の修正コストを合計することで、修正候補（１）の合計修正コストを算出する。また、情報処理装置は、修正候補（１）の補正を統制語彙データ２０に行った場合に、不整合が存在するか否かを判定する。 The information processing device calculates the total correction cost of the correction candidate (1) by totaling the correction costs of the correction candidates within the group included in the candidate (1). Further, the information processing device determines whether or not there is inconsistency when the controlled vocabulary data 20 is corrected for the correction candidate (1).

情報処理装置は、候補候補（２）に含まれる各グループ内修正候補の修正コストを合計することで、修正候補（２）の合計修正コストを算出する。また、情報処理装置は、修正候補（２）の補正を統制語彙データ２０に行った場合に、不整合が存在するか否かを判定する。 The information processing device calculates the total correction cost of the correction candidate (2) by totaling the correction costs of the correction candidates within the group included in the candidate (2). Further, the information processing device determines whether or not there is an inconsistency when the controlled vocabulary data 20 is corrected with the correction candidate (2).

情報処理装置は、他の修正候補についても、上記処理を繰り返し実行する。 The information processing device repeats the above process for other correction candidates.

情報処理装置は、複数の修正候補の合計修正コストおよび不整合の存在有無を基にして、合計修正コストが最小となり、かつ、不整合の検出されない修正候補を特定する。情報処理装置は、特定した修正候補と、この修正候補の合計修正コストとを対応付けて、表示する。情報処理装置は、合計修正コストが最小となる複数の修正候補を特定した場合には、複数の修正候補を、合計修正コストに対応付けて、表示させる。 Based on the total correction cost of the plurality of correction candidates and the existence or non-existence of inconsistency, the information processing device identifies the correction candidate with the lowest total correction cost and in which no inconsistency is detected. The information processing device associates and displays the specified correction candidate and the total correction cost of this correction candidate. When specifying a plurality of correction candidates with the lowest total correction cost, the information processing device displays the plurality of correction candidates in association with the total correction cost.

上記のように、本実施例に係る情報処理装置は、統制語彙データ２０の不整合が存在する場合に、不整合を修正するための複数の修正候補を抽出し、複数の修正候補と、合計修正コストとを対応付けて表示する。これによって、作業者は、修正コストを考慮して、修正候補を選択することが可能となる。 As described above, the information processing apparatus according to the present embodiment extracts a plurality of correction candidates for correcting the inconsistency when there is inconsistency in the controlled vocabulary data 20. It is displayed in association with the correction cost. This allows the operator to select a correction candidate in consideration of the correction cost.

また、情報処理装置は、合計修正コストが最小となる修正候補を特定して、表示することで、作業者の負担が最小となる修正候補であって、不整合を含まない修正候補を提示することができる。 Further, the information processing device identifies and displays a correction candidate that minimizes the total correction cost, thereby presenting a correction candidate that minimizes the burden on the operator and does not include inconsistency. be able to.

次に、本実施例に係る情報処理装置の構成について説明する。図４は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。図４に示すように、情報処理装置１００は、通信部１１０と、入力部１２０と、表示部１３０と、記憶部１４０と、制御部１５０とを有する。 Next, the configuration of the information processing apparatus according to this embodiment will be described. FIG. 4 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. As shown in FIG. 4 , information processing apparatus 100 includes communication section 110 , input section 120 , display section 130 , storage section 140 and control section 150 .

通信部１１０は、ネットワークを介して外部装置から各種のデータを受信する。通信部１１０は、通信装置の一例である。たとえば、通信部１１０は、後述する統制語彙データ１４１を、外部装置から受信してもよい。 Communication unit 110 receives various data from an external device via a network. Communication unit 110 is an example of a communication device. For example, communication unit 110 may receive controlled vocabulary data 141, which will be described later, from an external device.

入力部１２０は、情報処理装置１００の制御部１５０に各種の情報を入力する入力装置である。入力部１２０は、キーボードやマウス、タッチパネル等に対応する。作業者は、入力部１２０を操作して、統制語彙データ１４１に関するデータを入力してもよい。 The input unit 120 is an input device that inputs various types of information to the control unit 150 of the information processing device 100 . The input unit 120 corresponds to a keyboard, mouse, touch panel, or the like. The operator may operate the input unit 120 to input data regarding the controlled vocabulary data 141 .

表示部１３０は、制御部１５０から出力される情報を表示する表示装置である。たとえば、表示部１３０は、修正候補の情報を表示する。 The display unit 130 is a display device that displays information output from the control unit 150 . For example, the display unit 130 displays information on correction candidates.

記憶部１４０は、統制語彙データ１４１、グループ内修正候補テーブル１４２、修正候補テーブル１４３を有する。記憶部１４０は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子や、ＨＤＤ（Hard Disk Drive）などの記憶装置に対応する。 The storage unit 140 has controlled vocabulary data 141 , an intra-group correction candidate table 142 , and a correction candidate table 143 . The storage unit 140 corresponds to semiconductor memory devices such as RAM (Random Access Memory) and flash memory, and storage devices such as HDD (Hard Disk Drive).

統制語彙データ１４１は、統制語彙データは、用語の曖昧さや同形異義、異形同義によって生じる検索の漏れ等を防ぐために、複数の用語間の意味的関係性をまとめた辞書のデータである。統制語彙データ１４１は、図３で説明した統制語彙データ２０に相当する。 The controlled vocabulary data 141 is dictionary data that summarizes semantic relationships between a plurality of terms in order to prevent omissions in retrieval caused by ambiguity of terms, homographs, and heteromorphic synonyms. The controlled vocabulary data 141 corresponds to the controlled vocabulary data 20 described with reference to FIG.

図５は、統制語彙データのデータ構造の一例を示す図である。図５に示すように、この統制語彙データ５０は、用語名列５０ａ、標目列５０ｂ、上位語列５０ｃを有する。後述する制御部１５０の処理によって、統制語彙データ１４１のレコードは、同義グループ１４１－１，同義グループ１４１－２に分類される。 FIG. 5 is a diagram showing an example of the data structure of controlled vocabulary data. As shown in FIG. 5, this controlled vocabulary data 50 has a term name column 50a, a headline column 50b, and a hypernym column 50c. Records of the controlled vocabulary data 141 are classified into a synonym group 141-1 and a synonym group 141-2 by the processing of the control unit 150, which will be described later.

グループ内修正候補テーブル１４２は、グループ内修正候補を保持するテーブルである。グループ内修正候補テーブル１４２のデータ構造の説明は、後述する。 The intra-group correction candidate table 142 is a table that holds intra-group correction candidates. The data structure of the intra-group correction candidate table 142 will be described later.

修正候補テーブル１４３は、修正候補を保持するテーブルである。修正候補テーブル１４３のデータ構造の説明は、後述する。 The correction candidate table 143 is a table that holds correction candidates. The data structure of the correction candidate table 143 will be described later.

制御部１５０は、検出部１５１、抽出部１５２、表示制御部１５３を有する。制御部１５０は、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジック等によって実現される。 The control unit 150 has a detection unit 151 , an extraction unit 152 and a display control unit 153 . The control unit 150 is implemented by hardwired logic such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), and an FPGA (Field Programmable Gate Array).

検出部１５１は、統制語彙データ１４１の不整合を検出する処理部である。検出部１５１の処理を、図５を用いて説明する。検出部１５１は、統制語彙データ１４１の用語名列１４１ａの用語と、標目列１４１ｂの標目とを基にして、統制語彙データ１４１のレコードを、同義グループに分類する。 The detection unit 151 is a processing unit that detects inconsistencies in the controlled vocabulary data 141 . Processing of the detection unit 151 will be described with reference to FIG. The detection unit 151 classifies the records of the controlled vocabulary data 141 into synonymous groups based on the terms in the term name column 141a and the headings in the heading column 141b of the controlled vocabulary data 141 .

たとえば、検出部１５１は、統制語彙データ１４１の２行目において用語「ソフトウェア」と、標目「ソフト」との連結成分を検出する。これにより、検出部１５１は、用語または標目に、「ソフト」、「ソフトウェア」が設定されたレコードを、同一の同義グループ（同義グループ１４１－１）に設定する。 For example, the detection unit 151 detects a connected component between the term “software” and the heading “soft” in the second line of the controlled vocabulary data 141 . As a result, the detection unit 151 sets records in which "software" and "software" are set in terms or headings in the same synonym group (synonym group 141-1).

検出部１５１は、統制語彙データ１４１の４行目において用語「ｏｓ」と、標目「ＯＳ」との連結成分を検出する。これにより、検出部１５１は、用語または標目に、「ｏｓ」、「ＯＳ」が設定されたレコードを、同一の同義グループ（同義グループ１４１－２）に設定する。 The detection unit 151 detects a connected component between the term “os” and the heading “OS” in the fourth line of the controlled vocabulary data 141 . As a result, the detection unit 151 sets records in which "os" and "OS" are set in terms or headings in the same synonym group (synonym group 141-2).

検出部１５１は、統制語彙データ１４１のレコードを、同義グループに分類した後に、統制語彙データ１４１に不整合が存在するか否かを判定する。図６に示す例では、同義グループ１４１－１の標目列１４１ｂに含まれる標目が２種類以上存在するため、検出部１５１は、不整合Ａを検出する。検出部１５１は、統制語彙データ１４１から不整合を検出した旨を、抽出部１５２に出力する。 After classifying the records of the controlled vocabulary data 141 into synonymous groups, the detection unit 151 determines whether or not there is inconsistency in the controlled vocabulary data 141 . In the example shown in FIG. 6, the detection unit 151 detects inconsistency A because there are two or more types of headings included in the heading string 141b of the synonym group 141-1. Detecting unit 151 outputs to extracting unit 152 that inconsistency has been detected from controlled vocabulary data 141 .

図５では説明を省略するが、検出部１５１は、他の不整合Ｂ～Ｄを検出した場合にも、統制語彙データ１４１から不整合を検出した旨を、抽出部１５２に出力する。検出部１５１は、同義グループの分類結果も、検出部１５１に出力する。 Although the description is omitted in FIG. 5, the detection unit 151 outputs to the extraction unit 152 the fact that the mismatch is detected from the controlled vocabulary data 141 even when other mismatches B to D are detected. The detection unit 151 also outputs the classification result of synonymous groups to the detection unit 151 .

抽出部１５２は、統制語彙データ１４１から不整合が検出された場合に、不整合を修正するための複数の修正候補を抽出する処理部である。抽出部１５２は、同義グループ毎に、グループ内修正候補を抽出し、抽出したグループ内修正候補を、グループ内修正候補テーブル１４２に登録する。 The extraction unit 152 is a processing unit that, when inconsistency is detected from the controlled vocabulary data 141, extracts a plurality of correction candidates for correcting the inconsistency. The extraction unit 152 extracts an in-group correction candidate for each synonym group, and registers the extracted in-group correction candidate in the in-group correction candidate table 142 .

図６は、グループ内修正候補テーブルのデータ構造の一例を示す図である。図６に示すように、同義グループ識別情報、グループ内修正候補識別情報、修正コストを有する。同義グループ識別情報は、同義グループを識別する情報である。たとえば、同義グループ識別情報「ＳＧ１」は、図５の同義グループ１４１－１を示す。同義グループ識別情報「ＳＧ２」は、図５の同義グループ１４１－２を示す。グループ内修正候補識別情報は、グループ内修正候補を識別する情報である。グループ内修正候補は、グループ内修正候補の内容を示す。修正コストは、グループ内修正候補の修正コストである。 FIG. 6 is a diagram showing an example of the data structure of the intra-group correction candidate table. As shown in FIG. 6, it has synonymous group identification information, in-group correction candidate identification information, and correction cost. The synonymous group identification information is information for identifying a synonymous group. For example, synonym group identification information “SG1” indicates synonym group 141-1 in FIG. The synonymous group identification information “SG2” indicates the synonymous group 141-2 in FIG. The in-group correction candidate identification information is information for identifying an in-group correction candidate. The in-group correction candidate indicates the content of the in-group correction candidate. The correction cost is the correction cost of the intra-group correction candidate.

まず、抽出部１５２は、図５に示す統制語彙データ１４１の同義グループ１４１－１を基にして、グループ内修正候補を抽出する。 First, the extraction unit 152 extracts in-group correction candidates based on the synonym group 141-1 of the controlled vocabulary data 141 shown in FIG.

抽出部１５２は、同義グループ１４１－１の標目を「ソフト」に統一するグループ内修正候補を抽出する。同義グループ１４１－１の標目をソフトに統一する際の修正箇所は１箇所となるため、修正コストは「１」となる。抽出部１５２は、グループ内修正候補にユニークなグループ内修正候補識別情報を割り当て、グループ内修正候補識別情報（たとえば、ｇａ－１）、グループ内修正候補「標目を「ソフト」に統一する」、修正コスト「１」を、グループ内修正候補テーブル１４２に登録する。以下の説明では、グループ内修正候補識別情報ｇａ－１のグループ内修正候補を「グループ内修正候補ｇａ－１」と表記する。 The extraction unit 152 extracts in-group correction candidates that unify the headings of the synonym group 141-1 to "soft". When the headings of the synonym group 141-1 are unified by software, there is only one correction point, so the correction cost is "1". The extraction unit 152 assigns unique intra-group correction candidate identification information to the intra-group correction candidate, and assigns the intra-group correction candidate identification information (for example, ga-1), the intra-group correction candidate “unify the heading to “soft””, A correction cost of “1” is registered in the intra-group correction candidate table 142 . In the following description, the intra-group correction candidate of the intra-group correction candidate identification information ga-1 is referred to as "in-group correction candidate ga-1".

抽出部１５２は、同義グループ１４１－１の標目を「ソフトウェア」に統一するグループ内修正候補を抽出する。同義グループ１４１－１の標目をソフトウェアに統一する際の修正箇所は２箇所となるため、修正コストは「２」となる。抽出部１５２は、グループ内修正候補にユニークなグループ内修正候補識別情報を割り当て、グループ内修正候補識別情報（たとえば、ｇａ－２）、グループ内修正候補「標目を「ソフトウェア」に統一する」、修正コスト「２」を、グループ内修正候補テーブル１４２に登録する。以下の説明では、グループ内修正候補識別情報ｇａ－２のグループ内修正候補を「グループ内修正候補ｇａ－２」と表記する。 The extraction unit 152 extracts in-group correction candidates that unify the headings of the synonym group 141-1 to "software". Since there are two points to be corrected when the headings of the synonym group 141-1 are integrated into the software, the correction cost is "2". The extraction unit 152 assigns unique in-group correction candidate identification information to the in-group correction candidate, and assigns the in-group correction candidate identification information (for example, ga-2), the in-group correction candidate 'unify the heading to 'software'', The correction cost “2” is registered in the intra-group correction candidate table 142 . In the following description, the intra-group correction candidate of the intra-group correction candidate identification information ga-2 is referred to as "in-group correction candidate ga-2".

続いて、抽出部１５２は、図５に示す統制語彙データ１４１の同義グループ１４１－２を基にして、グループ内修正候補を抽出する。抽出部１５２は、統制語彙データ１４１全体の用語名と、標目との連結成分（ソフトおよびソフトウェアの連結成分）から、同義グループ１４１－２の上位語の修正候補として、「ソフトウェア」を「ソフト」に修正する修正候補を特定する。 Subsequently, the extraction unit 152 extracts in-group correction candidates based on the synonym group 141-2 of the controlled vocabulary data 141 shown in FIG. Extraction unit 152 extracts "software" from "software" as correction candidates for broader terms of synonym group 141-2 based on term names in controlled vocabulary data 141 as a whole and connected components with headings (software and software connected components). Identifies candidates for correction to .

抽出部１５２は、同義グループ１４１－２の上位語を「ソフト」に統一するグループ内修正候補を抽出する。同義グループ１４１－２の上位語をソフトに統一する際の修正箇所は２箇所となるため、修正コストは「２」となる。抽出部１５２は、グループ内修正候補にユニークなグループ内修正候補識別情報を割り当て、グループ内修正候補識別情報（たとえば、ｇｂ－１）、グループ内修正候補「上位語を「ソフト」に統一する」、修正コスト「２」を、グループ内修正候補テーブル１４２に登録する。以下の説明では、グループ内修正候補識別情報ｇｂ－１のグループ内修正候補を「グループ内修正候補ｇｂ－１」と表記する。 The extraction unit 152 extracts in-group correction candidates that unify the hypernyms of the synonym group 141-2 to "soft". Since there are two points to be corrected when unifying the broader terms of the synonym group 141-2 into software, the correction cost is "2". The extraction unit 152 assigns unique in-group correction candidate identification information to the in-group correction candidate, and extracts the in-group correction candidate identification information (for example, gb-1) and the in-group correction candidate ``unify the broader terms into ``soft''''. , the correction cost “2” is registered in the intra-group correction candidate table 142 . In the following description, the intra-group correction candidate of the intra-group correction candidate identification information gb-1 is referred to as "in-group correction candidate gb-1".

抽出部１５２は、同義グループ１４１－２の上位語を「ソフトウェア」に統一するグループ内修正候補を抽出する。同義グループ１４１－２の上位語をソフトウェアに統一する際の修正箇所は０箇所となるため、修正コストは「０」となる。抽出部１５２は、グループ内修正候補にユニークなグループ内修正候補識別情報を割り当て、グループ内修正候補識別情報（たとえば、ｇｂ－２）、グループ内修正候補「上位語を「ソフトウェア」に統一する」、修正コスト「０」を、グループ内修正候補テーブル１４２に登録する。以下の説明では、グループ内修正候補識別情報ｇｂ－２のグループ内修正候補を「グループ内修正候補ｇｂ－２」と表記する。 The extraction unit 152 extracts in-group correction candidates that unify the hypernyms of the synonym group 141-2 to "software". Since there are 0 points to be corrected when unifying the higher terms of the synonym group 141-2 into software, the correction cost is "0". The extraction unit 152 assigns unique in-group correction candidate identification information to the in-group correction candidate, and extracts the in-group correction candidate identification information (for example, gb-2) and the in-group correction candidate ``unify the broader terms into ``software''''. , the correction cost “0” is registered in the intra-group correction candidate table 142 . In the following description, the intra-group correction candidate of the intra-group correction candidate identification information gb-2 is referred to as "in-group correction candidate gb-2".

抽出部１５２が、上記の処理を実行することで、図６に示すグループ内修正候補テーブル１４２にデータが登録される。 Data is registered in the intra-group correction candidate table 142 shown in FIG. 6 by the extraction unit 152 executing the above process.

続いて、抽出部１５２は、グループ内修正候補テーブル１４２に含まれるグループ内修正候補を組み合わせることで、修正候補を生成し、修正候補テーブル１４３に登録する。 Subsequently, the extraction unit 152 generates correction candidates by combining the intra-group correction candidates included in the intra-group correction candidate table 142 , and registers them in the correction candidate table 143 .

図７は、修正候補テーブルのデータ構造の一例を示す図である。図７に示すように、修正候補テーブル１４３は、修正候補識別情報と、修正候補構成と、合計修正コストと、不整合有無を有する。修正候補識別情報は、修正候補を識別する情報である。修正候補構成は、修正候補に含まれるグループ内修正候補の識別情報（グループ内修正候補識別情報）を示す。合計修正コストは、修正候補に含まれるグループ内修正候補の修正コストの合計値を示す。不整合有無は、該当する修正候補によって、統制語彙データ１４１を修正した場合に、不整合が存在するか否かを示す。不整合が存在する場合には「有」となり、不整合が存在しない場合には「無」となる。 FIG. 7 is a diagram showing an example of the data structure of a correction candidate table. As shown in FIG. 7, the correction candidate table 143 has correction candidate identification information, correction candidate configuration, total correction cost, and presence/absence of inconsistency. The correction candidate identification information is information that identifies a correction candidate. The correction candidate configuration indicates identification information of in-group correction candidates (in-group correction candidate identification information) included in the correction candidates. The total correction cost indicates the total correction cost of the correction candidates within the group included in the correction candidates. The presence or absence of inconsistency indicates whether or not there is inconsistency when the controlled vocabulary data 141 is corrected by the corresponding correction candidate. If there is an inconsistency, it will be "present", and if there is no inconsistency, it will be "absent".

抽出部１５２は、グループ内修正候補ｇａ－１，ｇｂ－１からなる修正候補を生成する。グループ内修正候補ｇａ－１，ｇｂ－１からなる修正候補の修正候補識別情報を「Ａｍ１」とする。また、修正候補識別情報「Ａｍ１」の修正候補を「修正候補Ａｍ１」と表記する。図６に示すように、グループ内修正候補ｇａ－１の修正コストが「１」、グループ内修正候補ｇｂ－１の修正コストが「２」となるため、修正候補Ａｍ１の合計修正コストは「３」となる。 The extraction unit 152 generates a correction candidate consisting of the intra-group correction candidates ga-1 and gb-1. Let "Am1" be the correction candidate identification information of the correction candidate consisting of the intra-group correction candidates ga-1 and gb-1. Further, the correction candidate of the correction candidate identification information "Am1" is written as "correction candidate Am1". As shown in FIG. 6, since the correction cost of the in-group correction candidate ga-1 is "1" and the correction cost of the in-group correction candidate gb-1 is "2", the total correction cost of the correction candidate Am1 is "3 ”.

図８は、修正結果の一例を示す図（１）である。抽出部１５２が、図５に示した統制語彙データ１４１に対して、修正候補Ａｍ１の修正を行うと、図８に示す統制語彙データ１４１－Ａｍ１となる。修正候補Ａｍ１の修正には、同義グループ１４１－１の標目を「ソフト」に統一する修正と、同義グループ１４１－２の上位語を「ソフト」に統一する修正が含まれる。統制語彙データ１４１－Ａｍ１には、不整合が存在しない。 FIG. 8 is a diagram (1) showing an example of a correction result. When the extraction unit 152 corrects the correction candidate Am1 to the controlled vocabulary data 141 shown in FIG. 5, the controlled vocabulary data 141-Am1 shown in FIG. 8 is obtained. The correction of the correction candidate Am1 includes a correction to unify the headings of the synonym group 141-1 to "soft" and a correction to unify the hypernyms of the synonym group 141-2 to "soft". There is no inconsistency in the controlled vocabulary data 141-Am1.

上記処理を実行することで、抽出部１５２は、修正候補識別情報「Ａｍ１」、修正候補構成「ｇａ－１，ｇｂ－１」、合計修正コスト「３」、不整合有無「無」を、修正候補テーブル１４３に登録する。 By executing the above process, the extraction unit 152 corrects the correction candidate identification information “Am1”, the correction candidate configuration “ga-1, gb-1”, the total correction cost “3”, and the presence/absence of inconsistency “none”. Register in the candidate table 143 .

抽出部１５２は、グループ内修正候補ｇａ－１，ｇｂ－２からなる修正候補を生成する。グループ内修正候補ｇａ－１，ｇｂ－２からなる修正候補の修正候補識別情報を「Ａｍ２」とする。また、修正候補識別情報「Ａｍ２」の修正候補を「修正候補Ａｍ２」と表記する。図６に示すように、グループ内修正候補ｇａ－１の修正コストが「１」、グループ内修正候補ｇｂ－２の修正コストが「０」となるため、修正候補Ａｍ２の合計修正コストは「１」となる。 The extraction unit 152 generates a correction candidate consisting of the intra-group correction candidates ga-1 and gb-2. Let "Am2" be the correction candidate identification information of the correction candidate consisting of the intra-group correction candidates ga-1 and gb-2. Further, the correction candidate of the correction candidate identification information "Am2" is written as "correction candidate Am2". As shown in FIG. 6, since the correction cost of the in-group correction candidate ga-1 is "1" and the correction cost of the in-group correction candidate gb-2 is "0", the total correction cost of the correction candidate Am2 is "1". ”.

図９は、修正結果の一例を示す図（２）である。抽出部１５２が、図５に示した統制語彙データ１４１に対して、修正候補Ａｍ２の修正を行うと、図９に示す統制語彙データ１４１－Ａｍ２となる。修正候補Ａｍ２の修正には、同義グループ１４１－１の標目を「ソフト」に統一する修正と、同義グループ１４１－２の上位語を「ソフトウェア」に統一する修正（変更なし）が含まれる。統制語彙データ１４１－Ａｍ２には、同義グループ１４１－２の上位語「ソフトウェア」が、同義グループ１４１－１の標目「ソフト」と異なるため、不整合Ｃが検出される。 FIG. 9 is a diagram (2) showing an example of the correction result. When the extraction unit 152 corrects the correction candidate Am2 to the controlled vocabulary data 141 shown in FIG. 5, the controlled vocabulary data 141-Am2 shown in FIG. 9 is obtained. The correction of the correction candidate Am2 includes a correction to unify the headings of the synonym group 141-1 to "software" and a correction (no change) to unify the hypernyms of the synonym group 141-2 to "software". In the controlled vocabulary data 141-Am2, inconsistency C is detected because the hypernym "software" of the synonym group 141-2 is different from the heading "soft" of the synonym group 141-1.

上記処理を実行することで、抽出部１５２は、修正候補識別情報「Ａｍ２」、修正候補構成「ｇａ－１，ｇｂ－２」、合計修正コスト「１」、不整合有無「有」を、修正候補テーブル１４３に登録する。 By executing the above process, the extraction unit 152 corrects the correction candidate identification information “Am2”, the correction candidate configuration “ga-1, gb-2”, the total correction cost “1”, and the presence or absence of mismatch “present”. Register in the candidate table 143 .

抽出部１５２は、グループ内修正候補ｇａ－２，ｇｂ－１からなる修正候補を生成する。グループ内修正候補ｇａ－２，ｇｂ－１からなる修正候補の修正候補識別情報を「Ａｍ３」とする。また、修正候補識別情報「Ａｍ３」の修正候補を「修正候補Ａｍ３」と表記する。図６に示すように、グループ内修正候補ｇａ－２の修正コストが「２」、グループ内修正候補ｇｂ－１の修正コストが「２」となるため、修正候補Ａｍ３の合計修正コストは「４」となる。 The extraction unit 152 generates a correction candidate consisting of the intra-group correction candidates ga-2 and gb-1. Let "Am3" be the correction candidate identification information of the correction candidate consisting of the intra-group correction candidates ga-2 and gb-1. Further, the correction candidate of the correction candidate identification information "Am3" is written as "correction candidate Am3". As shown in FIG. 6, since the correction cost of the in-group correction candidate ga-2 is "2" and the correction cost of the in-group correction candidate gb-1 is "2", the total correction cost of the correction candidate Am3 is "4 ”.

図１０は、修正結果の一例を示す図（３）である。抽出部１５２が、図５に示した統制語彙データ１４１に対して、修正候補Ａｍ３の修正を行うと、図１０に示す統制語彙データ１４１－Ａｍ３となる。修正候補Ａｍ３の修正には、同義グループ１４１－１の標目を「ソフトウェア」に統一する修正と、同義グループ１４１－２の上位語を「ソフト」に統一する修正が含まれる。統制語彙データ１４１－Ａｍ３には、同義グループ１４１－２の上位語「ソフト」が、同義グループ１４１－１の標目「ソフトウェア」と異なるため、不整合Ｃが検出される。 FIG. 10 is a diagram (3) showing an example of the correction result. When the extraction unit 152 corrects the correction candidate Am3 to the controlled vocabulary data 141 shown in FIG. 5, the controlled vocabulary data 141-Am3 shown in FIG. 10 is obtained. The correction of the correction candidate Am3 includes a correction to unify the headings of the synonym group 141-1 to "software" and a correction to unify the hypernyms of the synonym group 141-2 to "soft". In the controlled vocabulary data 141-Am3, inconsistency C is detected because the hypernym "soft" of the synonym group 141-2 is different from the heading "software" of the synonym group 141-1.

上記処理を実行することで、抽出部１５２は、修正候補識別情報「Ａｍ３」、修正候補構成「ｇａ－２，ｇｂ－１」、合計修正コスト「４」、不整合有無「有」を、修正候補テーブル１４３に登録する。 By executing the above process, the extraction unit 152 corrects the correction candidate identification information “Am3”, the correction candidate configuration “ga-2, gb-1”, the total correction cost “4”, and the presence or absence of mismatch “present”. Register in the candidate table 143 .

抽出部１５２は、グループ内修正候補ｇａ－２，ｇｂ－２からなる修正候補を生成する。グループ内修正候補ｇａ－２，ｇｂ－２からなる修正候補の修正候補識別情報を「Ａｍ４」とする。また、修正候補識別情報「Ａｍ４」の修正候補を「修正候補Ａｍ４」と表記する。図６に示すように、グループ内修正候補ｇａ－２の修正コストが「２」、グループ内修正候補ｇｂ－２の修正コストが「０」となるため、修正候補Ａｍ４の合計修正コストは「２」となる。 The extraction unit 152 generates a correction candidate consisting of the intra-group correction candidates ga-2 and gb-2. Let "Am4" be the correction candidate identification information of the correction candidate consisting of the intra-group correction candidates ga-2 and gb-2. Further, the correction candidate of the correction candidate identification information "Am4" is written as "correction candidate Am4". As shown in FIG. 6, since the correction cost of the in-group correction candidate ga-2 is "2" and the correction cost of the in-group correction candidate gb-2 is "0", the total correction cost of the correction candidate Am4 is "2 ”.

図１１は、修正結果の一例を示す図（４）である。抽出部１５２が、図５に示した統制語彙データ１４１に対して、修正候補Ａｍ４の修正を行うと、図１１に示す統制語彙データ１４１－Ａｍ４となる。修正候補Ａｍ４の修正には、同義グループ１４１－１の標目を「ソフトウェア」に統一する修正と、同義グループ１４１－２の上位語を「ソフトウェア」に統一する修正（修正なし）が含まれる。統制語彙データ１４１－Ａｍ４には、不整合が存在しない。 FIG. 11 is a diagram (4) showing an example of the correction result. When the extraction unit 152 corrects the correction candidate Am4 to the controlled vocabulary data 141 shown in FIG. 5, the controlled vocabulary data 141-Am4 shown in FIG. 11 is obtained. The correction of the correction candidate Am4 includes a correction to unify the headings of the synonym group 141-1 to "software" and a correction (no correction) to unify the broader terms of the synonym group 141-2 to "software". There is no inconsistency in the controlled vocabulary data 141-Am4.

上記処理を実行することで、抽出部１５２は、修正候補識別情報「Ａｍ４」、修正候補構成「ｇａ－２，ｇｂ－２」、合計修正コスト「２」、不整合有無「無」を、修正候補テーブル１４３に登録する。 By executing the above process, the extraction unit 152 corrects the correction candidate identification information “Am4”, the correction candidate configuration “ga-2, gb-2”, the total correction cost “2”, and the presence or absence of inconsistency “none”. Register in the candidate table 143 .

抽出部１５２が、上記処理を実行することで、図７に示す修正候補テーブル１４３にデータが登録される。 Data is registered in the correction candidate table 143 shown in FIG. 7 by the extraction unit 152 executing the above process.

図４の説明に戻る。表示制御部１５３は、グループ内修正候補テーブル１４２、修正候補テーブル１４３を基にして、複数の修正候補と、複数の修正候補に対応する合計修正コストとを対応付けた画面情報を生成し、画面情報を表示部１３０に表示させる。 Returning to the description of FIG. Based on the intra-group correction candidate table 142 and the correction candidate table 143, the display control unit 153 generates screen information in which a plurality of correction candidates and total correction costs corresponding to the plurality of correction candidates are associated with each other. Information is displayed on the display unit 130 .

たとえば、表示制御部１５３は、修正候補テーブル１４３を基にして、複数の修正候補から、不整合有無「無」となる修正候補と、この修正候補に対応する合計修正コスト、修正候補構成を特定する。表示制御部１５３は、特定した修正候補構成と、グループ内修正候補テーブル１４２を基にして、修正候補に対応するグループ内修正候補を特定する。 For example, based on the correction candidate table 143, the display control unit 153 identifies a correction candidate with inconsistency “no” from a plurality of correction candidates, a total correction cost corresponding to this correction candidate, and a correction candidate configuration. do. The display control unit 153 identifies an intra-group correction candidate corresponding to the correction candidate based on the identified correction candidate configuration and the intra-group correction candidate table 142 .

表示制御部１５３は、上記処理によって特定した情報を基にして、画面情報を生成する。図１２は、表示制御部が生成する画面情報の一例を示す図である。図１２に示すように、画面情報６０には、修正候補Ａｍ４に関する情報を含む領域６０ａと、修正候補Ａｍ１に関する情報を含む領域６０ｂとが含まれる。 The display control unit 153 generates screen information based on the information specified by the above process. FIG. 12 is a diagram illustrating an example of screen information generated by a display control unit; As shown in FIG. 12, the screen information 60 includes an area 60a containing information on the correction candidate Am4 and an area 60b containing information on the correction candidate Am1.

表示制御部１５３は、合計修正コストの降順に、各修正候補を画面情報６０に設定してもよいし、合計修正コストが最小となる修正候補のみを、画面情報６０に設定してもよい。表示制御部１５３は、画面情報６０を、表示部１３０に出力する。 The display control unit 153 may set each correction candidate in the screen information 60 in descending order of the total correction cost, or may set only the correction candidate with the lowest total correction cost in the screen information 60 . Display control unit 153 outputs screen information 60 to display unit 130 .

作業者は、表示部１３０に表示された画面情報６０を参照し、入力部１２０を操作し、統制語彙データ１４１を修正する。なお、作業者の設定に応じて、情報処理装置１００は、不整合を含まない修正候補を基にして、自動で、統制語彙データ１４１を修正してもよい。 The operator refers to the screen information 60 displayed on the display unit 130 , operates the input unit 120 , and corrects the controlled vocabulary data 141 . Note that the information processing apparatus 100 may automatically correct the controlled vocabulary data 141 based on correction candidates that do not contain inconsistencies, according to the operator's settings.

次に、本実施例に係る情報処理装置１００の処理手順の一例について説明する。図１３は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。図１３に示すように、情報処理装置１００の検出部１５１は、統制語彙データ１４１を取得する（ステップＳ１０１）。 Next, an example of the processing procedure of the information processing apparatus 100 according to this embodiment will be described. FIG. 13 is a flow chart showing the processing procedure of the information processing apparatus according to this embodiment. As shown in FIG. 13, the detection unit 151 of the information processing device 100 acquires controlled vocabulary data 141 (step S101).

検出部１５１は、用語および標目を基にして、統制語彙データを同義グループに分類する（ステップＳ１０２）。検出部１５１は、複数の同義グループを基にして、不整合が存在するか否かを特定する（ステップＳ１０３）。 The detection unit 151 classifies the controlled vocabulary data into synonymous groups based on terms and headings (step S102). The detection unit 151 identifies whether or not there is an inconsistency based on a plurality of synonymous groups (step S103).

検出部１５１が、不整合を検出しない場合には（ステップＳ１０４，Ｎｏ）、処理を終了する。検出部１５１が、不整合を検出した場合には（ステップＳ１０４，Ｙｅｓ）、ステップＳ１０５に移行する。 If the detection unit 151 does not detect any inconsistency (step S104, No), the process ends. When the detection unit 151 detects mismatch (step S104, Yes), the process proceeds to step S105.

情報処理装置１００の抽出部１５２は、各同義グループについて、グループ内修正候補を抽出し、計算コストを算出する（ステップＳ１０５）。 The extraction unit 152 of the information processing apparatus 100 extracts in-group correction candidates for each synonym group, and calculates a calculation cost (step S105).

抽出部１５２は、複数の同義グループ間のグループ内修正候補を組み合わせた複数の修正候補を生成する（ステップＳ１０６）。抽出部１５２は、修正候補によって、統制語彙データ１４１を修正した場合の不整合の有無を特定する（ステップＳ１０７）。 The extraction unit 152 generates a plurality of correction candidates by combining intra-group correction candidates between a plurality of synonymous groups (step S106). The extraction unit 152 identifies the presence or absence of inconsistency when the controlled vocabulary data 141 is corrected based on the correction candidates (step S107).

抽出部１５２は、修正候補に含まれるグループ内修正候補の修正コストを合計した合計修正コストを算出する（ステップＳ１０８）。抽出部１５２は、修正候補テーブル１４３に修正候補の情報を登録する（ステップＳ１０９）。 The extraction unit 152 calculates a total correction cost by summing the correction costs of the correction candidates within the group included in the correction candidates (step S108). The extraction unit 152 registers the information of the correction candidate in the correction candidate table 143 (step S109).

表示制御部１５３は、不整合の存在しない修正候補を基にして画面情報を生成する（ステップＳ１１０）。表示制御部１５３は、画面情報を、表示部１３０に表示させる（ステップＳ１１１）。 The display control unit 153 generates screen information based on correction candidates that do not have inconsistency (step S110). The display control unit 153 causes the display unit 130 to display screen information (step S111).

次に、本実施例に係る情報処理装置１００の効果について説明する。情報処理装置１００は、統制語彙データ１４１の不整合を検出した場合に、不整合を修正するための複数の修正候補を抽出し、複数の修正候補に対する合計修正コストを算出する。情報処理装置１００は、複数の修正候補と、合計修正コストとを対応付けて表示する。これによって、作業者は、修正コストを考慮して、修正候補を選択することが可能となる。 Next, the effects of the information processing apparatus 100 according to this embodiment will be described. When the information processing apparatus 100 detects inconsistency in the controlled vocabulary data 141, it extracts a plurality of correction candidates for correcting the inconsistency, and calculates a total correction cost for the plurality of correction candidates. The information processing apparatus 100 associates and displays a plurality of correction candidates and the total correction cost. This allows the operator to select a correction candidate in consideration of the correction cost.

情報処理装置１００は、合計修正コストが最小となる修正候補を特定して、表示することで、作業者の負担が最小となる修正候補であって、不整合を含まない修正候補を提示することができる。 The information processing apparatus 100 identifies and displays a correction candidate that minimizes the total correction cost, thereby presenting a correction candidate that minimizes the burden on the operator and that does not include inconsistency. can be done.

情報処理装置１００は、統制語彙データ１４１に含まれる用語および標目を基にして、統制語彙データ１４１を複数の同義グループに分類し、複数の同義グループの間の不整合を検出する。これによって、同義グループの上位語に設定された値が、他の同義グループの標目に設定されていない不整合（不整合Ｃ）等を検出することができる。 The information processing apparatus 100 classifies the controlled vocabulary data 141 into a plurality of synonymous groups based on the terms and headings included in the controlled vocabulary data 141, and detects inconsistencies between the plurality of synonymous groups. This makes it possible to detect a mismatch (mismatch C) in which the value set in the hypernym of the synonym group is not set in the heading of another synonym group.

情報処理装置１００は、修正候補に含まれる複数のグループ内修正候補の修正コストを合計した合計修正コストを算出する。これによって、各修正候補の合計修正コストを算出して、作業者の手間を定量的に特定することができる。 The information processing apparatus 100 calculates a total correction cost by totaling the correction costs of a plurality of correction candidates within a group included in the correction candidates. This makes it possible to calculate the total correction cost of each correction candidate and to quantitatively identify the labor of the operator.

次に、上記実施例に示した情報処理装置１００と同様の機能を実現するコンピュータのハードウェア構成の一例について説明する。図１４は、情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。 Next, an example of the hardware configuration of a computer that implements the same functions as the information processing apparatus 100 shown in the above embodiment will be described. FIG. 14 is a diagram illustrating an example of a hardware configuration of a computer that implements functions similar to those of the information processing apparatus.

図１４に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０１と、ユーザからのデータの入力を受け付ける入力装置２０２と、ディスプレイ２０３とを有する。また、コンピュータ２００は、外部装置からデータを受信する通信装置２０４と、各種の装置と接続するインタフェース装置２０５とを有する。コンピュータ２００は、各種情報を一時記憶するＲＡＭ２０６と、ハードディスク装置２０７とを有する。そして、各装置２０１～２０７は、バス２０８に接続される。 As shown in FIG. 14, a computer 200 has a CPU 201 that executes various arithmetic processes, an input device 202 that receives data input from a user, and a display 203 . The computer 200 also has a communication device 204 that receives data from an external device, and an interface device 205 that connects with various devices. The computer 200 has a RAM 206 that temporarily stores various information and a hard disk device 207 . Each device 201 - 207 is then connected to a bus 208 .

ハードディスク装置２０７は、検出プログラム２０７ａ、抽出プログラム２０７ｂ、表示制御プログラム２０７ｃを有する。ＣＰＵ２０１は、抽出プログラム２０７ｂ、表示制御プログラム２０７ｃを読み出してＲＡＭ２０６に展開する。 The hard disk device 207 has a detection program 207a, an extraction program 207b, and a display control program 207c. The CPU 201 reads out the extraction program 207b and the display control program 207c and develops them in the RAM 206. FIG.

検出プログラム２０７ａは、検出プロセス２０６ａとして機能する。抽出プログラム２０７ｂは、抽出プロセス２０６ｂとして機能する。表示制御プログラム２０７ｃは、表示制御プロセス２０６ｃとして機能する。 Detection program 207a functions as detection process 206a. Extraction program 207b functions as extraction process 206b. The display control program 207c functions as a display control process 206c.

検出プロセス２０６ａの処理は、検出部１５１の処理に対応する。抽出プロセス２０６ｂの処理は、抽出部１５２の処理に対応する。表示制御プロセス２０６ｃの処理は、表示制御部１５３の処理に対応する。 The processing of the detection process 206 a corresponds to the processing of the detection unit 151 . The processing of the extraction process 206 b corresponds to the processing of the extraction unit 152 . Processing of the display control process 206 c corresponds to processing of the display control unit 153 .

なお、各プログラム２０７ａ～２０７ｃについては、必ずしも最初からハードディスク装置２０７に記憶させておかなくてもよい。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させておく。そして、コンピュータ２００が各プログラム２０７ａ～２０７ｃを読み出して実行するようにしてもよい。 Note that the programs 207a to 207c do not necessarily have to be stored in the hard disk device 207 from the beginning. For example, each program is stored in a “portable physical medium” such as a flexible disk (FD), CD-ROM, DVD disk, magneto-optical disk, IC card, etc. inserted into the computer 200 . Then, the computer 200 may read and execute each program 207a to 207c.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following additional remarks are disclosed regarding the embodiments including the above examples.

（付記１）コンピュータが実行するデータ修正方法であって、
データテーブル内のデータ間の不整合を検出し、
前記不整合が検出された場合に、前記不整合を修正するための複数の修正候補を抽出し、
前記複数の修正候補に対応する修正コストをそれぞれ算出し、
前記複数の修正候補と、前記複数の修正候補に対応する修正コストとを対応付けて表示する
処理を実行することを特徴とするデータ修正方法。 (Appendix 1) A computer-executed data modification method comprising:
detect inconsistencies between data in data tables,
extracting a plurality of correction candidates for correcting the inconsistency when the inconsistency is detected;
calculating a correction cost corresponding to each of the plurality of correction candidates;
A data correction method, comprising: displaying the plurality of correction candidates and correction costs corresponding to the plurality of correction candidates in association with each other.

（付記２）前記複数の修正候補に対応する修正コストのうち、最小の修正コストとなる修正候補を特定する処理を更に実行し、前記表示する処理は、前記最小の修正コストとなる修正候補を表示することを特徴とする付記１に記載のデータ修正方法。 (Supplementary Note 2) A process of identifying a correction candidate with the lowest correction cost among the correction costs corresponding to the plurality of correction candidates is further executed, and the display process includes selecting the correction candidate with the lowest correction cost. The data correction method according to appendix 1, characterized by displaying.

（付記３）前記検出する処理は、前記データテーブルに含まれる用語および標目を基にして、前記データテーブルのレコードを複数の同義グループに分類し、前記複数の同義グループの間の不整合を検出すること特徴とする付記１または２に記載のデータ修正方法。 (Appendix 3) The detecting process classifies the records of the data table into a plurality of synonymous groups based on terms and headings contained in the data table, and detects inconsistencies among the plurality of synonymous groups. The data correction method according to appendix 1 or 2, characterized by:

（付記４）前記抽出する処理は、前記複数の同義グループについて、複数の修正候補をそれぞれ抽出し、前記算出する処理は、複数の同義グループの修正候補の組み合わせについて、修正コストを算出することを特徴とする付記３に記載のデータ修正方法。 (Appendix 4) The extracting process extracts a plurality of correction candidates for the plurality of synonymous groups, and the calculating process calculates a correction cost for a combination of the correction candidates of the plurality of synonymous groups. A data correction method according to appendix 3.

（付記５）コンピュータに、
データテーブル内のデータ間の不整合を検出し、
前記不整合が検出された場合に、前記不整合を修正するための複数の修正候補を抽出し、
前記複数の修正候補に対応する修正コストをそれぞれ算出し、
前記複数の修正候補と、前記複数の修正候補に対応する修正コストとを対応付けて表示する
処理を実行させることを特徴とするデータ修正プログラム。 (Appendix 5) to the computer,
detect inconsistencies between data in data tables,
extracting a plurality of correction candidates for correcting the inconsistency when the inconsistency is detected;
calculating a correction cost corresponding to each of the plurality of correction candidates;
A data correction program for executing a process of displaying the plurality of correction candidates and correction costs corresponding to the plurality of correction candidates in association with each other.

（付記６）前記複数の修正候補に対応する修正コストのうち、最小の修正コストとなる修正候補を特定する処理を更に実行し、前記表示する処理は、前記最小の修正コストとなる修正候補を表示することを特徴とする付記５に記載のデータ修正プログラム。 (Supplementary Note 6) A process of identifying a correction candidate with the lowest correction cost among the correction costs corresponding to the plurality of correction candidates is further executed, and the display process includes selecting the correction candidate with the lowest correction cost. The data correction program according to appendix 5, characterized by displaying.

（付記７）前記検出する処理は、前記データテーブルに含まれる用語および標目を基にして、前記データテーブルのレコードを複数の同義グループに分類し、前記複数の同義グループの間の不整合を検出すること特徴とする付記５または６に記載のデータ修正プログラム。 (Appendix 7) The detecting process classifies the records of the data table into a plurality of synonymous groups based on terms and headings included in the data table, and detects inconsistencies among the plurality of synonymous groups. The data correction program according to appendix 5 or 6, characterized by:

（付記８）前記抽出する処理は、前記複数の同義グループについて、複数の修正候補をそれぞれ抽出し、前記算出する処理は、複数の同義グループの修正候補の組み合わせについて、修正コストを算出することを特徴とする付記７に記載のデータ修正プログラム。 (Appendix 8) The extracting process extracts a plurality of correction candidates for the plurality of synonymous groups, and the calculating process calculates a correction cost for a combination of the correction candidates of the plurality of synonymous groups. A data correction program according to appendix 7, characterized in that:

１００情報処理装置
１１０通信部
１２０入力部
１３０表示部
１４０記憶部
１４１統制語彙データ
１４２グループ内修正候補テーブル
１４３修正候補テーブル
１５０制御部
１５１検出部
１５２抽出部
１５３表示制御部 100 information processing device 110 communication unit 120 input unit 130 display unit 140 storage unit 141 controlled vocabulary data 142 in-group correction candidate table 143 correction candidate table 150 control unit 151 detection unit 152 extraction unit 153 display control unit

Claims

A computer implemented data modification method comprising:
detect inconsistencies between data in data tables,
extracting a plurality of correction candidates for correcting the inconsistency when the inconsistency is detected;
calculating a correction cost corresponding to each of the plurality of correction candidates;
A data correction method, comprising: displaying the plurality of correction candidates and correction costs corresponding to the plurality of correction candidates in association with each other.

Further executing a process of identifying a correction candidate with the lowest correction cost among the correction costs corresponding to the plurality of correction candidates, and the displaying process includes displaying the correction candidate with the lowest correction cost. 2. The data correction method according to claim 1.

The detecting process classifies the records of the data table into a plurality of synonymous groups based on terms and headings contained in the data table, and detects inconsistencies among the plurality of synonymous groups. 3. The data correction method according to claim 1 or 2.

The extracting process extracts a plurality of correction candidates for each of the plurality of synonymous groups, and the calculating process calculates a correction cost for a combination of the correction candidates of the plurality of synonymous groups. Item 3. The data correction method according to item 3.

to the computer,
detect inconsistencies between data in data tables,
extracting a plurality of correction candidates for correcting the inconsistency when the inconsistency is detected;
calculating a correction cost corresponding to each of the plurality of correction candidates;
A data correction program for executing a process of displaying the plurality of correction candidates and correction costs corresponding to the plurality of correction candidates in association with each other.