JP2023149188A

JP2023149188A - Correction support method, correction support program, and information processing apparatus

Info

Publication number: JP2023149188A
Application number: JP2022057621A
Authority: JP
Inventors: 康貴森脇; Yasutaka Moriwaki; 唯野間; Yui Noma
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-13

Abstract

To present a correction candidate on the basis of contents of editing performed in the past by a worker.SOLUTION: An information processing apparatus 100 accepts a data table including a plurality of records. The information processing apparatus 100 specifies a plurality of pieces of correction candidate data being candidates to be corrected, out of a plurality of pieces of data included in the plurality of records on the basis of an analysis result of the accepted data table. The information processing apparatus 100 selects one piece of correction candidate data from the plurality of pieces of specified correction candidate data on the basis of records subjected to data editing in the past out of the plurality of records. The information processing apparatus 100 outputs the selected piece of correction candidate data.SELECTED DRAWING: Figure 2

Description

本発明は、修正支援方法等に関する。 The present invention relates to a modification support method and the like.

各種の分野において、使用される用語が統一されていない場合がある。たとえば、「氏名」および「名前」は同じ意味を表しているが表記が異なるため、システム上では、全く別のデータとして処理され、正確にデータを連携することができない原因になる。 Terminology used in various fields may not be standardized. For example, "name" and "name" have the same meaning but are written differently, so they are treated as completely different data on the system, making it impossible to accurately link the data.

同義語や上位語、下位語を定義した統制語彙データを作成し、利用することで、用語の曖昧性を吸収し、上記のような問題を解消できる。統制語彙データは、用語の曖昧さや同形異義、異形同義によって生じる検索の漏れ等を防ぐために、複数の用語間の意味的関係性をまとめた辞書であり、人手によって作成される。 By creating and using controlled vocabulary data that defines synonyms, hypernyms, and hyponyms, ambiguity in terms can be absorbed and problems such as those described above can be resolved. Controlled vocabulary data is a dictionary that summarizes the semantic relationships between multiple terms, and is created manually to prevent omissions in searches caused by ambiguity, homographs, and synonyms of terms.

図１５は、統制語彙データのデータ構造の一例について説明する図である。一例として、統制語彙データのデータ構造を表形式のフォーマットで説明する。図１５に示すように、統制語彙データ１０は、用語名列１０ａ、代表語列１０ｂ、言語列１０ｃ、代表語のＵＲＩ列１０ｄ、上位語のＵＲＩ列１０ｅを有する。 FIG. 15 is a diagram illustrating an example of the data structure of controlled vocabulary data. As an example, the data structure of controlled vocabulary data will be explained in a tabular format. As shown in FIG. 15, the controlled vocabulary data 10 includes a term name column 10a, a representative word column 10b, a language column 10c, a representative word URI column 10d, and a hypernym URI column 10e.

用語名列１０ａには、特定の分野で利用される用語集の用語名が設定される。代表語列１０ｂには、代表語が設定される。代表語は、複数種類の用語名を代表する名称（標目）である。たとえば、図１５に示す例では、用語名「３Ｄプリンタ」、「３Ｄプリンター」の代表語として「３Ｄプリンタ」が設定されている。 The term name column 10a is set with term names from a glossary used in a specific field. Representative words are set in the representative word string 10b. A representative word is a name (heading) that represents multiple types of term names. For example, in the example shown in FIG. 15, "3D printer" is set as the representative word for the term "3D printer" and "3D printer."

言語列１０ｃには、作業者が、統制語彙データを入力する場合に用いた言語が設定される。たとえば、「ja」は日本語を示し、「en」は英語を示す。図１５に示す例では、統制語彙データ１０の１、２行目の言語が「ja」となっているため、作業者は、統制語彙データ１０の１、２行目のデータを、日本語で入力したことを示す。統制語彙データ１０の３行目の言語が「en」となっているため、作業者は、統制語彙データ１０の３行目のデータを、英語で入力したことを示す。 The language column 10c is set with the language used by the operator when inputting the controlled vocabulary data. For example, "ja" indicates Japanese, and "en" indicates English. In the example shown in FIG. 15, the language in the first and second lines of the controlled vocabulary data 10 is "ja", so the operator can write the data in the first and second lines of the controlled vocabulary data 10 in Japanese. Indicates input. Since the language in the third line of the controlled vocabulary data 10 is "en," this indicates that the operator has input the data in the third line of the controlled vocabulary data 10 in English.

代表語のＵＲＩ列１０ｄには、代表語のＵＲＩ（Uniform Resource Identifier）が設定される。上位語のＵＲＩ列１０ｅには、上位語のＵＲＩが設定される。 The URI (Uniform Resource Identifier) of the representative word is set in the representative word URI column 10d. The URI of a broader term is set in the URI column 10e of a broader term.

以下の説明では、用語名列１０ａの値（用語）および代表語列の値（代表語）が同義関係である用語をまとめたものを、「同義サブグループ」と表記する。図１５に示す例では、１行目～２行目の情報（レコード）が、同義サブグループ１０－ｓｕｂ１に属する。３行目の情報が、同義サブグループ１０－ｓｕｂ２に属する。各同義サブグループは内部で、ユニークな代表語、代表語のＵＲＩ、上位語のＵＲＩを持つものとする。 In the following description, a group of terms whose values (terms) in the term name column 10a and values (representative terms) in the representative word string have a synonymous relationship will be referred to as a "synonymous subgroup." In the example shown in FIG. 15, the information (records) in the first and second lines belong to the synonymous subgroup 10-sub1. The information on the third line belongs to the synonymous subgroup 10-sub2. It is assumed that each synonymous subgroup has a unique representative word, URI of the representative word, and URI of a broader term.

たとえば、同義サブグループ１０－ｓｕｂ１には、ユニークな代表語「３Ｄプリンタ」、ユニークな代表語のＵＲＩ「http://myVocab/1」、ユニークな上位語のＵＲＩ「http://myVocab/24」が設定されている。 For example, synonymous subgroup 10-sub1 includes the unique representative word "3D printer," the unique representative word URI "http://myVocab/1," and the unique hypernym URI "http://myVocab/24." " is set.

また、言語が異なる各同義グループは、同じ代表語のＵＲＩを持つことを許容する。同じ代表語のＵＲＩを持つ。同じ代表語のＵＲＩを持つ各同義サブグループをまとめて、同義グループと表記する。図１５に示す例では、１行目～３行目の情報（レコード）が、同義グループ１０－１に属する。同義グループの内部では、ユニークな上位語のＵＲＩを持つとする。たとえば、同義グループ１０－１には、ユニークな上位語のＵＲＩ「http://myVocab/24」が設定されている。 Further, each synonym group having a different language is allowed to have a URI of the same representative word. They have the same representative word URI. Synonymous subgroups having the same representative word URI are collectively referred to as a synonymous group. In the example shown in FIG. 15, the information (records) in the first to third lines belong to the synonymous group 10-1. It is assumed that a synonym group has a unique URI of a broader term. For example, a unique hypernym URI "http://myVocab/24" is set for the synonym group 10-1.

続いて、同義サブグループおよび同義グループに関する「不整合」について定義する。図１６および図１７は、不整合を説明するための図である。一例として、図１６において、不整合Ａ，Ｂ，Ｃ，Ｄについて説明し、図１７において、不整合Ｅ，Ｆについて説明する。 Next, "inconsistency" regarding synonymous subgroups and synonymous groups will be defined. FIGS. 16 and 17 are diagrams for explaining mismatch. As an example, mismatches A, B, C, and D will be described in FIG. 16, and mismatches E and F will be described in FIG. 17.

図１６について説明する。統制語彙データ１１ａを用いて、「不整合Ａ」について説明する。不整合Ａは、同一の同義サブグループ内において、代表語が２種類以上存在するものである。たとえば、統制語彙データ１１ａにおいて、同一の同義サブグループ１１ａ－ｓｕｂ１には、２種類の代表語「３Ｄプリンター」、「３Ｄプリンタ」が設定されており、不整合Ａに該当する。 FIG. 16 will be explained. “Inconsistency A” will be explained using the controlled vocabulary data 11a. Inconsistency A is when two or more types of representative words exist within the same synonymous subgroup. For example, in the controlled vocabulary data 11a, two types of representative words "3D printer" and "3D printer" are set in the same synonymous subgroup 11a-sub1, which corresponds to inconsistency A.

統制語彙データ１１ｂを用いて、「不整合Ｂ」について説明する。不整合Ｂは、同一の同義サブグループにおいて、代表語のＵＲＩが２種類存在するものである。たとえば、統制語彙データ１１ｂにおいて、同一の同義サブグループ１１ｂ－ｓｕｂ１では、２種類の代表語のＵＲＩ「http://myVocab/1」、「http://myVocab/2」が設定されており、不整合Ｂに該当する。 “Inconsistency B” will be explained using the controlled vocabulary data 11b. Inconsistency B is when there are two types of URIs for representative words in the same synonymous subgroup. For example, in the controlled vocabulary data 11b, two types of representative word URIs "http://myVocab/1" and "http://myVocab/2" are set for the same synonymous subgroup 11b-sub1, This corresponds to inconsistency B.

統制語彙データ１１ｃを用いて、「不整合Ｃ」について説明する。不整合Ｃは、ある同義サブグループに設定された上位語のＵＲＩが２種類以上存在するものである。たとえば、統制語彙データ１１ｃにおいて、同一の同義サブグループ１１ｃ－ｓｕｂ１では、２種類の上位語のＵＲＩ「http://myVocab/25」、「http://myVocab/24」が設定されており、不整合Ｃに該当する。 “Inconsistency C” will be explained using the controlled vocabulary data 11c. Inconsistency C is one in which there are two or more types of URI of a hypernym set in a certain synonymous subgroup. For example, in the controlled vocabulary data 11c, two types of hypernym URIs "http://myVocab/25" and "http://myVocab/24" are set for the same synonymous subgroup 11c-sub1, This corresponds to Inconsistency C.

統制語彙データ１１ｄを用いて、「不整合Ｄ」について説明する。不整合Ｄは、言語が同じ異なる同義サブグループ間において代表語のＵＲＩが同一となるものである。たとえば、統制語彙データ１１ｄにおいて、同義サブグループ１１ｃ－ｓｕｂ１，１１ｃ－ｓｕｂ２の言語は「ja」で同一の言語であり、代表語のＵＲＩが「http://myVocab/1」で同一であり、不整合Ｄに該当する。仮に、同義サブグループ１１ｃ－ｓｕｂ１、同義サブグループ１１ｃ－ｓｕｂ２のどちらか一方の言語が「en」であれば、不整合Ｄに該当しない。 “Inconsistency D” will be explained using the controlled vocabulary data 11d. Inconsistency D is one in which the URI of the representative word is the same between different synonymous subgroups that use the same language. For example, in the controlled vocabulary data 11d, the languages of the synonymous subgroups 11c-sub1 and 11c-sub2 are "ja", which is the same language, and the URIs of the representative words are "http://myVocab/1", which are the same, This corresponds to inconsistency D. If the language of either the synonymous subgroup 11c-sub1 or the synonymous subgroup 11c-sub2 is "en", the mismatch D does not apply.

図１７の説明に移行する。統制語彙データ１１ｅを用いて、「不整合Ｅ」について説明する。不整合Ｅは、同一の同義グループ内において上位語のＵＲＩが２種類以上となるものである。たとえば、統制語彙データ１１ｅにおいて、同義グループ１１ｅ－１では、２種類の上位語のＵＲＩ「http://myVocab/24」、「http://myVocab/25」が設定されており、不整合Ｅに該当する。 Moving on to the description of FIG. 17. “Inconsistency E” will be explained using the controlled vocabulary data 11e. Inconsistency E is one in which there are two or more types of URIs for a broader term within the same synonymous group. For example, in the controlled vocabulary data 11e, two types of hypernym URIs "http://myVocab/24" and "http://myVocab/25" are set in the synonym group 11e-1, and an inconsistent E Applies to.

統制語彙データ１１ｆを用いて、「不整合Ｆ」について説明する。不整合Ｆは、異なる同義グループ間において、代表語のＵＲＩと、上位語のＵＲＩとの上下関係が循環するというものである。統制語彙データ１１ｆには、同義グループ１１ｆ－１，１１ｆ－２，１１ｆ－３が含まれる。たとえば、同義グループ１１ｆ－３において、代表語のＵＲＩは「http://myVocab/24」となり、上位語のＵＲＩは「http://myVcoab/2」となる。同義グループ１１ｆ－２において、代表語のＵＲＩは「http://myVocab/2」となり、上位語のＵＲＩは「http://myVcoab/1」となる。同義グループ１１ｆ－１において、代表語のＵＲＩは「http://myVocab/1」となり、上位語のＵＲＩは「http://myVcoab/24」となる。すなわち、同義グループ１１ｆ－１，１１ｆ－２，１１ｆ－３において、上位語のＵＲＩとの上下関係が循環しており、不整合Ｆに該当する。 “Inconsistency F” will be explained using the controlled vocabulary data 11f. Inconsistency F is that the hierarchical relationship between the URI of a representative word and the URI of a broader term circulates between different synonymous groups. The controlled vocabulary data 11f includes synonymous groups 11f-1, 11f-2, and 11f-3. For example, in the synonym group 11f-3, the URI of the representative word is "http://myVocab/24" and the URI of the broader term is "http://myVcoab/2." In the synonym group 11f-2, the URI of the representative word is "http://myVocab/2" and the URI of the broader term is "http://myVcoab/1." In the synonym group 11f-1, the URI of the representative word is "http://myVocab/1" and the URI of the broader term is "http://myVcoab/24." That is, in the synonymous groups 11f-1, 11f-2, and 11f-3, the hierarchical relationship between the hypernym and the URI is circular, and this corresponds to mismatch F.

上述した不整合を修正するために、統制語彙データの各セルの値を人手で修正する手数のことを、「修正コスト」と表記する。図１８は、修正コストを説明するための図である。 In order to correct the above-mentioned inconsistency, the amount of effort required to manually correct the value of each cell of the controlled vocabulary data is referred to as "correction cost." FIG. 18 is a diagram for explaining modification costs.

統制語彙データ１１ａでは、同一の同義サブグループ１１ａ－ｓｕｂ１において、代表語が２種類以上存在しており、不整合Ａとなる。作業者が、統制語彙データ１１ａの代表語「３Ｄプリンター」を、「３Ｄプリンタ」に修正することで、不整合Ａが解消し、統制語彙データ１２ａとなる。この場合、修正コストは「１」となる。 In the controlled vocabulary data 11a, there are two or more types of representative words in the same synonymous subgroup 11a-sub1, resulting in an inconsistency A. When the operator corrects the representative word "3D printer" in the controlled vocabulary data 11a to "3D printer," the inconsistency A is resolved and the data becomes controlled vocabulary data 12a. In this case, the modification cost is "1".

統制語彙データ１１ｅでは、同一の同義グループ１１ｅ－１において、上位語のＵＲＩが２種類存在しており、不整合Ｅとなる。作業者が、統制語彙データ１１ｅの上位語のＵＲＩ「http://myVocab/24」を「http://myVocab/25」に修正する（２箇所修正する）ことで、不整合Ｅが解消し、統制語彙データ１２ｅとなる。この場合、修正コストは「２」となる。 In the controlled vocabulary data 11e, there are two types of URIs of hypernyms in the same synonym group 11e-1, resulting in a mismatch E. Inconsistency E is resolved by the worker correcting the URI "http://myVocab/24" of the hypernym in the controlled vocabulary data 11e to "http://myVocab/25" (correcting in two places). , the controlled vocabulary data 12e. In this case, the modification cost is "2".

特開２０２０－５２６９０号公報JP2020-52690A

図１８で説明したように、人手によって統制語彙データを修正する際、入力ミスや見落としが発生する場合がある。また、修正作業に時間を要し、統制語彙データの質が低下する場合がある。 As described with reference to FIG. 18, when manually correcting controlled vocabulary data, input errors or oversights may occur. Furthermore, correction work takes time, and the quality of the controlled vocabulary data may deteriorate.

このため、統制語彙データに不整合が含まれる場合、修正候補を提示することが求められている。また、作業者が過去に行った編集内容に基づいて、修正候補を提示することが好ましい。たとえば、作業員が、統制語彙データを日本語で編集している場合、修正候補に対する修正を日本語で行うことができれば、作業者が修正する際の負担を軽減できる。 Therefore, when controlled vocabulary data contains inconsistencies, it is required to present correction candidates. Further, it is preferable that correction candidates be presented based on the contents of edits made in the past by the operator. For example, when a worker is editing controlled vocabulary data in Japanese, if the worker can make corrections to correction candidates in Japanese, the burden on the worker when making corrections can be reduced.

１つの側面では、本発明は、作業者が過去に行った編集内容に基づいて修正候補を提示することができる修正支援方法、修正支援プログラムおよび情報処理装置を提供することを目的とする。 In one aspect, an object of the present invention is to provide a modification support method, a modification support program, and an information processing device that can present modification candidates based on editing contents that have been edited by an operator in the past.

第１の案では、コンピュータが次の処理を実行する。コンピュータは、複数のレコードを含むデータテーブルを受け付ける。コンピュータは、受け付けたデータテーブルの分析結果に基づき、複数のレコードに含まれる複数のデータのうち、修正の対象候補である複数の修正候補データを特定する。コンピュータは、複数のレコードのうち、過去にデータへの編集が行われたレコードに基づき、特定した複数の修正候補データからいずれかの修正候補データを選定する。コンピュータは、選定した修正候補データを出力する。 In the first proposal, the computer performs the following processing. A computer accepts a data table containing multiple records. Based on the analysis result of the received data table, the computer identifies a plurality of correction candidate data that are correction target candidates from among the plurality of data included in the plurality of records. The computer selects any correction candidate data from the specified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records. The computer outputs the selected correction candidate data.

作業者が過去に行った編集内容に基づいて修正候補を提示することができる。 It is possible to present correction candidates based on the contents of edits made by the operator in the past.

図１は、参考技術を説明するための図である。FIG. 1 is a diagram for explaining reference technology. 図２は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. 図３は、統制語彙データのデータ構造の一例を示す図である。FIG. 3 is a diagram showing an example of the data structure of controlled vocabulary data. 図４は、特定部の処理を説明するための図である。FIG. 4 is a diagram for explaining the processing of the specifying section. 図５は、修正候補Ａで修正した場合の統制語彙データの一例を示す図である。FIG. 5 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate A. 図６は、修正候補Ｂで修正した場合の統制語彙データの一例を示す図である。FIG. 6 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate B. 図７は、修正候補Ｃで修正した場合の統制語彙データの一例を示す図である。FIG. 7 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate C. 図８は、修正候補Ｄで修正した場合の統制語彙データの一例を示す図である。FIG. 8 is a diagram showing an example of controlled vocabulary data when modified using modification candidate D. 図９は、修正候補Ｘ１～Ｘ８を示す図である。FIG. 9 is a diagram showing correction candidates X1 to X8. 図１０は、第１選定処理を説明するための図である。FIG. 10 is a diagram for explaining the first selection process. 図１１は、第２選定処理を説明するための図である。FIG. 11 is a diagram for explaining the second selection process. 図１２は、表示制御部が生成する表示画面の一例を示す図である。FIG. 12 is a diagram illustrating an example of a display screen generated by the display control unit. 図１３は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。FIG. 13 is a flowchart showing the processing procedure of the information processing apparatus according to this embodiment. 図１４は、実施例の情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。FIG. 14 is a diagram illustrating an example of the hardware configuration of a computer that implements the same functions as the information processing device of the embodiment. 図１５は、統制語彙データのデータ構造の一例について説明する図である。FIG. 15 is a diagram illustrating an example of the data structure of controlled vocabulary data. 図１６は、不整合を説明するための図（１）である。FIG. 16 is a diagram (1) for explaining mismatch. 図１７は、不整合を説明するための図（２）である。FIG. 17 is a diagram (2) for explaining mismatch. 図１８は、修正コストを説明するための図である。FIG. 18 is a diagram for explaining modification costs.

以下に、本願の開示する修正支援方法、修正支援プログラムおよび情報処理装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Examples of a modification support method, a modification support program, and an information processing apparatus disclosed in the present application will be described in detail below with reference to the drawings. Note that the present invention is not limited to this example.

本実施例に係る情報処理装置の説明を行う前に、統制語彙データに不整合が含まれる場合に修正候補を提示する参考技術について説明する。かかる参考技術は、従来技術でない。図１は、参考技術を説明するための図である。参考技術では、統制語彙データ内の不整合を検出し、不整合に対する修正候補を抽出する。参考技術では、各修正候補の修正コストを算出し、修正コストが最小となる修正候補を優先的に表示する。 Before explaining the information processing apparatus according to the present embodiment, a reference technique for presenting correction candidates when controlled vocabulary data includes an inconsistency will be explained. Such reference technology is not prior art. FIG. 1 is a diagram for explaining reference technology. The reference technique detects inconsistencies in controlled vocabulary data and extracts correction candidates for the inconsistencies. In the reference technique, the modification cost of each modification candidate is calculated, and the modification candidate with the minimum modification cost is displayed preferentially.

たとえば、統制語彙データ１３において、１行目～３行目のレコードが、同義グループである。かかる同義グループには、２種類の上位語のＵＲＩ「http://myVocab/24」、「http://myVocab/25」が存在するため、不整合となっている。 For example, in the controlled vocabulary data 13, the records in the first to third lines are synonymous groups. In this synonymous group, there are two types of hypernym URIs "http://myVocab/24" and "http://myVocab/25", so there is an inconsistency.

統制語彙データ１３の不整合に対する修正候補は、次の修正候補（１）、修正候補（２）となる。 Correction candidates for inconsistencies in the controlled vocabulary data 13 are the following correction candidates (1) and (2).

修正候補（１）は、用語名が「3D printer」となるレコード（３行目のレコード）について、上位語のＵＲＩ「http://myVocab/24」を「http://myVocab/25」に修正するものである。修正候補（１）の修正コストは「１」となる。 Correction candidate (1) changes the hypernym URI “http://myVocab/24” to “http://myVocab/25” for the record whose term name is “3D printer” (record on the third line). It is something to be corrected. The modification cost of modification candidate (1) is "1".

修正候補（２）は、用語名が「３Ｄプリンタ」、「３Ｄプリンター」となるレコード（１、２行目のレコード）について、上位語のＵＲＩ「http://myVocab/25」をそれぞれ「http://myVocab/24」に修正するものである。修正候補（２）の修正コストは「２」となる。 Correction candidate (2) changes the URI of the broader term "http://myVocab/25" to "http://myVocab/25" for the records whose term names are "3D printer" and "3D printer" (records in the 1st and 2nd lines). ://myVocab/24”. The modification cost of modification candidate (2) is "2".

参考技術では、修正候補（２）の修正コストよりも、修正候補（１）の修正コストの方が少ないので、修正候補（１）を修正候補（２）よりも優先的に提示する。 In the reference technique, since the cost of modifying the modification candidate (1) is lower than the cost of modifying the modification candidate (2), the modification candidate (1) is presented with priority over the modification candidate (2).

続いて、参考技術の問題点について説明する。各用語には言語情報があり、作業者が一時的にある特定の言語の用語の情報のみを編集するといった状況が考えられる。しかし、参考技術では、作業者がどの言語の用語について編集しているか、といった観点を考慮せずに、単純に修正コストの低い修正候補から順に提示しているため、作業者の編集内容に基づいた修正候補を順番に提示できていない。 Next, problems with the reference technology will be explained. Each term has language information, and a situation may arise in which an operator temporarily edits only the information for a term in a specific language. However, the reference technology simply presents correction candidates in order of the lowest correction cost, without taking into account the perspective of which language the worker is editing the terms, and therefore Cannot present correction candidates in order.

たとえば、図１で説明したように、参考技術では、統制語彙データ１３に対する修正候補を提示する場合、修正候補（２）の修正コストよりも、修正候補（１）の修正コストの方が少ないので、修正候補（１）を修正候補（２）よりも優先的に提示する。 For example, as explained in FIG. 1, in the reference technology, when presenting correction candidates for the controlled vocabulary data 13, the correction cost of correction candidate (1) is less than the correction cost of correction candidate (2). , the modification candidate (1) is presented preferentially over the modification candidate (2).

ここで、統制語彙データ１３に対して実際に修正を行う作業者が言語「ja」の用語について編集していた場合、作業者の編集内容に基づいた最優先の修正候補は、言語が「ja」となる、１行目、２行目のレコードを修正する修正候補（２）である。一方、参考技術では、修正候補（１）を修正候補（２）よりも優先的に提示しており、作業者の編集内容に基づいた修正候補を順番に提示できていない。 Here, if the operator who actually makes corrections to the controlled vocabulary data 13 edits terms for the language "ja", the highest priority correction candidate based on the operator's editing content is the language "ja". ” is a modification candidate (2) that modifies the records in the first and second lines. On the other hand, in the reference technique, modification candidate (1) is presented preferentially over modification candidate (2), and modification candidates cannot be sequentially presented based on the contents edited by the operator.

続いて、本実施例に係る情報処理装置について説明する。本実施例に係る情報処理装置は、作業者がどの言語の用語について編集しているかという情報を基にして、作業者の編集内容に基づいた修正候補を提示する。 Next, an information processing apparatus according to this embodiment will be explained. The information processing apparatus according to the present embodiment presents correction candidates based on the operator's edited content, based on information about which language the operator is editing terms.

図２は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。図２に示すように、この情報処理装置１００は、通信部１１０と、入力部１２０と、表示部１３０と、記憶部１４０と、制御部１５０とを有する。 FIG. 2 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. As shown in FIG. 2, the information processing device 100 includes a communication section 110, an input section 120, a display section 130, a storage section 140, and a control section 150.

通信部１１０は、ネットワークを介して外部装置から各種のデータを受信する。通信部１１０は、通信装置の一例である。たとえば、通信部１１０は、後述する統制語彙データ１４１を、外部装置から受信してもよい。 The communication unit 110 receives various data from an external device via a network. Communication unit 110 is an example of a communication device. For example, the communication unit 110 may receive controlled vocabulary data 141, which will be described later, from an external device.

入力部１２０は、情報処理装置１００の制御部１５０に各種の情報を入力する入力装置である。入力部１２０は、キーボードやマウス、タッチパネル等に対応する。作業者は、入力部１２０を操作して、統制語彙データ１４１に関するデータを入力してもよい。 The input unit 120 is an input device that inputs various information to the control unit 150 of the information processing device 100. The input unit 120 corresponds to a keyboard, a mouse, a touch panel, etc. The operator may input data regarding the controlled vocabulary data 141 by operating the input unit 120.

表示部１３０は、制御部１５０から出力される情報を表示する表示装置である。たとえば、表示部１３０は、統制語彙データ１４１や、修正候補の情報等を表示する。 The display unit 130 is a display device that displays information output from the control unit 150. For example, the display unit 130 displays controlled vocabulary data 141, information on correction candidates, and the like.

記憶部１４０は、統制語彙データ１４１を有する。記憶部１４０は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子や、ＨＤＤ（Hard Disk Drive）などの記憶装置に対応する。 The storage unit 140 has controlled vocabulary data 141. The storage unit 140 corresponds to a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as an HDD (Hard Disk Drive).

統制語彙データ１４１は、用語の曖昧さや同形異義、異形同義によって生じる検索の漏れ等を防ぐために、複数の用語間の意味的関係性をまとめた辞書のデータである。 The controlled vocabulary data 141 is dictionary data that summarizes the semantic relationships between a plurality of terms in order to prevent omissions in searches caused by ambiguity, homographs, and synonyms of terms.

図３は、統制語彙データのデータ構造の一例を示す図である。図３に示すように、この統制語彙データ１４１は、用語名列１０ａ、代表語列１０ｂ、言語列１０ｃ、代表語のＵＲＩ列１０ｄ、上位語のＵＲＩ列１０ｅを有する。用語名列１０ａ、代表語列１０ｂ、言語列１０ｃ、代表語のＵＲＩ列１０ｄ、上位語のＵＲＩ列１０ｅに関する説明は、図１５で行った説明と同様である。 FIG. 3 is a diagram showing an example of the data structure of controlled vocabulary data. As shown in FIG. 3, the controlled vocabulary data 141 includes a term name column 10a, a representative word column 10b, a language column 10c, a representative word URI column 10d, and a hypernym URI column 10e. The explanation regarding the term name string 10a, representative word string 10b, language string 10c, representative word URI string 10d, and hypernym URI string 10e is the same as that given in FIG. 15.

制御部１５０は、受付部１５１、特定部１５２、選定部１５３、表示制御部１５４を有する。制御部１５０は、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジック等によって実現される。 The control unit 150 includes a reception unit 151, a specification unit 152, a selection unit 153, and a display control unit 154. The control unit 150 is realized by hardwired logic such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), and an FPGA (Field Programmable Gate Array).

受付部１５１は、入力部１２０等から、統制語彙データ１４１を受け付ける。受付部１５１は、統制語彙データ１４１を、記憶部１４０に格納する。受付部１５１は、通信部１１０を介して、外部装置から、統制語彙データ１４１を受け付けてもよい。 The reception unit 151 receives controlled vocabulary data 141 from the input unit 120 or the like. The reception unit 151 stores the controlled vocabulary data 141 in the storage unit 140. The reception unit 151 may receive the controlled vocabulary data 141 from an external device via the communication unit 110.

特定部１５２は、統制語彙データ１４１を基にして、複数の修正候補を特定する。以下において、特定部１５２の各処理について説明する。図４は、特定部の処理を説明するための図である。 The specifying unit 152 specifies a plurality of correction candidates based on the controlled vocabulary data 141. Each process of the specifying unit 152 will be explained below. FIG. 4 is a diagram for explaining the processing of the specifying section.

図４について説明する。まず、特定部１５２は、統制語彙データ１４１を基にして、作業者が編集している言語を推定する。たとえば、特定部１５２は、言語列１０ｃの値を走査し、頻度が最大となる言語を、作業者が編集している言語として推定する。図４に示す例では、最も頻度の高い言語が「ja」であるため、特定部１５２は、作業者が編集している言語を「ja」と推定する。以下の説明では、特定部１５２が推定した言語であって、作業者が編集している言語を「編集言語」と表記する。 FIG. 4 will be explained. First, the specifying unit 152 estimates the language in which the operator is editing based on the controlled vocabulary data 141. For example, the specifying unit 152 scans the values of the language string 10c and estimates the language with the highest frequency as the language being edited by the operator. In the example shown in FIG. 4, since the most frequently used language is "ja", the identification unit 152 estimates that the language being edited by the operator is "ja". In the following description, the language estimated by the specifying unit 152 and edited by the operator will be referred to as an "editing language."

続いて、特定部１５２は、統制語彙データ１４１の用語名列１０ａの用語名と、代表語列１０ｂの代表語とを基にして、統制語彙データ１４１のレコードを、複数の同義サブグループに分類する。 Next, the specifying unit 152 classifies the records of the controlled vocabulary data 141 into a plurality of synonymous subgroups based on the term names in the term name column 10a and the representative words in the representative word column 10b of the controlled vocabulary data 141. do.

たとえば、特定部１５２は、統制語彙データ１４１の２行目において用語名「３次元プリンタ」と、代表語「３Ｄプリンタ」との連結成分を検出する。これにより、特定部１５２は、用語名または代表語に、「３次元プリンタ」、「３Ｄプリンタ」が設定された１行目～３行目のレコードを、同一の同義サブグループ５０ａに設定する。 For example, the specifying unit 152 detects a connected component between the term name "3D printer" and the representative word "3D printer" in the second line of the controlled vocabulary data 141. As a result, the specifying unit 152 sets the records in the first to third lines in which "3D printer" and "3D printer" are set as term names or representative words to the same synonymous subgroup 50a.

特定部１５２は、統制語彙データ１４１の４行目～６行目のレコードに含まれる用語名、代表語には、他のレコードの用語名、または、代表語と共通しないため、各レコードを異なる同義サブグループに設定する。たとえば、特定部１５２は、統制語彙データ１４１の４行目のレコードを、同義サブグループ５０ｂに設定する。特定部１５２は、統制語彙データ１４１の５行目のレコードを、同義サブグループ５０ｃに設定する。特定部１５２は、統制語彙データ１４１の６行目のレコードを、同義サブグループ５０ｄに設定する。 The identification unit 152 identifies each record as different because the term names and representative words included in the records in the 4th to 6th lines of the controlled vocabulary data 141 are not common to the term names or representative words of other records. Set to synonymous subgroup. For example, the specifying unit 152 sets the record in the fourth line of the controlled vocabulary data 141 to the synonymous subgroup 50b. The specifying unit 152 sets the record in the fifth line of the controlled vocabulary data 141 to the synonymous subgroup 50c. The specifying unit 152 sets the record in the sixth line of the controlled vocabulary data 141 to the synonymous subgroup 50d.

続いて、特定部１５２は、同義サブグループ毎に、修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義サブグループ５０ａのレコードを基にして、修正候補（ａ）および修正候補（ｂ）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs for each synonymous subgroup. For example, the specifying unit 152 specifies a correction candidate (a) and a correction candidate (b) based on the records of the synonymous subgroup 50a.

修正候補（ａ）は、同義サブグループ５０ａの代表語を「３Ｄプリンタ」に統一する修正である。修正候補（ａ）の修正コストは「２」となる。 Candidate modification (a) is a modification that unifies the representative words of the synonymous subgroup 50a to "3D printer." The modification cost of modification candidate (a) is "2".

修正候補（ｂ）は、同義サブグループ５０ａの代表語を「３Ｄプリンタ」に統一する修正である。修正候補（ｂ）の修正コストは「１」となる。 Correction candidate (b) is a correction that unifies the representative words of the synonymous subgroup 50a to "3D printer." The modification cost of modification candidate (b) is "1".

続いて、特定部１５２は、異なる同義サブグループ間の修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義サブグループ５０ａ，５０ｂ，５０ｃ，５０ｄをそれぞれ比較すると、同義サブグループ５０ａと、同義サブグループ５０ｃとの間で、修正候補（ｃ）および修正候補（ｄ）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs between different synonymous subgroups. For example, when comparing the synonymous subgroups 50a, 50b, 50c, and 50d, the identification unit 152 identifies a modification candidate (c) and a modification candidate (d) between the synonymous subgroup 50a and the synonymous subgroup 50c. do.

修正候補（ｃ）は、同義サブグループ５０ａの代表語のＵＲＩを「http://myVocab/1」以外の値に統一する修正である。修正候補（ｃ）の修正コストは「３」となる。 Correction candidate (c) is a correction that unifies the URI of the representative word of the synonymous subgroup 50a to a value other than "http://myVocab/1". The modification cost of modification candidate (c) is "3".

修正候補（ｄ）は、同義サブグループ５０ｃの代表語のＵＲＩを「http://myVocab/1」以外の値に統一する修正である。修正候補（ｄ）の修正コストは「１」となる。 Correction candidate (d) is a correction that unifies the URI of the representative word of the synonymous subgroup 50c to a value other than "http://myVocab/1". The modification cost of modification candidate (d) is "1".

以上より、特定部１５２によって特定される、各同義サブグループ内、同義サブグループ間の修正候補は、以下の修正候補Ａ、修正候補Ｂ、修正候補Ｃ、修正候補Ｄとなる。 From the above, the modification candidates within each synonymous subgroup and between the synonymous subgroups identified by the specifying unit 152 are the following modification candidate A, modification candidate B, modification candidate C, and modification candidate D.

修正候補Ａは、修正候補（ａ）と修正候補（ｃ）とを行う修正である。修正候補Ａの修正コストは、修正候補（ａ）の修正コスト「２」と、修正候補（ｃ）の修正コスト「３」とを合計した修正コスト「５」となる。 Modification candidate A is a modification that performs modification candidate (a) and modification candidate (c). The modification cost of modification candidate A is "5", which is the sum of the modification cost "2" of modification candidate (a) and the modification cost "3" of modification candidate (c).

修正候補Ｂは、修正候補（ａ）と修正候補（ｄ）とを行う修正である。修正候補Ｂの修正コストは、修正候補（ａ）の修正コスト「２」と、修正候補（ｄ）の修正コスト「１」とを合計した修正コスト「３」となる。 Modification candidate B is a modification that performs modification candidate (a) and modification candidate (d). The modification cost of modification candidate B is "3", which is the sum of the modification cost "2" of modification candidate (a) and the modification cost "1" of modification candidate (d).

修正候補Ｃは、修正候補（ｂ）と修正候補（ｃ）とを行う修正である。修正候補Ｃの修正コストは、修正候補（ｂ）の修正コスト「１」と、修正候補（ｃ）の修正コスト「３」とを合計した修正コスト「４」となる。 Modification candidate C is a modification that performs modification candidate (b) and modification candidate (c). The modification cost of modification candidate C is "4", which is the sum of the modification cost "1" of modification candidate (b) and the modification cost "3" of modification candidate (c).

修正候補Ｄは、修正候補（ｂ）と修正候補（ｄ）とを行う修正である。修正候補Ｄの修正コストは、修正候補（ｂ）の修正コスト「１」と、修正候補（ｄ）の修正コスト「１」とを合計した修正コスト「２」となる。 Modification candidate D is a modification that performs modification candidate (b) and modification candidate (d). The modification cost of the modification candidate D is "2", which is the sum of the modification cost "1" of the modification candidate (b) and the modification cost "1" of the modification candidate (d).

以降の説明では、修正候補Ａ、Ｂ、Ｃ、Ｄでそれぞれ修正した場合の統制語彙データについて説明する。 In the following explanation, the controlled vocabulary data will be explained when each of the correction candidates A, B, C, and D is corrected.

図５は、修正候補Ａで修正した場合の統制語彙データの一例を示す図である。図５に示す統制語彙データ１４１Ａは、図４に示した統制語彙データ１４１に対して、修正候補Ａで修正した結果である。 FIG. 5 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate A. Controlled vocabulary data 141A shown in FIG. 5 is the result of correcting the controlled vocabulary data 141 shown in FIG. 4 using correction candidate A.

特定部１５２は、統制語彙データ１４１Ａの代表語のＵＲＩを基にして、同じ代表語のＵＲＩを持つ各同義サブグループをまとめることで、複数の同義グループ５１ａ，５１ｂ，５１ｃに分類する。 The specifying unit 152 classifies synonymous subgroups having the same representative word URI into a plurality of synonymous groups 51a, 51b, and 51c based on the URI of the representative word in the controlled vocabulary data 141A.

続いて、特定部１５２は、同義グループ毎に、修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義グループ５１ｂのレコードを基にして、修正候補（ａ１）および修正候補（ｂ１）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs for each synonymous group. For example, the identifying unit 152 identifies the modification candidate (a1) and the modification candidate (b1) based on the records of the synonymous group 51b.

修正候補（ａ１）は、同義グループ５１ｂの上位語のＵＲＩを「http://myVocab/24」に統一する修正である。修正候補（ａ１）の修正コストは「１」となる。修正候補（ａ１）は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 The modification candidate (a1) is a modification that unifies the URI of the hypernym of the synonym group 51b to "http://myVocab/24". The modification cost of the modification candidate (a1) is "1". The modification candidate (a1) is a modification candidate that includes a record of the editing language "ja" as a modification target.

修正候補（ｂ１）は、同義グループ５１ｂの上位語のＵＲＩを「http://myVocab/10」に統一する修正である。修正候補（ｂ１）の修正コストは「１」となる。 The modification candidate (b1) is a modification that unifies the URI of the hypernym of the synonym group 51b to "http://myVocab/10". The modification cost of the modification candidate (b1) is "1".

続いて、特定部１５２は、異なる同義グループ間の修正候補と修正コストとを特定する。たとえば、特定部１５２は、上記の修正候補（ｂ１）の修正を行うと、同義グループ５１ｂと、同義グループ５１ｃとの間に不整合が発生するため、修正候補（ｃ１）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs between different synonymous groups. For example, the identifying unit 152 specifies the modification candidate (c1) because if the modification candidate (b1) is modified, a mismatch will occur between the synonymous group 51b and the synonymous group 51c.

修正候補（ｃ１）は、同義グループ５１ｃの上位語のＵＲＩを「http://myVocab/1」以外の値に統一する修正である。修正候補（ｃ１）の修正コストは「１」となる。 The modification candidate (c1) is a modification that unifies the URI of the hypernym of the synonym group 51c to a value other than "http://myVocab/1". The modification cost of the modification candidate (c1) is "1".

以上により、特定部１５２が修正候補Ａで修正した場合の、各同義グループ内、同義グループ間の修正候補は、以下の修正候補Ａ－１、修正候補Ａ－２となる。 As described above, when the specifying unit 152 makes a correction using the correction candidate A, the correction candidates within each synonymous group and between synonymous groups are the following correction candidates A-1 and A-2.

修正候補Ａ－１は、修正候補（ａ１）を行う修正である。修正候補Ａ－１の修正コストは「１」となる。修正候補Ａ－１は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate A-1 is a modification that performs modification candidate (a1). The modification cost of modification candidate A-1 is "1". Correction candidate A-1 is a correction candidate that includes a record of the editing language "ja" as a correction target.

修正候補Ａ－２は、修正候補（ｂ１）と修正候補（ｃ１）とを行う修正である。修正候補Ａ－２の修正コストは、修正候補（ｂ１）の修正コスト「１」と、修正候補（ｃ１）の修正コスト「１」とを合計した修正コスト「２」となる。 Modification candidate A-2 is a modification that performs modification candidate (b1) and modification candidate (c1). The modification cost of modification candidate A-2 is "2", which is the sum of the modification cost "1" of modification candidate (b1) and the modification cost "1" of modification candidate (c1).

図６は、修正候補Ｂで修正した場合の統制語彙データの一例を示す図である。図６に示す統制語彙データ１４１Ｂは、図４に示した統制語彙データ１４１に対して、修正候補Ｂで修正した結果である。 FIG. 6 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate B. Controlled vocabulary data 141B shown in FIG. 6 is the result of correcting the controlled vocabulary data 141 shown in FIG. 4 using correction candidate B.

特定部１５２は、統制語彙データ１４１Ｂの代表語のＵＲＩを基にして、同じ代表語のＵＲＩを持つ各同義サブグループをまとめることで、複数の同義グループ５２ａ，５２ｂ，５２ｃに分類する。 The specifying unit 152 classifies synonymous subgroups having the same representative word URI into a plurality of synonymous groups 52a, 52b, and 52c based on the URI of the representative word in the controlled vocabulary data 141B.

続いて、特定部１５２は、同義グループ毎に、修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義グループ５２ａのレコードを基にして、修正候補（ａ２）および修正候補（ｂ２）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs for each synonymous group. For example, the identifying unit 152 identifies a modification candidate (a2) and a modification candidate (b2) based on the records of the synonymous group 52a.

修正候補（ａ２）は、同義グループ５２ａの上位語のＵＲＩを「http://myVocab/25」に統一する修正である。修正候補（ａ２）の修正コストは「１」となる。 The modification candidate (a2) is a modification that unifies the URI of the hypernym of the synonym group 52a to "http://myVocab/25". The modification cost of the modification candidate (a2) is "1".

修正候補（ｂ２）は、同義グループ５２ａの上位語のＵＲＩを「http://myVocab/24」に統一する修正である。修正候補（ｂ２）の修正コストは「３」となる。修正候補（ｂ２）は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 The modification candidate (b2) is a modification that unifies the URI of the hypernym of the synonym group 52a to "http://myVocab/24". The modification cost of the modification candidate (b2) is "3". The modification candidate (b2) is a modification candidate that includes a record of the editing language "ja" as a modification target.

続いて、特定部１５２は、異なる同義グループ間の修正候補と修正コストとを特定する。なお、同義グループ間で、不整合は存在しないため、特定部１５２は、係る処理をスキップする。 Subsequently, the identification unit 152 identifies modification candidates and modification costs between different synonymous groups. Note that since there is no inconsistency between the synonymous groups, the identifying unit 152 skips this process.

以上により、特定部１５２が修正候補Ｂで修正した場合の、各同義グループ内、同義グループ間の修正候補は、以下の修正候補Ｂ－１、修正候補Ｂ－２となる。 As described above, when the specifying unit 152 makes a correction using the correction candidate B, the correction candidates within each synonymous group and between synonymous groups are the following correction candidate B-1 and correction candidate B-2.

修正候補Ｂ－１は、修正候補（ａ２）を行う修正である。修正候補Ｂ－１の修正コストは「１」となる。 Modification candidate B-1 is a modification that performs modification candidate (a2). The modification cost of modification candidate B-1 is "1".

修正候補Ｂ－２は、修正候補（ｂ２）を行う修正である。修正候補Ｂ－２の修正コストは「３」となる。修正候補Ｂ－２は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate B-2 is a modification that performs modification candidate (b2). The modification cost of modification candidate B-2 is "3". Correction candidate B-2 is a correction candidate that includes a record of the editing language "ja" as a correction target.

図７は、修正候補Ｃで修正した場合の統制語彙データの一例を示す図である。図７に示す統制語彙データ１４１Ｃは、図４に示した統制語彙データ１４１に対して、修正候補Ｃで修正した結果である。 FIG. 7 is a diagram showing an example of controlled vocabulary data when corrected using correction candidate C. Controlled vocabulary data 141C shown in FIG. 7 is the result of correcting the controlled vocabulary data 141 shown in FIG. 4 using correction candidate C.

特定部１５２は、統制語彙データ１４１Ｃの代表語のＵＲＩを基にして、同じ代表語のＵＲＩを持つ各同義サブグループをまとめることで、複数の同義グループ５３ａ，５３ｂ，５３ｃに分類する。 The specifying unit 152 classifies synonymous subgroups having the same representative word URI into a plurality of synonymous groups 53a, 53b, and 53c based on the URI of the representative word in the controlled vocabulary data 141C.

続いて、特定部１５２は、同義グループ毎に、修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義グループ５３ｂのレコードを基にして、修正候補（ａ３）および修正候補（ｂ３）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs for each synonymous group. For example, the identifying unit 152 identifies a modification candidate (a3) and a modification candidate (b3) based on the records of the synonymous group 53b.

修正候補（ａ３）は、同義グループ５３ｂの上位語のＵＲＩを「http://myVocab/24」に統一する修正である。修正候補（ａ３）の修正コストは「１」となる。修正候補（ａ３）は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 The modification candidate (a3) is a modification that unifies the URI of the hypernym of the synonym group 53b to "http://myVocab/24". The modification cost of the modification candidate (a3) is "1". The modification candidate (a3) is a modification candidate that includes a record of the editing language "ja" as a modification target.

修正候補（ｂ３）は、同義グループ５３ｂの上位語のＵＲＩを「http://myVocab/10」に統一する修正である。修正候補（ｂ３）の修正コストは「１」となる。 The modification candidate (b3) is a modification that unifies the URI of the hypernym of the synonym group 53b to "http://myVocab/10". The modification cost of the modification candidate (b3) is "1".

続いて、特定部１５２は、異なる同義グループ間の修正候補と修正コストとを特定する。たとえば、特定部１５２は、上記の修正候補（ｂ３）の修正を行うと、同義グループ５３ｂと、同義グループ５３ｃとの間に不整合が発生するため、修正候補（ｃ３）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs between different synonymous groups. For example, the specifying unit 152 specifies the correction candidate (c3) because if the above correction candidate (b3) is corrected, a mismatch will occur between the synonymous group 53b and the synonymous group 53c.

修正候補（ｃ３）は、同義グループ５３ｃの上位語のＵＲＩを「http://myVocab/1」以外の値に統一する修正である。修正候補（ｃ３）の修正コストは「１」となる。 The modification candidate (c3) is a modification that unifies the URI of the hypernym of the synonym group 53c to a value other than "http://myVocab/1". The modification cost of the modification candidate (c3) is "1".

以上により、特定部１５２が修正候補Ｃで修正した場合の、各同義グループ内、同義グループ間の修正候補は、以下の修正候補Ｃ－１、修正候補Ｃ－２となる。 As described above, when the specifying unit 152 makes a correction using the correction candidate C, the correction candidates within each synonymous group and between synonymous groups are the following correction candidates C-1 and C-2.

修正候補Ｃ－１は、修正候補（ａ３）を行う修正である。修正候補Ｃ－１の修正コストは「１」となる。修正候補Ｃ－１は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Correction candidate C-1 is a correction that performs correction candidate (a3). The modification cost of modification candidate C-1 is "1". Correction candidate C-1 is a correction candidate that includes a record of the editing language "ja" as a correction target.

修正候補Ｃ－２は、修正候補（ｂ３）と修正候補（ｃ３）とを行う修正である。修正候補Ｃ－２の修正コストは、修正候補（ｂＣ）の修正コスト「１」と、修正候補（ｃ３）の修正コスト「１」とを合計した修正コスト「２」となる。 Modification candidate C-2 is a modification that performs modification candidate (b3) and modification candidate (c3). The modification cost of modification candidate C-2 is "2", which is the sum of the modification cost "1" of modification candidate (bC) and the modification cost "1" of modification candidate (c3).

図８は、修正候補Ｄで修正した場合の統制語彙データの一例を示す図である。図８に示す統制語彙データ１４１Ｂは、図４に示した統制語彙データ１４１に対して、修正候補Ｄで修正した結果である。 FIG. 8 is a diagram showing an example of controlled vocabulary data when modified using modification candidate D. Controlled vocabulary data 141B shown in FIG. 8 is the result of correcting the controlled vocabulary data 141 shown in FIG. 4 using correction candidate D.

特定部１５２は、統制語彙データ１４１Ｄの代表語のＵＲＩを基にして、同じ代表語のＵＲＩを持つ各同義サブグループをまとめることで、複数の同義グループ５４ａ，５４ｂ，５４ｃに分類する。 The specifying unit 152 classifies synonymous subgroups having the same representative word URI into a plurality of synonymous groups 54a, 54b, and 54c based on the URI of the representative word in the controlled vocabulary data 141D.

続いて、特定部１５２は、同義グループ毎に、修正候補と修正コストとを特定する。たとえば、特定部１５２は、同義グループ５３ａのレコードを基にして、修正候補（ａ４）および修正候補（ｂ４）を特定する。 Subsequently, the identification unit 152 identifies modification candidates and modification costs for each synonymous group. For example, the identifying unit 152 identifies a modification candidate (a4) and a modification candidate (b4) based on the records of the synonymous group 53a.

修正候補（ａ４）は、同義グループ５４ａの上位語のＵＲＩを「http://myVocab/25」に統一する修正である。修正候補（ａ４）の修正コストは「１」となる。 The modification candidate (a4) is a modification that unifies the URI of the hypernym of the synonym group 54a to "http://myVocab/25". The modification cost of the modification candidate (a4) is "1".

修正候補（ｂ４）は、同義グループ５４ａの上位語のＵＲＩを「http://myVocab/24」に統一する修正である。修正候補（ｂ４）の修正コストは「３」となる。修正候補（ｂ４）は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 The modification candidate (b4) is a modification that unifies the URI of the hypernym of the synonym group 54a to "http://myVocab/24". The modification cost of the modification candidate (b4) is "3". The modification candidate (b4) is a modification candidate that includes a record of the editing language "ja" as a modification target.

以上により、特定部１５２が修正候補Ｄで修正した場合の、各同義グループ内、同義グループ間の修正候補は、以下の修正候補Ｄ－１、修正候補Ｄ－２となる。 As described above, when the specifying unit 152 makes a correction using the correction candidate D, the correction candidates within each synonymous group and between synonymous groups are the following correction candidates D-1 and D-2.

修正候補Ｄ－１は、修正候補（ａ４）を行う修正である。修正候補Ｄ－１の修正コストは「１」となる。 Modification candidate D-1 is a modification that performs modification candidate (a4). The modification cost of modification candidate D-1 is "1".

修正候補Ｄ－２は、修正候補（ｂ４）を行う修正である。修正候補Ｄ－２の修正コストは「３」となる。修正候補Ｄ－２は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate D-2 is a modification that performs modification candidate (b4). The modification cost of modification candidate D-2 is "3". Correction candidate D-2 is a correction candidate that includes a record of the editing language "ja" as a correction target.

特定部１５２は、図４～図８で説明した処理を実行することで、全修正候補と、修正コストとを特定する。たとえば、全修正候補には、次に説明する修正候補Ｘ１，Ｘ２，Ｘ３，Ｘ４，Ｘ５，Ｘ６，Ｘ７，Ｘ８が含まれる。図９は、修正候補Ｘ１～Ｘ８を示す図である。 The specifying unit 152 specifies all correction candidates and correction costs by executing the processes described in FIGS. 4 to 8. For example, all correction candidates include correction candidates X1, X2, X3, X4, X5, X6, X7, and X8, which will be described next. FIG. 9 is a diagram showing correction candidates X1 to X8.

図９に示すように、修正候補Ｘ１は、修正候補Ａと、修正候補Ａ－１とを行う修正である。修正候補Ａの修正コストは「５」、修正候補Ａ－１の修正コストは「１」であり、修正候補Ｘ１の修正コストは「６」となる。修正候補Ａ－１は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 As shown in FIG. 9, modification candidate X1 is a modification that performs modification candidate A and modification candidate A-1. The modification cost of modification candidate A is "5", the modification cost of modification candidate A-1 is "1", and the modification cost of modification candidate X1 is "6". Correction candidate A-1 is a correction candidate that includes a record of the editing language "ja" as a correction target.

修正候補Ｘ２は、修正候補Ａと、修正候補Ａ－２とを行う修正である。修正候補Ａの修正コストは「５」、修正候補Ａ－２の修正コストは「２」であり、修正候補Ｘ２の修正コストは「７」となる。 Modification candidate X2 is a modification that performs modification candidate A and modification candidate A-2. The modification cost of modification candidate A is "5", the modification cost of modification candidate A-2 is "2", and the modification cost of modification candidate X2 is "7".

修正候補Ｘ３は、修正候補Ｂと、修正候補Ｂ－１とを行う修正である。修正候補Ｂの修正コストは「３」、修正候補Ｂ－１の修正コストは「１」であり、修正候補Ｘ３の修正コストは「４」となる。 Modification candidate X3 is a modification that performs modification candidate B and modification candidate B-1. The modification cost of modification candidate B is "3", the modification cost of modification candidate B-1 is "1", and the modification cost of modification candidate X3 is "4".

修正候補Ｘ４は、修正候補Ｂと、修正候補Ｂ－２とを行う修正である。修正候補Ｂの修正コストは「３」、修正候補Ｂ－２の修正コストは「３」であり、修正候補Ｘ４の修正コストは「６」となる。修正候補Ｂ－２は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate X4 is a modification that performs modification candidate B and modification candidate B-2. The modification cost of modification candidate B is "3", the modification cost of modification candidate B-2 is "3", and the modification cost of modification candidate X4 is "6". Correction candidate B-2 is a correction candidate that includes a record of the editing language "ja" as a correction target.

修正候補Ｘ５は、修正候補Ｃと、修正候補Ｃ－１とを行う修正である。修正候補Ｃの修正コストは「３」、修正候補Ｃ－１の修正コストは「１」であり、修正候補Ｘ５の修正コストは「４」となる。修正候補Ｃ－１は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate X5 is a modification that performs modification candidate C and modification candidate C-1. The modification cost of modification candidate C is "3", the modification cost of modification candidate C-1 is "1", and the modification cost of modification candidate X5 is "4". Correction candidate C-1 is a correction candidate that includes a record of the editing language "ja" as a correction target.

修正候補Ｘ６は、修正候補Ｃと、修正候補Ｃ－２とを行う修正である。修正候補Ｃの修正コストは「３」、修正候補Ｃ－２の修正コストは「２」であり、修正候補Ｘ６の修正コストは「５」となる。 Modification candidate X6 is a modification that performs modification candidate C and modification candidate C-2. The modification cost of modification candidate C is "3", the modification cost of modification candidate C-2 is "2", and the modification cost of modification candidate X6 is "5".

修正候補Ｘ７は、修正候補Ｄと、修正候補Ｄ－１とを行う修正である。修正候補Ｄの修正コストは「２」、修正候補Ｄ－１の修正コストは「１」であり、修正候補Ｘ７の修正コストは「３」となる。 Modification candidate X7 is a modification that performs modification candidate D and modification candidate D-1. The modification cost of modification candidate D is "2", the modification cost of modification candidate D-1 is "1", and the modification cost of modification candidate X7 is "3".

修正候補Ｘ８は、修正候補Ｄと、修正候補Ｄ－２とを行う修正である。修正候補Ｄの修正コストは「２」、修正候補Ｄ－２の修正コストは「３」であり、修正候補Ｘ８の修正コストは「５」となる。修正候補Ｄ－２は、修正対象に、編集言語「ja」のレコードを含む修正候補である。 Modification candidate X8 is a modification that performs modification candidate D and modification candidate D-2. The modification cost of modification candidate D is "2", the modification cost of modification candidate D-2 is "3", and the modification cost of modification candidate X8 is "5". Correction candidate D-2 is a correction candidate that includes a record of the editing language "ja" as a correction target.

特定部１５２は、図９で説明した修正候補Ｘ１～Ｘ８の情報を、選定部１５３に出力する。以下の説明では、修正候補Ｘ１～Ｘ８の情報をまとめて「修正候補情報」と表記する。修正候補情報には、修正候補毎に、修正内容と、修正コストと、修正対象に、編集言語「ja」のレコードを含む修正候補であるか否かの情報とが設定されているものとする。 The specifying unit 152 outputs information about the correction candidates X1 to X8 described in FIG. 9 to the selecting unit 153. In the following description, the information on the correction candidates X1 to X8 will be collectively referred to as "correction candidate information." It is assumed that the modification candidate information includes, for each modification candidate, the modification content, modification cost, and information as to whether or not the modification target includes a record of the editing language "ja". .

選定部１５３は、修正候補情報を基にして、修正候補Ｘ１～Ｘ８から、表示対象とする修正候補を選定する。たとえば、選定部１５３は、第１選定処理、または、第２選定処理のうち、いずれか一方の選定処理を実行する。いずれの選定処理を実行するかは、予め設定されているものとする。 The selection unit 153 selects a modification candidate to be displayed from among the modification candidates X1 to X8 based on the modification candidate information. For example, the selection unit 153 executes either the first selection process or the second selection process. It is assumed that which selection process is to be executed is set in advance.

まず、第１選定処理について説明する。図１０は、第１選定処理を説明するための図である。選定部１５３は、修正候補情報を参照し、修正候補Ｘ１～Ｘ８のうち、編集言語「ja」のレコードを修正対象とする修正候補を有する修正候補のグループＧ１と、編集言語「ja」のレコードを修正対象とする修正候補を有さない修正候補のグループＧ２とに分類する。 First, the first selection process will be explained. FIG. 10 is a diagram for explaining the first selection process. The selection unit 153 refers to the correction candidate information and selects among the correction candidates X1 to X8, a correction candidate group G1 having correction candidates that target records with the editing language "ja" and records with the editing language "ja". is classified into a group G2 of correction candidates that does not have any correction candidates to be corrected.

図１０に示すように、グループＧ１には、修正候補Ｘ１，Ｘ４，Ｘ５，Ｘ８が含まれる。選定部１５３は、グループＧ１の修正候補Ｘ１，Ｘ４，Ｘ５，Ｘ８を、修正コストの小さい順にソートする。修正コストが同一の修正候補については、どちらを先にしてもよい。たとえば、ソートされた結果、グループＧ１の修正候補Ｘ１，Ｘ４，Ｘ５，Ｘ８の並び順は、先頭から、修正候補Ｘ５，Ｘ８，Ｘ１，Ｘ４となる。選定部１５３は、修正コストの小さい修正候補を優先的に選定する。選定部１５３は、先頭からＮ個の修正候補を選定してもよい。Ｎは予め設定される自然数である。 As shown in FIG. 10, group G1 includes correction candidates X1, X4, X5, and X8. The selection unit 153 sorts the modification candidates X1, X4, X5, and X8 of the group G1 in descending order of modification cost. As for correction candidates with the same correction cost, either one may be selected first. For example, as a result of the sorting, the modification candidates X1, X4, X5, and X8 of the group G1 are arranged in order from the top to the modification candidates X5, X8, X1, and X4. The selection unit 153 preferentially selects correction candidates with low correction costs. The selection unit 153 may select N correction candidates from the beginning. N is a preset natural number.

グループＧ２には、修正候補Ｘ２，Ｘ３，Ｘ６，Ｘ７が含まれる。選定部１５３は、グループＧ２の修正候補Ｘ２，Ｘ３，Ｘ６，Ｘ７を、修正コストの小さい順にソートする。修正コストが同一の修正候補については、どちらを先にしてもよい。たとえば、ソートされた結果、グループＧ２の修正候補Ｘ２，Ｘ３，Ｘ６，Ｘ７の並び順は、先頭から、修正候補Ｘ７，Ｘ３，Ｘ６，Ｘ２となる。選定部１５３は、修正コストの小さい修正候補を優先的に選定する。選定部１５３は、先頭からＮ個の修正候補を選定してもよい。 Group G2 includes correction candidates X2, X3, X6, and X7. The selection unit 153 sorts the modification candidates X2, X3, X6, and X7 of the group G2 in descending order of modification cost. As for correction candidates with the same correction cost, either one may be selected first. For example, as a result of the sorting, the modification candidates X2, X3, X6, and X7 of the group G2 are arranged in the order of modification candidates X7, X3, X6, and X2 from the top. The selection unit 153 preferentially selects correction candidates with low correction costs. The selection unit 153 may select N correction candidates from the beginning.

選定部１５３は、選定した修正候補の情報を、表示制御部１５４に出力する。選定した修正候補の情報には、修正内容と、修正コストと、ソートした際の順番とが含まれる。 The selection unit 153 outputs information on the selected correction candidates to the display control unit 154. The information on the selected modification candidates includes the modification details, modification cost, and sorting order.

続いて、第２選定処理について説明する。図１１は、第２選定処理を説明するための図である。選定部１５３は、修正候補情報を参照し、修正候補Ｘ１～Ｘ８のうち、編集言語「ja」のレコードを修正対象とする修正候補を有する修正候補の修正コストを、所定の重みによって修正する。本実施例では、所定の重みを「０．６」とする。たとえば、所定の重みは、０より大きく、１未満となるように、予め設定されているものとする。 Next, the second selection process will be explained. FIG. 11 is a diagram for explaining the second selection process. The selection unit 153 refers to the modification candidate information and modifies the modification cost of a modification candidate among the modification candidates X1 to X8 that has a modification candidate that targets a record with the editing language "ja", using a predetermined weight. In this embodiment, the predetermined weight is "0.6". For example, it is assumed that the predetermined weight is set in advance to be greater than 0 and less than 1.

図９で説明したように、修正候補Ｘ１～Ｘ８のうち、編集言語「ja」のレコードを修正対象とする修正候補を有する修正候補は、修正候補Ｘ１，Ｘ４，Ｘ５，Ｘ８となる。このため、図１１に示すように、選定部１５３は、修正候補Ｘ１の修正コスト「６」に重み「０．６」を乗算した値「３．６」を、修正候補Ｘ１の新たな修正コストに設定する。選定部１５３は、修正候補Ｘ４の修正コスト「６」に重み「０．６」を乗算した値「３．６」を、修正候補Ｘ４の新たな修正コストに設定する。 As explained with reference to FIG. 9, among the modification candidates X1 to X8, the modification candidates X1, X4, X5, and X8 have modification candidates that target records in the editing language "ja". Therefore, as shown in FIG. 11, the selection unit 153 uses the value "3.6" obtained by multiplying the modification cost "6" of the modification candidate X1 by the weight "0.6" as the new modification cost of the modification candidate X1. Set to . The selection unit 153 sets the value "3.6" obtained by multiplying the modification cost "6" of the modification candidate X4 by the weight "0.6" as the new modification cost of the modification candidate X4.

選定部１５３は、修正候補Ｘ５の修正コスト「４」に重み「０．６」を乗算した値「２．４」を、修正候補Ｘ５の新たな修正コストに設定する。選定部１５３は、修正候補Ｘ８の修正コスト「５」に重み「０．６」を乗算した値「３」を、修正候補Ｘ８の新たな修正コストに設定する。 The selection unit 153 sets the value "2.4" obtained by multiplying the modification cost "4" of the modification candidate X5 by the weight "0.6" as the new modification cost of the modification candidate X5. The selection unit 153 sets the value "3" obtained by multiplying the modification cost "5" of the modification candidate X8 by the weight "0.6" as the new modification cost of the modification candidate X8.

選定部１５３は、修正後の修正コストを考慮して、修正候補Ｘ１～Ｘ８を、修正コストの小さい順にソートする。たとえば、ソートされた結果、修正候補Ｘ１～Ｘ８の並び順は、修正候補Ｘ５，Ｘ７，Ｘ８，Ｘ１，Ｘ４，Ｘ３，Ｘ３，Ｘ６，Ｘ，２となる。選定部１５３は、修正コストの小さい修正候補を優先的に選定する。選定部１５３は、先頭からＮ個の修正候補を選定してもよい。 The selection unit 153 sorts the modification candidates X1 to X8 in descending order of modification cost, taking into account the modification cost after modification. For example, as a result of sorting, the modification candidates X1 to X8 are arranged in the following order: modification candidates X5, X7, X8, X1, X4, X3, X3, X6, X, 2. The selection unit 153 preferentially selects correction candidates with low correction costs. The selection unit 153 may select N correction candidates from the beginning.

図２の説明に戻る。表示制御部１５４は、選定部１５３によって選定された修正候補の情報を基にして、表示画面を生成し、生成した表示画面を、表示部１３０に出力して表示させる。たとえば、表示制御部１５４は、修正候補の修正内容と、修正コストとを対応付けたテキスト情報を表示画面に配置する。表示制御部１５４は、修正候補のうち、順番（選定部１５３にソートされた際の順番）の低いものが表示画面の上方に来るように、設定する。 Returning to the explanation of FIG. 2. The display control unit 154 generates a display screen based on the information on the correction candidates selected by the selection unit 153, and outputs the generated display screen to the display unit 130 for display. For example, the display control unit 154 arranges on the display screen text information that associates the modification content of the modification candidate with the modification cost. The display control unit 154 sets the correction candidates so that the one with the lowest order (the order when sorted by the selection unit 153) is placed at the top of the display screen.

図１２は、表示制御部が生成する表示画面の一例を示す図である。図１２に示す例では、表示画面６０に、修正候補（１）、修正候補（２）が設定されており、修正候補（１）および修正候補（２）の修正内容、修正コストが設定されている。 FIG. 12 is a diagram illustrating an example of a display screen generated by the display control unit. In the example shown in FIG. 12, a modification candidate (1) and a modification candidate (2) are set on the display screen 60, and the modification content and modification cost of the modification candidate (1) and modification candidate (2) are set. There is.

次に、本実施例に係る情報処理装置の処理手順の一例について説明する。図１３は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。図１３に示すように、情報処理装置１００の受付部１５１は、統制語彙データ１４１を受け付ける（ステップＳ１０１）。情報処理装置１００の特定部１５２は、統制語彙データ１４１を基にして、作業者が編集している言語を推定する（ステップＳ１０２）。 Next, an example of a processing procedure of the information processing apparatus according to this embodiment will be described. FIG. 13 is a flowchart showing the processing procedure of the information processing apparatus according to this embodiment. As shown in FIG. 13, the reception unit 151 of the information processing device 100 receives controlled vocabulary data 141 (step S101). The identification unit 152 of the information processing device 100 estimates the language in which the operator is editing based on the controlled vocabulary data 141 (step S102).

特定部１５２は、統制語彙データ１４１を基にして、同義サブグループ内の修正候補および修正コストを特定する（ステップＳ１０３）。特定部１５２は、統制語彙データ１４１を基にして、異なる同義サブグループ間の修正候補および修正コストを特定する（ステップＳ１０４）。 The specifying unit 152 specifies correction candidates and correction costs in the synonymous subgroup based on the controlled vocabulary data 141 (step S103). The specifying unit 152 specifies correction candidates and correction costs between different synonymous subgroups based on the controlled vocabulary data 141 (step S104).

特定部１５２は、同義サブグループ内の修正候補と、同義サブグループ間の修正候補とに対する修正を、統制語彙データ１４１に対して実行する（ステップＳ１０５）。特定部１５２は、修正後の統制語彙データ１４１に対して、同義グループ内の修正候補および修正コストを特定する（ステップＳ１０６）。 The specifying unit 152 executes corrections to the correction candidates within the synonymous subgroup and the correction candidates between the synonymous subgroups to the controlled vocabulary data 141 (step S105). The specifying unit 152 specifies correction candidates and correction costs within the synonymous group for the corrected controlled vocabulary data 141 (step S106).

特定部１５２は、修正後の統制語彙データ１４１に対して、異なる同義グループ間の修正候補および修正コストを特定する（ステップＳ１０７）。特定部１５２は、修正候補情報を生成する（ステップＳ１０８）。 The specifying unit 152 specifies correction candidates and correction costs between different synonymous groups for the corrected controlled vocabulary data 141 (step S107). The specifying unit 152 generates correction candidate information (step S108).

情報処理装置１００の選定部１５３は、作成者が編集している言語と、修正候補情報とを基にして、表示対象となる修正候補を選定する（ステップＳ１０９）。情報処理装置１００の表示制御部１５４は、選定された修正候補を基にして、表示画面を生成する（ステップＳ１１０）。表示制御部１５４は、表示画面を表示部１３０に出力する（ステップＳ１１１）。 The selection unit 153 of the information processing device 100 selects correction candidates to be displayed based on the language edited by the creator and the correction candidate information (step S109). The display control unit 154 of the information processing device 100 generates a display screen based on the selected correction candidates (step S110). The display control unit 154 outputs the display screen to the display unit 130 (step S111).

次に、本実施例に係る情報処理装置１００の効果について説明する。情報処理装置１００は、統制語彙データ１４１を基にして、複数の修正候補を特定し、複数の修正候補から、作業員の編集内容に応じた修正候補を選定して表示する処理を実行する。これによって、作業者の編集内容に基づいた修正候補を提示することができ、作業者の負担を軽減させることができる。 Next, the effects of the information processing device 100 according to this embodiment will be explained. The information processing device 100 specifies a plurality of correction candidates based on the controlled vocabulary data 141, and performs a process of selecting and displaying a correction candidate according to the content edited by the worker from among the plurality of correction candidates. As a result, correction candidates can be presented based on the contents edited by the operator, and the burden on the operator can be reduced.

情報処理装置１００は、統制語彙データ１４１の言語列に設定された言語を基にして、出現頻度が他の言語の種別よりも大きい言語の種別を、作業者が編集している言語の種別として推定する。これによって、作業者が編集している言語の種別を特定することができる。 Based on the languages set in the language string of the controlled vocabulary data 141, the information processing device 100 selects a language type whose appearance frequency is higher than other language types as the language type being edited by the operator. presume. This allows the operator to specify the type of language being edited.

情報処理装置１００は、各修正候補について修正コストを算出し、修正コストの小さい修正候補を優先して選定する。ここで、情報処理装置１００は、修正対象となるレコードが、作業者が編集している言語の種別に対応するレコードの場合には、係るレコードを修正対象とする修正候補の修正コストに重み（０＜重みの値＜１）をかけて修正する。これによって、作業者が編集している言語の種別に対応するレコードに関連する修正候補を優先して選定し易くすることができる。 The information processing apparatus 100 calculates a modification cost for each modification candidate, and selects a modification candidate with a lower modification cost with priority. Here, if the record to be modified is a record corresponding to the language type edited by the operator, the information processing device 100 adds a weight ( Correct by multiplying by 0<weight value<1). This makes it easier for the operator to preferentially select correction candidates related to the record corresponding to the language type being edited.

なお、情報処理装置１００は、作業者が編集している言語の種別を特定する場合に、統制語彙データ１４１の言語列１０ｃに設定された言語の出現頻度を利用していたが、これに限定されるものではない。情報処理装置１００の特定部１５２は、統制語彙データ１４１のレコード毎に設定した編集時刻を走査し、編集時刻が最新のレコードに設定された言語の種別を、作業者が編集している言語の種別として特定してもよい。 Note that the information processing device 100 used the appearance frequency of the language set in the language string 10c of the controlled vocabulary data 141 when identifying the type of language that the operator is editing, but this is not limited to this. It is not something that will be done. The identification unit 152 of the information processing device 100 scans the editing time set for each record of the controlled vocabulary data 141, and identifies the language type set in the record with the latest editing time as the language being edited by the operator. It may also be specified as a type.

次に、上記実施例に示した情報処理装置１００と同様の機能を実現するコンピュータのハードウェア構成の一例について説明する。図１４は、実施例の情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。 Next, an example of the hardware configuration of a computer that implements the same functions as the information processing apparatus 100 shown in the above embodiment will be described. FIG. 14 is a diagram illustrating an example of the hardware configuration of a computer that implements the same functions as the information processing device of the embodiment.

図１４に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０１と、ユーザからのデータの入力を受け付ける入力装置２０２と、ディスプレイ２０３とを有する。また、コンピュータ２００は、有線または無線ネットワークを介して、外部装置等との間でデータの授受を行う通信装置２０４と、インタフェース装置２０５とを有する。また、コンピュータ２００は、各種情報を一時記憶するＲＡＭ２０６と、ハードディスク装置２０７とを有する。そして、各装置２０１～２０７は、バス２０８に接続される。 As shown in FIG. 14, the computer 200 includes a CPU 201 that executes various calculation processes, an input device 202 that receives data input from a user, and a display 203. The computer 200 also includes a communication device 204 and an interface device 205 that exchange data with an external device or the like via a wired or wireless network. The computer 200 also includes a RAM 206 that temporarily stores various information and a hard disk device 207. Each device 201-207 is then connected to a bus 208.

ハードディスク装置２０７は、受付プログラム２０７ａ、特定プログラム２０７ｂ、選定プログラム２０７ｃ、表示制御プログラム２０７ｄを有する。また、ＣＰＵ２０１は、各プログラム２０７ａ～２０７ｄを読み出してＲＡＭ２０６に展開する。 The hard disk device 207 has a reception program 207a, a specific program 207b, a selection program 207c, and a display control program 207d. Further, the CPU 201 reads each program 207a to 207d and expands it into the RAM 206.

受付プログラム２０７ａは、受付プロセス２０６ａとして機能する。特定プログラム２０７ｂは、特定プロセス２０６ｂとして機能する。選定プログラム２０７ｃは、選定プロセス２０６ｃとして機能する。表示制御プログラム２０７ｄは、表示制御プロセス２０６ｄとして機能する。 The reception program 207a functions as a reception process 206a. The specific program 207b functions as a specific process 206b. The selection program 207c functions as a selection process 206c. The display control program 207d functions as a display control process 206d.

受付プロセス２０６ａの処理は、受付部１５１の処理に対応する。特定プロセス２０６ｂの処理は、特定部１５２の処理に対応する。選定プロセス２０６ｃの処理は、選定部１５３の処理に対応する。表示制御プロセス２０６ｄの処理は、表示制御部１５４の処理に対応する。 The processing of the reception process 206a corresponds to the processing of the reception unit 151. The processing of the identification process 206b corresponds to the processing of the identification unit 152. The processing of the selection process 206c corresponds to the processing of the selection unit 153. The processing of the display control process 206d corresponds to the processing of the display control unit 154.

なお、各プログラム２０７ａ～２０７ｄについては、必ずしも最初からハードディスク装置２０７に記憶させておかなくても良い。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤ、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させておく。そして、コンピュータ２００が各プログラム２０７ａ～２０７ｄを読み出して実行するようにしてもよい。 Note that each of the programs 207a to 207d does not necessarily need to be stored in the hard disk device 207 from the beginning. For example, each program is stored in a "portable physical medium" such as a flexible disk (FD), CD-ROM, DVD, magneto-optical disk, or IC card that is inserted into the computer 200. Then, the computer 200 may read and execute each program 207a to 207d.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 Regarding the embodiments including each of the above examples, the following additional notes are further disclosed.

（付記１）複数のレコードを含むデータテーブルを受け付け、
受け付けた前記データテーブルの分析結果に基づき、前記複数のレコードに含まれる複数のデータのうち、修正の対象候補である複数の修正候補データを特定し、
前記複数のレコードのうち、過去にデータへの編集が行われたレコードに基づき、特定した前記複数の修正候補データからいずれかの修正候補データを選定し、
選定した前記修正候補データを出力する、
処理をコンピュータが実行することを特徴とする修正支援方法。 (Additional note 1) Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
A correction support method characterized in that processing is executed by a computer.

（付記２）前記複数のレコードには、前記データを編集する場合に用いられた言語の種別が設定され、前記特定する処理は、前記複数のレコードに設定された言語の種別のうち、優先する言語の種別を更に特定し、前記選定する処理は、前記優先する言語の種別を基にして、特定した前記複数の修正候補データからいずれかの修正候補データを選定することを特徴とする付記１に記載の修正支援方法。 (Additional Note 2) The language types used when editing the data are set in the plurality of records, and the identifying process prioritizes the language types set in the plurality of records. Supplementary Note 1, wherein the process of further specifying and selecting a language type selects any correction candidate data from the identified plurality of correction candidate data based on the prioritized language type. Correction support method described in.

（付記３）前記特定する処理は、修正候補データに示される前記データテーブルに対する修正回数に基づいた修正コストを更に特定し、前記選定する処理は、前記優先する言語の種別に関するレコードを修正する修正候補データに対する修正コストを、小さくする修正を行い、修正コストの小さい修正候補データを優先して選定することを特徴とする付記２に記載の修正支援方法。 (Additional Note 3) The identifying process further specifies a modification cost based on the number of modifications to the data table indicated in the modification candidate data, and the selecting process includes modifications to modify records related to the preferred language type. The modification support method according to appendix 2, characterized in that modification is performed to reduce the modification cost to the candidate data, and priority is given to selecting modification candidate data with a small modification cost.

（付記４）前記特定する処理は、前記複数のレコードに設定された言語の種別のうち、出現頻度が他の言語の種別よりも大きい言語の種別を、前記優先する言語の種別として特定することを特徴とする付記２に記載の修正支援方法。 (Additional note 4) The identifying process is to identify a language type whose appearance frequency is higher than other language types among the language types set in the plurality of records as the priority language type. The correction support method according to appendix 2, characterized by:

（付記５）複数のレコードを含むデータテーブルを受け付け、
受け付けた前記データテーブルの分析結果に基づき、前記複数のレコードに含まれる複数のデータのうち、修正の対象候補である複数の修正候補データを特定し、
前記複数のレコードのうち、過去にデータへの編集が行われたレコードに基づき、特定した前記複数の修正候補データからいずれかの修正候補データを選定し、
選定した前記修正候補データを出力する、
処理をコンピュータに実行させることを特徴とする修正支援プログラム。 (Appendix 5) Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
A modification support program that causes a computer to perform processing.

（付記６）前記複数のレコードには、前記データを編集する場合に用いられた言語の種別が設定され、前記特定する処理は、前記複数のレコードに設定された言語の種別のうち、優先する言語の種別を更に特定し、前記選定する処理は、前記優先する言語の種別を基にして、特定した前記複数の修正候補データからいずれかの修正候補データを選定することを特徴とする付記５に記載の修正支援プログラム。 (Additional note 6) The language types used when editing the data are set in the plurality of records, and the identifying process prioritizes the language types set in the plurality of records. Supplementary note 5, wherein the process of further specifying and selecting a language type selects any correction candidate data from the identified plurality of correction candidate data based on the prioritized language type. Modification support program as described in .

（付記７）前記特定する処理は、修正候補データに示される前記データテーブルに対する修正回数に基づいた修正コストを更に特定し、前記選定する処理は、前記優先する言語の種別に関するレコードを修正する修正候補データに対する修正コストを、小さくする修正を行い、修正コストの小さい修正候補データを優先して選定することを特徴とする付記６に記載の修正支援プログラム。 (Additional Note 7) The identifying process further specifies a modification cost based on the number of modifications to the data table indicated in the modification candidate data, and the selecting process includes modifications to modify records related to the preferred language type. The correction support program according to appendix 6, characterized in that the correction support program performs correction to reduce the correction cost to the candidate data, and selects correction candidate data with a lower correction cost with priority.

（付記８）前記特定する処理は、前記複数のレコードに設定された言語の種別のうち、出現頻度が他の言語の種別よりも大きい言語の種別を、前記優先する言語の種別として特定することを特徴とする付記６に記載の修正支援プログラム。 (Additional note 8) The identifying process is to identify, among the language types set in the plurality of records, a language type whose appearance frequency is higher than other language types as the priority language type. The correction support program described in appendix 6, characterized by:

（付記９）複数のレコードを含むデータテーブルを受け付け、
受け付けた前記データテーブルの分析結果に基づき、前記複数のレコードに含まれる複数のデータのうち、修正の対象候補である複数の修正候補データを特定し、
前記複数のレコードのうち、過去にデータへの編集が行われたレコードに基づき、特定した前記複数の修正候補データからいずれかの修正候補データを選定し、
選定した前記修正候補データを出力する、
処理を実行する制御部を有する情報処理装置。 (Appendix 9) Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
An information processing device that has a control unit that executes processing.

（付記１０）前記複数のレコードには、前記データを編集する場合に用いられた言語の種別が設定され、前記制御部は、前記複数のレコードに設定された言語の種別のうち、優先する言語の種別を更に特定し、前記優先する言語の種別を基にして、特定した前記複数の修正候補データからいずれかの修正候補データを選定することを特徴とする付記９に記載の情報処理装置。 (Additional Note 10) The plurality of records are set with language types used when editing the data, and the control unit selects a language that is prioritized among the language types set in the plurality of records. The information processing device according to appendix 9, further comprising: further specifying a type of the preferred language, and selecting one of the specified correction candidate data from among the specified plurality of correction candidate data.

（付記１１）前記制御部は、修正候補データに示される前記データテーブルに対する修正回数に基づいた修正コストを更に特定し、前記選定する処理は、前記優先する言語の種別に関するレコードを修正する修正候補データに対する修正コストを、小さくする修正を行い、修正コストの小さい修正候補データを優先して選定することを特徴とする付記９に記載の情報処理装置。 (Supplementary Note 11) The control unit further specifies a modification cost based on the number of modifications to the data table indicated in the modification candidate data, and the selecting process includes modification candidates for modifying records related to the prioritized language type. The information processing apparatus according to appendix 9, wherein correction is performed to reduce the correction cost to the data, and correction candidate data with a small correction cost is selected with priority.

（付記１２）前記制御部は、前記複数のレコードに設定された言語の種別のうち、出現頻度が他の言語の種別よりも大きい言語の種別を、前記優先する言語の種別として特定することを特徴とする付記１０に記載の情報処理装置。 (Additional Note 12) The control unit may specify, among the language types set in the plurality of records, a language type whose appearance frequency is higher than other language types as the priority language type. The information processing device according to feature Supplementary Note 10.

１００情報処理装置
１１０通信部
１２０入力部
１３０表示部
１４０記憶部
１４１統制語彙データ
１５０制御部
１５１受付部
１５２特定部
１５３選定部
１５４表示制御部 100 Information processing device 110 Communication unit 120 Input unit 130 Display unit 140 Storage unit 141 Controlled vocabulary data 150 Control unit 151 Reception unit 152 Specification unit 153 Selection unit 154 Display control unit

Claims

Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
A correction support method characterized in that processing is executed by a computer.

Language types used when editing the data are set in the plurality of records, and the identifying process selects a preferred language type among the language types set in the plurality of records. 2. The process of further specifying and selecting selects any correction candidate data from the identified plurality of correction candidate data based on the type of the prioritized language. Correction support method.

The identifying process further specifies a modification cost based on the number of modifications to the data table indicated in the modification candidate data, and the selecting process includes a modification to the modification candidate data that modifies a record related to the preferred language type. 3. The modification support method according to claim 2, wherein modification is performed to reduce the cost, and priority is given to selecting modification candidate data with a small modification cost.

The identifying process is characterized in that, among the language types set in the plurality of records, a language type whose appearance frequency is higher than other language types is specified as the priority language type. The correction support method according to claim 2.

Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
A modification support program that causes a computer to perform processing.

Accepts a data table containing multiple records,
Based on the analysis result of the received data table, identifying a plurality of correction candidate data that are correction target candidates among the plurality of data included in the plurality of records;
Selecting any correction candidate data from the identified plurality of correction candidate data based on records in which data has been edited in the past among the plurality of records;
outputting the selected correction candidate data;
An information processing device that has a control unit that executes processing.