JPWO2008126149A1

JPWO2008126149A1 - Document anonymization device

Info

Publication number: JPWO2008126149A1
Application number: JP2009508703A
Authority: JP
Inventors: 山田　茂; 茂山田; 潤伊吹; 明彦小幡
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-03-30
Filing date: 2007-03-30
Publication date: 2010-07-15
Also published as: WO2008126149A1

Abstract

文書中の固有名詞を抽出し、その抽出した固有名詞毎に、その固有名詞に係わる属性情報を少なくとも１つ取得し、その属性情報を取得した固有名詞と置換すべき文字列を、その固有名詞を匿名化した表現にその属性情報を付加して生成し、生成した文字列を固有名詞と置換する。それにより、情報量の減少を抑えた文書の匿名化を行う。A proper noun in the document is extracted, and for each extracted proper noun, at least one attribute information related to the proper noun is obtained, and a character string to be replaced with the proper noun from which the attribute information is obtained is obtained as the proper noun. Is generated by adding the attribute information to the anonymized expression, and the generated character string is replaced with a proper noun. As a result, the anonymization of the document in which the decrease in the amount of information is suppressed is performed.

Description

本発明は、文書中に存在する固有名詞を匿名化するための技術に関する。 The present invention relates to a technique for anonymizing proper nouns present in a document.

近年、文書の電子化が進み、膨大な文書が電子データの形で蓄積されるようになっている。通信ネットワークを介して閲覧可能な文書も急増する傾向にある。
文書には、公開するのが望ましくない個人情報である固有名詞が存在する場合がある。例えば企業等の組織のなかには、より適切な対応を各従業員が取れるように、情報を共有化すべき現場エピソードを文書で公開することが行われている。そのような文書には、実名で公開すべきでない固有名詞が存在することが多い。このようなことから、そのような固有名詞を匿名化する文書匿名化装置が創案されている。In recent years, digitization of documents has progressed, and an enormous amount of documents has been accumulated in the form of electronic data. The number of documents that can be browsed via a communication network tends to increase rapidly.
A document may have proper nouns, which are personal information that is not desirable to publish. For example, in an organization such as a company, in order for each employee to take a more appropriate response, the episodes on which information should be shared are disclosed in documents. Such documents often have proper nouns that should not be disclosed in real names. For this reason, a document anonymization device that anonymizes such proper nouns has been devised.

従来の文書匿名化装置としては、例えば特許文献１に記載されたものがある。その特許文献１に記載された従来の文書匿名化装置では、匿名化対象の文書から固有名詞を抽出し、抽出した固有名詞を匿名化すべきか否か判定し、匿名化すべきと判定した固有名詞を匿名化するようになっている。それにより例えば図１Ａに示すような文書を匿名化する場合、その匿名化によって図１Ｂに示すような文書を生成していた。図１Ｂに示す文書では、「佐藤さん」は「Ａさん」、「丸菱電機」は「Ｊ社」、「大村さん」は「Ｔさん」、「坂本さん」は「Ｂさん」にそれぞれ置換されている。その置換は例えば「佐藤さん」を「○○さん」といった形で行う場合もある。 As a conventional document anonymization device, for example, there is one described in Patent Document 1. In the conventional document anonymization apparatus described in Patent Document 1, a proper noun is extracted from a document to be anonymized, it is determined whether the extracted proper noun should be anonymized, and the proper noun determined to be anonymized Has become anonymized. Accordingly, for example, when a document as shown in FIG. 1A is anonymized, the document as shown in FIG. 1B is generated by the anonymization. In the document shown in FIG. 1B, “Mr. Sato” is replaced with “Mr. A”, “Maruhishi Electric” is replaced with “Company J”, “Mr. Omura” is replaced with “T”, and “Mr. Sakamoto” is replaced with “Mr. B”. Has been. For example, “Mr. Sato” may be replaced with “Mr. XXX”.

そのような置換により、固有名詞は特定不可能に匿名化される。しかし、固有名詞を単に匿名化すると、その匿名化により情報量が減少し、その減少によって元の文書の意味自体の把握が非常に困難になる場合がある。そのため、元の文書が持つ重要な意味が損なわれる可能性が高かった。 Such substitutions make the proper nouns anonymized unspecified. However, if the proper noun is simply anonymized, the amount of information decreases due to the anonymization, and the decrease may make it very difficult to grasp the meaning of the original document. Therefore, the important meaning of the original document was likely to be lost.

匿名化による情報量の減少を抑えることが可能な従来の文書匿名化装置としては、例えば特許文献２に記載されたものがある。その特許文献２に記載された従来の文書匿名化装置では、固有名詞毎に、その固有名詞と置換すべき文字列を設定することにより、固有名詞を任意の文字列に置換できるようにしている。 As a conventional document anonymization device capable of suppressing a decrease in the amount of information due to anonymization, for example, there is one described in Patent Document 2. In the conventional document anonymization apparatus described in Patent Document 2, a proper noun can be replaced with an arbitrary character string by setting a character string to be replaced with the proper noun for each proper noun. .

しかし、そのような置換を実現させるためには、固有名詞毎に文字列を設定しなければならない。その文字列の設定は、当然のことながら、他に設定する文字列を考慮して行わなければならない。なぜなら、文書に存在する複数の個人名を同じ文字列で置換するようなことは回避しなければならないからである。そのため、単に文字列を設定するようなことはできず、置換すべき固有名詞の数が多くなると、文字列の設定に非常に煩雑な作業を行わなければならなくなる。このようなことから、匿名化による情報量の減少は、固有名詞と置換すべき文字列を個別に設定することなく行えるようにすることが重要と考えられる。
特開２００２−２６９０８１号公報特開２００２−２５９３６８号公報 However, in order to realize such substitution, a character string must be set for each proper noun. Naturally, the character string must be set in consideration of other character strings to be set. This is because it is necessary to avoid replacing a plurality of personal names existing in a document with the same character string. Therefore, it is not possible to simply set a character string, and if the number of proper nouns to be replaced increases, a very complicated operation must be performed for setting the character string. For this reason, it is considered important to be able to reduce the amount of information due to anonymization without individually setting character strings to be replaced with proper nouns.
JP 2002-269081 A JP 2002-259368 A

本発明は、固有名詞と置換すべき文字列を個別に設定することなく、その固有名詞の匿名化による情報量の減少を抑えることが可能な文書匿名化装置を提供することを目的とする。 An object of the present invention is to provide a document anonymization device capable of suppressing a decrease in the amount of information due to anonymization of proper nouns without individually setting character strings to be replaced with proper nouns.

本発明の第１の態様のプログラムは、文書中の固有名詞を匿名化する文書匿名化装置として用いることが可能なコンピュータに、文書中の固有名詞を抽出する固有名詞抽出機能と、固有名詞抽出機能により抽出した固有名詞に係わる属性情報を少なくとも１つ取得する情報取得機能と、情報取得機能により属性情報を取得した場合に、該属性情報を用いて、固有名詞抽出機能により抽出した固有名詞と置換すべき文字列を生成する文字列生成機能と、固有名詞抽出機能により資料中から抽出した固有名詞を、文字列生成機能により生成した文字列に置換する文字列置換機能と、を実現させる。 The program according to the first aspect of the present invention includes a proper noun extraction function for extracting proper nouns in a document to a computer that can be used as a document anonymization device for anonymizing proper nouns in a document, and proper noun extraction. An information acquisition function for acquiring at least one attribute information related to the proper noun extracted by the function, and when the attribute information is acquired by the information acquisition function, the proper noun extracted by the proper noun extraction function using the attribute information; A character string generation function for generating a character string to be replaced and a character string replacement function for replacing the proper noun extracted from the material by the proper noun extraction function with the character string generated by the character string generation function are realized.

第２の態様のプログラムは、上記第１の態様が備えた機能に加え、文字列生成機能により生成させる文字列を選択するための選択機能、を更に具備し、文字列生成機能は、選択機能を介したユーザの選択指示に応じて、情報取得機能により取得された属性情報を用いた文字列の生成を行う。 The program according to the second aspect further includes a selection function for selecting a character string to be generated by the character string generation function in addition to the function provided in the first aspect. A character string is generated using the attribute information acquired by the information acquisition function in response to the user's selection instruction via.

本発明の文書匿名化装置は、文書中の固有名詞を匿名化することを前提とし、文書中の固有名詞を抽出する固有名詞抽出手段と、固有名詞抽出手段が抽出した固有名詞に係わる属性情報を少なくとも１つ取得する情報取得手段と、情報取得手段が属性情報を取得した場合に、該属性情報を用いて、固有名詞抽出手段が抽出した固有名詞と置換すべき文字列を生成する文字列生成手段と、固有名詞抽出手段が資料中から抽出した固有名詞を、文字列生成手段が生成した文字列に置換する文字列置換手段と、を具備する。 The document anonymization apparatus of the present invention is based on the premise that the proper nouns in the document are anonymized, proper noun extraction means for extracting proper nouns in the document, and attribute information related to proper nouns extracted by the proper noun extraction means A character string for generating a character string to be replaced with the proper noun extracted by the proper noun extraction means using the attribute information when the information acquisition means acquires the attribute information. And a character string replacement means for replacing the proper noun extracted from the material by the proper noun extraction means with the character string generated by the character string generation means.

本発明の文書匿名化方法は、文書中の固有名詞をコンピュータにより匿名化するためのものであり、文書中の固有名詞を抽出し、該抽出した固有名詞毎に、該固有名詞に係わる属性情報を少なくとも１つ取得し、該属性情報を取得した固有名詞と置換すべき文字列を、該固有名詞を匿名化した表現に該属性情報を付加して生成し、該生成した文字列を固有名詞と置換する。 The document anonymization method of the present invention is for anonymizing proper nouns in a document by a computer, extracts proper nouns in a document, and for each extracted proper noun, attribute information related to the proper noun The character string to be replaced with the proper noun that acquired the attribute information is generated by adding the attribute information to the anonymized expression of the proper noun, and the generated character string is generated as the proper noun. Replace with

本発明では、文書中の固有名詞を抽出し、その抽出した固有名詞毎に、その固有名詞に係わる属性情報を少なくとも１つ取得し、その属性情報を取得した固有名詞と置換すべき文字列を、その固有名詞を匿名化した表現にその属性情報を付加して生成し、生成した文字列を固有名詞と置換する。その固有名詞と置換した文字列には属性情報が付加されているため、単に固有名詞を伏せ字にするような場合と比較して、情報量の減少を抑えることができる。固有名詞と置換すべき文字列を個別に設定するような煩雑な作業を行う必要性は回避される。文字列の生成に用いる属性情報の数、種類、或いは組み合わせを任意に変更することが可能であることから、所望の文字列を生成させることもできる。 In the present invention, a proper noun in a document is extracted, and at least one attribute information related to the proper noun is obtained for each of the extracted proper nouns, and a character string to be replaced with the proper noun obtained from the attribute information is obtained. Then, the proper noun is generated by adding the attribute information to the anonymized expression, and the generated character string is replaced with the proper noun. Since the attribute information is added to the character string replaced with the proper noun, a decrease in the amount of information can be suppressed as compared with the case where the proper noun is simply turned over. The need to perform complicated operations such as individually setting character strings to be replaced with proper nouns is avoided. Since the number, type, or combination of the attribute information used for generating the character string can be arbitrarily changed, a desired character string can be generated.

文書例を示す図である。It is a figure which shows the example of a document. 特許文献１に記載された従来の文書匿名化装置により匿名化された文書を示す図である。It is a figure which shows the document anonymized by the conventional document anonymization apparatus described in patent document 1. FIG. 本発明の実施の形態による文書匿名化装置の構成を示す図である。It is a figure which shows the structure of the document anonymization apparatus by embodiment of this invention. 該当組織体制情報テーブルの構成を示す図である。It is a figure which shows the structure of an applicable organization system information table. 対応顧客組織体制情報テーブルの構成を示す図である。It is a figure which shows the structure of a corresponding customer organization system information table. 会社情報テーブルの構成を示す図である。It is a figure which shows the structure of a company information table. 社内従業員情報テーブルの構成を示す図である。It is a figure which shows the structure of an in-house employee information table. 図１Ａの文書を匿名化した後の文書を示す図である。It is a figure which shows the document after anonymizing the document of FIG. 1A. 文書匿名化処理のフローチャートである。It is a flowchart of a document anonymization process. 付加表現生成処理のフローチャートである。It is a flowchart of an additional expression production | generation process. 組織名付加表現生成処理のフローチャートである。It is a flowchart of an organization name additional expression generation process. 個人名付加表現生成処理のフローチャートである。It is a flowchart of a personal name additional expression production | generation process. 組織名付加表現の生成方法を示す図である。It is a figure which shows the production | generation method of an organization name additional expression. 個人名付加表現の生成方法を示す図である。It is a figure which shows the production | generation method of personal name additional expression. 資料の匿名化例を示す図である。It is a figure which shows the example of anonymization of a document. 資料の冗長性を排除しての匿名化例を示す図である。It is a figure which shows the anonymization example which excludes the redundancy of data. 冗長性の排除を可能とする固有名詞の匿名化方法を説明する図である。It is a figure explaining the anonymization method of the proper noun which makes it possible to exclude redundancy. 固有名詞置換処理のフローチャートである。It is a flowchart of a proper noun replacement process. 生成させる属性情報付加表現の選択方法を示す図である（組織）。It is a figure which shows the selection method of the attribute information addition expression to produce | generate (organization). 生成させる属性情報付加表現の選択方法を示す図である（個人）。It is a figure which shows the selection method of the attribute information additional expression to produce | generate (individual). 付加表現選択処理のフローチャートである。It is a flowchart of an additional expression selection process. 本実施の形態による文書匿名化装置を実現できるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer which can implement | achieve the document anonymization apparatus by this Embodiment.

以下、本発明の実施の形態について、図面を参照しながら詳細に説明する。
図１は、本発明の実施の形態による文書匿名化装置の構成を示す図である。その文書匿名化装置２は、例えば組織（企業）が従業員、或いは外部の人に公開する文書を匿名化するために構築されたものである。図１に示すように、資料取得部２１、固有名詞抽出部２２、固有名詞変換部３３、情報格納部２４、文字列置換部２５、入力部２６、及び表示制御部２７を備えた構成となっている。その表示制御部２７により、匿名化した文書やその匿名化のための各種情報を表示装置ＤＰに表示できるものとして実現されている。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing a configuration of a document anonymization apparatus according to an embodiment of the present invention. The document anonymization device 2 is constructed to anonymize a document that an organization (company) makes available to employees or external persons, for example. As shown in FIG. 1, the information acquisition unit 21, proper noun extraction unit 22, proper noun conversion unit 33, information storage unit 24, character string replacement unit 25, input unit 26, and display control unit 27 are provided. ing. The display control unit 27 realizes an anonymized document and various types of information for anonymization on the display device DP.

以上の構成において動作を説明する。
資料取得部２１は、匿名化すべき文書として資料を取得するものである。取得した資料は、固有名詞抽出部２２、及び文字列置換部２５にそれぞれ出力する。固有名詞抽出部２２は、資料取得部２１から入力した資料の形態素解析を行い、固有名詞を抽出する。その抽出結果は固有名詞変換部２３に出力する。匿名化すべき文書として資料を資料取得部２１が取得することから、特に断らない限り以降「資料」は匿名化すべき文書の意味で用いることとする。匿名化すべき文書自体は限定されるものではない。The operation in the above configuration will be described.
The material acquisition unit 21 acquires material as a document to be anonymized. The acquired materials are output to the proper noun extraction unit 22 and the character string replacement unit 25, respectively. The proper noun extraction unit 22 performs morphological analysis of the material input from the material acquisition unit 21 and extracts a proper noun. The extraction result is output to the proper noun conversion unit 23. Since the material acquisition unit 21 acquires a material as a document to be anonymized, “material” is hereinafter used in the meaning of the document to be anonymized unless otherwise specified. The document itself to be anonymized is not limited.

固有名詞変換部２３は、固有名詞抽出部２２が抽出した固有名詞毎に、その固有名詞と置換すべき文字列を生成するものである。そのために情報取得部２３ａ及び付加表現生成部２３ｂを備えている。情報取得部２３ａは、情報格納部２４から文字列の生成に用いる情報を取得するものであり、付加表現生成部２３ｂは、情報取得部２３ａが取得した情報を用いて固有名詞と置換すべき文字列（付加表現）を生成するものである。 The proper noun conversion unit 23 generates a character string to be replaced with the proper noun for each proper noun extracted by the proper noun extraction unit 22. For this purpose, an information acquisition unit 23a and an additional expression generation unit 23b are provided. The information acquisition unit 23a acquires information used for generating a character string from the information storage unit 24, and the additional expression generation unit 23b uses the information acquired by the information acquisition unit 23a to replace characters with proper nouns. A sequence (additional expression) is generated.

情報格納部２４には、文字列生成用に、該当組織体制情報テーブル、対応顧客組織体制情報テーブル、会社情報テーブル、及び社内従業員情報テーブルを格納している。ここでそれらのテーブルについて、図３Ａ〜図３Ｄを参照して具体的に説明する。 The information storage unit 24 stores a corresponding organization system information table, a corresponding customer organization system information table, a company information table, and an in-house employee information table for character string generation. Here, these tables will be specifically described with reference to FIGS. 3A to 3D.

図３Ａは、該当組織体制情報テーブルの構成を示す図である。そのテーブルは、会社組織のなかで資料を匿名化する体制（事業部、部署、或いはグループなど）に関する情報をまとめたものである。図３Ａに示すように、役割名、担当者名、所属組織、及び属性の各項目が設けられている。それにより、役割を与えられた従業員毎に、その役割名、その名前、所属組織、及び属性の個人情報が１レコードに格納されている。 FIG. 3A is a diagram showing the configuration of the corresponding organizational structure information table. The table summarizes information about a system (business division, department, group, etc.) that anonymizes materials in a company organization. As shown in FIG. 3A, items of a role name, a person in charge name, a belonging organization, and an attribute are provided. Thus, for each employee who is given a role, the role name, the name, the organization to which the user belongs, and the personal information of the attribute are stored in one record.

図３Ｂは、対応顧客組織体制情報テーブルの構成を示す図である。そのテーブルは、顧客に相当する組織（会社）のなかで実際に仕事を依頼している体制に関する情報をまとめたものである。図３Ｂに示すように、その構成は上記該当組織体制情報テーブルと基本的に同じとなっている。 FIG. 3B is a diagram showing a configuration of a corresponding customer organization system information table. The table is a summary of information on the system that actually requests work in an organization (company) corresponding to a customer. As shown in FIG. 3B, the configuration is basically the same as the corresponding organizational structure information table.

図３Ｃは、会社情報テーブルの構成を示す図である。そのテーブルは、資料に出現する可能性のある会社毎に、その会社に関する情報をまとめたものである。図３Ｃに示すように、企業名、年間売り上げ、所在地、関係、及び業種の各項目の情報が会社毎に格納されている。関係は、資料を匿名化する体制を持つ会社（自社）を基準にしたものである。 FIG. 3C is a diagram illustrating a configuration of a company information table. The table summarizes information about each company that may appear in the material. As shown in FIG. 3C, information on each item of company name, annual sales, location, relationship, and type of business is stored for each company. The relationship is based on a company (in-house) with a system for anonymizing materials.

図３Ｄは、社内従業員情報テーブルの構成を示す図である。そのテーブルは、自社の従業員毎に、その個人情報をまとめたものである。その個人情報を格納する項目としては、図３Ｄに示すように、氏名、従業員番号、所属、役職、電話番号などが設けられている。 FIG. 3D is a diagram illustrating a configuration of an in-house employee information table. The table summarizes the personal information for each employee of the company. As items for storing the personal information, as shown in FIG. 3D, a name, employee number, affiliation, title, telephone number, and the like are provided.

上記該当組織体制情報テーブル及び社内従業員情報テーブルは、実際には自社で働く従業員を管理するために用意されたデータベース（ＤＢ）或いはそのＤＢから作成したものである。他の２つのテーブルは、実際には自社が事業を展開するうえで必要な情報を管理するために用意されたＤＢ、或いはそのＤＢから作成したものである。本実施の形態では、そのようなテーブルを利用して、より具体的には、テーブルに格納されている情報を用いて、固有名詞と置換すべき文字列を生成するようにしている。その情報を用いた文字列の生成により、固有名詞の匿名化に伴う情報量の減少を補うため、重要な意味を損なうようなことは確実に回避することができる。 The relevant organizational structure information table and the in-house employee information table are actually created from a database (DB) prepared for managing employees working in the company or from the DB. The other two tables are actually prepared from the DB prepared for managing information necessary for the company to develop the business or from the DB. In the present embodiment, such a table is used, and more specifically, a character string to be replaced with a proper noun is generated using information stored in the table. Since the generation of the character string using the information compensates for the decrease in the amount of information associated with the anonymization of the proper noun, it is possible to reliably avoid losing important meaning.

図２の説明に戻る。
情報取得部２３ａは、固有名詞をキーにした検索を各テーブルに対して行い、そのキーを持つレコードを抽出し、付加表現生成部２３ｂする。付加表現生成部２３ｂは、固有名詞と置換すべき文字列を生成し、その文字列にレコードの情報を付加して、実際に置換すべき文字列を生成する。その生成方法について、図９及び図１０を参照して詳細に説明する。混乱を避けるために以降、情報を付加していない文字列を「変換後名称」、情報を付加した後の文字列を「属性情報付加表現」と表記することとする。Returning to the description of FIG.
The information acquisition unit 23a performs a search using each proper noun as a key for each table, extracts a record having the key, and makes the additional expression generation unit 23b. The additional expression generation unit 23b generates a character string to be replaced with a proper noun, adds record information to the character string, and generates a character string to be actually replaced. The generation method will be described in detail with reference to FIG. 9 and FIG. In order to avoid confusion, hereinafter, a character string to which no information is added is referred to as “name after conversion”, and a character string after information is added to “attribute information addition expression”.

固有名詞は組織と個人とに大別される。組織と個人とでは、図３Ａ〜図３Ｄに示す各テーブルのなかで情報を取得するテーブルが異なる。このことから本実施の形態では、固有名詞が組織と個人の何れであるか判別して、属性情報付加表現の生成を行うようにしている。それにより個人を表す固有名詞用に生成する属性情報付加表現は「個人名付加表現」組織を表す固有名詞用に生成する属性情報付加表現は「組織名付加表現」と表記することとする。 Proper nouns are broadly divided into organizations and individuals. The table for acquiring information is different among the tables shown in FIGS. 3A to 3D between the organization and the individual. Therefore, in this embodiment, it is determined whether the proper noun is an organization or an individual, and the attribute information additional expression is generated. Accordingly, the attribute information additional expression generated for the proper noun representing the individual is referred to as “individual name additional expression”, and the attribute information additional expression generated for the proper noun representing the organization is expressed as “organization name additional expression”.

図９は、組織名付加表現の生成方法を示す図である。図９に示すように組織名付加表現の生成は、変換後名称テーブル、匿名化情報テーブル、及び匿名化変換テーブルを生成して行われる。これは、個人名付加表現を生成する場合も同じである。 FIG. 9 is a diagram illustrating a method for generating an organization name additional expression. As shown in FIG. 9, the organization name additional expression is generated by generating a post-conversion name table, an anonymization information table, and an anonymization conversion table. This is the same when the personal name additional expression is generated.

変換後名称テーブルは、固有名詞毎にその変換後名称を管理するために生成される。変換後名称自体は、予め定めた規則に従って生成される。それにより、資料中で１番目に出現する組織名に「Ｊ社」、次に出現する組織名には「Ｋ社」といったように変換後名称が設定される。 The post-conversion name table is generated for managing the post-conversion name for each proper noun. The converted name itself is generated according to a predetermined rule. As a result, the post-conversion name is set such that “Company J” is the first organization name that appears in the document, and “K Company” is the next organization name.

匿名化テーブルは、組織名を表す固有名詞毎に、組織名付加表現生成用の情報を管理するために生成される。そのテーブルには、固有名詞、変換後名称、関係、属性１、及び属性２の各項目が設けられている。関係、属性１、及び属性２の各項目には、会社情報テーブルから抽出されたレコードに格納されている情報が格納される。属性１、及び属性２の各項目には、会社情報テーブルで業種の項目に格納された情報が格納される。これは、業種の項目は、会社にとって属性情報が格納される項目だからである。業種の項目は、複数の属性情報を格納可能なようになっている。 An anonymization table is produced | generated in order to manage the information for organization name additional expression production | generation for every proper noun showing an organization name. In the table, items of proper noun, converted name, relationship, attribute 1 and attribute 2 are provided. Information stored in the record extracted from the company information table is stored in each item of relationship, attribute 1, and attribute 2. Each item of attribute 1 and attribute 2 stores information stored in the business type item in the company information table. This is because the business category item is an item in which attribute information is stored for the company. The item of industry can store a plurality of attribute information.

匿名化変換テーブルは、固有名詞毎に、組織名付加表現（属性情報付加表現）を管理するために生成される。組織名付加表現は、単純に固定的な変換ルールに従って生成するようにしている。その変換ルールにより組織名付加表現は、図９に示すように、変換後名称の前に関係項目の情報（ここでは「顧客会社」が相当）を付加し、変換後名称の後に属性１項目、及び／或いは属性２項目の情報を括弧内に表記した文字列を付加することで生成される。 The anonymization conversion table is generated for managing the organization name additional expression (attribute information additional expression) for each proper noun. The organization name additional expression is simply generated according to a fixed conversion rule. According to the conversion rule, as shown in FIG. 9, as shown in FIG. 9, as shown in FIG. 9, related item information (here, “customer company” is equivalent) is added before the converted name, and attribute 1 item is added after the converted name. And / or it is generated by adding a character string in which information of attribute 2 items is described in parentheses.

属性情報付加表現に置換すべき組織名は、会社名でなくとも良い。会社（組織）を構成する一部であっても良い。そのため、図９では匿名化情報テーブルに、固有名詞項目に「情報システム部」を表記したレコードを示している。そのようなレコードは、例えば組織毎に、組織を構成する各部門に関する情報をまとめたテーブルを用意することで生成することができる。混乱を避けるため、ここではこれ以上、言及しないこととする。 The organization name to be replaced with the attribute information additional expression may not be the company name. It may be a part of a company (organization). Therefore, in FIG. 9, the record which described "information system part" in the proper noun item is shown in the anonymization information table. Such a record can be generated, for example, by preparing a table in which information on each department constituting the organization is collected for each organization. To avoid confusion, we will not mention it here any further.

図１０は、個人名付加表現の生成方法を示す図である。ここでは組織名付加表現を生成する場合と異なる部分にのみ着目して説明する。
個人名をキーにした検索では、その個人名を格納したレコードは該当組織体制情報テーブル、及び対応顧客組織体制情報テーブルのうちの何れかから抽出される。その個人名を持つ人は、何れかの組織に属している。このことから匿名化テーブルは、図１０に示すように、該当組織体制情報テーブル、及び対応顧客組織体制情報テーブルのうちの何れかから抽出されたレコード、及び会社情報テーブルから抽出されるレコードを用いて生成される。匿名化情報テーブルに設けられた項目において、関係項目には会社情報テーブルから抽出されたレコードの情報が格納される。FIG. 10 is a diagram illustrating a method for generating a personal name additional expression. Here, description will be given focusing only on the difference from the case of generating the organization name additional expression.
In the search using the personal name as a key, a record storing the personal name is extracted from either the corresponding organizational structure information table or the corresponding customer organizational structure information table. A person with that personal name belongs to any organization. Therefore, as shown in FIG. 10, the anonymization table uses a record extracted from any one of the corresponding organizational structure information table and the corresponding customer organizational structure information table, and a record extracted from the company information table. Generated. In the items provided in the anonymization information table, the record of the record extracted from the company information table is stored in the related item.

個人名付加表現も、組織名付加表現と同様に、固定的な変換ルールに従って行われる。その変換ルールにより個人名付加表現は、図１０に示すように、変換後名称の前に役割項目の情報（ここでは「プロマネの」及び「アプリリーダの」が相当）を付加し、変換後名称の後に関係項目、及び／或いは所属項目の情報を括弧内に表記した文字列を付加することで生成される。冗長的になり過ぎるのを回避するために、自社に属する個人には関係項目の情報は付加しないようにしている。それにより固有名詞（オリジナル表現）が「佐藤」の個人名付加表現（属性情報付加表現）は「プロマネのＡ」となっている。 The personal name additional expression is also performed according to a fixed conversion rule, like the organization name additional expression. According to the conversion rule, the personal name addition expression adds the role item information (here, “promanet” and “application reader” are equivalent) before the converted name, as shown in FIG. Is added by adding a character string in which information on related items and / or belonging items is written in parentheses. In order to avoid becoming too redundant, information on related items is not added to individuals belonging to the company. As a result, the personal name additional expression (attribute information additional expression) whose proper noun (original expression) is “Sato” is “Promanet A”.

特には図示していないが、該当組織体制情報テーブル、対応顧客組織体制情報テーブル、及び社内従業員情報テーブルの何れにも存在しない個人名は、変換後名称が属性情報付加表現となる。 Although not shown in particular, the converted name of the personal name that does not exist in any of the corresponding organizational structure information table, the corresponding customer organizational structure information table, and the in-house employee information table is an attribute information additional expression.

付加表現生成部２３ｂは、上述したようにして、固有名詞が組織と個人の何れであるか判別し、その判別結果に応じて属性情報付加表現を固有名詞毎に生成する。その生成結果は文字列置換部２５に出力される。それにより文字列置換部２５は、資料取得部２１から入力した資料中に存在する固有名詞を、固有名詞変換部２３から指定の属性情報付加表現に置換して匿名化する。 As described above, the additional expression generation unit 23b determines whether the proper noun is an organization or an individual, and generates an attribute information additional expression for each proper noun according to the determination result. The generation result is output to the character string replacement unit 25. Thereby, the character string replacement unit 25 anonymizes the proper noun existing in the material input from the material acquisition unit 21 by replacing it with the specified attribute information additional expression from the proper noun conversion unit 23.

上記匿名化により、図１Ａに示す文書は、図４に示すものに変換されることとなる。その図４に示す文書は、図１Ｂに示す文書と比較して、元の文書の意味をより明確に表すものとなっている。このため、文書（資料）を匿名化しても、元の文書の持つ意味を読む人に正確に伝えることができるようになっている。 The anonymization converts the document shown in FIG. 1A into the one shown in FIG. The document shown in FIG. 4 expresses the meaning of the original document more clearly than the document shown in FIG. 1B. For this reason, even if the document (material) is anonymized, the meaning of the original document can be accurately conveyed to the reader.

上記資料の取得や資料の匿名化は、入力部２６へのユーザの操作によって行われる。匿名化した資料は、入力部２６を介したユーザの指示により、表示制御部２７によって表示装置ＤＰ上に表示させることが可能である。 Acquisition of the material and anonymization of the material are performed by a user operation on the input unit 26. The anonymized material can be displayed on the display device DP by the display control unit 27 in accordance with a user instruction via the input unit 26.

図１６は、上記文書匿名化装置２を実現できるコンピュータのハードウェア構成の一例を示す図である。ここで図１６を参照して、文書匿名化装置２を実現できるコンピュータの構成について具体的に説明する。混乱を避けるために以降、文書匿名化装置２は図１６に構成を示す１台のコンピュータによって実現されていることを前提として説明することとする。 FIG. 16 is a diagram illustrating an example of a hardware configuration of a computer capable of realizing the document anonymization apparatus 2. Here, with reference to FIG. 16, a configuration of a computer capable of realizing the document anonymization apparatus 2 will be specifically described. In order to avoid confusion, the document anonymization apparatus 2 will be described on the assumption that it is realized by a single computer having the configuration shown in FIG.

図１６に示すコンピュータは、ＣＰＵ６１、メモリ６２、入力装置６３、出力装置６４、外部記憶装置６５、媒体駆動装置６６、及びネットワーク接続装置６７を有し、これらがバス６８によって互いに接続された構成となっている。同図に示す構成は一例であり、これに限定されるものではない。 The computer shown in FIG. 16 includes a CPU 61, a memory 62, an input device 63, an output device 64, an external storage device 65, a medium drive device 66, and a network connection device 67, which are connected to each other via a bus 68. It has become. The configuration shown in the figure is an example, and the present invention is not limited to this.

ＣＰＵ６１は、当該コンピュータ全体の制御を行う。
メモリ６２は、プログラム実行、データ更新等の際に、外部記憶装置６５（あるいは可搬型の記録媒体６９）に記憶されているプログラムあるいはデータを一時的に格納するＲＡＭ等のメモリである。ＣＰＵ６１は、プログラムをメモリ６２に読み出して実行することにより、全体の制御を行う。The CPU 61 controls the entire computer.
The memory 62 is a memory such as a RAM that temporarily stores a program or data stored in the external storage device 65 (or the portable recording medium 69) during program execution, data update, or the like. The CPU 61 performs overall control by reading the program into the memory 62 and executing it.

入力装置６３は、例えば、キーボード、マウス等の入力装置と接続されたインターフェース、或いはそれらを全て有するものである。入力装置に対するユーザの操作を検出し、その検出結果をＣＰＵ６１に通知する。 The input device 63 includes, for example, an interface connected to an input device such as a keyboard and a mouse, or all of them. A user operation on the input device is detected, and the detection result is notified to the CPU 61.

出力装置６４は、例えば図２の表示装置ＤＰと接続された表示制御装置、或いはそれらを有するものである。ＣＰＵ６１の制御によって送られてくるデータを図２の表示装置ＤＰに出力させる。 The output device 64 is, for example, a display control device connected to the display device DP of FIG. The data sent under the control of the CPU 61 is output to the display device DP of FIG.

ネットワーク接続装置６７は、例えばイントラネットやインターネット等のネットワークを介して、外部装置と通信を行うためのものである。外部記憶装置６５は、例えばハードディスク装置である。主に各種データやプログラムの保存に用いられる。 The network connection device 67 is for communicating with an external device via a network such as an intranet or the Internet. The external storage device 65 is, for example, a hard disk device. Mainly used for storing various data and programs.

記憶媒体駆動装置６６は、光ディスクや光磁気ディスク等の可搬型の記録媒体ＭＤにアクセスするものである。
本実施の形態による文書匿名化装置２は、それに必要な機能を搭載したプログラム（以降「文書匿名化ソフト」と呼ぶ）をＣＰＵ６１が実行することで実現される。その匿名化ソフトは、記録媒体ＭＤに記録して配布しても良く、或いはネットワーク接続装置６７により取得できるようにしても良い。ここでは、外部記憶装置６５上に格納されていると想定する。The storage medium driving device 66 accesses a portable recording medium MD such as an optical disk or a magneto-optical disk.
The document anonymization apparatus 2 according to the present embodiment is realized by the CPU 61 executing a program (hereinafter referred to as “document anonymization software”) equipped with functions necessary for the document anonymization apparatus 2. The anonymization software may be recorded and distributed on the recording medium MD, or may be acquired by the network connection device 67. Here, it is assumed that the data is stored on the external storage device 65.

資料の取得は、ＣＰＵ６１の制御により媒体駆動装置６６、或いはネットワーク制御装置６７を介して行われ、メモリ６２或いは外部記憶装置６５に格納される。それにより上記の想定では、図２の資料取得部２１は、ＣＰＵ６１、メモリ６２、外部記憶装置６５、媒体駆動装置６６、ネットワーク制御装置６７及びバス６８により実現される。匿名化した資料を記録媒体ＭＤに格納、或いはネットワーク制御装置６７を介して出力可能であれば、固有名詞抽出部２２、固有名詞変換部２３、及び文字列置換部２５も同じ構成要素の組み合わせによって実現される。情報格納部２４は、例えば外部記憶装置６５、或いは記録媒体ＭＤをマウントした媒体駆動装置６６に相当する。入力部２６は、例えばＣＰＵ６１、メモリ６２、入力装置６３、外部記憶装置６５、及びバス６８によって実現される。表示制御部２７は、例えばＣＰＵ６１、メモリ６２、出力装置６４、外部記憶装置６５、及びバス６８によって実現される。 The acquisition of the material is performed via the medium driving device 66 or the network control device 67 under the control of the CPU 61 and stored in the memory 62 or the external storage device 65. 2 is realized by the CPU 61, the memory 62, the external storage device 65, the medium driving device 66, the network control device 67, and the bus 68. If the anonymized material can be stored in the recording medium MD or output via the network control device 67, the proper noun extraction unit 22, the proper noun conversion unit 23, and the character string replacement unit 25 are also combined by the same constituent elements. Realized. The information storage unit 24 corresponds to, for example, the external storage device 65 or the medium driving device 66 on which the recording medium MD is mounted. The input unit 26 is realized by, for example, the CPU 61, the memory 62, the input device 63, the external storage device 65, and the bus 68. The display control unit 27 is realized by, for example, the CPU 61, the memory 62, the output device 64, the external storage device 65, and the bus 68.

上述したような属性付加表現の生成、及び固有名詞の置換は、上記ＣＰＵ６１が以下の処理を実行することで実現される。次に図５〜図８を参照して、ＣＰＵ６１が実行する処理について詳細に説明する。各処理は何れも、上記文書匿名化ソフトをＣＰＵ６１が実行することによって実現される。 Generation of attribute addition expressions and replacement of proper nouns as described above are realized by the CPU 61 executing the following processing. Next, processing executed by the CPU 61 will be described in detail with reference to FIGS. Each process is realized by the CPU 61 executing the document anonymization software.

図５は、文書匿名化処理のフローチャートである。始めに図５を参照して、文書匿名化処理について詳細に説明する。その文書匿名化処理は、入力部２６を介してユーザが資料を指定することで起動される。複数の資料を指定することが可能であるが、ここでは混乱を避けるために、指定される資料は１つのみとの想定で説明する。 FIG. 5 is a flowchart of the document anonymization process. First, the document anonymization process will be described in detail with reference to FIG. The document anonymization process is started when the user designates a document via the input unit 26. Although a plurality of materials can be specified, here, in order to avoid confusion, the description will be made on the assumption that only one material is specified.

先ず、ステップＳ１では、資料（文書）から匿名化すべき固有名詞を抽出する。その固有名詞の抽出は、形態素解析を行い、図３Ａ〜図３Ｄに示す各テーブルを参照して行う。続くステップＳ２では、抽出した固有名詞毎に、それと置換すべき属性情報付加表現を生成する付加表現生成処理を実行する。その実行後はステップＳ３に移行して、資料中の固有名詞を生成した属性情報付加表現で置換することにより、匿名化した資料を生成して例えば外部記憶装置６５に保存する。その保存を行った後、文書匿名化処理を終了する。 First, in step S1, proper nouns to be anonymized are extracted from the material (document). The proper noun is extracted by performing morphological analysis and referring to the tables shown in FIGS. 3A to 3D. In subsequent step S2, for each extracted proper noun, an additional expression generation process for generating an attribute information additional expression to be replaced is executed. After the execution, the process proceeds to step S3, where the anonymized material is generated by replacing the proper noun in the material with the generated attribute information additional expression, and stored in the external storage device 65, for example. After the storage, the document anonymization process is terminated.

図６は、上記ステップＳ２として実行される付加表現生成処理のフローチャートである。次に図６を参照して、その生成処理について詳細に説明する。その生成処理には文書匿名化処理から固有名詞の抽出結果が渡される。 FIG. 6 is a flowchart of the additional expression generation process executed as step S2. Next, the generation process will be described in detail with reference to FIG. The result of the proper noun extraction is passed from the document anonymization process to the generation process.

先ず、ステップＳ１１では、抽出された固有名詞のなかから一つを選択し、その固有名詞が個人、及び組織のうちの何れを表すものか判定する。その判定は、例えば図３Ａ〜図３Ｄの各テーブルのなかで、その固有名詞を格納したレコードを持つものを特定することで行う。それにより、その固有名詞を持つレコードが該当組織体制情報テーブル、対応顧客組織体制情報テーブル、及び社内従業員情報テーブルのうちの何れかに存在する場合、固有名詞は個人を表すものと判定し、ステップＳ１３に移行する。そうでない場合には、つまり固有名詞を持つレコードが会社情報テーブルに存在する場合には、固有名詞は組織を表すものと判定し、ステップＳ１２に移行する。 First, in step S11, one of the extracted proper nouns is selected, and it is determined whether the proper noun represents an individual or an organization. The determination is performed by specifying, for example, each table in FIGS. 3A to 3D having a record storing the proper noun. Thereby, when the record having the proper noun exists in any of the corresponding organizational structure information table, the corresponding customer organizational structure information table, and the in-house employee information table, it is determined that the proper noun represents an individual, The process proceeds to step S13. If not, that is, if a record having a proper noun exists in the company information table, it is determined that the proper noun represents an organization, and the process proceeds to step S12.

ステップＳ１２では、固有名詞と置換すべき属性情報付加表現である組織名付加表現を生成する組織名付加表現生成処理を実行する。他方のステップＳ１３では、固有名詞と置換すべき属性情報付加表現である個人名付加表現を生成するために、個人名付加表現生成処理を実行する。それら生成処理のうちの一方を実行した後はステップＳ１４に移行して、全ての固有名詞を処理、つまり固有名詞と置換すべき属性情報付加表現を全て生成したか否か判定する。属性情報付加表現を生成すべき固有名詞が残っている場合、判定はＮＯとなって上記ステップＳ１１に戻り、残っている固有名詞のなかから一つを選択しての判定処理を行う。そうでない場合には、判定はＹＥＳとなり、ここで付加表現生成処理を終了する。 In step S12, an organization name additional expression generation process for generating an organization name additional expression which is an attribute information additional expression to be replaced with a proper noun is executed. In step S13, personal name additional expression generation processing is executed to generate a personal name additional expression that is an attribute information additional expression to be replaced with a proper noun. After executing one of the generation processes, the process proceeds to step S14 to determine whether all proper nouns have been processed, that is, whether all attribute information additional expressions to be replaced with proper nouns have been generated. If there remains a proper noun for which an attribute information additional expression is to be generated, the determination is no, the process returns to step S11, and a determination process is performed by selecting one of the remaining proper nouns. Otherwise, the determination is yes, and the additional expression generation process ends here.

図９及び図１０を参照して詳述したように、本実施の形態では固有名詞が組織、及び個人の何れを表すものであるかに応じて、構成の異なる匿名化情報テーブルを生成するようになっている。そのために、上記ステップＳ１１で判定処理を行い、その判定結果に応じて異なる処理を実行している。以降は、その判定結果に応じて実行される各付加表現生成処理について詳細に説明する。 As described in detail with reference to FIG. 9 and FIG. 10, in this embodiment, anonymization information tables having different configurations are generated according to whether the proper noun represents an organization or an individual. It has become. For this purpose, determination processing is performed in step S11, and different processing is executed according to the determination result. Hereinafter, each additional expression generation process executed according to the determination result will be described in detail.

図７は、上記ステップＳ１２として実行される組織名付加表現生成処理のフローチャートである。始めに図７を参照して、その付加表現生成処理について詳細に説明する。この生成処理を実行する度に、図９に示す変換後名称テーブル、匿名化情報テーブル、及び匿名化変換テーブルには１レコードが追加される。それらのテーブルの生成は、特には図示していないが、例えば最初にこの生成処理を実行した際に行われる。 FIG. 7 is a flowchart of the organization name additional expression generation process executed as step S12. First, the additional expression generation processing will be described in detail with reference to FIG. Each time this generation process is executed, one record is added to the post-conversion name table, the anonymization information table, and the anonymization conversion table shown in FIG. The generation of these tables is not particularly illustrated, but is performed, for example, when the generation process is first executed.

先ず、ステップＳ２１では、対象とする固有名詞（組織名）を格納したレコードを会社情報テーブル（図３Ｃ）から抽出する。続くステップＳ２２では、その固有名詞と置換すべき変換後名称を設定して変換後名称テーブルに１レコードを追加すると共に、その固有名詞の属性情報付加表現生成用の情報を格納した１レコードを匿名化情報テーブルに追加する。その後に移行するステップＳ２３では、匿名化情報テーブルに追加した１レコードに格納されている情報を用いて属性情報付加表現を生成し、その付加表現、及び固有名詞（オリジナル表現）を格納した１レコードを匿名化変換テーブルに追加する。その追加を行った後、組織名付加表現生成処理を終了する。 First, in step S21, a record storing a target proper noun (organization name) is extracted from the company information table (FIG. 3C). In the subsequent step S22, the post-conversion name to be replaced with the proper noun is set, one record is added to the post-conversion name table, and one record storing the information for generating the attribute information additional expression of the proper noun is anonymous. Add to the conversion information table. In step S23, the attribute information additional expression is generated using the information stored in the one record added to the anonymization information table, and the additional record and the proper noun (original expression) are stored. Is added to the anonymization conversion table. After the addition, the organization name additional expression generation process is terminated.

図８は、上記ステップＳ１３として実行される個人名付加表現生成処理のフローチャートである。次に図８を参照して、その付加表現生成処理について詳細に説明する。この生成処理を実行する度に、図１０に示す変換後名称テーブル、匿名化情報テーブル、及び匿名化変換テーブルには１レコードが追加される。それらのテーブルの生成は、特には図示していないが、例えば最初にこの生成処理を実行した際に行われる。 FIG. 8 is a flowchart of the personal name additional expression generation process executed as step S13. Next, the additional expression generation processing will be described in detail with reference to FIG. Each time this generation process is executed, one record is added to the post-conversion name table, the anonymization information table, and the anonymization conversion table shown in FIG. The generation of these tables is not particularly illustrated, but is performed, for example, when the generation process is first executed.

先ず、ステップＳ３１では、対象とする固有名詞（個人名）を格納したレコードを該当組織体制情報テーブル（図３Ａ）、対応顧客組織体制情報テーブル、及び社内従業員情報テーブル（図３Ｄ）のうちの何れかから抽出する。続くステップＳ３２では、レコードを抽出したテーブルから特定される会社情報を格納したレコードを会社情報テーブル（図３Ｃ）から抽出する。その抽出後はステップＳ３３に移行する。 First, in step S31, a record storing a target proper noun (personal name) is selected from the corresponding organizational structure information table (FIG. 3A), the corresponding customer organizational structure information table, and the in-house employee information table (FIG. 3D). Extract from either. In the subsequent step S32, a record storing company information specified from the table from which the record is extracted is extracted from the company information table (FIG. 3C). After the extraction, the process proceeds to step S33.

ステップＳ３３では、その固有名詞（個人）に置換すべき変換後名称を設定して変換後名称テーブルに１レコードを追加すると共に、その固有名詞の属性情報付加表現生成用の情報を格納した１レコードを匿名化情報テーブルに追加する。匿名化情報テーブルに追加するレコードは、会社情報テーブルから抽出したレコードに格納されている情報を用いて行う。その後に移行するステップＳ３４では、固有名詞で表される個人が自組織のメンバーか否かに応じて、匿名化情報テーブルに追加した１レコードに格納されている情報を用いて属性情報付加表現を生成し、その付加表現、及び固有名詞（オリジナル表現）を格納した１レコードを匿名化変換テーブルに追加する。その追加を行った後、個人名付加表現生成処理を終了する。 In step S33, a post-conversion name to be replaced with the proper noun (individual) is set, one record is added to the post-conversion name table, and one record storing information for generating attribute information additional expressions of the proper noun Is added to the anonymization information table. The record added to the anonymization information table is performed using information stored in the record extracted from the company information table. In step S34, the attribute information additional expression is expressed using information stored in one record added to the anonymized information table depending on whether the individual represented by the proper noun is a member of the own organization. Generate and add one record storing the additional expression and proper noun (original expression) to the anonymization conversion table. After the addition, the personal name additional expression generation process is terminated.

図１０に示すように、自組織に属するメンバーの属性情報付加表現では、そうでないメンバーの属性情報付加表現とは異なり、括弧書きによる属性情報の付加は行わない。それにより、括弧書きによる属性情報の付加の有無により、個人が自組織に属するメンバーか否か特定できるようにしている。 As shown in FIG. 10, in the attribute information additional expression of members belonging to the own organization, the attribute information is not added in parentheses, unlike the attribute information additional expression of other members. Thereby, it is possible to specify whether or not an individual is a member belonging to his / her own organization depending on whether or not attribute information is added in parentheses.

本実施の形態では、属性付加表現の生成、及び固有名詞の置換は基本的には上述したようにして行っている。資料の匿名化をよりユーザにとって適切に行えるように、以下の２つの機能を搭載させている。その２つの機能は共に、使用するか（有効か）否かは入力部２６を操作して選択する。 In the present embodiment, the generation of attribute addition expressions and the substitution of proper nouns are basically performed as described above. The following two functions are installed so that anonymization of materials can be performed more appropriately for the user. Whether or not both of these functions are used (valid) is selected by operating the input unit 26.

同じ固有名詞が複数、出現する資料（文書）は多い。そのような資料で同じ固有名詞を同じ属性情報付加表現で置換すると、匿名化後の文書は図１１Ａに示すように冗長と感じられるものとなる可能性がある。２つの機能のうちの一つは、匿名化によって資料が冗長となるのを回避するためのものである。 There are many materials (documents) in which multiple identical proper nouns appear. If the same proper noun is replaced with the same attribute information additional expression in such materials, the anonymized document may feel redundant as shown in FIG. 11A. One of the two functions is to avoid redundant documents due to anonymization.

その機能により、図１１Ａに示すように匿名化される資料は図１１Ｂに示すように匿名化される。図１１Ｂに示すような匿名化を行うために、属性情報付加表現は２つ生成する。１つは上述したようなものであり、もう一つは属性情報を付加しない簡潔なもの（簡潔表現）である。それにより、複数、出現する同じ固有名詞は、最初に出現するものは通常の属性情報付加表現で置換し、２回目以降に出現するものは簡潔表現（変換後名称）で置換する。そのようにして冗長的となるのを回避し、図１１Ｂに示すような匿名化を実現させる。以降、その機能については「冗長性排除機能」と呼ぶことにする。 By the function, the material to be anonymized as shown in FIG. 11A is anonymized as shown in FIG. 11B. In order to perform anonymization as shown in FIG. 11B, two attribute information additional expressions are generated. One is as described above, and the other is a simple one (simply expressed) without adding attribute information. Accordingly, a plurality of the same proper nouns appearing first are replaced with a normal attribute information additional expression, and those appearing after the second time are replaced with a concise expression (name after conversion). In this way, the redundancy is avoided and anonymization as shown in FIG. 11B is realized. Hereinafter, this function is referred to as “redundancy elimination function”.

図１２は、冗長性の排除を可能とする固有名詞の匿名化方法を説明する図である。その図１２に示す匿名化方法は、固有名詞として組織名を匿名化するケースのものである。
上述したように属性情報付加表現は、匿名化情報テーブルに格納された情報を用いて生成され、匿名化変換テーブルに格納される（図９及び図１０）。冗長性回避機能を用いた場合、図１２に示すように、匿名化変換テーブルには「簡潔表現」項目、及び「カウンタ」項目が追加される。その簡潔表現項目は、簡潔表現を格納する項目であり、カウンタ項目は、固有名詞の出現数の計数結果を格納する項目である。図１２に表記した「０」は初期値である。このため、その値（カウント値）が０でなければ固有名詞は複数、出現することを意味する。それにより、そのカウント値が０か否かに応じて、属性情報付加表現、及び簡潔表現のうちの何れかを選択し、選択した表現で固有名詞を置換するようにしている。これは、固有名詞が個人名の場合も同じである。FIG. 12 is a diagram for explaining a method for anonymizing proper nouns that can eliminate redundancy. The anonymization method shown in FIG. 12 is for a case where an organization name is anonymized as a proper noun.
As described above, the attribute information additional expression is generated using information stored in the anonymization information table and stored in the anonymization conversion table (FIGS. 9 and 10). When the redundancy avoidance function is used, as shown in FIG. 12, a “concise expression” item and a “counter” item are added to the anonymization conversion table. The concise expression item is an item that stores a concise expression, and the counter item is an item that stores a count result of the number of appearances of proper nouns. “0” shown in FIG. 12 is an initial value. For this reason, if the value (count value) is not 0, it means that a plurality of proper nouns appear. Thus, either the attribute information additional expression or the concise expression is selected according to whether the count value is 0, and the proper noun is replaced with the selected expression. This is the same when the proper noun is an individual name.

図１２に示すような匿名化変換テーブルへのレコードの追加は、固有名詞が組織名であれば図７のステップＳ２３で行われ、固有名詞が個人名であれば図８のステップＳ３４で行われる。固有名詞との置換は、図５のステップＳ３で図１３に示す固有名詞置換処理を実行することで実現される。ここでその置換処理について、図１３に示すフローチャートを参照して詳細に説明する。この置換処理での固有名詞の置換は、図５のステップＳ１における形態素解析により固有名詞と判別された単語を資料の先頭から順次、処理していくことで行われる。 The addition of the record to the anonymization conversion table as shown in FIG. 12 is performed in step S23 of FIG. 7 if the proper noun is an organization name, and in step S34 of FIG. 8 if the proper noun is a personal name. . The replacement with the proper noun is realized by executing the proper noun replacement process shown in FIG. 13 in step S3 of FIG. Here, the replacement process will be described in detail with reference to the flowchart shown in FIG. The replacement of proper nouns in this replacement processing is performed by sequentially processing words determined as proper nouns by morphological analysis in step S1 of FIG. 5 from the top of the material.

先ず、ステップＳ４１では、処理対象とする固有名詞がオリジナル表現として匿名化変換テーブルに存在するか否か判定する。その固有名詞をオリジナル表現として格納したレコードを匿名化変換テーブルから抽出できた場合、判定はＹＥＳとなってステップＳ４３に移行する。そうでない場合には、判定はＮＯとなり、ステップＳ４２で処理対象を次の固有名詞（単語）に変更した後、再度、ステップＳ４１の判定を行う。 First, in step S41, it is determined whether or not the proper noun to be processed exists as an original expression in the anonymization conversion table. If the record storing the proper noun as the original expression can be extracted from the anonymization conversion table, the determination is yes and the process proceeds to step S43. Otherwise, the determination is no, the process target is changed to the next proper noun (word) in step S42, and then the determination in step S41 is performed again.

ステップＳ４３では、抽出したレコードのカウンタ項目の値が０か否か判定する。その値が０であった場合、判定はＹＥＳとなり、ステップＳ４４で固有名詞（元の文字列）をそのレコードの属性情報付加表現により置換し、更にステップＳ４５でカウンタ項目の値をインクリメントした後、ステップＳ４７に移行する。そうでない場合には、判定はＮＯとなり、ステップＳ４６で固有名詞（元の文字列）をそのレコードの簡潔表現により置換した後、そのステップＳ４７に移行する。 In step S43, it is determined whether or not the value of the counter item in the extracted record is zero. If the value is 0, the determination is YES, the proper noun (original character string) is replaced with the attribute information additional expression of the record in step S44, and the counter item value is further incremented in step S45. Control goes to step S47. Otherwise, the determination is no, and after replacing the proper noun (original character string) with the concise expression of the record in step S46, the process proceeds to step S47.

ステップＳ４７では、置換した固有名詞が資料（文章）の最後に位置するものか否か判定する。処理すべき固有名詞が残っていない場合、判定はＹＥＳとなり、ここで固有名詞置換処理を終了する。そうでない場合には、判定はＮＯとなって上記ステップＳ４２に移行する。 In step S47, it is determined whether or not the replaced proper noun is located at the end of the material (sentence). If there are no proper nouns to be processed, the determination is yes, and the proper noun replacement process ends here. Otherwise, the determination is no and the process moves to step S42.

次に、残りの機能について詳細に説明する。
属性情報付加表現は、上述したように、変換後名称に属性情報を１つ以上、付加する形で生成する。そのように生成するため、付加する属性情報の数や種類を選択することが可能である。もう一つの機能は、このことに着目して、生成させる属性情報付加表現をユーザが選択できるようにするものである。それにより以降、その機能は「表現選択機能」と呼ぶことにする。Next, the remaining functions will be described in detail.
As described above, the attribute information additional expression is generated by adding one or more attribute information to the post-conversion name. Because of such generation, it is possible to select the number and type of attribute information to be added. Another function focuses on this and allows the user to select the attribute information additional expression to be generated. Henceforth, the function will be referred to as “expression selection function”.

図１４Ａ及び図１４Ｂは、生成させる属性情報付加表現の選択方法を示す図である。図１４Ａは固有名詞が組織名の場合のものであり、図１４Ｂは固有名詞が個人名の場合のものである。 14A and 14B are diagrams illustrating a method of selecting an attribute information additional expression to be generated. FIG. 14A shows the case where the proper noun is an organization name, and FIG. 14B shows the case where the proper noun is an individual name.

図１４Ａ及び図１４Ｂに示すように、本実施の形態では、属性情報付加表現の持つ情報量（詳しさ度合い）、つまりその生成に用いる属性情報の数をユーザに選択させるためのスライダーを表示装置ＤＰ上に表示させ、入力部２６をユーザが操作してそのスライダーのつまみの位置を移動させることにより、生成される属性情報付加表現の情報量を選択可能とさせている。そのスライダーの左側には最大の情報量のときに生成される匿名化変換テーブルを、その右側には最小の情報量のときに生成される匿名化変換テーブルを併せて示している。それらの匿名化変換テーブルから明らかなように、つまみの位置による情報量の変更は、括弧書きさせる属性情報の数を増減させることで行っている。 As shown in FIG. 14A and FIG. 14B, in this embodiment, a display device is provided with a slider for allowing the user to select the amount of information (detail level) possessed by the attribute information additional expression, that is, the number of attribute information used for the generation. It is displayed on the DP, and the user can operate the input unit 26 to move the position of the slider knob to select the information amount of the attribute information additional expression to be generated. The anonymization conversion table generated at the maximum information amount is shown on the left side of the slider, and the anonymization conversion table generated at the minimum information amount is shown on the right side. As is clear from these anonymization conversion tables, the amount of information is changed depending on the position of the knob by increasing or decreasing the number of attribute information to be written in parentheses.

選択可能とする対象は、情報量に限定されるわけではなく、属性情報付加表現の生成に用いる属性情報自体を選択可能としても良い。それにより組織であれば所在地、業種、或いは規模等を選択可能としても良く、個人であれば所属、性別、或いは役職等を選択可能としても良い。 The target to be selectable is not limited to the amount of information, and the attribute information itself used for generating the attribute information additional expression may be selectable. As a result, if it is an organization, it may be possible to select a location, a business type, or a scale, and if it is an individual, it may be possible to select an affiliation, gender, job title, or the like.

本実施の形態では、組織名、個人名の種類別に、属性情報付加表現の生成内容を選択できるようにしている。これは、自組織内の人向けには組織に関する詳しい情報はあまり重要でない場合が多く、外部の人向けには個人に関する詳しい情報はあまり重要でない場合が多いためである。つまり、資料を閲覧する人によって、個人名、及び組織名の各属性情報付加表現に求められる情報が異なることが多いからである。 In the present embodiment, the generation content of the attribute information additional expression can be selected for each type of organization name and personal name. This is because detailed information about the organization is often not very important for people within the organization, and detailed information about the individual is often not important for people outside the organization. That is, information required for each attribute information additional expression of the personal name and the organization name is often different depending on the person browsing the material.

上述したように属性情報付加表現は、図９及び図１０にそれぞれ示す匿名化情報テーブルに格納された情報を用いて生成される。その付加表現の生成内容を変更可能としていることから、実際に生成された付加表現を表示し、ユーザに確認させるようにしている。その確認は、図１５にフローチャートを示す付加表現選択処理を実行することで実現させている。次に図１５を参照して、その選択処理について詳細に説明する。 As described above, the attribute information additional expression is generated using information stored in the anonymization information tables shown in FIGS. Since the generation content of the additional expression can be changed, the actually generated additional expression is displayed to allow the user to confirm. The confirmation is realized by executing an additional expression selection process shown in the flowchart of FIG. Next, the selection process will be described in detail with reference to FIG.

その選択処理は、例えば図６のステップＳ１４の判定がＹＥＳとなった場合に、付加表現生成処理を終了させる前に実行される。それにより、属性情報付加表現（匿名化変換テーブル）は、予め定めた情報量（詳しさ度合い）に従って生成し、ユーザによる詳しさ度合いの変更により再度、生成するようにしている。匿名化変換テーブルは組織用と個人用の２種類、生成され、それぞれ詳しさ度合いを変更することが可能であるが、ここでは便宜的に匿名化変換テーブルは一つとの想定で説明する。 The selection process is executed before the additional expression generation process is ended, for example, when the determination in step S14 of FIG. 6 is YES. Thereby, the attribute information additional expression (anonymization conversion table) is generated according to a predetermined amount of information (detail level), and is generated again by changing the detail level by the user. There are two types of anonymization conversion tables, one for organization and one for personal use, and the degree of detail can be changed. However, for the sake of convenience, description will be made on the assumption that there is one anonymization conversion table.

先ず、ステップＳ５１では、生成した匿名化変換テーブルを表示させると共に、図１４Ａ及び図１４Ｂにそれぞれ示すスライダーを表示させ、入力部２６へのユーザの操作に応じて詳しさ度合いを変更する。ステップＳ５２には、その詳しさ度合いをユーザが変更した場合に移行する。その詳しさ度合いをユーザが変更しなかった場合、つまり最後に生成された匿名化変換テーブルでの匿名化をユーザが指示した場合には、特には図示していないが、ここで付加表現選択処理を終了する。 First, in step S51, the generated anonymization conversion table is displayed, and the sliders shown in FIGS. 14A and 14B are displayed, and the degree of detail is changed according to the user's operation on the input unit 26. The process proceeds to step S52 when the user changes the degree of detail. When the user does not change the level of detail, that is, when the user instructs anonymization in the last generated anonymization conversion table, the additional expression selection process is not shown here. Exit.

ステップＳ５２では、属性情報付加表現の生成用の規則（図中「属性情報付加表現変換規則」と表記）を変更後の詳しさ度合いに応じて選択する。その規則は、ここでは属性情報付加表現の生成に用いることが可能な属性情報のなかから実際に用いるものを選択するための規則である。続くステップＳ５３では、選択した規則を使用して、匿名化情報テーブルから匿名化変換テーブルを生成（作成）し、次のステップＳ５４でその変換テーブルを表示させる。 In step S52, a rule for generating an attribute information additional expression (indicated as “attribute information additional expression conversion rule” in the figure) is selected according to the degree of detail after the change. Here, the rule is a rule for selecting what is actually used from among the attribute information that can be used to generate the attribute information additional expression. In the subsequent step S53, an anonymization conversion table is generated (created) from the anonymization information table using the selected rule, and the conversion table is displayed in the next step S54.

変換テーブルを表示させた後に移行するステップＳ５５では、入力終了か否か、つまりその変換テーブルでの匿名化を指示する操作をユーザが行ったか否か判定する。その操作をユーザが行った場合、判定はＹＥＳとなり、ここで付加表現選択処理を終了する。そうでない場合には、判定はＮＯとなり、上記ステップＳ５１に戻る。それにより、ユーザがスライダーを操作しての詳しさ度合いの変更に対応する。 In step S55 to which the process proceeds after displaying the conversion table, it is determined whether or not the input has been completed, that is, whether or not the user has performed an operation for instructing anonymization in the conversion table. If the user performs the operation, the determination is yes, and the additional expression selection process ends here. Otherwise, the determination is no and the process returns to step S51. Thereby, it corresponds to the change of the degree of detail when the user operates the slider.

上記付加表現選択機能を有効とさせた場合、上述したような付加表現選択処理が実行される。その付加表現選択機能は、冗長性排除機能と共に有効とさせることができる。その冗長性排除機能が共に有効となっている場合、匿名化変換テーブルとしては図１２に示すようなものが生成されることになる。 When the additional expression selection function is enabled, the additional expression selection process as described above is executed. The additional expression selection function can be made effective together with the redundancy elimination function. When both the redundancy elimination functions are valid, the anonymization conversion table as shown in FIG. 12 is generated.

なお、本実施の形態では、属性情報付加表現は１つの属性情報は変換後名称に直接的に付加し、他の属性情報は括弧書きで付加しているが、属性情報の付加の仕方はそのようなものに限定されるわけではない。その付加の仕方としては、任意に変更しても良い。固有名詞は組織、個人の２種類に大別しているが、例えば組織は会社、その会社を構成する部門、個人は役職の有無、その地位といったものに着目してより多くの種類に分け、種類毎に付加の仕方を決定しても良い。 In this embodiment, in the attribute information addition expression, one attribute information is directly added to the name after conversion, and the other attribute information is added in parentheses. It is not limited to such a thing. The addition method may be arbitrarily changed. Proper nouns are broadly divided into two types: organizations and individuals. For example, organizations are divided into more types, focusing on the company, the departments that make up the company, the individual's presence or absence, and the status of each individual. The method of addition may be determined.

Claims

In a computer that can be used as a document anonymization device that anonymizes proper nouns in a document,
A proper noun extraction function for extracting proper nouns in the document;
An information acquisition function for acquiring at least one attribute information related to the proper noun extracted by the proper noun extraction function;
When the attribute information is acquired by the information acquisition function, using the attribute information, a character string generation function for generating a character string to be replaced with the proper noun extracted by the proper noun extraction function;
A character string replacement function for replacing the proper noun extracted from the material by the proper noun extraction function with the character string generated by the character string generation function,
A program to realize

The character string generation function generates an expression in which the proper noun is anonymized for each proper noun that has acquired the attribute information by the information acquisition function, and adds the attribute information to the generated expression to add the character information. Generate columns,
The program according to claim 1.

The character string generation function generates a simple expression that is a character string with a small amount of information compared to the character string for each proper noun from which the information acquisition function has acquired the attribute information,
The character string replacement function replaces the first proper noun that appears in the document with the character string when the same proper noun exists in the document, and the second proper proper noun is Replace with the concise expression,
The program according to claim 1.

A selection function for selecting the character string to be generated by the character string generation function;
The character string generation function generates the character string using the attribute information acquired by the information acquisition function in response to a user's selection instruction via the selection function.
The program according to claim 1.

In a document anonymization device that anonymizes proper nouns in a document,
Proper noun extraction means for extracting proper nouns in the document;
Information acquisition means for acquiring at least one attribute information related to the proper noun extracted by the proper noun extraction means;
When the information acquisition unit acquires the attribute information, using the attribute information, a character string generation unit that generates a character string to be replaced with the proper noun extracted by the proper noun extraction unit;
Character string replacement means for replacing the proper noun extracted from the material by the proper noun extraction means with the character string generated by the character string generation means,
A document anonymization device comprising:

The character string generation unit generates, for each proper noun for which the information acquisition unit has acquired the attribute information, a concise expression that is a character string with a small amount of information compared to the character string,
The character string replacement means replaces the proper noun that appears first in the document with the character string when there are a plurality of the same proper nouns in the document, and the proper noun that appears second or later is Replace with the concise expression,
The document anonymization device according to claim 5, wherein:

A selecting means for selecting the character string to be generated by the character string generating means;
The character string generation means generates the character string using the attribute information acquired by the information acquisition means in response to a user's selection instruction via the selection means.
The document anonymization device according to claim 5, wherein:

In a document anonymization method for anonymizing a proper noun in a document by a computer,
Extracting proper nouns in the document;
For each extracted proper noun, obtain at least one attribute information related to the proper noun,
A character string to be replaced with the proper noun obtained from the attribute information is generated by adding the attribute information to an anonymized expression of the proper noun,
Replacing the generated string with the proper noun,
A document anonymization method characterized by the above.