JP2013246547A

JP2013246547A - Data converter

Info

Publication number: JP2013246547A
Application number: JP2012118365A
Authority: JP
Inventors: Toshihiko Sasaki; 俊彦佐々木
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2012-05-24
Filing date: 2012-05-24
Publication date: 2013-12-09
Anticipated expiration: 2032-05-24
Also published as: JP5687656B2

Abstract

【課題】個人情報をマスクした後の文書データについて、テストデータとしての品質の低下を抑制する。
【解決手段】データ変換装置１４の原本文書取得部３２は、原本の文書データを本番機１０から取得する。個人情報検出部３４は、原本の文書データの中から、個人情報を検出する。置換データ取得部３８は、個人情報検出部３４が検出した個人情報について、そのハッシュ値を示す置換データを取得する。変換文書出力部４２は、原本の文書データにおける個人情報を置換データへ置換した文書データを、テストデータとして記録メディア１６へ出力する。
【選択図】図２[PROBLEMS] To suppress deterioration of quality as test data for document data after masking personal information.
An original document acquisition unit of a data conversion apparatus acquires original document data from a production machine. The personal information detection unit 34 detects personal information from the original document data. The replacement data acquisition unit 38 acquires replacement data indicating the hash value of the personal information detected by the personal information detection unit 34. The converted document output unit 42 outputs the document data obtained by replacing the personal information in the original document data with the replacement data to the recording medium 16 as test data.
[Selection] Figure 2

Description

本発明はデータ処理技術に関し、特に文書データの内容を変換する技術に関する。 The present invention relates to a data processing technique, and more particularly to a technique for converting the contents of document data.

情報システムの運用フェーズでは、保守作業のために、本番環境（言い換えれば商用環境）に蓄積されたデータをテスト用のデータとして抽出することがある。そして抽出したデータを開発環境（言い換えればテスト環境）へ導入し、開発環境にて各種のテストを実施することがある。 In the operation phase of the information system, data accumulated in the production environment (in other words, commercial environment) may be extracted as test data for maintenance work. Then, the extracted data may be introduced into a development environment (in other words, a test environment), and various tests may be performed in the development environment.

本番環境に蓄積されたデータには個人情報が含まれることがある。個人情報の保護が重視される現在、個人情報をマスキングする技術が提案されている。 Data stored in the production environment may contain personal information. Currently, protection of personal information is emphasized, and techniques for masking personal information have been proposed.

特開２００６−２２１５６０号公報JP 2006-221560 A 特開２０１０−１５２４５９号公報JP 2010-152459 A

既述したように、本番環境から抽出されたデータが、開発環境においてテストデータとして用いられることがある。本発明者は、本番環境から抽出したデータに含まれる個人情報をマスクした場合に、テストデータとしての品質が低下することがあるという着想を得た。 As described above, data extracted from the production environment may be used as test data in the development environment. The inventor has come up with the idea that the quality of test data may be reduced when personal information contained in data extracted from the production environment is masked.

本発明は、本発明者の上記着想に基づきなされたものであり、その主な目的は、個人情報をマスクした後の文書データについて、テストデータとしての品質の低下を抑制する技術を提供することである。 The present invention has been made on the basis of the above-mentioned idea of the present inventor, and its main object is to provide a technique for suppressing deterioration in quality as test data for document data after masking personal information. It is.

上記課題を解決するために、本発明のある態様のデータ変換装置は、原本の文書データの中から、個人情報を検出する検出部と、検出部により検出された個人情報について、そのハッシュ値を示す置換データを取得する置換データ取得部と、原本の文書データにおける個人情報を置換データへ置換した文書データを出力する出力部と、を備える。 In order to solve the above problems, a data conversion apparatus according to an aspect of the present invention includes a detection unit that detects personal information from original document data, and a hash value of the personal information detected by the detection unit. A replacement data acquisition unit that acquires the replacement data to be shown, and an output unit that outputs document data obtained by replacing the personal information in the original document data with the replacement data.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、システム、プログラム、プログラムを格納した記録媒体などの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, and the expression of the present invention converted between a method, a system, a program, a recording medium storing the program, and the like are also effective as an aspect of the present invention.

本発明によれば、個人情報をマスクした後の文書データについて、テストデータとしての品質の低下を抑制することができる。 According to the present invention, it is possible to suppress deterioration in quality as test data for document data after masking personal information.

実施の形態の情報システムの構成を示す図である。It is a figure which shows the structure of the information system of embodiment. 図１のデータ変換装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the data converter of FIG. 個人情報の抽出結果を示す図である。It is a figure which shows the extraction result of personal information. 対応規則を示す図である。It is a figure which shows a correspondence rule. 置換規則を示す図である。It is a figure which shows a replacement rule. ユーザ設定画面を示す図である。It is a figure which shows a user setting screen. データ変換装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a data converter. 図７（ａ）に続く動作を示すフローチャートである。It is a flowchart which shows the operation | movement following Fig.7 (a). 複数のテーブルのマスク処理を模式的に示す図である。It is a figure which shows typically the mask process of a some table. ログデータのマスク処理を模式的に示す図である。It is a figure which shows the mask process of log data typically.

図１は、実施の形態の情報システム１００の構成を示す。情報システム１００は、例えば、小売業者や金融業者のための情報処理サービスを提供するものであり、その開発・運用・保守をＳＩ企業が担当する。情報システム１００は、本番機１０と開発機１２とデータ変換装置１４を含む。 FIG. 1 shows a configuration of an information system 100 according to the embodiment. The information system 100 provides, for example, an information processing service for a retailer or a financial company, and an SI company takes charge of development, operation, and maintenance thereof. The information system 100 includes a production machine 10, a development machine 12, and a data conversion device 14.

本番機１０は、本番環境に設置されたウェブサーバや、アプリケーションサーバ、データベースサーバ等の情報処理装置である。本番機１０は、顧客企業やエンドユーザに対する商用の情報処理サービスを提供し、また、顧客企業やエンドユーザの個人情報に該当する各種情報を含む文書データを保持する。この文書データは、データベースサーバが管理するテーブルのデータを含む。またＣＳＶファイルや、フリーフォーマットのログファイル、固定長ファイル等を含む。 The production machine 10 is an information processing apparatus such as a web server, an application server, or a database server installed in a production environment. The production machine 10 provides commercial information processing services for customer companies and end users, and holds document data including various information corresponding to personal information of customer companies and end users. This document data includes table data managed by the database server. Also includes CSV files, free format log files, fixed length files, and the like.

開発機１２は開発環境に設置された情報処理装置である。また開発機１２は、本番機１０にインストールされたアプリケーションについて、そのトラブル解析や、バグ改修、機能追加等の作業（以下、総称して「保守作業」とも呼ぶ。）を行うための情報処理装置である。実施の形態では、開発機１２での保守作業の効率を高めるために、その保守作業に用いるテストデータとして、本番環境に保持される文書データに対応した文書データを用いる。 The development machine 12 is an information processing apparatus installed in the development environment. The development machine 12 is an information processing apparatus for performing work such as trouble analysis, bug repair, and function addition (hereinafter also collectively referred to as “maintenance work”) for the application installed in the production machine 10. It is. In the embodiment, in order to increase the efficiency of maintenance work in the development machine 12, document data corresponding to document data held in the production environment is used as test data used for the maintenance work.

データ変換装置１４は、本番環境における文書データ（以下、「原本文書データ」とも呼ぶ。）を本番機１０から取得する。そして、その原本文書データに含まれる個人情報をマスクした文書データ（以下、「テスト用文書データ」とも呼ぶ。）へ変換し、記録メディア１６へ記録する。データ変換装置１４は一般的なＰＣであってもよい。ＳＩ企業の担当者は、記録メディア１６に記録されたテスト用文書データを開発機１２に読み込ませて、開発機１２での保守作業を実施する。 The data conversion apparatus 14 acquires document data in the production environment (hereinafter also referred to as “original document data”) from the production machine 10. Then, the personal information contained in the original document data is converted into document data masked (hereinafter also referred to as “test document data”) and recorded on the recording medium 16. The data converter 14 may be a general PC. The person in charge of the SI company reads the test document data recorded on the recording medium 16 into the development machine 12 and performs maintenance work on the development machine 12.

図２は、図１のデータ変換装置１４の機能構成を示すブロック図である。データ変換装置１４は、各種データを保持する記憶領域であるデータ保持部２０と、各種データ処理を実行するデータ処理部３０を備える。データ保持部２０は、抽出データ保持部２２と、対応関係保持部２４と、置換規則保持部２６と、除外規則保持部２８を含む。データ処理部３０は、原本文書取得部３２と、個人情報検出部３４と、置換規則決定部３６と、置換データ取得部３８と、文書変換部４０と、変換文書出力部４２と、ユーザ設定支援部４４を含む。 FIG. 2 is a block diagram showing a functional configuration of the data converter 14 shown in FIG. The data conversion device 14 includes a data holding unit 20 that is a storage area for holding various types of data, and a data processing unit 30 that executes various types of data processing. The data holding unit 20 includes an extracted data holding unit 22, a correspondence relation holding unit 24, a replacement rule holding unit 26, and an exclusion rule holding unit 28. The data processing unit 30 includes an original document acquisition unit 32, a personal information detection unit 34, a replacement rule determination unit 36, a replacement data acquisition unit 38, a document conversion unit 40, a converted document output unit 42, and a user setting support. Part 44 is included.

本明細書のブロック図において示される各ブロックは、ハードウェア的には、コンピュータのＣＰＵをはじめとする素子や機械装置で実現でき、ソフトウェア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウェア、ソフトウェアの組合せによっていろいろなかたちで実現できることは、当業者には理解されるところである。例えば図２の各ブロックは、プログラムモジュールとして記録媒体に格納され、その記録媒体を介してデータ変換装置１４のストレージへインストールされてもよい。そしてデータ変換装置１４において、各ブロックに対応するプログラムモジュールをメインメモリへ随時読み出し、ＣＰＵにより実行することで、各ブロックの機能を実現してもよい。 Each block shown in the block diagram of the present specification can be realized in terms of hardware by an element such as a CPU of a computer or a mechanical device, and in terms of software, it can be realized by a computer program or the like. The functional block realized by those cooperation is drawn. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by a combination of hardware and software. For example, each block in FIG. 2 may be stored as a program module in a recording medium and installed in the storage of the data conversion apparatus 14 via the recording medium. And in the data converter 14, the function of each block may be implement | achieved by reading the program module corresponding to each block to a main memory at any time, and performing with CPU.

抽出データ保持部２２は、原本文書データに含まれる個人情報の抽出結果を保持する。図３は個人情報の抽出結果を示す。レコード番号フィールドには、原本文書データにおけるレコード番号が記録される。例えば、ＣＳＶファイルの行位置を示す番号であってもよく、データベースで管理されるテーブルの各レコードに付された識別番号であってもよい。項目名フィールドには個人情報が設定された、原本文書データの情報項目の名称が記録される。文字列フィールドには個人情報として検出された文字列が設定される。位置フィールドには、原本文書データの各情報項目において個人情報の文字列が設定された位置、具体的には先頭を１とした場合のバイト数が設定される。検出タイプフィールドには、個人情報の種類を識別する情報、例えば個人情報が人名・地名・電話番号・組織・メールアドレス等のいずれであるかを示す情報が記録される。 The extracted data holding unit 22 holds the extraction result of the personal information included in the original document data. FIG. 3 shows the extraction result of personal information. The record number in the original document data is recorded in the record number field. For example, it may be a number indicating the line position of the CSV file, or may be an identification number assigned to each record of a table managed in the database. In the item name field, the name of the information item of the original document data in which personal information is set is recorded. A character string detected as personal information is set in the character string field. In the position field, the position where the character string of the personal information is set in each information item of the original document data, specifically, the number of bytes when the head is 1 is set. In the detection type field, information for identifying the type of personal information, for example, information indicating whether the personal information is a person name, a place name, a telephone number, an organization, an email address, or the like is recorded.

図２に戻り、対応関係保持部２４は、個人情報の検出タイプと、その個人情報を別の文字列（本実施の形態において文字列は数字列を含む）に置換する際の種類・態様を示すマスクパターンとを対応づけた対応規則を保持する。この対応規則は、データ変換装置１４において予め定められたものであるが、後述のユーザ設定支援部４４を介してユーザが変更することもできる。図４は対応規則を示す。同図で示すように、原則として、個人情報として検出された文字列の属性に合致するマスクパターン（例えば漢字の個人情報であればランダムな漢字列のマスクパターン）が対応づけられる。 Returning to FIG. 2, the correspondence relationship holding unit 24 determines the detection type of personal information and the type / mode when the personal information is replaced with another character string (in this embodiment, the character string includes a numeric string). A correspondence rule that associates the indicated mask pattern is held. This correspondence rule is predetermined in the data conversion device 14, but can be changed by the user via a user setting support unit 44 described later. FIG. 4 shows the correspondence rules. As shown in the figure, in principle, a mask pattern that matches the attribute of a character string detected as personal information (for example, a random Chinese character string mask pattern for personal information of Chinese characters) is associated.

なおマスクパターンが、ユーザが任意の形式を設定可能な「カスタムパターン」の場合は、ユーザにより決定された文字列の態様を示すカスタム文字列をさらに保持する。例えば、カスタム文字列「＜２ｎ＞−＜４ｎ＞−＜４ｎ＞」は、長さ２のランダムな数字列、「−」、長さ４のランダムな数字列、「−」、「長さ４のランダムな数字列」、を連結した文字列を示している。またカスタム文字列「＜５ａ＞＠＜３ａ＞．＜２ａ＞．＜２ａ＞」は、長さ５のランダムなアスキー文字列、「＠」、長さ３のランダムなアスキー文字列、「．」、長さ２のランダムなアスキー文字列、「．」、長さ２のランダムなアスキー文字列、を連結した文字列を示している。 If the mask pattern is a “custom pattern” that allows the user to set an arbitrary format, a custom character string indicating the character string mode determined by the user is further held. For example, a custom character string “<2n>-<4n>-<4n>” is a random number string of length 2, “−”, a random number string of length 4, “−”, “length 4” The random character string "is a concatenated character string. The custom character string “<5a> @ <3a>. <2a>. <2a>” is a random ASCII character string of length 5, “@”, a random ASCII character string of length 3, “.”. , A random ASCII character string of length 2, “.”, And a random ASCII character string of length 2 are concatenated.

図２に戻り、置換規則保持部２６は、原本文書データにおける各情報項目と、マスクパターンとを対応づけた置換規則を保持する。図５は置換規則を示す。項目名フィールドには、原本文書データにおける情報項目の名称が記録される。対象フィールドには、当該情報項目が個人情報を含む（ＴＲＵＥ）か否か（ＦＡＬＳＥ）を示す情報が記録される。最大検出タイプフィールドには、当該情報項目の検出タイプとして最も多く決定された個人情報の種類が記録され、その検出タイプの検出数が検出数フィールドに記録される。マスクパターンフィールドには、個人情報を置換するマスクパターンが記録され、カスタムパターンについてはカスタム文字列がさらに記録される。 Returning to FIG. 2, the replacement rule holding unit 26 holds a replacement rule in which each information item in the original document data is associated with a mask pattern. FIG. 5 shows the replacement rule. In the item name field, the name of the information item in the original document data is recorded. Information indicating whether the information item includes personal information (TRUE) or not (FALSE) is recorded in the target field. In the maximum detection type field, the type of personal information determined most frequently as the detection type of the information item is recorded, and the number of detections of the detection type is recorded in the detection number field. A mask pattern for replacing personal information is recorded in the mask pattern field, and a custom character string is further recorded for the custom pattern.

また図５の置換規則では、項目名「ＡＣＣＯＵＮＴ＿ＮＵＭＢＥＲ」のマスクパターンとしてハッシュ値が指定されている。これは、項目名「ＡＣＣＯＵＮＴ＿ＮＵＭＢＥＲ」のデータが、図５では不図示の他のテーブルでも使用され、両テーブルを関連づけるキーとなっているためである。個人情報をハッシュ値でマスクすることにより、原本文書データにおける複数テーブルの関連性を、テスト用文書データで維持することについては、図８等に関連して後述する。 In the replacement rule of FIG. 5, a hash value is specified as a mask pattern for the item name “ACCOUNT_NUMBER”. This is because the data of the item name “ACCOUNT_NUMBER” is also used in other tables not shown in FIG. 5 and serves as a key for associating both tables. Maintaining the relevance of multiple tables in the original document data by masking the personal information with the hash value will be described later with reference to FIG.

図２に戻り、除外規則保持部２８は、個人情報として検出された文字列のうち、マスク処理から除外すべき文字列を識別するための除外規則を保持する。本実施の形態の除外規則は、マスク処理の対象外とすべき１つ以上の文字列（以下、「マスク対象外文字列」とも呼ぶ。）を定めたものとする。本実施の形態では、個人情報として検出された文字列のうち、マスク対象外文字列と完全一致する文字列をマスク処理から除外する。変形例としては、マスク対象外文字列を一部に含む文字列をマスク処理から除外してもよく、マスク対象外文字列が正規表現で示される場合には、その正規表現に包含される文字列をマスク処理から除外してもよい。 Returning to FIG. 2, the exclusion rule holding unit 28 holds an exclusion rule for identifying a character string to be excluded from the masking process among the character strings detected as personal information. It is assumed that the exclusion rule of this embodiment defines one or more character strings (hereinafter also referred to as “non-maskable character strings”) that should not be masked. In the present embodiment, a character string that completely matches a character string that is not to be masked among character strings detected as personal information is excluded from mask processing. As a modified example, a character string partially including an unmasked character string may be excluded from the mask process. When the non-maskable character string is indicated by a regular expression, the characters included in the regular expression are excluded. The column may be excluded from the mask process.

原本文書取得部３２は、原本文書データを本番機１０から取得する。既述したように、原本文書データは、データベースのテーブルに格納されたレコードであってもよく、ＣＳＶファイル・固定長ファイル・フリーフォーマットのログファイル等の各種ファイルデータであってもよい。 The original document acquisition unit 32 acquires original document data from the production machine 10. As described above, the original document data may be records stored in a database table, or may be various file data such as a CSV file, a fixed length file, a free format log file, and the like.

個人情報検出部３４は、原本文書データから、当該データに含まれる個人情報を検出し、その検出結果を図２で示した態様で抽出データ保持部２２へ記録する。個人情報検出部３４は公知の個人情報抽出手段により実現されてよい。例えば、株式会社野村総合研究所が提供するソフトウェア製品である「ＴＲＵＥＴＥＬＬＥＲ個人情報フィルタ（登録商標）」により実現されてもよい。 The personal information detection unit 34 detects personal information included in the data from the original document data, and records the detection result in the extracted data holding unit 22 in the manner shown in FIG. The personal information detection unit 34 may be realized by a known personal information extraction unit. For example, it may be realized by “TRUE TELLER Personal Information Filter (registered trademark)” which is a software product provided by Nomura Research Institute, Ltd.

置換規則決定部３６は、抽出データ保持部２２に格納された個人情報の検出結果と、対応関係保持部２４に格納された対応規則を参照して、原本文書データに含まれる個人情報に対する置換規則を決定し置換規則保持部２６へ記録する。具体的には、原本文書データの情報項目ごとに、個人情報が検出されたか否か（例えば検出タイプが記録されたか否か）を判定し、その判定結果を記録する。また原本文書データの情報項目ごとに、各検出タイプの検出数をカウントして最大検出タイプを判定し記録する。そして、対応関係保持部２４に格納された対応規則にしたがって、最大検出タイプと対応づけられたマスクパターン（およびカスタム文字列）を特定し記録する。 The replacement rule determining unit 36 refers to the detection result of the personal information stored in the extracted data holding unit 22 and the corresponding rule stored in the correspondence holding unit 24, and replaces the personal information included in the original document data. And is recorded in the replacement rule holding unit 26. Specifically, for each information item of the original document data, it is determined whether or not personal information is detected (for example, whether or not a detection type is recorded), and the determination result is recorded. For each information item of the original document data, the number of detections of each detection type is counted to determine and record the maximum detection type. Then, the mask pattern (and custom character string) associated with the maximum detection type is specified and recorded in accordance with the correspondence rule stored in the correspondence relationship holding unit 24.

なお、マスク前の文字列（個人情報を含む文字列であり、以下「オリジナル文字列」とも呼ぶ。）と、マスク後の文字列の属性を近似させるために、置換規則決定部３６は、最大検出タイプの判定において、個人情報検出部３４により特定された検出タイプを、文字の属性に応じてより詳細化する。例えば、個人情報の検出結果における最大検出タイプが［人名］であり、文字列フィールドに設定された文字列が漢字であれば、最大検出タイプ［人名］ＫＡＮＪＩを記録する。また、個人情報の検出結果における最大検出タイプが［人名］であり、文字列フィールドに設定された文字列が平仮名であれば、最大検出タイプ［人名］ＫＡＮＡを記録する。 In order to approximate the character string before the mask (a character string including personal information, hereinafter also referred to as “original character string”) and the attribute of the character string after the mask, In the detection type determination, the detection type specified by the personal information detection unit 34 is further detailed according to the character attribute. For example, if the maximum detection type in the personal information detection result is [person name] and the character string set in the character string field is kanji, the maximum detection type [person name] KANJI is recorded. If the maximum detection type in the personal information detection result is [person name] and the character string set in the character string field is hiragana, the maximum detection type [person name] KANA is recorded.

置換データ取得部３８は、置換規則保持部２６に格納された置換規則を参照して、原本文書データのレコードごと、かつ、情報項目ごとに、個人情報をマスクするための置換用のデータ（以下、「マスクデータ」とも呼ぶ。）を取得する。例えば、マスクパターンがランダム文字列（漢字）の場合、オリジナルの文字列長に対応する長さ（本実施の形態では同じ長さ）のランダムな漢字文字列をマスクデータとして取得する。 The replacement data acquisition unit 38 refers to the replacement rules stored in the replacement rule holding unit 26 and replaces data for masking personal information for each record of the original document data and for each information item (hereinafter referred to as “replacement data”). , Also referred to as “mask data”). For example, when the mask pattern is a random character string (kanji), a random kanji character string having a length corresponding to the original character string length (the same length in this embodiment) is acquired as mask data.

またマスクパターンがハッシュ値の場合、オリジナルの文字列をハッシュ関数（実施の形態ではＳＨＡ−２）に入力し、当該ハッシュ関数の出力結果であるハッシュ値を示す文字列（以下、「ハッシュ文字列」とも呼ぶ。）をマスクデータとして取得する。このハッシュ関数は、他の種類のハッシュ関数であってもよく、例えばＳＨＡ−１やＭＤ５であってもよい。ハッシュ文字列は、所定長のハッシュ値を１６進表記したＨＥＸ文字列であってもよく、数字列であってもよい。また置換データ取得部３８は、ハッシュ文字列を、オリジナルの文字列長に対応する長さにトリミングした結果をマスクデータとして取得してもよい。 When the mask pattern is a hash value, an original character string is input to a hash function (SHA-2 in the embodiment), and a character string indicating a hash value as an output result of the hash function (hereinafter referred to as “hash character string”). Is also obtained as mask data. This hash function may be another type of hash function, for example, SHA-1 or MD5. The hash character string may be a HEX character string in which a hash value of a predetermined length is expressed in hexadecimal or a numeric string. Further, the replacement data acquisition unit 38 may acquire, as mask data, the result of trimming the hash character string to a length corresponding to the original character string length.

また図６に関連して後述するように、マスクパターンがカスタムパターンの場合もハッシュ値が指定される場合がある。このとき置換データ取得部３８は、オリジナル文字列のハッシュ値を取得し、そのハッシュ値を示す文字列を、カスタム文字列で指定された長さにトリミングした結果をマスクデータとして取得する。 As will be described later with reference to FIG. 6, a hash value may be specified even when the mask pattern is a custom pattern. At this time, the replacement data acquisition unit 38 acquires a hash value of the original character string, and acquires, as mask data, a result of trimming the character string indicating the hash value to a length specified by the custom character string.

文書変換部４０は、原本文書データのレコードごと、かつ、情報項目ごとに、個人情報検出部３４により個人情報として検出された文字列を、置換データ取得部３８により取得されたマスクデータへ置換する。これにより、原本文書データを、個人情報がマスクされたテスト用文書データへ変換する。 The document conversion unit 40 replaces the character string detected as the personal information by the personal information detection unit 34 with the mask data acquired by the replacement data acquisition unit 38 for each record of the original document data and for each information item. . Thus, the original document data is converted into test document data in which personal information is masked.

また文書変換部４０は、原本文書データにおける変換対象のオリジナル文字列が、除外規則保持部２８の除外規則で定められたマスク対象外文字列と一致するか否かを判定し、不一致であれば、当該オリジナル文字列をマスクデータへ置換する。一致した場合は、当該オリジナル文字列のマスクデータへの置換を抑制する。言い換えれば、当該オリジナル文字列のマスク処理をスキップして、次の変換対象文字列のマスク処理へ移行する。 Further, the document conversion unit 40 determines whether or not the original character string to be converted in the original document data matches the non-maskable character string determined by the exclusion rule of the exclusion rule holding unit 28. The original character string is replaced with mask data. If they match, the replacement of the original character string with the mask data is suppressed. In other words, the mask process for the original character string is skipped, and the process proceeds to the mask process for the next character string to be converted.

変換文書出力部４２は、文書変換部４０により生成されたテスト用文書データを記録メディア１６へ記録する。例えば、原本文書データがデータベースのテーブルデータの場合は、個人情報をマスク後のテーブルデータをテスト用文書データとして記録メディア１６へ格納する。また原本文書データがフリーフォーマットのログファイルの場合は、個人情報をマスク後のログファイルのデータをテスト用文書データとして記録メディア１６へ格納する。記録メディア１６に記録されたテスト用文書データは、開発機１２に読み込まれ、開発機１２での保守作業において（例えばテストのための入力データや照合用データとして）用いられる。 The converted document output unit 42 records the test document data generated by the document conversion unit 40 on the recording medium 16. For example, if the original document data is table data in a database, the personal data masked table data is stored in the recording medium 16 as test document data. When the original document data is a free format log file, the log file data after masking the personal information is stored in the recording medium 16 as test document data. The test document data recorded on the recording medium 16 is read into the development machine 12 and used in maintenance work on the development machine 12 (for example, as input data or verification data for testing).

ユーザ設定支援部４４は、対応関係保持部２４に保持された対応規則と、置換規則保持部２６に保持された置換規則に対するユーザの設定操作を支援する。具体的には、対応規則および置換規則を編集するためのユーザ設定画面を所定のディスプレイに表示させ、ユーザ設定画面に対するユーザの入力情報を対応関係保持部２４の対応規則および置換規則保持部２６の置換規則へ反映させる。 The user setting support unit 44 supports the user's setting operation for the correspondence rules held in the correspondence relationship holding unit 24 and the replacement rules held in the replacement rule holding unit 26. Specifically, a user setting screen for editing the correspondence rule and the replacement rule is displayed on a predetermined display, and user input information for the user setting screen is displayed in the correspondence rule and replacement rule holding unit 26 of the correspondence relationship holding unit 24. Reflect in the replacement rule.

図６はユーザ設定画面を示す。同図は、対応関係保持部２４に保持された対応規則を編集するためのユーザ設定画面を示している。同図の内容をユーザが入力すると、［人名］ＫＡＮＪＩの検出タイプと、８文字のハッシュ文字列のマスクパターンとを対応づけるよう対応関係が更新される。なお、マスクパターンのプルダウンメニューからハッシュ文字列を選択することもできる。 FIG. 6 shows a user setting screen. This figure shows a user setting screen for editing the correspondence rule held in the correspondence relationship holding unit 24. When the user inputs the contents of FIG. 6, the correspondence is updated so that the detection type of [person name] KANJI and the mask pattern of the 8-character hash character string are associated with each other. It is also possible to select a hash character string from the pull-down menu of the mask pattern.

またユーザ設定支援部４４は、マスク対象外文字列をユーザに入力させるためのユーザ設定画面を表示させる。そして、ユーザ設定画面に対してユーザが入力したマスク対象外文字列を取得し、その文字列を除外規則保持部２８の除外規則へ追加する。 The user setting support unit 44 displays a user setting screen for allowing the user to input a character string that is not to be masked. Then, the non-maskable character string input by the user on the user setting screen is acquired, and the character string is added to the exclusion rule of the exclusion rule holding unit 28.

以上の構成によるデータ変換装置１４の動作を以下説明する。
図７（ａ）は、データ変換装置１４の動作を示すフローチャートである。データ変換装置１４において本番環境の文書データに対するマスクパターンの決定を指示するユーザ操作が検出されると（Ｓ１０のＹ）、原本文書取得部３２は、当該ユーザ操作で指定された原本文書データを本番機１０から取得する（Ｓ１２）。個人情報検出部３４は、原本文書データに記載された個人情報を検出し、その記載位置を含む属性情報を抽出データ保持部２２へ記録する（Ｓ１４）。置換規則決定部３６は、個人情報検出部３４により検出された個人情報の属性と、対応関係保持部２４に保持された対応規則とに基づいて各個人情報のマスクパターンを決定し、各個人情報の置換規則を置換規則保持部２６へ記録する（Ｓ１６）。マスクパターンの決定を指示するユーザ操作が未検出であれば（Ｓ１０のＮ）、Ｓ１２〜Ｓ１６をスキップする。 The operation of the data converter 14 having the above configuration will be described below.
FIG. 7A is a flowchart showing the operation of the data converter 14. When the user operation for instructing the determination of the mask pattern for the document data in the production environment is detected in the data conversion device 14 (Y in S10), the original document acquisition unit 32 performs the production of the original document data designated by the user operation. Obtained from the machine 10 (S12). The personal information detection unit 34 detects personal information described in the original document data, and records attribute information including the description position in the extracted data holding unit 22 (S14). The replacement rule determination unit 36 determines the mask pattern of each personal information based on the attribute of the personal information detected by the personal information detection unit 34 and the correspondence rule held in the correspondence relationship holding unit 24, and each personal information The replacement rule is recorded in the replacement rule holding unit 26 (S16). If the user operation for instructing the determination of the mask pattern has not been detected (N in S10), S12 to S16 are skipped.

またデータ変換装置１４においてマスク処理の設定変更を指示するユーザ操作が検出されると（Ｓ１８のＹ）、ユーザ設定支援部４４は、ユーザ設定画面を表示させる（Ｓ２０）。ユーザ設定支援部４４は、ユーザ設定画面に入力された対応規則の更新情報を対応関係保持部２４に反映させ、または、ユーザ設定画面に入力された置換規則の更新情報を置換規則保持部２６に反映させる（Ｓ２２）。マスク処理の設定変更を指示するユーザ操作が未検出であれば（Ｓ１８のＮ）、Ｓ２０およびＳ２２をスキップする。なお図７（ａ）には不図示であるが、ユーザ設定画面においてマスク対象外文字列が入力されると、ユーザ設定支援部４４はマスク対象外文字列を除外規則保持部２８に記録する。 When the user operation for instructing the change of the mask processing setting is detected in the data conversion device 14 (Y in S18), the user setting support unit 44 displays a user setting screen (S20). The user setting support unit 44 reflects the update information of the correspondence rule input on the user setting screen in the correspondence relationship holding unit 24 or the update information of the replacement rule input on the user setting screen in the replacement rule holding unit 26. Reflect (S22). If the user operation for instructing the setting change of the mask process is not detected (N in S18), S20 and S22 are skipped. Although not shown in FIG. 7A, when an unmasked character string is input on the user setting screen, the user setting support unit 44 records the unmasked character string in the exclusion rule holding unit 28.

典型的には、データ変換装置１４のユーザは、原本文書データにおいて複数箇所に記載されて、相互の関連性を維持すべき個人情報の項目に対するマスクパターンとしてハッシュ値を設定する。より具体的には、リレーショナルデータベースにおいて複数のテーブルを関連づけるためのキー情報（例えば第１テーブルにおける外部キーであり、第２のテーブルにおける主キー）に対するマスクパターンとしてハッシュ値を設定する。これにより、原本文書データにおける項目間の関連性、例えばリレーショナルデータベースにおける複数テーブル間のリレーションを、マスク後のテスト用文書データでも維持することができる。 Typically, the user of the data conversion device 14 sets a hash value as a mask pattern for items of personal information that are described in a plurality of locations in the original document data and should maintain their relevance. More specifically, a hash value is set as a mask pattern for key information (for example, a foreign key in the first table and a primary key in the second table) for associating a plurality of tables in a relational database. Thereby, the relationship between items in the original document data, for example, the relation between a plurality of tables in the relational database can be maintained even in the test document data after masking.

図７（ｂ）は、図７（ａ）に続く動作を示すフローチャートである。データ変換装置１４においてマスク処理の開始を指示するユーザ操作が検出されると（Ｓ２４のＹ）、置換データ取得部３８は、原本文書データに記載された個人情報のオリジナル文字列ごとに、その属性に応じたマスクパターンに基づくマスクデータを取得する。文書変換部４０は、個人情報として検出されたオリジナル文字列が、除外規則保持部２８に記録されたマスク対象外文字列と不一致であれば（Ｓ２６のＮ）、そのオリジナル文字列をマスクデータへ置換する（Ｓ２８）。具体的には、抽出データ保持部２２に記録されたオリジナル文字列の先頭位置から、オリジナル文字列の長さ分のデータ（すなわちオリジナル文字列そのもの）を、マスクデータの文字列へ置き換える。個人情報として検出されたオリジナル文字列がマスク対象外文字列と一致すれば（Ｓ２６のＹ）、Ｓ２８をスキップする。 FIG. 7B is a flowchart showing the operation following FIG. When the user operation for instructing the start of the mask process is detected in the data conversion device 14 (Y in S24), the replacement data acquisition unit 38 sets the attribute for each original character string of the personal information described in the original document data. The mask data based on the mask pattern corresponding to is acquired. If the original character string detected as personal information does not match the non-maskable character string recorded in the exclusion rule holding unit 28 (N in S26), the document conversion unit 40 converts the original character string into mask data. Replace (S28). Specifically, data corresponding to the length of the original character string (that is, the original character string itself) is replaced with the character string of the mask data from the beginning position of the original character string recorded in the extracted data holding unit 22. If the original character string detected as personal information matches the non-maskable character string (Y in S26), S28 is skipped.

原本文書データに記載された全ての個人情報のオリジナル文字列に対する置換処理、もしくは置換スキップを完了すると（Ｓ３０のＹ）、変換文書出力部４２はテスト用文書データを記録メディア１６へ出力する（Ｓ３２）。未処理の個人情報のオリジナル文字列が残っていれば（Ｓ３０のＮ）、Ｓ２６へ戻って、置換データ取得部３８は、未処理のオリジナル文字列に対するマスクデータを取得する。マスク処理の開始を指示するユーザ操作が未検出であれば（Ｓ２４のＮ）、Ｓ２６からＳ３２をスキップする。 When the replacement process for the original character string of all personal information described in the original document data or the replacement skip is completed (Y in S30), the converted document output unit 42 outputs the test document data to the recording medium 16 (S32). ). If the original character string of unprocessed personal information remains (N in S30), the process returns to S26, and the replacement data acquisition unit 38 acquires mask data for the unprocessed original character string. If no user operation for instructing the start of the mask process is detected (N in S24), S26 to S32 are skipped.

本実施の形態のデータ変換装置１４によると、原本文書データの各情報項目とマスクパターンとの対応関係を１つ１つユーザが定義する必要がない。すなわち、原本文書データに含まれる個人情報を自動的に検出し、個人情報の属性とマスクパターンとの対応規則にもとづいて、各個人情報のマスクパターンを自動的に決定する。これにより人為的なミスの発生（典型的にはマスク設定の漏れ）を抑制できる。例えば、原本文書データの中に予備項目が設けられ、その予備項目は初期の開発時には未使用であり、後の機能追加時に個人情報を保持するよう変更されることがある。人手でマスクパターンを設定すると、予備項目のマスクが見落とされて、マスク設定の漏れが発生しやすい。データ変換装置１４では、マスク対象とすべき文字列の検出と、その文字列を置き換えるべきマスクデータの決定とを自動化することにより、人為的なミスの発生を抑制できる。 According to the data conversion apparatus 14 of the present embodiment, it is not necessary for the user to define the correspondence between each information item of the original document data and the mask pattern one by one. That is, the personal information included in the original document data is automatically detected, and the mask pattern of each personal information is automatically determined based on the correspondence rule between the attribute of the personal information and the mask pattern. As a result, it is possible to suppress the occurrence of artificial mistakes (typically leakage of mask settings). For example, a spare item is provided in the original document data, and the spare item is unused at the time of initial development, and may be changed to retain personal information when a function is added later. When the mask pattern is manually set, the mask of the spare item is overlooked, and the mask setting is likely to be leaked. The data converter 14 can suppress the occurrence of human error by automating the detection of a character string to be masked and the determination of mask data to be replaced with the character string.

またユーザ設定画面では、置換規則を編集できるだけでなく、置換規則の基礎となる対応規則もユーザが編集できる。ユーザは対応規則を編集することで、個人情報の属性が共通する原本文書データの複数の情報項目に対するマスクパターンを一括して設定でき、マスク処理のためのユーザ作業の効率化を実現できる。 In addition, on the user setting screen, not only can the replacement rule be edited, but the user can also edit the corresponding rule that is the basis of the replacement rule. By editing the correspondence rule, the user can collectively set a mask pattern for a plurality of information items of original document data having the same personal information attribute, and the efficiency of user work for mask processing can be realized.

またデータ変換装置１４によると、原本文書データにおいて異なる箇所に記載された複数の個人情報項目であり、マスク後においても互いの関連性を維持すべき複数の個人情報項目のそれぞれを、個人情報を示す文字列のハッシュ値によりマスクする。これにより、原本文書データにおける情報項目間の関連性を、個人情報をマスクした後のテスト用文書データでも維持でき、テストデータとしての品質の低下を抑制することができる。 Further, according to the data conversion device 14, a plurality of personal information items described in different locations in the original document data, and each of the plurality of personal information items that should be maintained in relation to each other even after masking is obtained. Mask with the hash value of the indicated string. Thereby, the relationship between the information items in the original document data can be maintained even in the test document data after the personal information is masked, and the deterioration of the quality as the test data can be suppressed.

図８は、複数のテーブルのマスク処理を模式的に示す。名義情報テーブルと残高情報テーブルは、口座番号をキーとして互いに関連性を有する。図８（ａ）は、原本文書データとしての名義情報テーブルと残高情報テーブルを示しており、例えば名前「佐々木」の残高は「３００，０００」であることを示している。図８（ｂ）は、名義情報テーブルの口座番号と、残高情報テーブルの口座番号のそれぞれをランダムな値でマスクした結果を示しており、名義情報テーブルと残高情報テーブルの関連性が失われている。 FIG. 8 schematically shows mask processing of a plurality of tables. The name information table and the balance information table are related to each other using the account number as a key. FIG. 8A shows a name information table and a balance information table as original document data. For example, the balance of the name “Sasaki” is “300,000”. FIG. 8B shows the result of masking the account number of the nominal information table and the account number of the balance information table with random values, and the association between the nominal information table and the balance information table is lost. Yes.

これに対して、図８（ｃ）は、名義情報テーブルの口座番号と、残高情報テーブルの口座番号のそれぞれをハッシュ値によりマスクした結果を示している。同図で示すように、名義情報テーブルの口座番号と残高情報テーブルの口座番号は、それぞれオリジナルの値が秘匿されつつも、互いの関連性が維持されている。例えば、名前「ＸＸＸ（佐々木のマスク結果）」の残高は「３００，０００」であることが、マスク後も識別できる。図８（ｃ）の名義情報テーブルと残高情報テーブルをテスト用文書データとして用いることにより、開発環境においても本番環境に即したテストを実施しやすくなる。 On the other hand, FIG. 8C shows the result of masking each of the account number of the nominal information table and the account number of the balance information table with a hash value. As shown in the figure, the relationship between the account number in the name information table and the account number in the balance information table is maintained while the original values are kept secret. For example, the balance of the name “XXX (Sasaki mask result)” is “300,000”, which can be identified even after masking. By using the name information table and the balance information table of FIG. 8C as test document data, it becomes easy to perform a test according to the production environment even in the development environment.

またデータ変換装置１４によると、個人情報がどの位置に記載されるかが確定しないフリーフォーマットのファイルデータ（例えばログデータ）に対しても、個人情報を自動でマスクすることができる。またデータ変換装置１４によると、ユーザは任意の文字列をマスク対象外文字列として指定でき、個人情報の保護と、テストデータの品質低下の抑制の両立を支援できる。 Further, according to the data converter 14, personal information can be automatically masked even for free format file data (for example, log data) in which the position where the personal information is described is not determined. Moreover, according to the data converter 14, the user can designate an arbitrary character string as a non-maskable character string, and can support both the protection of personal information and the suppression of deterioration in the quality of test data.

図９は、ログデータのマスク処理を模式的に示す。図９（ａ）は、２つのログメッセージを含む本番環境でのオリジナルのログデータを示している。これら２つのログメッセージはフォーマットが異なるものである。図９（ｂ）は、図９（ａ）のログデータに含まれる個人情報をマスクした後のテスト用のデータを示している。既述したように、個人情報検出部３４は、オリジナルのログデータに含まれる個人情報について、その属性・記載位置を記録する。そして置換データ取得部３８は、個人情報の属性に応じたマスクデータを取得し、文書変換部４０は、個人情報の記載位置を特定してマスクデータへ置き換える。これにより、フリーフォーマットのファイルデータにおける個人情報のマスキングを実現できる。 FIG. 9 schematically illustrates log data mask processing. FIG. 9A shows original log data in a production environment including two log messages. These two log messages have different formats. FIG. 9B shows test data after the personal information included in the log data of FIG. 9A is masked. As described above, the personal information detection unit 34 records the attribute / description position of the personal information included in the original log data. Then, the replacement data acquisition unit 38 acquires mask data corresponding to the attribute of the personal information, and the document conversion unit 40 specifies the description position of the personal information and replaces it with the mask data. Thereby, the masking of the personal information in the free format file data can be realized.

なお図９（ｂ）では、本来マスクされるべきでない商品名「山田３００」も個人情報としてマスクされている。ユーザは、文字列「山田３００」をマスク対象外文字列として指定することにより、テスト用ログデータにおいて商品名「山田３００」をそのまま出力させることができる。 In FIG. 9B, the product name “Yamada 300” that should not be masked is also masked as personal information. The user can output the product name “Yamada 300” as it is in the test log data by designating the character string “Yamada 300” as a non-maskable character string.

また上記では言及していないが、個人情報検出部３４は、図９（ｃ）で示すように、個人情報として検出した文字列を記録した個人情報検出リストをさらに出力してもよい。そして、個人情報検出リストをディスプレイに表示し、ユーザへ提示してもよい。このリストは、抽出データ保持部２２に格納した個人情報抽出結果によって代用してもよい。個人情報検出リストをユーザへ提示することにより、ユーザがマスク対象外文字列を適切に指定できるよう支援できる。 Although not mentioned above, the personal information detection unit 34 may further output a personal information detection list in which character strings detected as personal information are recorded, as shown in FIG. 9C. Then, the personal information detection list may be displayed on the display and presented to the user. This list may be substituted by the personal information extraction result stored in the extracted data holding unit 22. By presenting the personal information detection list to the user, it is possible to assist the user in appropriately specifying the non-maskable character string.

以上、本発明を実施の形態をもとに説明した。この実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。以下変形例を示す。 The present invention has been described based on the embodiments. This embodiment is an exemplification, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. is there. A modification is shown below.

第１の変形例を説明する。
上記実施の形態では、ハッシュ値でマスクする個人情報項目をユーザが指定することとしたが、置換規則決定部３６は、ハッシュ値でマスクする個人情報項目を自動で決定し、置換規則へ記録してもよい。例えば、リレーショナルデータベースの定義情報を参照し、リレーショナルデータベースにおいて複数テーブル間に設定された参照整合性制約を検出してもよい。そして参照整合性制約が設定されたカラム（典型的には第１テーブルにおける外部キーのカラムと、第２テーブルにおける主キーのカラムの両方）について、マスクパターンとしてハッシュ値を設定してもよい。 A first modification will be described.
In the above embodiment, the user designates the personal information item to be masked with the hash value, but the replacement rule determining unit 36 automatically determines the personal information item to be masked with the hash value and records it in the replacement rule. May be. For example, referential integrity constraints set between a plurality of tables in the relational database may be detected by referring to the definition information of the relational database. A hash value may be set as a mask pattern for columns for which referential integrity constraints are set (typically, both the foreign key column in the first table and the primary key column in the second table).

第２の変形例を説明する。
上記実施の形態で一部既述したが、ハッシュ文字列を、オリジナルの文字列長に対応する長さにトリミングした結果をマスクデータとする場合、異なるオリジナル文字列を同じマスクデータへ変換してしまうことが考えられる。そのため変形例として、置換データ取得部３８は、オリジナル文字列と、そのハッシュ値をトリミングしたマスクデータとを対応づけたテーブル（以下、「割当履歴テーブル」）を保持してもよい。 A second modification will be described.
As described above in part in the above embodiment, when mask data is obtained by trimming a hash character string to a length corresponding to the original character string length, different original character strings are converted to the same mask data. It is possible to end up. Therefore, as a modification, the replacement data acquisition unit 38 may hold a table (hereinafter, “allocation history table”) in which the original character string is associated with the mask data obtained by trimming the hash value.

置換データ取得部３８は、あるオリジナル文字列（「当該オリジナル文字列」と呼ぶ。）のマスクデータ（ここではハッシュ文字列をトリミングした文字列）を取得すべきとき、割当履歴テーブルを参照して、当該オリジナル文字列と一致する文字列にマスクデータを割当済かを判定する。割当済であれば、当該オリジナル文字列と一致する文字列に割当済のマスクデータを、当該オリジナル文字列へ割り当てる。 The replacement data acquisition unit 38 refers to the allocation history table when acquiring mask data (here, a character string obtained by trimming a hash character string) of a certain original character string (referred to as “the original character string”). Then, it is determined whether the mask data has been assigned to the character string that matches the original character string. If it has been assigned, the mask data assigned to the character string that matches the original character string is assigned to the original character string.

当該オリジナル文字列と一致する文字列が割当履歴テーブルに未記録であれば、置換データ取得部３８は、当該オリジナル文字列のハッシュ値をトリミングしたマスクデータを取得する。そして割当履歴テーブルを参照し、そのマスクデータを他のオリジナル文字列へ割当済か否かを判定する。未割当であれば、そのマスクデータを当該オリジナル文字列へ割り当て、割当履歴テーブルへ記録する。他のオリジナル文字列へ割当済であれば、当該オリジナル文字列のハッシュ値をハッシュ関数へ入力し、その出力結果であるハッシュ値を新たなマスクデータとして取得する。以下、ユニークなマスクデータを取得するまで上記処理を繰り返す。 If a character string that matches the original character string is not recorded in the allocation history table, the replacement data acquisition unit 38 acquires mask data obtained by trimming the hash value of the original character string. Then, with reference to the allocation history table, it is determined whether or not the mask data has already been allocated to another original character string. If not assigned, the mask data is assigned to the original character string and recorded in the assignment history table. If it has been assigned to another original character string, the hash value of the original character string is input to the hash function, and the hash value that is the output result is obtained as new mask data. Thereafter, the above process is repeated until unique mask data is acquired.

この変形例によると、マスクパターンがハッシュ値に設定され、そのハッシュ文字列をトリミングする場合に、異なるオリジナル文字列に対して重複するマスクデータを割り当てることを回避できる。これにより、原本文書データにおいて関連性のない複数の情報項目について、テスト用文書データにおいて関連性を生じさせることを回避できる。 According to this modification, when a mask pattern is set to a hash value and the hash character string is trimmed, it is possible to avoid assigning overlapping mask data to different original character strings. Thereby, it is possible to avoid the occurrence of relevance in the test document data for a plurality of information items that are not relevant in the original document data.

請求項に記載の各構成要件が果たすべき機能は、実施の形態および変形例において示された各構成要素の単体もしくはそれらの連係によって実現されることも当業者には理解されるところである。 It should also be understood by those skilled in the art that the functions to be fulfilled by the constituent elements recited in the claims are realized by the individual constituent elements shown in the embodiments and the modified examples or by their linkage.

１４データ変換装置、２２抽出データ保持部、２４対応関係保持部、２６置換規則保持部、２８除外規則保持部、３２原本文書取得部、３４個人情報検出部、３６置換規則決定部、３８置換データ取得部、４０文書変換部、４２変換文書出力部、４４ユーザ設定支援部。 DESCRIPTION OF SYMBOLS 14 Data converter, 22 Extraction data holding part, 24 Correspondence relation holding part, 26 Replacement rule holding part, 28 Exclusion rule holding part, 32 Original document acquisition part, 34 Personal information detection part, 36 Replacement rule determination part, 38 Replacement data An acquisition unit, 40 document conversion unit, 42 conversion document output unit, 44 user setting support unit.

Claims

A detection unit that detects personal information from the original document data;
For the personal information detected by the detection unit, a replacement data acquisition unit that acquires replacement data indicating the hash value;
An output unit for outputting document data obtained by replacing personal information in the original document data with the replacement data;
A data conversion device comprising:

It further includes a holding unit that holds a replacement rule that defines replacement data according to the attribute of personal information,
The replacement rule is described in a plurality of locations in the original document data, and for personal information that should maintain the relevance of each other, the hash value of the personal information is defined as the replacement data,
The data conversion apparatus according to claim 1, wherein the replacement data acquisition unit acquires replacement data for the detected personal information according to the replacement rule.

The original document data includes data of a plurality of tables associated with personal information as a key,
The replacement data acquisition unit acquires the hash value of personal information as the key as the replacement data, thereby maintaining association of the plurality of tables in the document data output from the output unit. Item 3. The data conversion device according to Item 1 or 2.

It further includes an exclusion rule holding unit that defines a specific character string to be excluded from replacement,
The output unit suppresses replacement with the replacement data for a character string detected as personal information by the detection unit that matches a character string determined by the exclusion rule. The data conversion device according to any one of 1 to 3.

A function to detect personal information from the original document data;
A function of acquiring replacement data indicating a hash value of the personal information detected by the function of detecting;
A function of outputting document data obtained by replacing personal information in the original document data with the replacement data;
A computer program for realizing a computer.