JP2013105207A

JP2013105207A - Information processing method and apparatus for retrieving concealed data

Info

Publication number: JP2013105207A
Application number: JP2011246817A
Authority: JP
Inventors: Ketsu Ko; 杰高; Yoshinori Katayama; 佳則片山; Ikuya Morikawa; 郁也森川; Hiroshi Tsuda; 宏津田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-11-10
Filing date: 2011-11-10
Publication date: 2013-05-30
Anticipated expiration: 2031-11-10
Also published as: JP5720536B2

Abstract

PROBLEM TO BE SOLVED: To extract similar data with the data concealed.SOLUTION: The method includes the steps of: extracting a first numeric value and a plurality of feature words present in the periphery of the first numeric value from text data stored in a data storage unit and including the first numeric value; generating, from the extracted first numeric value, one or a plurality of second numeric value to be served as a reference when determining whether or not being approximate to the first numeric value; and performing concealing process on each of the one or the plurality of second numeric values and the plurality of feature words to generate concealed data, and storing the concealed data in the data storage unit.

Description

本技術は、秘匿化データの検索技術に関する。 The present technology relates to a concealed data search technology.

クラウドの広がりと共に、情報をクラウドに預けてクラウド本来の特徴を生かした情報共有及び活用が進んでいる。その中で、クラウドでの協業や分業における機密データの活用が期待されている。例えば、個人が健康に関する情報をクラウドに預け、これを信頼できる公的機関などに分析及び整理してもらうというような利用方法が考えられる。 With the spread of the cloud, information sharing and utilization utilizing the original features of the cloud is progressing by depositing information in the cloud. Among them, the utilization of confidential data in collaboration and division of labor in the cloud is expected. For example, there may be a usage method in which an individual deposits information on health in the cloud and has it analyzed and organized by a trusted public organization.

こういった場面では、数値を含むテキストデータが共有される。例えば、医療関係では体温や血圧など患者の検査データに数値が含まれることになる。このようなデータを共有することは、関係者には有用である。 In such situations, text data including numerical values is shared. For example, in medical relations, numerical values are included in patient examination data such as body temperature and blood pressure. Sharing such data is useful to those involved.

一方で、セキュリティとプライバシ保護のため、このようなデータは秘匿化してからクラウドに預けるのは一般的である。そうすると、セキュリティとプライバシが守られるが、データの活用という面では制限が生ずる。すなわち、秘匿化データは、従来の分析アプリケーションや検索サービスでは適切に処理できない。例えば、患者の症状と類似する診療例を検索したい場合であっても、診療データが秘匿化されていると、単純な検索では適切な診療例を見つけることが難しい。 On the other hand, for security and privacy protection, it is common to store such data in the cloud after concealing it. This protects security and privacy, but limits the use of data. That is, the concealment data cannot be appropriately processed by a conventional analysis application or search service. For example, even if it is desired to search for a medical treatment example similar to a patient's symptom, it is difficult to find an appropriate medical treatment example with a simple search if the medical treatment data is concealed.

なお、文書を検索キーワードで検索して、検索キーワードが出現すると当該文書内で検索キーワードに関連する数値と、検索キーワードと共に指定された数値とを比較するような技術が存在している。しかしながら、データを秘匿化することは考慮されていないので、秘匿化すると適切なデータを検索で抽出することは難しい。 There is a technique in which a document is searched with a search keyword, and when a search keyword appears, a numerical value related to the search keyword in the document is compared with a numerical value specified together with the search keyword. However, since it is not considered to conceal data, if it is concealed, it is difficult to extract appropriate data by searching.

また、検索対象のデータを秘匿化してサーバに保持しておき、検索時にも検索条件を同じように秘匿化して検索を行う技術も存在している。しかしながら、暗号化やハッシュ値算出を行うと、完全一致するデータのみしか抽出できないという問題がある。 There is also a technology in which data to be searched is concealed and held in a server, and the search condition is concealed in the same way during the search. However, when encryption or hash value calculation is performed, there is a problem that only data that completely matches can be extracted.

さらに、検索条件入力データとして、数値範囲を指定することができ、数値範囲に少なくとも一致するデータを抽出する技術も存在している。しかしながら、暗号化やハッシュ値算出を行うことは考慮されていない。 Further, there is a technique for specifying a numerical range as search condition input data and extracting data that at least matches the numerical range. However, it is not considered to perform encryption or hash value calculation.

また、秘匿化したデータを一旦安全な場所で復元し、検索条件とマッチング処理を行う技術も存在している。マッチング処理は平文の状態で行われるので、類似するデータをも抽出できるが、検索条件についても平文で入力するので、マッチング処理を行うサーバには検索条件は知られてしまう。 There is also a technique for restoring the concealed data once in a safe place and performing a search condition and matching processing. Since the matching process is performed in a plain text state, similar data can be extracted. However, since the search condition is also input in the plain text, the search condition is known to the server that performs the matching process.

特開２０００−１１００１号公報JP 2000-11001 A 特開２００７−５２６９８号公報JP 2007-52698 A 特開２００５−２４２７４０号公報JP 2005-242740 A 特開２００２−１０８９１１号公報JP 2002-108911 A 特開２００４−２１３６４９号公報JP 2004-213649 A 特開平１−５８０１９号公報JP-A-1-58019

従って、本技術の目的は、一側面としては、秘匿化したまま類似するデータを抽出できるようにするための技術を提供することである。 Therefore, an object of the present technology is to provide a technology for enabling extraction of similar data while keeping it secret in one aspect.

本技術の第１の形態に係る情報処理方法は、（Ａ）データ格納部に格納されており且つ第１の数値を含むテキストデータから、第１の数値及び当該第１の数値の周辺に存在する複数個の特徴語を抽出するステップと、（Ｂ）抽出された第１の数値から、当該第１の数値と近似するか否かを判断する上で基準となる１又は複数の第２の数値を生成する生成ステップと、（Ｃ）１又は複数の第２の数値と複数個の特徴語との各々について秘匿化処理を行って秘匿化データを生成し、データ格納部に格納するステップとを含む。 The information processing method according to the first embodiment of the present technology is (A) from the text data stored in the data storage unit and including the first numeric value, the first numeric value and the vicinity of the first numeric value. A step of extracting a plurality of feature words, and (B) one or a plurality of second words that serve as a reference in determining whether to approximate the first numerical value from the extracted first numerical value A generation step of generating a numerical value; and (C) a step of generating concealment data by performing concealment processing for each of one or more second numerical values and a plurality of feature words, and storing the data in a data storage unit; including.

本技術の第２の技術に係る情報処理方法は、（Ａ）第１の数値の第１の秘匿化データ値と複数個の第１の特徴語の第２の秘匿化データ値とを含む１又は複数の検索データブロックを含む検索要求を受信するステップと、（Ｂ）複数の第２の数値の第３の秘匿化データ値と複数個の第２の特徴語の第４の秘匿化データ値とを含む１又は複数のデータブロックと識別情報とを含む案件データブロックを複数格納するデータ格納部に格納されている案件データブロックの各々について、第１の秘匿化データ値と第３の秘匿化データ値とから算出される、数値についての第１の類似度と、第２の秘匿化データ値と一致する第４の秘匿化データ値の個数とから、処理対象の案件データブロックに含まれるデータブロックと検索データブロックとの各組み合わせについての第２の類似度の合計値である第３の類似度を算出する算出ステップと、（Ｃ）第３の類似度が閾値を超えた案件データブロックの識別情報又は第３の類似度が上位所定数の案件データブロックの識別情報を、検索要求の送信元に送信するステップとを含む。 The information processing method according to the second technique of the present technology includes (A) a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words 1 Or receiving a search request including a plurality of search data blocks; (B) a third concealment data value of a plurality of second numerical values and a fourth concealment data value of a plurality of second feature words The first concealment data value and the third concealment for each of the case data blocks stored in the data storage unit for storing a plurality of case data blocks including one or a plurality of data blocks including identification information and identification information The data included in the case data block to be processed from the first similarity for the numerical value calculated from the data value and the number of the fourth anonymized data value that matches the second anonymized data value Each combination of block and search data block A calculation step for calculating a third similarity that is a total value of the second similarities for the matching, and (C) identification information of the case data block whose third similarity exceeds the threshold or the third similarity Transmitting the identification information of the upper predetermined number of matter data blocks to the transmission source of the search request.

本技術の一側面によれば、秘匿化したまま類似するデータを抽出できるようになる。 According to one aspect of the present technology, similar data can be extracted while being kept secret.

図１は、実施の形態のシステム構成図である。FIG. 1 is a system configuration diagram of the embodiment. 図２は、登録装置の機能ブロック図である。FIG. 2 is a functional block diagram of the registration device. 図３は、管理装置の機能ブロック図である。FIG. 3 is a functional block diagram of the management apparatus. 図４は、検索装置の機能ブロック図である。FIG. 4 is a functional block diagram of the search device. 図５は、登録時の処理フローを示す図である。FIG. 5 is a diagram showing a processing flow at the time of registration. 図６は、ＦＰ生成処理の処理フローを示す図である。FIG. 6 is a diagram illustrating a processing flow of the FP generation processing. 図７は、機密データの一例を示す図である。FIG. 7 is a diagram illustrating an example of confidential data. 図８Ａは、データブロック（第１の方式）の一例を示す図である。FIG. 8A is a diagram illustrating an example of a data block (first scheme). 図８Ｂは、データブロック（第２の方式）の一例を示す図である。FIG. 8B is a diagram illustrating an example of a data block (second scheme). 図９Ａは、データブロック（第１の方式）の他の例を示す図である。FIG. 9A is a diagram illustrating another example of the data block (first scheme). 図９Ｂは、データブロック（第２の方式）の他の例を示す図である。FIG. 9B is a diagram illustrating another example of the data block (second scheme). 図１０は、管理装置のＤＢに蓄積されるデータの一例を示す図である。FIG. 10 is a diagram illustrating an example of data stored in the DB of the management apparatus. 図１１は、検索時の処理フローを示す図である。FIG. 11 is a diagram showing a processing flow at the time of search. 図１２は、第２ＦＰ生成処理の処理フローを示す図である。FIG. 12 is a diagram illustrating a processing flow of the second FP generation processing. 図１３は、検索条件となる機密データの一例を示す図である。FIG. 13 is a diagram illustrating an example of confidential data serving as a search condition. 図１４Ａは、検索ＦＰデータ（第１の方式の第１の例）の一例を示す図である。FIG. 14A is a diagram illustrating an example of search FP data (first example of the first scheme). 図１４Ｂは、検索ＦＰデータ（第１の方式の第２の例）の一例を示す図である。FIG. 14B is a diagram illustrating an example of search FP data (second example of the first scheme). 図１５Ａは、検索ＦＰデータ（第２の方式の第１の例）の一例を示す図である。FIG. 15A is a diagram illustrating an example of search FP data (first example of the second method). 図１５Ｂは、検索ＦＰデータ（第２の方式の第２の例）の一例を示す図である。FIG. 15B is a diagram illustrating an example of search FP data (second example of the second scheme). 図１６は、検索処理の処理フローを示す図である。FIG. 16 is a diagram illustrating a processing flow of search processing. 図１７は、類似度算出処理の処理フローを示す図である。FIG. 17 is a diagram illustrating a processing flow of similarity calculation processing. 図１８は、第１の方式を採用した場合における数値のハッシュ値の比較について説明するための図である。FIG. 18 is a diagram for explaining comparison of numerical hash values when the first method is employed. 図１９は、第２の方式を採用した場合における数値のハッシュ値の比較について説明するための図である。FIG. 19 is a diagram for explaining comparison of numerical hash values when the second method is employed. 図２０は、類似度算出処理の処理フローを示す図である。FIG. 20 is a diagram illustrating a processing flow of similarity calculation processing. 図２１は、類似度算出処理の処理フローを示す図である。FIG. 21 is a diagram illustrating a processing flow of similarity calculation processing. 図２２は、出力例を示す図である。FIG. 22 is a diagram illustrating an output example. 図２３は、ＦＰ生成処理の他の例を示す図である。FIG. 23 is a diagram illustrating another example of the FP generation process. 図２４は、コンピュータの機能ブロック図である。FIG. 24 is a functional block diagram of a computer.

本技術の実施の形態に係るシステムの構成例を図１に示す。図１に示すように、例えばインターネットであるネットワーク１には、登録装置３と、管理装置５と、検索装置７とが接続されている。登録装置３は、以下で述べる処理を行って機密データを秘匿化して、管理装置５に登録する装置であり、登録装置３の数に制限はない。また、検索装置７は、以下で述べる処理を行って検索条件に係る機密データを秘匿化して、秘匿化データと他の検索条件とを含む検索要求を管理装置５に送信し、管理装置５から検索結果を受信する装置であり、検索装置７の数に制限はない。登録装置３と検索装置７は、専用の装置であっても良いし、秘匿化データを登録する際には登録装置３として機能し、検索を行う際には検索装置７として機能する装置であっても良い。 A configuration example of a system according to an embodiment of the present technology is illustrated in FIG. As shown in FIG. 1, a registration device 3, a management device 5, and a search device 7 are connected to a network 1 that is, for example, the Internet. The registration device 3 is a device that performs processing described below to conceal confidential data and registers it in the management device 5, and the number of registration devices 3 is not limited. Further, the search device 7 performs the process described below to conceal confidential data related to the search condition, and transmits a search request including the concealed data and other search conditions to the management device 5. This is a device that receives search results, and there is no limit to the number of search devices 7. The registration device 3 and the search device 7 may be dedicated devices, function as the registration device 3 when registering confidential data, and function as the search device 7 when performing search. May be.

図２に、登録装置３の機能ブロック図を示す。登録装置３は、入力部３１と、機密データ格納部３２と、ＦＰ（Finger Print）生成部３３と、ＦＰルールデータ取得部３４と、ＦＰルールデータ格納部３５と、ＦＰデータ格納部３６と、送信部３７とを有する。入力部３１は、ユーザからの指示に応じて、機密データ格納部３２に、管理装置５に格納すべきデータを格納したり、ユーザから機密データの選択指示を受け付け、当該選択指示をＦＰ生成部３３に出力する。ＦＰ生成部３３は、ＦＰルールデータ格納部３５に格納されているＦＰルールデータに従ってＦＰデータを生成して、ＦＰデータ格納部３６に格納する。なお、ＦＰルールデータ格納部３５にＦＰルールデータが格納されていない場合には、ＦＰ生成部３３は、ＦＰルールデータ取得部３４に対して管理装置５からＦＰルールデータを取得するように指示する。ＦＰルールデータ取得部３４は、ＦＰ生成部３３からの指示に応じて、管理装置５からＦＰルールデータを取得して、ＦＰルールデータ格納部３５に格納する。送信部３７は、ＦＰデータ格納部３６に格納されているＦＰデータを、管理装置５に送信する。 FIG. 2 shows a functional block diagram of the registration device 3. The registration device 3 includes an input unit 31, a confidential data storage unit 32, an FP (Finger Print) generation unit 33, an FP rule data acquisition unit 34, an FP rule data storage unit 35, an FP data storage unit 36, And a transmission unit 37. In response to an instruction from the user, the input unit 31 stores data to be stored in the management apparatus 5 in the confidential data storage unit 32 or accepts an instruction to select confidential data from the user, and outputs the selection instruction to the FP generation unit. To 33. The FP generation unit 33 generates FP data according to the FP rule data stored in the FP rule data storage unit 35 and stores it in the FP data storage unit 36. When FP rule data is not stored in the FP rule data storage unit 35, the FP generation unit 33 instructs the FP rule data acquisition unit 34 to acquire FP rule data from the management device 5. . The FP rule data acquisition unit 34 acquires FP rule data from the management device 5 in accordance with an instruction from the FP generation unit 33 and stores it in the FP rule data storage unit 35. The transmission unit 37 transmits the FP data stored in the FP data storage unit 36 to the management device 5.

図３に、管理装置５の機能ブロック図を示す。管理装置５は、ＦＰルールデータ格納部５１と、ＦＰルールデータ配布部５２と、ＦＰ登録部５３と、データベース（ＤＢ）５４と、検索処理部５５と、検索要求受信部５６と、検索結果送信部５７とを有する。ＦＰルールデータ配布部５２は、ＦＰルールデータ格納部５１に格納されているＦＰルールデータを、要求に応じて配信する。ＦＰ登録部５３は、登録装置３からＦＰデータを受信し、ＤＢ５４に格納する。検索要求受信部５６は、検索装置７から、検索要求を受信し、受信した検索要求のデータを検索処理部５５に出力する。検索結果送信部５７は、検索処理部５５から検索結果を受信すると、検索要求の送信元の検索装置７へ検索結果を送信する。検索処理部５５は、ＦＰルールデータに従って、検索要求受信部５６から受け取った検索要求に含まれる秘匿化データ及び検索条件などを用いた検索処理を実施して、検索結果を検索結果送信部５７に出力する。 FIG. 3 shows a functional block diagram of the management apparatus 5. The management device 5 includes an FP rule data storage unit 51, an FP rule data distribution unit 52, an FP registration unit 53, a database (DB) 54, a search processing unit 55, a search request reception unit 56, and a search result transmission. Part 57. The FP rule data distribution unit 52 distributes the FP rule data stored in the FP rule data storage unit 51 in response to a request. The FP registration unit 53 receives FP data from the registration device 3 and stores it in the DB 54. The search request receiving unit 56 receives the search request from the search device 7 and outputs the received search request data to the search processing unit 55. When the search result transmission unit 57 receives the search result from the search processing unit 55, the search result transmission unit 57 transmits the search result to the search device 7 that is the transmission source of the search request. The search processing unit 55 performs search processing using the concealment data and search conditions included in the search request received from the search request receiving unit 56 according to the FP rule data, and sends the search result to the search result transmitting unit 57. Output.

図４に、検索装置７の機能ブロック図を示す。検索装置７は、入力部７１と、機密データ格納部７２と、ＦＰ生成部７３と、ＦＰルールデータ取得部７４と、ＦＰルールデータ格納部７５と、検索条件データ格納部７６と、ＦＰデータ格納部７７と、検索要求部７８と、出力部７９とを有する。入力部７１は、ユーザからの指示に応じて、機密データ格納部７２に、検索のための機密データを格納したり、ユーザから機密データの選択指示を受け付け、当該選択指示をＦＰ生成部７３に出力する。また、入力部７１は、ユーザから検索条件のデータを受け付け、検索条件データ格納部７６に格納する。 FIG. 4 shows a functional block diagram of the search device 7. The search device 7 includes an input unit 71, a confidential data storage unit 72, an FP generation unit 73, an FP rule data acquisition unit 74, an FP rule data storage unit 75, a search condition data storage unit 76, and an FP data storage. A section 77, a search request section 78, and an output section 79. In response to an instruction from the user, the input unit 71 stores confidential data for search in the confidential data storage unit 72 or accepts an instruction to select confidential data from the user, and sends the selection instruction to the FP generation unit 73. Output. The input unit 71 also receives search condition data from the user and stores it in the search condition data storage unit 76.

ＦＰ生成部７３は、ＦＰルールデータ格納部７５に格納されているＦＰルールデータに従ってＦＰデータ等を生成して、ＦＰデータ格納部７７に格納する。なお、ＦＰルールデータ格納部７５にＦＰルールデータが格納されていない場合には、ＦＰ生成部７３は、ＦＰルールデータ取得部７４に対して管理装置５からＦＰルールデータを取得するように指示する。ＦＰルールデータ取得部７４は、ＦＰ生成部７３からの指示に応じて、管理装置５からＦＰルールデータを取得して、ＦＰルールデータ格納部７５に格納する。検索要求部７８は、ＦＰデータ格納部７７に格納されているＦＰデータ等と、検索条件データ格納部７６に格納されている検索条件データとを読み出して検索要求を生成して、管理装置５に送信する。また、検索要求部７８は、管理装置５から検索結果を受信すると、出力部７９に出力して、例えば表示装置などに検索結果を表示する。 The FP generation unit 73 generates FP data and the like according to the FP rule data stored in the FP rule data storage unit 75 and stores the FP data in the FP data storage unit 77. When FP rule data is not stored in the FP rule data storage unit 75, the FP generation unit 73 instructs the FP rule data acquisition unit 74 to acquire FP rule data from the management device 5. . The FP rule data acquisition unit 74 acquires FP rule data from the management device 5 in accordance with an instruction from the FP generation unit 73 and stores it in the FP rule data storage unit 75. The search request unit 78 reads out the FP data and the like stored in the FP data storage unit 77 and the search condition data stored in the search condition data storage unit 76 to generate a search request, and sends it to the management device 5. Send. In addition, when the search request unit 78 receives the search result from the management device 5, the search request unit 78 outputs the search result to the output unit 79 and displays the search result on a display device, for example.

次に、図１乃至図４に示した装置の処理内容について説明する。まず、図５乃至図１０を用いて、ＦＰデータの登録処理について説明する。まず、入力部３１は、ＦＰ生成対象の機密データの指定を受け付ける（図５：ステップＳ１）。例えば、機密データ格納部３２に格納されている機密データを列挙して選択させるようにしても良いし、指定された機密データを他のコンピュータなどから取得して機密データ格納部３２に格納するようにしても良い。そして、入力部３１は、指定された機密データをＦＰ生成部３３に通知する。 Next, processing contents of the apparatus shown in FIGS. 1 to 4 will be described. First, the FP data registration process will be described with reference to FIGS. First, the input unit 31 receives designation of confidential data to be generated by FP (FIG. 5: step S1). For example, the confidential data stored in the confidential data storage unit 32 may be listed and selected, or the designated confidential data is acquired from another computer or the like and stored in the confidential data storage unit 32. Anyway. Then, the input unit 31 notifies the FP generation unit 33 of the designated confidential data.

ＦＰ生成部３３は、ＦＰルールデータ格納部３５にＦＰルールデータが格納されているか確認する（ステップＳ３）。ＦＰルールデータが格納されていない場合には（ステップＳ５：Ｎｏルート）、ＦＰ生成部３３は、ＦＰデータ取得部３４に、ＦＰルールデータを取得させ、ＦＰルールデータ格納部３５に格納させる（ステップＳ７）。 The FP generation unit 33 checks whether FP rule data is stored in the FP rule data storage unit 35 (step S3). When the FP rule data is not stored (step S5: No route), the FP generation unit 33 causes the FP data acquisition unit 34 to acquire the FP rule data and store it in the FP rule data storage unit 35 (step S5). S7).

一方、ＦＰルールデータがＦＰルールデータ格納部３５に格納されている場合（ステップＳ５：Ｙｅｓルート）、又はステップＳ７の後に、ＦＰ生成部３３は、ＦＰルールデータに従って、ユーザにより指定された機密データのＦＰ生成処理を実施する（ステップＳ９）。ＦＰ生成処理については、後に詳しく述べる。これによって、生成されたＦＰデータは、ＦＰデータ格納部３６に格納される。 On the other hand, when the FP rule data is stored in the FP rule data storage unit 35 (step S5: Yes route), or after step S7, the FP generation unit 33 performs the confidential data designated by the user according to the FP rule data. The FP generation process is performed (step S9). The FP generation process will be described in detail later. As a result, the generated FP data is stored in the FP data storage unit 36.

そして、送信部３７は、ＦＰデータ格納部３６に格納されているＦＰデータを、管理装置５に送信する（ステップＳ１１）。これに対して、管理装置５のＦＰ登録部５３は、登録装置３からＦＰデータを受信すると、当該受信したＦＰデータ及び識別情報などを、ＤＢ５４に格納する（ステップＳ１３）。識別情報は、例えば登録装置３の登録者ＩＤと、登録日とを含み、ＦＰ登録部５３が発行したＦＰＩＤをも含む。 Then, the transmission unit 37 transmits the FP data stored in the FP data storage unit 36 to the management device 5 (step S11). On the other hand, when receiving the FP data from the registration device 3, the FP registration unit 53 of the management device 5 stores the received FP data, identification information, and the like in the DB 54 (step S13). The identification information includes, for example, the registrant ID of the registration device 3 and the registration date, and also includes the FPID issued by the FP registration unit 53.

このような処理を繰り返すことで、ＤＢ５４にＦＰデータが蓄積されてゆく。 By repeating such processing, FP data is accumulated in the DB 54.

次に、図６乃至図９を用いて、ＦＰ生成処理について説明する。ＦＰ生成部３３は、指定された機密データに対して正規化処理を実施する（図６：ステップＳ２１）。本実施の形態における機密データは、数値を含むテキストデータである。しかしながら、数値は、半角数字、全角数字、漢数字、アラビア数字などで表されている場合があり、さらに単位の違いも含まれる可能性がある。本実施の形態における正規化処理では、このような異なる表現を統一させる処理である。例えば、全角で「７０００」を、半角の「7000」へ、「１万円」を半角数字の「10000」に変換する。この正規化処理についてはよく知られているので、これ以上述べない。 Next, the FP generation process will be described with reference to FIGS. The FP generation unit 33 performs normalization processing on the designated confidential data (FIG. 6: Step S21). The confidential data in the present embodiment is text data including numerical values. However, the numerical value may be represented by a half-width number, a full-width number, a Chinese number, an Arabic number, and the like, and may include a difference in units. The normalization process in the present embodiment is a process for unifying such different expressions. For example, “7000” in full-width is converted into “7000” in half-width, and “10,000 yen” is converted into “10000” in half-width numbers. This normalization process is well known and will not be described further.

その後、ＦＰ生成部３３は、指定された機密データ中の数値及び特徴語を抽出し、例えばメインメモリなどの記憶装置に格納する（ステップＳ２３）。例えば、機密データのテキストを形態素解析により形態素に分解し、さらにその中から数値及び特徴語（例えば一般名詞、固有名詞など）を抽出する。 Thereafter, the FP generation unit 33 extracts numerical values and feature words in the designated confidential data and stores them in a storage device such as a main memory (step S23). For example, the text of confidential data is decomposed into morphemes by morphological analysis, and further numerical values and feature words (for example, general nouns, proper nouns, etc.) are extracted therefrom.

例えば、図７に示すようなテキストを処理する場合を考える。この例では、「患者」「基本」「情報」「主訴」「朝」「体温」「３８．５」「発熱」「症状」「検査」「心拍数」「測定」「結果」「８５」「以上」「値」「血液検査」．．．「治療」「方針」などが抽出される。 For example, consider the case of processing text as shown in FIG. In this example, “patient” “basic” “information” “main complaint” “morning” “body temperature” “38.5” “fever” “symptom” “examination” “heart rate” “measurement” “result” “85” “ Above, "value", "blood test". . . “Treatment” and “policy” are extracted.

次に、ＦＰ生成部３３は、抽出された数値のうち未処理の数値を１つ特定する（ステップＳ２５）。そして、ＦＰ生成部３３は、ＦＰルールデータ格納部３５に格納されているＦＰルールデータに従って、特定された数値から、ＦＰのための数値を生成し、メインメモリなどの記憶装置に格納する（ステップＳ２７）。本実施の形態では、数値の近似を判断できるようにするために、単純に数値を秘匿化するのではなく、例えば２つの方式のいずれかで、特定された数値を展開する。 Next, the FP generation unit 33 identifies one unprocessed numerical value among the extracted numerical values (step S25). Then, the FP generation unit 33 generates a numerical value for the FP from the specified numerical value according to the FP rule data stored in the FP rule data storage unit 35 and stores it in a storage device such as a main memory (step) S27). In the present embodiment, in order to be able to determine the approximation of the numerical value, the numerical value is not simply concealed, but the specified numerical value is developed by one of two methods, for example.

第１の方式では、複数の有効桁数で、特定された数値を表すようにする。例えば、「３８．５」であれば、有効桁数が１であれば「３×１０¹」、有効桁数が２であれば「３．８×１０¹」、有効桁数が３であれば「３．８２×１０¹」というように表現を変更する。使用すべき有効桁数についてのデータは、ＦＰルールデータに含まれている。これによって、近似判断の幅を表す数値を生成している。 In the first method, the specified numerical value is represented by a plurality of significant digits. For example, if “38.5”, the number of significant digits is “3 × 10 ¹ ”, if the number of significant digits is 2, “3.8 × 10 ¹ ”, and the number of significant digits is 3. For example, the expression is changed to “3.82 × 10 ¹ ”. Data on the number of significant digits to be used is included in the FP rule data. As a result, a numerical value indicating the width of the approximation determination is generated.

第２の方式では、予め定められた数値の範囲のいずれに、特定された数値が属するかを判断し、特定された数値が属する範囲の上限値及び下限値を特定する。なお、補助データとして、下限値からの差及び上限値からの差をさらに算出する。例えば、１０刻みで範囲が規定されている場合には、「３８．２」の場合、３０乃至４０という範囲に属するので、上限値「４０」及び下限値「３０」が特定される。補助データは、下限値からの差「８．２」と上限値からの差「−１．２」が算出される。ＦＰルールデータには、数値の範囲についての定義が含まれる。このようにして、近似判断の幅を表す数値とその補助数値とが生成される。 In the second method, it is determined to which of the predetermined numerical ranges the specified numerical value belongs, and the upper limit value and the lower limit value of the range to which the specified numerical value belongs are specified. As auxiliary data, a difference from the lower limit value and a difference from the upper limit value are further calculated. For example, when the range is defined in increments of 10, the upper limit value “40” and the lower limit value “30” are specified because “38.2” belongs to the range of 30 to 40. For the auxiliary data, a difference “8.2” from the lower limit value and a difference “−1.2” from the upper limit value are calculated. The FP rule data includes a definition for a range of numerical values. In this way, a numerical value indicating the range of approximation determination and its auxiliary numerical value are generated.

そして、ＦＰ生成部３３は、生成されたＦＰのための数値における秘匿部分に対するハッシュ値を生成し、メインメモリなどの記憶装置に格納する（ステップＳ２９）。ハッシュ値ではなく、暗号化であっても良い。鍵を用いる場合には、登録装置３及び検索装置７で共通の鍵を用いる。第１の方式の場合には、有効桁数が１乃至３であれば、「３×１０¹」、「３．８×１０¹」及び「３．８２×１０¹」のそれぞれについてハッシュ値を算出する。第２の方式の場合には、特定された数値が属する範囲の上限値及び下限値のそれぞれについてハッシュ値を算出する。補助データについてはハッシュ値を算出しない。 Then, the FP generation unit 33 generates a hash value for the secret part in the numerical value for the generated FP, and stores it in a storage device such as a main memory (step S29). Encryption may be used instead of the hash value. When a key is used, a common key is used for the registration device 3 and the search device 7. In the case of the first method, if the number of significant digits is 1 to 3, hash values are respectively obtained for “3 × 10 ¹ ”, “3.8 × 10 ¹ ”, and “3.82 × 10 ¹ ”. calculate. In the case of the second method, a hash value is calculated for each of the upper limit value and the lower limit value of the range to which the specified numerical value belongs. No hash value is calculated for auxiliary data.

このように、ＦＰデータの登録処理の場合には、複数の数値に対して複数のハッシュ値が算出される。単純に数値のハッシュ値を１つだけ算出するだけでは、数値が完全一致するか否かしか判断できない。しかしながら、第１の方式によれば、１桁一致、２桁一致、３桁一致といったように、有効桁数の範囲で一致不一致を判断できるため、近似する数値の有無を判断できる。第２の方式によれば、特定された数値が属する範囲の上限値又は下限値が一致する場合を特定でき、以下で述べるように実際に数値の差をも判断できるため、近似する数値の有無も判断できる。 Thus, in the case of FP data registration processing, a plurality of hash values are calculated for a plurality of numerical values. By simply calculating one numerical hash value, it can only be determined whether or not the numerical values completely match. However, according to the first method, it is possible to determine the coincidence / mismatch within the range of the effective digits, such as one-digit match, two-digit match, and three-digit match. According to the second method, it is possible to specify the case where the upper limit value or the lower limit value of the range to which the specified numerical value belongs, and to determine the difference between the numerical values as described below. Can also be judged.

ＦＰ生成部３３は、特定された数値の周辺における特徴語を所定個数特定する（ステップＳ３１）。所定個数は、例えばＦＰルールデータに規定されている。そして、ＦＰ生成部３３は、特定された各特徴語についてハッシュ値を算出し、メインメモリなどの記憶装置に格納する（ステップＳ３３）。 The FP generation unit 33 specifies a predetermined number of feature words around the specified numerical value (step S31). The predetermined number is defined in FP rule data, for example. Then, the FP generation unit 33 calculates a hash value for each identified feature word and stores it in a storage device such as a main memory (step S33).

そして、ＦＰ生成部３３は、数値のハッシュ値等（補助データがある場合には当該補助データ）と特徴語のハッシュ値とを含むデータブロックを、ＦＰデータ格納部３６に格納する（ステップＳ３５）。 Then, the FP generation unit 33 stores a data block including a hash value of a numerical value or the like (auxiliary data when there is auxiliary data) and a hash value of a feature word in the FP data storage unit 36 (step S35). .

ここまで処理すると図８Ａに示すようなデータブロックが、ＦＰデータ格納部３６に格納される。図８Ａの例では、数値の周辺４個の特徴語についてハッシュ値を算出するようになっている。なお、図８Ａは、第１の方式を採用した場合の例を示している。また、Ｈａｓｈ（Ｘ）は、Ｘのハッシュ値を表す。一方、第２の方式を採用した場合には、図８Ｂに示すようなデータブロックが生成される。 When processing is performed so far, the data block as shown in FIG. 8A is stored in the FP data storage unit 36. In the example of FIG. 8A, hash values are calculated for four feature words around the numerical value. FIG. 8A shows an example in which the first method is adopted. Hash (X) represents the hash value of X. On the other hand, when the second method is adopted, a data block as shown in FIG. 8B is generated.

その後、ＦＰ生成部３３は、機密データから抽出された数値の中で未処理の数値が存在するか判断する（ステップＳ３７）。未処理の数値が存在している場合には処理はステップＳ２５に戻る。一方、未処理の数値が存在していない場合には呼び出し元の処理に戻る。図７の機密データを第１の方式で処理すると、図９Ａに示すようなもう一つデータブロックが生成される。一方、第２の方式で処理すると、図９Ｂに示すようなもう一つのデータブロックが生成される。このように、ＦＰデータは、１又は複数のデータブロックを含む。 Thereafter, the FP generation unit 33 determines whether there is an unprocessed numerical value among the numerical values extracted from the confidential data (step S37). If there is an unprocessed numerical value, the process returns to step S25. On the other hand, if there is no unprocessed numerical value, the process returns to the calling process. When the confidential data in FIG. 7 is processed by the first method, another data block as shown in FIG. 9A is generated. On the other hand, when the second method is used, another data block as shown in FIG. 9B is generated. Thus, the FP data includes one or a plurality of data blocks.

一般的には、管理装置５のＤＢ５４は、例えば図１０に示すようなデータが蓄積される。図１０の例では、ＦＰＩＤと、登録者ＩＤと、登録日と、ＦＰデータとが登録されるようになっている。ＦＰデータは、データブロックのＩＤであるブロック番号と、数値部分と、特徴語部分とを含む。各データブロックの数値部分には、複数の数値のハッシュ値（ＮＵＭ（１，１），ＮＵＭ（１，２）など）と、補助データがある場合には補助データ（ＡＵＸ１など）とを含む。さらに特徴語部分には、複数の特徴語のハッシュ値（ＫＷ（１，１），ＫＷ（１，２）などＭ個の特徴語のハッシュ値）を含む。図１０の例では、Ｎ個のデータブロックが含まれ、各データブロックの特徴語はＭ個である例を示している。 In general, the DB 54 of the management apparatus 5 stores data as shown in FIG. 10, for example. In the example of FIG. 10, FPID, registrant ID, registration date, and FP data are registered. The FP data includes a block number that is an ID of a data block, a numerical value portion, and a feature word portion. The numerical portion of each data block includes a plurality of numerical hash values (NUM (1,1), NUM (1,2), etc.) and auxiliary data (AUX1, etc.) if there is auxiliary data. Further, the feature word portion includes hash values of a plurality of feature words (hash values of M feature words such as KW (1, 1) and KW (1, 2)). In the example of FIG. 10, N data blocks are included, and the number of feature words of each data block is M.

次に、検索時に行われる処理について図１１乃至図２１を用いて説明する。まず、検索装置７の入力部７１は、ユーザから検索に係る機密データの指定を受け付け、ＦＰ生成部７３に機密データの指定を出力する（図１１：ステップＳ４１）。機密データ格納部７２に格納されていない場合には、例えば他のコンピュータから、指定された機密データを取得して、ＦＰ生成部７３に出力するようにしても良い。 Next, processing performed at the time of search will be described with reference to FIGS. First, the input unit 71 of the search device 7 receives the designation of confidential data related to the search from the user, and outputs the designation of confidential data to the FP generation unit 73 (FIG. 11: Step S41). When the confidential data is not stored in the confidential data storage unit 72, designated confidential data may be acquired from another computer, for example, and output to the FP generation unit 73.

また、入力部７１は、ユーザから検索条件の入力を受け付け、検索条件データ格納部７６に格納する（ステップＳ４３）。以下で具体的な検索処理において用いられるパラメータを、ユーザが指定する。例えば、類似度の閾値や結果の出力数などが指定される。どのパラメータを指定すべきかは、例えばＦＰルールデータに含まれる場合もある。また、ＦＰデータを生成する上で用いられるパラメータについては、ＦＰ生成部７３に出力される。 Further, the input unit 71 accepts input of search conditions from the user and stores them in the search condition data storage unit 76 (step S43). The user specifies parameters used in the specific search process below. For example, a threshold value of similarity and the number of output results are specified. Which parameter should be specified may be included in the FP rule data, for example. In addition, parameters used for generating FP data are output to the FP generation unit 73.

そして、ＦＰ生成部７３は、ＦＰルールデータ格納部７５に、ＦＰルールデータが格納されているか判断する（ステップＳ４５）。ＦＰルールデータがＦＰルールデータ格納部７５に格納されていない場合には（ステップＳ４７：Ｎｏルート）、ＦＰ生成部７３は、ＦＰデータ取得部７４に、ＦＰルールデータを取得させ、ＦＰルールデータ格納部７５に格納させる（ステップＳ４９）。 Then, the FP generation unit 73 determines whether FP rule data is stored in the FP rule data storage unit 75 (step S45). When the FP rule data is not stored in the FP rule data storage unit 75 (step S47: No route), the FP generation unit 73 causes the FP data acquisition unit 74 to acquire the FP rule data and store the FP rule data. The data is stored in the unit 75 (step S49).

一方、ＦＰルールデータがＦＰルールデータ格納部７５に格納されている場合（ステップＳ４７：Ｙｅｓルート）、又はステップＳ４９の後に、ＦＰ生成部７３は、ＦＰルールデータに従って、ユーザにより指定された機密データの第２ＦＰ生成処理を実施する（ステップＳ５１）。第２ＦＰ生成処理については、図１２乃至図１５Ｂを用いて説明する。 On the other hand, when the FP rule data is stored in the FP rule data storage unit 75 (step S47: Yes route), or after step S49, the FP generation unit 73 performs the confidential data designated by the user according to the FP rule data. The second FP generation process is performed (step S51). The second FP generation process will be described with reference to FIGS. 12 to 15B.

ＦＰ生成部７３は、指定された機密データに対して正規化処理を実施する（図１２：ステップＳ７１）。ステップＳ２１と同様である。 The FP generation unit 73 performs normalization processing on the designated confidential data (FIG. 12: step S71). This is the same as step S21.

その後、ＦＰ生成部７３は、指定された機密データ中の数値及び特徴語を抽出し、例えばメインメモリなどの記憶装置に格納する（ステップＳ７３）。ステップＳ２３と同様である。 Thereafter, the FP generation unit 73 extracts numerical values and feature words in the designated confidential data and stores them in a storage device such as a main memory (step S73). This is the same as step S23.

例えば、図１３に示すようなテキストを処理する場合を考える。この例では、「患者」「基本」「情報」「主訴」「体温」「３８」「測定」「発熱」「判定」「検査」「心拍数」「測定」「値」「８０」「正常値」．．．「治療」「方針」などが抽出される。 For example, consider the case of processing text as shown in FIG. In this example, “patient” “basic” “information” “main complaint” “body temperature” “38” “measurement” “fever” “determination” “examination” “heart rate” “measurement” “value” “80” “normal value” ". . . “Treatment” and “policy” are extracted.

次に、ＦＰ生成部７３は、抽出された数値のうち未処理の数値を１つ特定する（ステップＳ７５）。そして、ＦＰ生成部７３は、ＦＰルールデータ格納部７５に格納されているＦＰルールデータに従って、特定された数値から、ＦＰのための１又は複数の数値を生成し、メインメモリなどの記憶装置に格納する（ステップＳ７７）。本ステップについてもステップＳ２７と基本的には同様である。 Next, the FP generation unit 73 identifies one unprocessed numerical value among the extracted numerical values (step S75). Then, the FP generation unit 73 generates one or a plurality of numerical values for the FP from the specified numerical values according to the FP rule data stored in the FP rule data storage unit 75, and stores them in a storage device such as a main memory. Store (step S77). This step is basically the same as step S27.

但し、本実施の形態では、第１の方式を採用する場合には、有効桁数の指定が検索条件に含まれる場合がある。その場合には、ＦＰのための数値について、複数の数値を生成するのではなく、指定された有効桁数の数値を生成する。 However, in this embodiment, when the first method is adopted, designation of the number of significant digits may be included in the search condition. In that case, instead of generating a plurality of numerical values for the numerical value for the FP, a numerical value having a designated effective number of digits is generated.

図１３の例の場合、「３８」については、指定された有効桁数が「２」であれば、「３．８×１０¹」というような表現の数値が生成される。 In the case of the example in FIG. 13, for “38”, if the designated number of significant digits is “2”, a numerical value with an expression such as “3.8 × 10 ¹ ” is generated.

また、第２の方式を採用する場合には、上で述べた方法と同様の方法を採用しても良い。例えば、１０刻みで数値の範囲が規定されている場合には、「３８」の場合、３０乃至４０という範囲に属するので、上限値「４０」及び下限値「３０」が特定される。補助データは、下限値からの差「８」と上限値からの差「−２」が算出される。この場合、下限値及び上限値が代表値としてハッシュ値の算出対象数値として取り扱われる。 When the second method is adopted, a method similar to the method described above may be adopted. For example, when the range of numerical values is defined in increments of 10, the value “38” belongs to the range of 30 to 40, so the upper limit value “40” and the lower limit value “30” are specified. For the auxiliary data, a difference “8” from the lower limit value and a difference “−2” from the upper limit value are calculated. In this case, the lower limit value and the upper limit value are treated as the calculation target numerical values of the hash value as representative values.

但し、第２の方式の場合、検索条件として近似と判断する範囲を指定するため、この近似と判断する範囲に基づき、特定された数値を展開してもよい。例えば、プラスマイナス１の範囲が近似と判断する範囲として指定された場合、特定された数値が「３８」であれば「３７」から「３９」までであれば近似していると判断される。従って、特定された数値が属する数値の範囲を超えて近似と判断されることがないので、所属範囲の下限値「３０」を代表値として特定し、当該代表値からの差「８」が補助データとして特定される。一方、特定された数値が「４１」である場合に、プラスマイナス３の範囲が近似と判断する範囲として指定されると、「３８」から「４４」までであれば近似していると判断される。従って、代表値としては、所属する範囲の１つ下の範囲の下限値「３０」と、所属する範囲の下限値「４０」を代表値として特定し、補助データとして、第１の下限値からの差「１１」と第２の下限値からの差「１」が算出される。 However, in the case of the second method, since a range that is determined to be approximate is designated as a search condition, a specified numerical value may be developed based on the range that is determined to be approximate. For example, when a range of plus or minus 1 is designated as a range to be determined to be approximate, if the specified numerical value is “38”, it is determined that the approximation is made if “37” to “39”. Therefore, since the specified numerical value does not exceed the range of the numerical value to which it belongs, it is not determined to be approximate, so the lower limit value “30” of the belonging range is specified as the representative value, and the difference “8” from the representative value is supplemented. Specified as data. On the other hand, if the specified numerical value is “41” and the range of plus or minus 3 is designated as the range to be determined to be approximate, if it is from “38” to “44”, it is determined to be approximate. The Therefore, as the representative value, the lower limit value “30” of the range immediately below the range to which it belongs and the lower limit value “40” of the range to which it belongs are specified as the representative value, and as the auxiliary data, from the first lower limit value And a difference “1” from the second lower limit value is calculated.

上で述べた例では、数値の範囲の刻みを超えて近似と判断される範囲が規定されないという前提があるが、このような前提が成り立たない場合には、以下のようにする。例えば、１０刻みで範囲が規定されているが、特定された数値が「１２３」で、プラスマイナス１５が近似と判断する範囲と指定された場合、「１０８」乃至「１３８」が近似と判断される。従って、「１００」「１１０」「１２０」「１３０」を代表値として特定し、それぞれとの差を補助データとして生成する。 In the example described above, there is a premise that the range that is determined to be approximate beyond the range of the numerical value range is not defined. If such a premise is not satisfied, the following is performed. For example, if the range is defined in increments of 10, but the specified numerical value is “123” and plus or minus 15 is designated as the range to be approximated, “108” to “138” are determined to be approximate. The Therefore, “100”, “110”, “120”, and “130” are specified as representative values, and the difference between them is generated as auxiliary data.

そして、ＦＰ生成部７３は、生成された１又は複数の数値における秘匿部分に対するハッシュ値を生成し、メインメモリなどの記憶装置に格納する（ステップＳ７９）。ステップＳ２９と同様である。第２の方式の場合には、補助データはハッシュ値を算出しない。 Then, the FP generation unit 73 generates a hash value for the secret part in the generated numerical value or values and stores it in a storage device such as a main memory (step S79). This is the same as step S29. In the case of the second method, the auxiliary data does not calculate a hash value.

このように、検索を行う場合には、特定された数値に対して、ＦＰのための数値として１又は複数の数値が生成される。但し、ＦＰ登録時と同様にＦＰデータを生成しても良い。上で述べたようなオプションについては、ＦＰルールデータに規定されているものとする。 As described above, when a search is performed, one or a plurality of numerical values are generated as numerical values for the FP with respect to the specified numerical values. However, FP data may be generated as in FP registration. The options as described above are defined in the FP rule data.

ＦＰ生成部７３は、特定された数値の周辺における特徴語を所定個数特定する（ステップＳ８１）。所定個数は、例えばＦＰルールデータに規定されている。そして、ＦＰ生成部７３は、特定された各特徴語についてハッシュ値を算出し、メインメモリなどの記憶装置に格納する（ステップＳ８３）。 The FP generation unit 73 specifies a predetermined number of feature words around the specified numerical value (step S81). The predetermined number is defined in FP rule data, for example. Then, the FP generation unit 73 calculates a hash value for each identified feature word, and stores it in a storage device such as a main memory (step S83).

そして、ＦＰ生成部７３は、数値のハッシュ値等（補助データがある場合には当該補助データ）と特徴語のハッシュ値とを含むデータブロックを、ＦＰデータ格納部７７に格納する（ステップＳ８５）。ステップＳ３５と同様である。 Then, the FP generation unit 73 stores a data block including a numerical hash value or the like (auxiliary data when there is auxiliary data) and a hash value of the feature word in the FP data storage unit 77 (step S85). . This is the same as step S35.

その後、ＦＰ生成部７３は、機密データから抽出された数値の中で未処理の数値が存在するか判断する（ステップＳ８７）。未処理の数値が存在している場合には処理はステップＳ７５に戻る。一方、未処理の数値が存在していない場合には呼び出し元の処理に戻る。 Thereafter, the FP generation unit 73 determines whether there is an unprocessed numerical value among the numerical values extracted from the confidential data (step S87). If there is an unprocessed numerical value, the process returns to step S75. On the other hand, if there is no unprocessed numerical value, the process returns to the calling process.

例えば、図１３に示した機密データについて、単純な第１の方式を採用した場合には、図１４Ａに示すようなＦＰデータが生成される。また、第１の方式で有効桁数が「２」である場合には、例えば図１４Ｂに示したようなＦＰデータが生成される。一方、単純な第２の方式によれば、図１５Ａに示したようなＦＰデータが生成される。さらに、検索条件で近似と判断される範囲がプラスマイナス３であれば、図１５Ｂに示したようなＦＰデータが生成される。 For example, when the simple first method is adopted for the confidential data shown in FIG. 13, FP data as shown in FIG. 14A is generated. When the number of significant digits is “2” in the first method, for example, FP data as shown in FIG. 14B is generated. On the other hand, according to the simple second method, FP data as shown in FIG. 15A is generated. Furthermore, if the range determined to be approximate by the search condition is plus or minus 3, FP data as shown in FIG. 15B is generated.

このようにすれば、完全一致だけではなく数値が近似しているか否かを判定できるようになる。 In this way, it is possible to determine whether the numerical values are approximated as well as exact matches.

図１１の処理の説明に戻って、検索装置７の検索要求部７８は、ＦＰデータ格納部７７に格納されているＦＰデータ（以下、区別するため検索ＦＰデータと呼ぶ）と検索条件データ格納部７６に格納されているデータとを含む検索要求を、管理装置５に送信する（ステップＳ５３）。 Returning to the description of the processing in FIG. 11, the search request unit 78 of the search device 7 includes the FP data stored in the FP data storage unit 77 (hereinafter referred to as search FP data for distinction) and the search condition data storage unit. A search request including the data stored in 76 is transmitted to the management apparatus 5 (step S53).

管理装置５の検索要求受信部５６は、検索装置７から、検索ＦＰデータ及び検索条件を含む検索要求を受信すると（ステップＳ５５）、検索要求のデータを検索処理部５５に出力する。検索処理部５５は、検索要求のデータを受け取ると、検索処理を実施する（ステップＳ５７）。この検索処理については、図１６乃至図２１を用いて説明する。 When receiving the search request including the search FP data and the search condition from the search device 7 (step S55), the search request receiving unit 56 of the management device 5 outputs the search request data to the search processing unit 55. When receiving the search request data, the search processing unit 55 performs a search process (step S57). This search process will be described with reference to FIGS.

検索処理部５５は、ＦＰルールデータ格納部５１からＦＰルールデータを読み出す（図１６：ステップＳ９１）。そして、検索処理部５５は、類似すると判定されたＦＰについての識別情報を格納する類似ＦＰ配列を初期化する（ステップＳ９３）。さらに、検索処理部５５は、類似判定のための閾値Ｔを、ＦＰルールデータ又は検索条件から設定する（ステップＳ９５）。閾値は固定のこともあり、その場合にはＦＰルールデータに含まれる。 The search processing unit 55 reads FP rule data from the FP rule data storage unit 51 (FIG. 16: step S91). Then, the search processing unit 55 initializes a similar FP array that stores identification information about the FP determined to be similar (step S93). Further, the search processing unit 55 sets a threshold T for similarity determination from FP rule data or search conditions (step S95). The threshold value may be fixed and is included in the FP rule data in that case.

その後、検索処理部５５は、ＤＢ５４内の未処理のＦＰデータを特定する（ステップＳ９７）。そして、検索処理部５５は、特定されたＦＰデータと検索ＦＰデータとについて類似度算出処理を実施する（ステップＳ９９）。類似度算出処理については、図１７乃至図２１を用いて説明する。 Thereafter, the search processing unit 55 identifies unprocessed FP data in the DB 54 (step S97). Then, the search processing unit 55 performs a similarity calculation process for the specified FP data and the searched FP data (step S99). The similarity calculation process will be described with reference to FIGS.

まず、検索処理部５５は、数値の類似度に応じた特徴語の共通度合いの累計値を算出するための変数ｃ１及び特定されたＦＰデータに含まれるデータブロック数をカウントするための変数ｃ２を０に初期化する（図１７：ステップＳ１１１）。また、検索処理部５５は、検索ＦＰデータに含まれるデータブロックのうち未処理のデータブロックの数値データＮ１を特定する（ステップＳ１１３）。ハッシュ値が複数ある場合、補助データがある場合も、それらを含めてＮ１として特定する。 First, the search processing unit 55 calculates a variable c1 for calculating the cumulative value of the common degree of feature words according to the numerical similarity and a variable c2 for counting the number of data blocks included in the specified FP data. It is initialized to 0 (FIG. 17: step S111). In addition, the search processing unit 55 specifies the numerical data N1 of the unprocessed data block among the data blocks included in the search FP data (step S113). When there are a plurality of hash values, and there is auxiliary data, it is specified as N1 including those.

さらに、検索処理部５５は、特定されたＦＰデータに含まれるデータブロックのうち未処理のデータブロックの数値データＮ２を特定する（ステップＳ１１５）。ここでも、ハッシュ値が複数ある場合、補助データがある場合も、それらも含めてＮ２として特定する。 Further, the search processing unit 55 specifies the numerical data N2 of the unprocessed data block among the data blocks included in the specified FP data (step S115). Also here, if there are a plurality of hash values, and there is auxiliary data, it is specified as N2 including these.

そして、検索処理部５５は、数値データＮ１と数値データＮ２とを比較して、数値類似度Ｓｉｍを設定する（ステップＳ１１９）。本実施の形態では、上で述べた２つの方式が存在する。最初に、単純な比較方式について説明する。 Then, the search processing unit 55 compares the numerical data N1 and the numerical data N2, and sets the numerical similarity Sim (step S119). In the present embodiment, there are the two methods described above. First, a simple comparison method will be described.

第１の方式の場合、数値データＮ１には１又は複数のハッシュ値が含まれ、数値データＮ２には複数のハッシュ値が含まれる。例えば、数値データＮ１に複数のハッシュ値が含まれる例を図１８に示す。図１８の例では、数値データＮ１の元の数値は３８．２で、有効桁数１乃至３の場合のハッシュ値が数値データＮ１に含まれる。一方、図１８には、元の数値が３８．２であるデータブロック（Ａ）の数値データＮ２と、元の数値が３８であるデータブロック（Ｂ）の数値データＮ２と、元の数値が３９であるデータブロック（Ｃ）の数値データＮ２とが比較対象として示されている。このように複数のハッシュ値が数値データＮ１に含まれる場合には、いずれかのハッシュ値が、比較対象の数値データＮ２に含まれるハッシュ値と一致すれば、Ｓｉｍに１を設定し、いずれのハッシュ値も、比較対象の数値データＮ２に含まれるハッシュ値と一致しなければ、Ｓｉｍに０を設定する。図１８の例では、データブロック（Ａ）乃至（Ｃ）のいずれも有効桁数「１」について一致するので、Ｓｉｍ＝１と設定される。 In the case of the first method, the numerical data N1 includes one or a plurality of hash values, and the numerical data N2 includes a plurality of hash values. For example, FIG. 18 shows an example in which the numerical data N1 includes a plurality of hash values. In the example of FIG. 18, the original numerical value of the numerical data N1 is 38.2, and the hash value when the number of significant digits is 1 to 3 is included in the numerical data N1. On the other hand, FIG. 18 shows the numerical data N2 of the data block (A) whose original numerical value is 38.2, the numerical data N2 of the data block (B) whose original numerical value is 38, and the original numerical value of 39. Numerical data N2 of the data block (C) is shown as a comparison target. In this way, when a plurality of hash values are included in the numerical data N1, if any hash value matches the hash value included in the numerical data N2 to be compared, 1 is set in Sim, If the hash value does not match the hash value included in the numerical data N2 to be compared, 0 is set to Sim. In the example of FIG. 18, since all of the data blocks (A) to (C) match for the number of significant digits “1”, Sim = 1 is set.

一方、数値データＮ１に、指定された有効桁数のハッシュ値が１つだけ含まれる場合には、その１つのハッシュ値に一致するか否かを判断する。例えば、数値データＮ１について有効桁数２である「３．８×１０¹」のみが含まれる場合には、データブロック（Ａ）及び（Ｂ）については有効桁数２についてのハッシュ値が一致するが、データブロック（Ｃ）については一致するハッシュ値がないと判断される。 On the other hand, when the numerical data N1 includes only one hash value having the designated number of significant digits, it is determined whether or not it matches the one hash value. For example, when only the number of significant digits “3.8 × 10 ¹ ” is included for the numerical data N1, the hash values for the number of significant digits 2 match for the data blocks (A) and (B). However, it is determined that there is no matching hash value for the data block (C).

第２の方式の場合、数値データＮ１には１又は複数のハッシュ値及び対応する補助データとが含まれ、数値データＮ２には複数のハッシュ値及び対応する補助データが含まれる。図１９に、数値データＮ１と数値データＮ２との比較例を模式的に示す。例えば、元の数値「３８．２」の数値データＮ２には、Ｈａｓｈ（３０）及び補助データ「８．２」とＨａｓｈ（４０）及び補助データ「−１．８」とが含まれている。これに対して、元の数値「３９．１」の数値データＮ１には、Ｈａｓｈ（３０）及び補助データ「９．１」とＨａｓｈ（４０）及び補助データ「−０．９」とが含まれる。なお、検索条件として近似と判断する範囲のデータが指定され、ここではプラスマイナス１が指定されているものとする。 In the case of the second method, the numerical data N1 includes one or more hash values and corresponding auxiliary data, and the numerical data N2 includes a plurality of hash values and corresponding auxiliary data. FIG. 19 schematically shows a comparative example between the numerical data N1 and the numerical data N2. For example, the numerical data N2 of the original numerical value “38.2” includes Hash (30), auxiliary data “8.2”, Hash (40), and auxiliary data “−1.8”. On the other hand, the numerical data N1 of the original numerical value “39.1” includes Hash (30), auxiliary data “9.1”, Hash (40), and auxiliary data “−0.9”. . It should be noted that data in a range determined to be approximate is designated as a search condition, and here, plus or minus 1 is designated.

この場合、数値データＮ１に含まれるハッシュ値と、数値データＮ２に含まれるハッシュ値とを比較して一致するものがあるか判断する。図１９の例ではｈａｓｈ（３０）及びｈａｓｈ（４０）のいずれも一致すると判断される。そして、ｈａｓｈ（３０）の場合には、数値データＮ２の補助データ「８．２」と数値データＮ１の補助データ「９．１」との差が、指定された範囲内であるか否かを判断する。この場合、｜９．１−８．２｜＝０．９であるから、指定された範囲内であるので、本実施の形態では、数値類似度Ｓｉｍ＝１に設定する。もし、補助データの差が、指定された範囲を超えている場合には、数値類似度Ｓｉｍ＝０に設定する。ｈａｓｈ（４０）については同じ値が得られるので、処理しなくとも良い。 In this case, the hash value included in the numerical data N1 and the hash value included in the numerical data N2 are compared to determine whether there is a match. In the example of FIG. 19, it is determined that both hash (30) and hash (40) match. In the case of hash (30), it is determined whether or not the difference between the auxiliary data “8.2” of the numerical data N2 and the auxiliary data “9.1” of the numerical data N1 is within a specified range. to decide. In this case, since | 9.1-8.2 | = 0.9, it is within the specified range, and therefore in this embodiment, the numerical similarity Sim = 1 is set. If the difference between the auxiliary data exceeds the designated range, the numerical similarity Sim = 0 is set. Since the same value is obtained for hash (40), it is not necessary to process it.

次に、数値の類似度合いに応じて数値類似度Ｓｉｍを０から１までの実数を設定する方式について説明する。第１の方式の場合には、図１８に示すように、数値データＮ１に複数のハッシュ値が含まれ、数値データＮ２にも複数のハッシュ値が含まれる。従って、同一の有効桁数のハッシュ値同士を比較して、一致する回数をカウントする。例えばデータブロック（Ａ）の場合、元の数値が一致するので、３回一致する。データブロック（Ｂ）の場合、有効桁数２まで一致するので、２回一致する。データブロック（Ｃ）については、有効桁数１まで一致するので、１回一致する。従って、データブロック（Ａ）については、Ｓｉｍ＝３回／３（＝有効桁数の種類数）＝１を設定し、データブロック（Ｂ）については、Ｓｉｍ＝２回／３＝０．６７を設定し、データブロック（Ｃ）については、Ｓｉｍ＝１回／３＝０．３３を設定する。 Next, a method for setting a real number from 0 to 1 as the numerical similarity Sim according to the numerical similarity will be described. In the case of the first method, as shown in FIG. 18, the numerical data N1 includes a plurality of hash values, and the numerical data N2 also includes a plurality of hash values. Therefore, hash values having the same number of significant digits are compared, and the number of times of matching is counted. For example, in the case of the data block (A), since the original numerical values match, they match three times. In the case of the data block (B), it matches up to 2 significant digits, so it matches twice. Since the data block (C) matches up to 1 significant digit, it matches once. Therefore, for data block (A), Sim = 3 times / 3 (= number of types of significant digits) = 1 is set, and for data block (B), Sim = 2 times / 3 = 0.67. For data block (C), Sim = 1 times / 3 = 0.33 is set.

一方、第２の方式の場合、上で述べたように補助データの差が算出されるので、（指定された範囲−補助データの差の絶対値）／（指定された範囲）で算出する。上で述べた例では、Ｓｉｍ＝｜１−０．９｜／１＝０．１と算出される。 On the other hand, in the case of the second method, since the difference between the auxiliary data is calculated as described above, it is calculated by (specified range-absolute value of auxiliary data difference) / (specified range). In the example described above, Sim = | 1-0.9 | /1=0.1 is calculated.

その後、検索処理部５５は、数値類似度Ｓｉｍが０を超えているか判断する（ステップＳ１２１）。数値類似度Ｓｉｍが０である場合には、端子Ｂを介して図２１のステップＳ１３７に移行する。これは、数値類似度Ｓｉｍとの乗算によってそのデータブロックについての類似度が決定されるので、数値類似度Ｓｉｍ＝０であれば、当該データブロックについて比較を行っても全体で０となってしまうためである。一方、数値類似度Ｓｉｍ＞０であれば、端子Ａを介して図２０のステップＳ１２３の処理に移行する。 After that, the search processing unit 55 determines whether the numerical similarity Sim exceeds 0 (step S121). When the numerical similarity Sim is 0, the process proceeds to step S137 in FIG. This is because the similarity for the data block is determined by multiplication with the numerical similarity Sim, so that if the numerical similarity Sim = 0, the data block will be 0 even if the comparison is made. Because. On the other hand, if the numerical similarity Sim> 0, the process proceeds to step S123 in FIG.

図２０の処理の説明に移行して、検索処理部５５は、数値データＮ１に対応するデータブロックに含まれる特徴語のうち未処理の特徴語のハッシュ値ＫＷ１を特定する（ステップＳ１２３）。また、検索処理部５５は、数値データＮ２に対応するデータブロックに含まれる特徴語のうち未処理の特徴語のハッシュ値ＫＷ２を特定する（ステップＳ１２５）。そして、検索処理部５５は、ハッシュ値ＫＷ１とハッシュ値ＫＷ２とを比較する（ステップＳ１２７）。 Shifting to the description of the processing in FIG. 20, the search processing unit 55 specifies the hash value KW1 of the unprocessed feature word among the feature words included in the data block corresponding to the numerical data N1 (step S123). Further, the search processing unit 55 specifies the hash value KW2 of the unprocessed feature word among the feature words included in the data block corresponding to the numerical data N2 (step S125). Then, the search processing unit 55 compares the hash value KW1 and the hash value KW2 (step S127).

なお、本実施の形態では同一のＦＰデータについて類似度を算出した場合には、１になることを前提としている。しかし、一般的には、数値について同一のハッシュ値が異なるデータブロックで出現する場合がある。この場合、異なる特徴語のハッシュ値が対応付けられている場合には特に問題ないが、同一の特徴語のハッシュ値が対応付けられている場合には同一のＦＰデータについて類似度を算出すると全体として類似度が１を超えてしまう。そこで、数値についてのハッシュ値と特徴語についてのハッシュ値との組み合わせが既に出現していたことが判明した場合には、その比較結果を類似度に反映しないようにする。 In this embodiment, it is assumed that the similarity is 1 when the similarity is calculated for the same FP data. However, in general, the same hash value may appear in different data blocks for numerical values. In this case, there is no particular problem when hash values of different feature words are associated with each other, but when hash values of the same feature words are associated with each other, the similarity is calculated for the same FP data as a whole. As a result, the degree of similarity exceeds 1. Therefore, when it is found that a combination of a hash value for a numerical value and a hash value for a feature word has already appeared, the comparison result is not reflected in the similarity.

従って、検索処理部５５は、ＫＷ１＝ＫＷ２であって且つ数値データＮ１とＫＷ１の組み合わせが初出であるか判断する（ステップＳ１２９）。ＫＷ１とＫＷ２とが一致しない場合、又は数値データＮ１とＫＷ１の組み合わせが既出である場合には、ステップＳ１３３に移行する。 Therefore, the search processing unit 55 determines whether KW1 = KW2 and the combination of the numerical data N1 and KW1 is the first appearance (step S129). If KW1 and KW2 do not match, or if the combination of numerical data N1 and KW1 has already been made, the process proceeds to step S133.

一方、ＫＷ１＝ＫＷ１であって且つ数値データＮ１とＫＷ１の組み合わせが初出である場合、検索処理部５５は、変数ｃ１に数値類似度Ｓｉｍを加算して新たな変数ｃ１の値として設定する（ステップＳ１３１）。数値類似度Ｓｉｍが０又は１の場合には、変数ｃ１には、共通する特徴語の数が設定される。一方、数値類似度Ｓｉｍが０から１までの値で変化する場合には、変数ｃ１には、データブロック毎に数値類似度Ｓｉｍで重み付けされた共通特徴語の数が累積される。 On the other hand, when KW1 = KW1 and the combination of the numerical data N1 and KW1 is the first appearance, the search processing unit 55 adds the numerical similarity Sim to the variable c1 and sets it as the value of the new variable c1 (step S1). S131). When the numerical similarity Sim is 0 or 1, the number of common feature words is set in the variable c1. On the other hand, when the numerical similarity Sim changes from 0 to 1, the number of common feature words weighted by the numerical similarity Sim for each data block is accumulated in the variable c1.

そして、検索処理部５５は、数値データＮ２に対応するデータブロックに未処理の特徴語のハッシュ値ＫＷ２が存在するか判断する（ステップＳ１３３）。未処理の特徴語のハッシュ値が存在する場合にはステップＳ１２５に戻る。一方、未処理の特徴語のハッシュ値が存在しない場合には、検索処理部５５は、数値データＮ１に対応するデータブロックに未処理の特徴語が存在するか判断する（ステップＳ１３５）。未処理の特徴語のハッシュ値が存在する場合にはステップＳ１２３に戻る。一方、未処理の特徴語のハッシュ値が存在しない場合には、端子Ｂを介して図２１のステップＳ１３７に移行する。 Then, the search processing unit 55 determines whether or not the hash value KW2 of the unprocessed feature word exists in the data block corresponding to the numerical data N2 (step S133). If there is a hash value of an unprocessed feature word, the process returns to step S125. On the other hand, if there is no hash value of an unprocessed feature word, the search processing unit 55 determines whether an unprocessed feature word exists in the data block corresponding to the numerical data N1 (step S135). If there is a hash value of an unprocessed feature word, the process returns to step S123. On the other hand, if there is no hash value of an unprocessed feature word, the process proceeds to step S137 in FIG.

図２１の処理の説明に移行して、検索処理部５５は、変数ｃ２を１インクリメントする（ステップＳ１３７）。検索処理部５５は、特定されたＦＰデータに含まれるデータブロックに未処理のデータブロックがあるか判断する（ステップＳ１３９）。特定されたＦＰデータに未処理のデータブロックが存在している場合には、処理は端子Ｃを介して図１７のステップＳ１１５に戻る。一方、特定されたＦＰデータに未処理のデータブロックが存在しない場合には、検索処理部５５は、検索ＦＰデータに含まれるデータブロックに未処理のデータブロックが存在するか判断する（ステップＳ１４１）。検索ＦＰデータに未処理のデータブロックが存在する場合には、処理は端子Ｄを介して図１７のステップＳ１１３に戻る。一方、検索ＦＰデータに未処理のデータブロックが存在しない場合には、検索処理部５５は、ｃ１／（ｃ２×ブロックサイズ）により類似度を算出し、ＦＰデータの識別情報に対応付けて例えばメインメモリなどの記憶装置に格納する（ステップＳ１４３）。ブロックサイズは、１データブロックに含まれる特徴語の数である。そして呼び出し元の処理に戻る。 Shifting to the description of the processing in FIG. 21, the search processing unit 55 increments the variable c2 by 1 (step S137). The search processing unit 55 determines whether there is an unprocessed data block in the data block included in the specified FP data (step S139). If an unprocessed data block exists in the identified FP data, the process returns to step S115 in FIG. On the other hand, if there is no unprocessed data block in the specified FP data, the search processing unit 55 determines whether there is an unprocessed data block in the data block included in the search FP data (step S141). . If there is an unprocessed data block in the search FP data, the processing returns to step S113 in FIG. On the other hand, when there is no unprocessed data block in the search FP data, the search processing unit 55 calculates the similarity by c1 / (c2 × block size), and associates it with the identification information of the FP data, for example, the main data block. The data is stored in a storage device such as a memory (step S143). The block size is the number of feature words included in one data block. Then, the process returns to the calling process.

このような処理を実施することで、数値をベースに特徴語も類似する機密データを秘匿性を保持しつつ検索することができる。数値についても近似しているか否かを秘匿化したままで判断できる。さらに検索ＦＰデータについても秘匿化されており、管理装置５に対しても、どのような検索を行っているのかについて秘密が保持されている。 By performing such processing, it is possible to search confidential data having similar feature words based on numerical values while maintaining confidentiality. Whether the numerical value is approximated or not can be determined while keeping it secret. Further, the search FP data is also concealed, and the management apparatus 5 is kept secret about what kind of search is being performed.

なお、ステップＳ１４３で計算される類似度については、特定されたＦＰデータにフォーカスし、そのデータに含まれるブロック数ｃ２を類似度計算式に入れた。そのほかに、検索ＦＰデータに含まれるブロック数Ｎ_Qをｃ２の代わりに使い、特徴語の共通度合いｃ１が検索ＦＰデータのサイズ（ブロック数Ｎ_Q×ブロックサイズ）のどの程度の割合を占めるかを表す類似度も考えられる。その計算式は以下の式で表される。同様に、利用場面によっては、ｃ２とＮ_Qの大きい方ｍａｘ(ｃ２, Ｎ_Q)、または小さい方ｍｉｎ(ｃ２, Ｎ_Q)をｃ２の代わりに使うことも考えられる。 Note that the similarity calculated in step S143 is focused on the specified FP data, and the number of blocks c2 included in the data is included in the similarity calculation formula. In addition, the number of blocks N _Q included in the search FP data is used instead of c2, and the degree of commonness c1 of the feature word occupies the size of the search FP data size (number of blocks N _Q × block size). The degree of similarity can also be considered. The calculation formula is represented by the following formula. Similarly, by the use scene it is also contemplated to use larger max of c2 and _{_{N Q (c2, N Q)}} , or the smaller min to (c2, N _Q) instead of c2.

ここでＱが検索ＦＰデータを表し、Ｄが比較対象のＦＰデータを表す。そして、Block_sizeは、上で述べたブロックサイズであり、Ｎ_Qは、検索ＦＰデータのデータブロック数を表す。Ｎｕｍ_Qiは、検索ＦＰデータにおけるｉ番目のデータブロックの数値データを表し、Ｎｕｍ_Djは、比較対象のＦＰデータにおけるｊ番目のデータブロックの数値データを表す。Ｓｉｍ（Ｎｕｍ_Qi，Ｎｕｍ_Dj）は、検索ＦＰデータにおけるｉ番目のデータブロックの数値データと、比較対象のＦＰデータにおけるｊ番目のデータブロックの数値データとの類似度Ｓｉｍを表す。Ｂ_Qi∩Ｂ_Djは、検索ＦＰデータにおけるｉ番目のデータブロックに含まれる特徴語のハッシュ値と、比較対象のＦＰデータにおけるｊ番目のデータブロックに含まれる特徴語のハッシュ値とで共通するハッシュ値の個数を表す。 Here, Q represents search FP data, and D represents FP data to be compared. Block_size is the block size described above, and N _Q represents the number of data blocks of the search FP data. Num _Qi represents the numerical data of the i-th data block in the search FP data, and Num _Dj represents the numerical data of the j-th data block in the FP data to be compared. Sim (Num _Qi , Num _Dj ) represents the similarity Sim between the numerical data of the i-th data block in the search FP data and the numerical data of the j-th data block in the FP data to be compared. B _Qi ∩B _Dj is a hash common to the hash value of the feature word included in the i-th data block in the search FP data and the hash value of the feature word included in the j-th data block in the FP data to be compared. Represents the number of values.

図１６の処理の説明に戻って、検索処理部５５は、算出した類似度が、検索条件で指定された閾値Ｔを超えているか判断する（ステップＳ１０１）。類似度が閾値Ｔを超えている場合には、検索処理部５５は、特定されたＦＰデータの識別情報（図１０におけるＦＰＩＤ、登録者ＩＤ及び登録日など）を含む書誌データを、類似ＦＰ配列に追加する（ステップＳ１０３）。検索者の参照のため、類似度の数値自体を書誌データに含めるようにしても良い。一方、類似度が閾値Ｔ以下である場合には、ステップＳ１０５に移行する。 Returning to the description of the processing in FIG. 16, the search processing unit 55 determines whether or not the calculated similarity exceeds the threshold T specified by the search condition (step S <b> 101). When the similarity exceeds the threshold T, the search processing unit 55 converts the bibliographic data including identification information (such as FPID, registrant ID, and registration date in FIG. 10) of the identified FP data into a similar FP array. (Step S103). The bibliographic data may include the similarity value itself for reference by the searcher. On the other hand, when the similarity is equal to or less than the threshold value T, the process proceeds to step S105.

ステップＳ１０１で類似度が閾値Ｔ以下であると判断された場合又はステップＳ１０３の後に、検索処理部５５は、ＤＢ５４内に未処理のＦＰデータが存在しているか判断する（ステップＳ１０５）。未処理のＦＰデータが存在している場合には処理はステップＳ９７に戻る。一方、未処理のＦＰデータが存在していない場合には、検索処理部５５は、類似ＦＰ配列のデータを検索結果送信部５７に出力する（ステップＳ１０７）。そして呼び出し元の処理に戻る。なお、登録者の詳細データについて追加した形で、検索結果送信部５７に出力するようにしても良い。 When it is determined in step S101 that the similarity is equal to or less than the threshold value T or after step S103, the search processing unit 55 determines whether unprocessed FP data exists in the DB 54 (step S105). If unprocessed FP data exists, the process returns to step S97. On the other hand, when there is no unprocessed FP data, the search processing unit 55 outputs the data of the similar FP array to the search result transmission unit 57 (step S107). Then, the process returns to the calling process. Note that the detailed data of the registrant may be added to the search result transmission unit 57 in an added form.

このようにして完全一致だけではなく類似するＦＰデータを特定して、当該ＦＰデータに関連するデータが抽出される。 In this way, not only perfect matching but also similar FP data is specified, and data related to the FP data is extracted.

図１１の処理の説明に戻って、検索結果送信部５７は、検索処理部５５から受け取った検索結果のデータを、検索要求の送信元である検索装置７に送信する（ステップＳ５９）。検索装置７の検索要求部７８は、検索結果を管理装置５から受信し、出力部７９に出力する（ステップＳ６１）。そして、出力部７９は、検索結果を表示装置などに出力する。例えば、図２２に示すようなデータが表示装置に表示される。図２２の例では、ＦＰＩＤと、登録者と、登録日と、類似度とが表示される。このように類似度が高い順にソートされた結果が提示されるようにしても良い。 Returning to the description of the processing in FIG. 11, the search result transmission unit 57 transmits the data of the search result received from the search processing unit 55 to the search device 7 that is the transmission source of the search request (step S59). The search request unit 78 of the search device 7 receives the search result from the management device 5 and outputs it to the output unit 79 (step S61). Then, the output unit 79 outputs the search result to a display device or the like. For example, data as shown in FIG. 22 is displayed on the display device. In the example of FIG. 22, FPID, registrant, registration date, and similarity are displayed. As described above, the sorted results may be presented in descending order of similarity.

これによって、検索者は、類似するＦＰデータの登録者を特定できるので、当該登録者に具体的な情報提供を依頼することができるようになる。 As a result, the searcher can specify a registrant of similar FP data, and can request specific information from the registrant.

例えば、診療データについてＦＰデータを登録する場合には、診療データそのものを開示することがないので、プライバシ保護やセキュリティ保護の観点で問題が生じず、管理装置５へのＦＰデータ登録が促進される。一方、検索側でも患者のデータは秘匿化されたままであり、プライバシ保護やセキュリティ保護の観点で問題は無いので、利用の促進も図られる。そして、具体的に類似する症例の存在が確認できれば、別途問い合わせを行うことで、治療法などの情報を早期に取得でき、患者にも有効である。 For example, when registering FP data for medical data, since the medical data itself is not disclosed, there is no problem in terms of privacy protection and security protection, and registration of FP data in the management apparatus 5 is promoted. . On the other hand, the patient's data is kept secret on the search side, and there is no problem in terms of privacy protection and security protection, so the use can be promoted. If the presence of a similar case can be confirmed, information such as a treatment method can be acquired at an early stage by making a separate inquiry, which is also effective for patients.

以上本技術の実施の形態を説明したが、本技術はこれに限定されるものではない。例えば、機密データから特徴語を特定するような処理を実施していたが、これに加えて、図２３に示すような処理フローを実施しても良い。ステップＳ２０１乃至Ｓ２０５以外は、図６と同様である。ＦＰ生成部３３は、特定された特徴語の同義語を辞書から抽出し（ステップＳ２０１）、各特徴語及び各同義語についてハッシュ値を算出する（ステップＳ２０３）。又、ＦＰ生成部３３は、特徴語に加えて同義語のハッシュ値をも含むデータブロックをＦＰデータ格納部３６に格納する（ステップＳ２０５）。このようにして、同義語についてのハッシュ値をもＦＰデータに含めるようにしても良い。 Although the embodiment of the present technology has been described above, the present technology is not limited to this. For example, processing for specifying a feature word from confidential data has been performed, but in addition to this, a processing flow as shown in FIG. 23 may be performed. Steps S201 to S205 are the same as in FIG. The FP generation unit 33 extracts synonyms of the identified feature words from the dictionary (step S201), and calculates a hash value for each feature word and each synonym (step S203). Further, the FP generation unit 33 stores a data block including a synonym hash value in addition to the feature word in the FP data storage unit 36 (step S205). In this way, a hash value for a synonym may also be included in the FP data.

さらに、上では閾値Ｔを検索条件に含める例を示したが、例えば類似度が高い順で上位指定個数のＦＰデータを抽出するようにしても良い。 Furthermore, although the example in which the threshold value T is included in the search condition has been described above, for example, the upper designated number of FP data may be extracted in descending order of similarity.

さらに、上で示した機能ブロック図は一例であって、必ずしも実際のプログラムモジュール構成と一致しない場合もある。さらに、処理フローについても、処理結果が変わらない限り処理ステップの順番を入れ替えたり、並列実行するようにしても良い。 Furthermore, the functional block diagram shown above is an example, and may not necessarily match the actual program module configuration. Further, regarding the processing flow, as long as the processing result does not change, the order of the processing steps may be changed or may be executed in parallel.

また、ＦＰルールデータは、管理装置５以外で管理しても良い。 The FP rule data may be managed by a device other than the management device 5.

なお、上で述べた登録装置３、管理装置５及び検索装置７は、コンピュータ装置であって、図２４に示すように、メモリ２５０１とＣＰＵ（Central Processing Unit）２５０３とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The registration device 3, the management device 5, and the search device 7 described above are computer devices, and as shown in FIG. 24, a memory 2501, a CPU (Central Processing Unit) 2503, a hard disk drive (HDD: Hard). Disk Drive) 2505, a display control unit 2507 connected to the display device 2509, a drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. . An operating system (OS: Operating System) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本実施の形態をまとめると、以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態の第１の形態に係る情報処理方法は、（Ａ）データ格納部に格納されており且つ第１の数値を含むテキストデータから、第１の数値及び当該第１の数値の周辺に存在する複数個の特徴語を抽出する処理と、（Ｂ）抽出された第１の数値から、当該第１の数値と近似するか否かを判断する上で基準となる１又は複数の第２の数値を生成する生成処理と、（Ｃ）１又は複数の第２の数値と複数個の特徴語との各々について秘匿化処理を行って秘匿化データを生成し、データ格納部に格納する処理とを含む。 The information processing method according to the first embodiment of the present embodiment is based on (A) text data stored in the data storage unit and including the first numerical value, and the first numerical value and the surroundings of the first numerical value. A process of extracting a plurality of feature words existing in (B), and (B) one or a plurality of first reference values used as a reference in determining whether to approximate the first numerical value from the extracted first numerical value Generation processing for generating a numerical value of 2, and (C) concealment processing is performed on each of one or more second numerical values and a plurality of feature words to generate concealment data, and store the data in the data storage unit Processing.

このように第２の数値を生成して当該第２の数値の秘匿化データを生成すれば、完全一致だけではなく近似する数値についても検出できるようになる。なお、第２の数値を１つだけ生成するのではなく複数生成すれば、より近似する数値を検出し易くなる。これはデータ登録時でもデータ検索時でも同様である。なお、第１の数値と類似するか否かを判断する上で基準となる数値は、近似判断の幅を表す数値とも言える。 If the second numerical value is generated and the concealment data of the second numerical value is generated in this way, it is possible to detect not only a perfect match but also an approximate numerical value. If a plurality of second numerical values are generated instead of only one, it becomes easier to detect a more approximate numerical value. This is the same during data registration and data retrieval. In addition, it can be said that the numerical value used as a reference in determining whether or not the first numerical value is similar is a numerical value indicating the range of approximation determination.

また、上で述べた生成処理が、抽出された第１の数値を異なる有効桁数で表した複数の第２の数値を生成する処理である場合もある。このようにすれば、有効桁数によって近似の精度を調整できる。 In addition, the generation process described above may be a process of generating a plurality of second numerical values in which the extracted first numerical values are represented by different effective digits. In this way, the accuracy of approximation can be adjusted by the number of significant digits.

さらに、上で述べた生成処理が、抽出された第１の数値を含む所定の数値範囲の上限値及び下限値である複数の第２の数値を特定する処理と、第１の数値と下限値との差と、第１の数値と上限値との差とを算出し、データ格納部に格納する処理とを含むようにしても良い。このようにすれば、検索の際に、元の数値との差を計算しやすくなる。 Further, the generation processing described above specifies a plurality of second numerical values that are an upper limit value and a lower limit value of a predetermined numerical range including the extracted first numerical value, and the first numerical value and the lower limit value. And a process of calculating a difference between the first numerical value and the upper limit value and storing the difference in the data storage unit. In this way, it becomes easy to calculate the difference from the original numerical value during the search.

さらに、上で述べた生成処理が、抽出された第１の数値と、数値を分類するための数値範囲の設定とから、第１の数値を代表する１又は複数の第２の数値を特定する処理と、１又は複数の第２の数値と、第１の数値との差を算出し、データ格納部に格納する処理とを含むようにしても良い。例えば、検索のためのデータを生成する際には、近似と判断する範囲なども加味して第２の数値を生成すれば、検索時に近似する数値についての秘匿化データを正確に特定できるようになる。 Further, the generation process described above specifies one or more second numerical values representing the first numerical value from the extracted first numerical value and the setting of the numerical value range for classifying the numerical value. The process may include a process of calculating a difference between the first numerical value and the first numerical value and the first numerical value and storing the difference in the data storage unit. For example, when generating the data for search, if the second numerical value is generated in consideration of the range determined to be approximate, the concealment data for the numerical value approximated at the time of the search can be accurately specified. Become.

また、上で述べた生成処理が、抽出された第１の数値を、指示された有効桁数で表した第２の数値を１つ生成する処理である場合もある。検索時にはこのように有効桁数を指定することで、所望の精度で近似を判断できるようになる。 In addition, the generation process described above may be a process of generating one second numerical value in which the extracted first numerical value is represented by the designated effective number of digits. By specifying the number of significant digits in this way at the time of search, approximation can be determined with a desired accuracy.

さらに、本実施の形態の第１の形態に係る情報処理方法は、複数個の特徴語の同義語を抽出する処理と、同義語の秘匿化を行って秘匿化データを生成し、データ格納部に格納する処理とをさらに含むようにしても良い。これによれば、類似する秘匿化データを抽出し易くなる。 Furthermore, the information processing method according to the first embodiment of the present embodiment generates processing of extracting synonyms of a plurality of feature words and concealing synonyms to generate concealed data, and a data storage unit May further include a process of storing in. According to this, it becomes easy to extract similar concealment data.

本実施の形態の第２の態様に係る情報処理方法は、（Ａ）第１の数値の第１の秘匿化データ値と複数個の第１の特徴語の第２の秘匿化データ値とを含む１又は複数の検索データブロックを含む検索要求を受信する処理と、（Ｂ）複数の第２の数値の第３の秘匿化データ値と複数個の第２の特徴語の第４の秘匿化データ値とを含む１又は複数のデータブロックと識別情報とを含む案件データブロックを複数格納するデータ格納部に格納されている案件データブロックの各々について、第１の秘匿化データ値と第３の秘匿化データ値とから算出される、数値についての第１の類似度と、第２の秘匿化データ値と一致する第４の秘匿化データ値の個数とから、処理対象の案件データブロックに含まれるデータブロックと検索データブロックとの各組み合わせについての第２の類似度の合計値である第３の類似度を算出する算出処理と、（Ｃ）第３の類似度が閾値を超えた案件データブロックの識別情報又は第３の類似度が上位所定数の案件データブロックの識別情報を、検索要求の送信元に送信する処理とを含む。 The information processing method according to the second aspect of the present embodiment includes (A) a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words. Processing for receiving a search request including one or a plurality of search data blocks, and (B) a third concealment data value of a plurality of second numerical values and a fourth concealment of a plurality of second feature words For each of the case data blocks stored in the data storage unit for storing a plurality of case data blocks including one or more data blocks including the data value and identification information, the first concealed data value and the third Included in the case data block to be processed from the first similarity degree for the numerical value calculated from the concealment data value and the number of the fourth concealment data value that matches the second concealment data value Each set of data block and search data block A calculation process for calculating a third similarity that is a total value of the second similarities for the combination, and (C) identification information of the case data block whose third similarity exceeds the threshold or the third similarity Includes a process of transmitting the identification information of the upper predetermined number of item data blocks to the transmission source of the search request.

このようにすれば、データ格納部に格納されている案件データブロックも、検索要求に含まれるデータブロックについても秘匿された状態で、数値についての類似度も特定でき、全体としての類似度も算出できる。従って、より類似度の高い案件データブロックを特定できるようになる。 In this way, both the case data block stored in the data storage unit and the data block included in the search request are concealed, and the numerical similarity can be specified, and the overall similarity is calculated. it can. Therefore, it becomes possible to specify a case data block having a higher similarity.

なお、上で述べた算出処理が、第１の秘匿化データ値に一致する第３の秘匿化データ値が存在する場合には第１の類似度を１に設定し、第１の秘匿化データ値に一致する第３の秘匿化データ値が存在しない場合には第１の類似度を０に設定する処理を含むようにしても良い。例えば元の値の近似判断を表す複数の第２の数値について第３の秘匿化データ値を用意しておけば、近似する数値の存在を検出しやすくなる。 In addition, when the calculation process described above has the 3rd encryption data value which corresponds to a 1st encryption data value, a 1st similarity is set to 1 and the 1st encryption data If there is no third anonymized data value that matches the value, a process of setting the first similarity to 0 may be included. For example, if a third concealment data value is prepared for a plurality of second numerical values representing approximation of the original value, it is easy to detect the presence of the numerical value to be approximated.

また、上で述べた第１の秘匿化データ値が、第１の数値の元の数値についての代表値の秘匿化データ値である場合もある。そして、上で述べた検索要求には、第１の数値の元の数値についての代表値との差である第１の補助数値と、近似判定のための範囲のデータとをさらに含む場合もある。そして、複数の第２の数値が、元の数値が属する値域の下限値及び上限値であり、上で述べたデータブロックには、第２の数値の元の数値が属する値域の下限値と当該元の数値との差である第２の補助数値と当該元の数値と上記上限値との差である第３の補助数値とをさらに含むようにしてもよい。このような場合、上で述べた算出処理が、第１の秘匿化データ値に一致する第３の秘匿化データ値が存在する場合には、第１の秘匿化データ値についての第１の補助数値と、第１の秘匿化データ値に一致する第３の秘匿化データ値についての第２の補助数値又は第３の補助数値との差を算出する処理と、第１の補助数値と第２の補助数値又は第３の補助数値との差が、近似判定のための範囲内であれば、第１の類似度を１に設定し、第１の補助数値と第２の補助数値又は第３の補助数値との差が、近似判定のための範囲内でない場合には第１の類似度を０に設定する処理とを含むようにしても良い。 Further, the first concealment data value described above may be a concealment data value of a representative value for the original numerical value of the first numerical value. The search request described above may further include a first auxiliary numerical value that is a difference between the first numerical value and the representative value of the original numerical value, and range data for approximation determination. . The plurality of second numerical values are the lower limit value and upper limit value of the range to which the original numerical value belongs, and the data block described above includes the lower limit value of the range to which the original numerical value of the second numerical value belongs, A second auxiliary numerical value that is a difference from the original numerical value and a third auxiliary numerical value that is a difference between the original numerical value and the upper limit value may be further included. In such a case, if there is a third concealment data value that matches the first concealment data value, the calculation process described above is the first auxiliary for the first concealment data value. A process of calculating a difference between the numerical value and the second auxiliary value or the third auxiliary value for the third anonymized data value that matches the first anonymized data value; the first auxiliary value and the second If the difference between the auxiliary numerical value or the third auxiliary numerical value is within the range for approximation determination, the first similarity is set to 1, and the first auxiliary numerical value and the second auxiliary numerical value or third If the difference from the auxiliary numerical value is not within the range for approximation determination, the first similarity may be set to 0.

さらに、第１の数値の第１の秘匿化データ値が複数データブロックに含まれる場合には、上で述べた算出処理が、第１の秘匿化データ値に一致する第３の秘匿化データ値の個数に応じた類似度を第１の類似度に設定する処理を含むようにしても良い。このようにすれば、０又は１だけではない第１の類似度を設定できるようになる。 Furthermore, when the first concealment data value of the first numerical value is included in the plurality of data blocks, the calculation processing described above is performed by the third concealment data value that matches the first concealment data value. A process of setting the similarity according to the number of the first similarity as the first similarity may be included. In this way, the first similarity that is not only 0 or 1 can be set.

また、第１の秘匿化データ値が、第１の数値の元の数値についての代表値の秘匿化データ値であり、検索要求には、第１の数値の元の数値についての代表値との差である第１の補助数値と、近似判定のための範囲のデータとをさらに含むようにしても良い。さらに、上で述べた複数の第２の数値が、元の数値が属する値域の下限値及び上限値であり、上で述べたデータブロックには、第２の数値の元の数値が属する値域の下限値と当該元の数値との差である第２の補助数値と当該元の数値と上記上限値との差である第３の補助数値とをさらに含むようにしても良い。そして、上で述べた算出処理が、第１の秘匿化データ値に一致する第３の秘匿化データ値が存在する場合には、第１の秘匿化データ値についての第１の補助数値と、第１の秘匿化データ値に一致する第３の秘匿化データ値についての第２の補助数値又は第３の補助数値との差を算出する処理と、第１の補助数値と第２の補助数値又は第３の補助数値と、近似判定のための範囲を表す数値と、の差に応じた類似度を第１の類似度に設定する処理とを含むようにしても良い。このようにすれば０又は１だけではない第１の類似度が設定できるようになる。 Further, the first concealment data value is the concealment data value of the representative value for the original numerical value of the first numerical value, and the search request includes a representative value for the original numerical value of the first numerical value. You may make it further contain the 1st auxiliary | assistant numerical value which is a difference, and the data of the range for an approximation determination. Further, the plurality of second numerical values described above are the lower limit value and the upper limit value of the range to which the original numerical value belongs, and the data block described above includes the range of the range to which the original numerical value of the second numerical value belongs. A second auxiliary numerical value that is a difference between the lower limit value and the original numerical value, and a third auxiliary numerical value that is a difference between the original numerical value and the upper limit value may be further included. When the calculation process described above has a third concealed data value that matches the first concealed data value, the first auxiliary numerical value for the first concealed data value, A process of calculating a difference between the second auxiliary value or the third auxiliary value for the third anonymized data value that matches the first anonymized data value, and the first auxiliary value and the second auxiliary value Alternatively, a process of setting the similarity according to the difference between the third auxiliary numerical value and the numerical value representing the range for the approximate determination as the first similarity may be included. In this way, the first similarity that is not only 0 or 1 can be set.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭなどの光ディスク、光磁気ディスク、半導体メモリ（例えばＲＯＭ）、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。 It is possible to create a program for causing a computer to carry out the processing described above, such as a flexible disk, an optical disk such as a CD-ROM, a magneto-optical disk, and a semiconductor memory (for example, ROM). Or a computer-readable storage medium such as a hard disk or a storage device.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
データ格納部に格納されており且つ第１の数値を含むテキストデータから、第１の数値及び当該第１の数値の周辺に存在する複数個の特徴語を抽出する処理と、
抽出された前記第１の数値から、当該第１の数値と近似するか否かを判断する上で基準となる１又は複数の第２の数値を生成する生成処理と、
前記１又は複数の第２の数値と前記複数個の特徴語との各々について秘匿化処理を行って秘匿化データを生成し、前記データ格納部に格納する処理と、
を含む処理を、コンピュータが実行する情報処理方法。 (Appendix 1)
A process of extracting a first numerical value and a plurality of feature words existing around the first numerical value from text data stored in the data storage unit and including the first numerical value;
A generation process for generating one or more second numerical values serving as a reference in determining whether to approximate the first numerical value from the extracted first numerical value;
A process of generating concealment data by performing concealment processing for each of the one or more second numerical values and the plurality of feature words, and storing the data in the data storage unit;
Processing method in which a computer executes a process including:

（付記２）
前記生成処理が、
抽出された前記第１の数値を異なる有効桁数で表した複数の第２の数値を生成する処理
である付記１記載の情報処理方法。 (Appendix 2)
The generation process is
The information processing method according to supplementary note 1, which is a process of generating a plurality of second numerical values in which the extracted first numerical values are expressed by different effective digits.

（付記３）
前記生成処理が、
抽出された前記第１の数値を含む所定の数値範囲の上限値及び下限値である複数の第２の数値を特定する処理と、
前記第１の数値と前記下限値との差と、前記第１の数値と前記上限値との差とを算出し、前記データ格納部に格納する処理と、
を含む付記１記載の情報処理方法。 (Appendix 3)
The generation process is
A process of specifying a plurality of second numerical values that are an upper limit value and a lower limit value of a predetermined numerical value range including the extracted first numerical value;
A process of calculating a difference between the first numerical value and the lower limit value and a difference between the first numerical value and the upper limit value and storing the difference in the data storage unit;
The information processing method according to appendix 1, including:

（付記４）
前記生成処理が、
抽出された前記第１の数値と、数値を分類するための数値範囲の設定とから、前記第１の数値を代表する１又は複数の第２の数値を特定する処理と、
前記１又は複数の第２の数値と、前記第１の数値との差を算出し、前記データ格納部に格納する処理と、
を含む付記１記載の情報処理方法。 (Appendix 4)
The generation process is
A process of specifying one or a plurality of second numerical values representing the first numerical value from the extracted first numerical value and setting of a numerical value range for classifying the numerical value;
A process of calculating a difference between the one or more second numerical values and the first numerical value and storing the difference in the data storage unit;
The information processing method according to appendix 1, including:

（付記５）
前記生成処理が、
抽出された前記第１の数値を、指示された有効桁数で表した第２の数値を１つ生成する処理
である付記１記載の情報処理方法。 (Appendix 5)
The generation process is
The information processing method according to claim 1, which is a process of generating one second numerical value in which the extracted first numerical value is represented by a designated number of significant digits.

（付記６）
前記複数個の特徴語の同義語を抽出する処理と、
前記同義語の秘匿化を行って秘匿化データを生成し、前記データ格納部に格納する処理と、
を前記処理がさらに含む付記１乃至５のいずれか１つ記載の情報処理方法。 (Appendix 6)
Processing to extract synonyms of the plurality of feature words;
Processing to generate concealment data by concealing the synonym and store in the data storage unit;
The information processing method according to any one of appendices 1 to 5, wherein the processing further includes:

（付記７）
第１の数値の第１の秘匿化データ値と複数個の第１の特徴語の第２の秘匿化データ値とを含む１又は複数の検索データブロックを含む検索要求を受信する処理と、
複数の第２の数値の第３の秘匿化データ値と複数個の第２の特徴語の第４の秘匿化データ値とを含む１又は複数のデータブロックと識別情報とを含む案件データブロックを複数格納するデータ格納部に格納されている前記案件データブロックの各々について、前記第１の秘匿化データ値と前記第３の秘匿化データ値とから算出される、数値についての第１の類似度と、前記第２の秘匿化データ値と一致する前記第４の秘匿化データ値の個数とから、処理対象の案件データブロックに含まれる前記データブロックと前記検索データブロックとの各組み合わせについての第２の類似度の合計値である第３の類似度を算出する算出処理と、
前記第３の類似度が閾値を超えた案件データブロックの識別情報又は前記第３の類似度が上位所定数の案件データブロックの識別情報を、前記検索要求の送信元に送信する処理と、
を含む処理を、コンピュータが実行する情報処理方法。 (Appendix 7)
A process of receiving a search request including one or more search data blocks including a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words;
A case data block including one or a plurality of data blocks including a plurality of second concealed data values of a second numerical value and a fourth concealed data value of a plurality of second feature words and identification information; 1st similarity about the numerical value calculated from said 1st concealment data value and said 3rd concealment data value about each of said case data block stored in the data storage part to store two or more And the number of the fourth anonymized data values that coincide with the second anonymized data value, the number of each combination of the data block and the search data block included in the case data block to be processed A calculation process for calculating a third similarity that is a total value of two similarities;
A process of transmitting identification information of a case data block in which the third similarity exceeds a threshold or identification information of a case data block having a third highest degree of similarity to the transmission source of the search request;
Processing method in which a computer executes a process including:

（付記８）
前記算出処理が、
前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値が存在する場合には前記第１の類似度を１に設定し、前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値が存在しない場合には前記第１の類似度を０に設定する処理
を含む付記７記載の情報処理方法。 (Appendix 8)
The calculation process
If the third concealed data value that matches the first concealed data value exists, the first similarity is set to 1, and the first concealed data value matches the first concealed data value. The information processing method according to appendix 7, including a process of setting the first similarity to 0 when there is no 3 concealed data value.

（付記９）
前記第１の秘匿化データ値が、前記第１の数値の元の数値についての代表値の秘匿化データ値であり、
前記検索要求には、前記第１の数値の元の数値についての代表値との差である第１の補助数値と、近似判定のための範囲のデータとをさらに含み、
前記複数の第２の数値が、元の数値が属する値域の下限値及び上限値であり、
前記データブロックには、前記第２の数値の元の数値が属する値域の下限値と当該元の数値との差である第２の補助数値と当該元の数値と前記上限値との差である第３の補助数値とをさらに含み、
前記算出処理が、
前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値が存在する場合には、前記第１の秘匿化データ値についての前記第１の補助数値と、前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値についての前記第２の補助数値又は前記第３の補助数値との差を算出する処理と、
前記第１の補助数値と前記第２の補助数値又は前記第３の補助数値との差が、前記近似判定のための範囲内であれば、前記第１の類似度を１に設定し、前記第１の補助数値と前記第２の補助数値又は前記第３の補助数値との差が、前記近似判定のための範囲内でない場合には前記第１の類似度を０に設定する処理と、
を含む付記７記載の情報処理方法。 (Appendix 9)
The first concealment data value is a concealment concealment data value of a representative value for the original numerical value of the first numerical value;
The search request further includes a first auxiliary numerical value that is a difference from the representative value of the original numerical value of the first numerical value, and range data for approximation determination,
The plurality of second numerical values are a lower limit value and an upper limit value of a range to which the original numerical value belongs,
The data block is a difference between a second auxiliary value that is a difference between the lower limit value of the range to which the original value of the second value belongs and the original value, and the original value and the upper limit value. A third auxiliary numerical value,
The calculation process
If there is a third concealed data value that matches the first concealed data value, the first auxiliary value for the first concealed data value and the first concealment A process of calculating a difference between the second auxiliary value or the third auxiliary value for the third concealed data value that matches a data value;
If the difference between the first auxiliary value and the second auxiliary value or the third auxiliary value is within the range for the approximation determination, the first similarity is set to 1, A process of setting the first similarity to 0 when the difference between the first auxiliary numerical value and the second auxiliary numerical value or the third auxiliary numerical value is not within the range for the approximation determination;
The information processing method according to appendix 7, including:

（付記１０）
前記第１の数値の第１の秘匿化データ値が複数前記データブロックに含まれ、
前記算出処理が、
前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値の個数に応じた類似度を前記第１の類似度に設定する処理
を含む付記７記載の情報処理方法。 (Appendix 10)
A plurality of first concealment data values of the first numerical value are included in the data block;
The calculation process
The information processing method according to appendix 7, including a process of setting a similarity according to the number of the third anonymized data values matching the first anonymized data value as the first similarity.

（付記１１）
前記第１の秘匿化データ値が、前記第１の数値の元の数値についての代表値の秘匿化データ値であり、
前記検索要求には、前記第１の数値の元の数値についての代表値との差である第１の補助数値と、近似判定のための範囲のデータとをさらに含み、
前記複数の第２の数値が、元の数値が属する値域の下限値及び上限値であり、
前記データブロックには、前記第２の数値の元の数値が属する値域の下限値と当該元の数値との差である第２の補助数値と当該元の数値と前記上限値との差である第３の補助数値とをさらに含み、
前記算出処理が、
前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値が存在する場合には、前記第１の秘匿化データ値についての前記第１の補助数値と、前記第１の秘匿化データ値に一致する前記第３の秘匿化データ値についての前記第２の補助数値又は前記第３の補助数値との差を算出する処理と、
前記第１の補助数値と前記第２の補助数値又は前記第３の補助数値と、前記近似判定のための範囲を表す数値と、の差に応じた類似度を前記第１の類似度に設定する処理と、
を含む付記７記載の情報処理方法。 (Appendix 11)
The first concealment data value is a concealment concealment data value of a representative value for the original numerical value of the first numerical value;
The search request further includes a first auxiliary numerical value that is a difference from the representative value of the original numerical value of the first numerical value, and range data for approximation determination,
The plurality of second numerical values are a lower limit value and an upper limit value of a range to which the original numerical value belongs,
The data block is a difference between a second auxiliary value that is a difference between the lower limit value of the range to which the original value of the second value belongs and the original value, and the original value and the upper limit value. A third auxiliary numerical value,
The calculation process
If there is a third concealed data value that matches the first concealed data value, the first auxiliary value for the first concealed data value and the first concealment A process of calculating a difference between the second auxiliary value or the third auxiliary value for the third concealed data value that matches a data value;
Similarity according to the difference between the first auxiliary numerical value, the second auxiliary numerical value or the third auxiliary numerical value, and the numerical value representing the range for the approximation determination is set as the first similarity. Processing to
The information processing method according to appendix 7, including:

（付記１２）
データ格納部と、
前記データ格納部に格納されており且つ第１の数値を含むテキストデータから、第１の数値及び当該第１の数値の周辺に存在する複数個の特徴語を抽出し、抽出された前記第１の数値から、当該第１の数値と近似するか否かを判断する上で基準となる１又は複数の第２の数値を生成し、前記１又は複数の第２の数値と前記複数個の特徴語との各々について秘匿化処理を行って秘匿化データを生成し、第２のデータ格納部に格納する生成部と、
を有する情報処理装置。 (Appendix 12)
A data storage unit;
A first numerical value and a plurality of feature words existing around the first numerical value are extracted from text data stored in the data storage unit and including the first numerical value, and the extracted first 1 or a plurality of second numerical values serving as a reference in determining whether to approximate the first numerical value or not from the numerical values of the first numerical value, the one or more second numerical values and the plurality of characteristics Generating a concealment data by performing concealment processing for each word, and storing in a second data storage unit;
An information processing apparatus.

（付記１３）
第１の数値の第１の秘匿化データ値と複数個の第１の特徴語の第２の秘匿化データ値とを含む１又は複数の検索データブロックを含む検索要求を受信する受信部と、
複数の第２の数値の第３の秘匿化データ値と複数個の第２の特徴語の第４の秘匿化データ値とを含む１又は複数のデータブロックと識別情報とを含む案件データブロックを複数格納するデータ格納部に格納されている前記案件データブロックの各々について、前記第１の秘匿化データ値と前記第３の秘匿化データ値とから算出される、数値についての第１の類似度と、前記第２の秘匿化データ値と一致する前記第４の秘匿化データ値の個数とから、処理対象の案件データブロックに含まれる前記データブロックと前記検索データブロックとの各組み合わせについての第２の類似度の合計値である第３の類似度を算出する検索処理部と、
前記第３の類似度が閾値を超えた案件データブロックの識別情報又は前記第３の類似度が上位所定数の案件データブロックの識別情報を、前記検索要求の送信元に送信する送信部と、
を有する情報処理装置。 (Appendix 13)
A receiving unit for receiving a search request including one or a plurality of search data blocks including a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words;
A case data block including one or a plurality of data blocks including a plurality of second concealed data values of a second numerical value and a fourth concealed data value of a plurality of second feature words and identification information; 1st similarity about the numerical value calculated from said 1st concealment data value and said 3rd concealment data value about each of said case data block stored in the data storage part to store two or more And the number of the fourth anonymized data values that coincide with the second anonymized data value, the number of each combination of the data block and the search data block included in the case data block to be processed A search processing unit that calculates a third similarity that is a total value of two similarities;
A transmission unit that transmits identification information of a case data block in which the third similarity exceeds a threshold or identification information of a case data block having a third predetermined higher degree of similarity to the transmission source of the search request;
An information processing apparatus.

３登録装置
３１入力部
３２機密データ格納部
３３ＦＰ生成部
３４ＦＰルールデータ取得部
３５ＦＰルールデータ格納部
３６ＰＦデータ格納部
３７送信部
５管理装置
５１ＦＰルールデータ格納部
５２ＦＰルールデータ配布部
５３ＰＦ登録部
５４ＤＢ
５５検索処理部
５６検索要求受信部
５７検索結果送信部
７検索装置
７１入力部
７２機密データ格納部
７３ＦＰ生成部
７４ＦＰルールデータ取得部
７５ＦＰルールデータ格納部
７６検索条件データ格納部
７７ＦＰデータ格納部
７８検索要求部
７９出力部 3 Registration Device 31 Input Unit 32 Confidential Data Storage Unit 33 FP Generation Unit 34 FP Rule Data Acquisition Unit 35 FP Rule Data Storage Unit 36 PF Data Storage Unit 37 Transmission Unit 5 Management Device 51 FP Rule Data Storage Unit 52 FP Rule Data Distribution Unit 53 PF Registration Unit 54 DB
55 Search processing unit 56 Search request reception unit 57 Search result transmission unit 7 Search device 71 Input unit 72 Confidential data storage unit 73 FP generation unit 74 FP rule data acquisition unit 75 FP rule data storage unit 76 Search condition data storage unit 77 FP data Storage unit 78 Search request unit 79 Output unit

Claims

A process of extracting a first numerical value and a plurality of feature words existing around the first numerical value from text data stored in the data storage unit and including the first numerical value;
A generation process for generating one or more second numerical values serving as a reference in determining whether to approximate the first numerical value from the extracted first numerical value;
A process of generating concealment data by performing concealment processing for each of the one or more second numerical values and the plurality of feature words, and storing the data in the data storage unit;
Processing method in which a computer executes a process including:

The generation process is
The information processing method according to claim 1, further comprising: generating a plurality of second numerical values representing the extracted first numerical values with different effective digits.

The generation process is
A process of specifying a plurality of second numerical values that are an upper limit value and a lower limit value of a predetermined numerical range including the extracted first numerical value;
A process of calculating a difference between the first numerical value and the lower limit value and a difference between the first numerical value and the upper limit value and storing the difference in the data storage unit;
The information processing method according to claim 1 including:

The generation process is
A process of specifying one or a plurality of second numerical values representing the first numerical value from the extracted first numerical value and setting of a numerical value range for classifying the numerical value;
A process of calculating a difference between the one or more second numerical values and the first numerical value and storing the difference in the data storage unit;
The information processing method according to claim 1 including:

The generation process is
The information processing method according to claim 1, wherein the first numerical value is a process of generating one second numerical value representing the indicated effective number of digits.

Processing to extract synonyms of the plurality of feature words;
Processing to generate concealment data by concealing the synonym and store in the data storage unit;
The information processing method according to any one of claims 1 to 5, wherein the processing further includes:

A process of receiving a search request including one or more search data blocks including a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words;
A case data block including one or a plurality of data blocks including a plurality of second concealed data values of a second numerical value and a fourth concealed data value of a plurality of second feature words and identification information; 1st similarity about the numerical value calculated from said 1st concealment data value and said 3rd concealment data value about each of said case data block stored in the data storage part to store two or more And the number of the fourth anonymized data values that coincide with the second anonymized data value, the number of each combination of the data block and the search data block included in the case data block to be processed A calculation process for calculating a third similarity that is a total value of two similarities;
A process of transmitting identification information of a case data block in which the third similarity exceeds a threshold or identification information of a case data block having a third highest degree of similarity to the transmission source of the search request;
Processing method in which a computer executes a process including:

The calculation process
If the third concealed data value that matches the first concealed data value exists, the first similarity is set to 1, and the first concealed data value matches the first concealed data value. The information processing method according to claim 7, further comprising: a process of setting the first similarity to 0 when there is no 3 concealed data value.

The first concealment data value is a concealment concealment data value of a representative value for the original numerical value of the first numerical value;
The search request further includes a first auxiliary numerical value that is a difference from the representative value of the original numerical value of the first numerical value, and range data for approximation determination,
The plurality of second numerical values are a lower limit value and an upper limit value of a range to which the original numerical value belongs,
The data block is a difference between a second auxiliary value that is a difference between the lower limit value of the range to which the original value of the second value belongs and the original value, and the original value and the upper limit value. A third auxiliary numerical value,
The calculation process
If there is a third concealed data value that matches the first concealed data value, the first auxiliary value for the first concealed data value and the first concealment A process of calculating a difference between the second auxiliary value or the third auxiliary value for the third concealed data value that matches a data value;
If the difference between the first auxiliary value and the second auxiliary value or the third auxiliary value is within the range for the approximation determination, the first similarity is set to 1, A process of setting the first similarity to 0 when the difference between the first auxiliary numerical value and the second auxiliary numerical value or the third auxiliary numerical value is not within the range for the approximation determination;
The information processing method according to claim 7.

A plurality of first concealment data values of the first numerical value are included in the data block;
The calculation process
The information processing method according to claim 7, further comprising: setting a similarity according to the number of the third concealed data values that matches the first concealed data value as the first similarity.

The first concealment data value is a concealment concealment data value of a representative value for the original numerical value of the first numerical value;
The search request further includes a first auxiliary numerical value that is a difference from the representative value of the original numerical value of the first numerical value, and range data for approximation determination,
The plurality of second numerical values are a lower limit value and an upper limit value of a range to which the original numerical value belongs,
The data block is a difference between a second auxiliary value that is a difference between the lower limit value of the range to which the original value of the second value belongs and the original value, and the original value and the upper limit value. A third auxiliary numerical value,
The calculation process
If there is a third concealed data value that matches the first concealed data value, the first auxiliary value for the first concealed data value and the first concealment A process of calculating a difference between the second auxiliary value or the third auxiliary value for the third concealed data value that matches a data value;
Similarity according to the difference between the first auxiliary numerical value, the second auxiliary numerical value or the third auxiliary numerical value, and the numerical value representing the range for the approximation determination is set as the first similarity. Processing to
The information processing method according to claim 7.

A data storage unit;
A first numerical value and a plurality of feature words existing around the first numerical value are extracted from text data stored in the data storage unit and including the first numerical value, and the extracted first 1 or a plurality of second numerical values serving as a reference in determining whether to approximate the first numerical value or not from the numerical values of the first numerical value, the one or more second numerical values and the plurality of characteristics Generating a concealment data by performing concealment processing for each word, and storing in a second data storage unit;
An information processing apparatus.

A receiving unit for receiving a search request including one or a plurality of search data blocks including a first concealment data value of a first numerical value and a second concealment data value of a plurality of first feature words;
A case data block including one or a plurality of data blocks including a plurality of second concealed data values of a second numerical value and a fourth concealed data value of a plurality of second feature words and identification information; 1st similarity about the numerical value calculated from said 1st concealment data value and said 3rd concealment data value about each of said case data block stored in the data storage part to store two or more And the number of the fourth anonymized data values that coincide with the second anonymized data value, the number of each combination of the data block and the search data block included in the case data block to be processed A search processing unit that calculates a third similarity that is a total value of two similarities;
A transmission unit that transmits identification information of a case data block in which the third similarity exceeds a threshold or identification information of a case data block having a third predetermined higher degree of similarity to the transmission source of the search request;
An information processing apparatus.