JP2005215717A

JP2005215717A - Document processor with security function

Info

Publication number: JP2005215717A
Application number: JP2004017780A
Authority: JP
Inventors: Naoto Akira; 直人秋良; Hiroyuki Kumai; 裕之隈井; Yasutsugu Morimoto; 康嗣森本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-01-27
Filing date: 2004-01-27
Publication date: 2005-08-11
Anticipated expiration: 2024-01-27
Also published as: JP4281561B2

Abstract

<P>PROBLEM TO BE SOLVED: To further effectively utilize document data without impairing information property of a document with enhanced security by masking a word included in the document data, the disclosure of which is to be restricted. <P>SOLUTION: The access level of a user is acquired by a user authentication means, and the disclosure-restricted word included in the document to be disclosed according to the access level is specified by collation of words included in the document with disclosure-restricted word data. A masking character string composed of a character string showing the kind of the disclosure-restricted word and an additive character string for discriminating a different word is generated, and the document is disclosed with the disclosure-restricted word being substituted with the masking character string. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、公開する文書や単語情報（文書から抽出した単語の情報など）に含まれる公開を制限する単語のマスキング（指定した単語を他の文字列に置換する）方法および装置に関する。 The present invention relates to a method and an apparatus for masking a word (replace a designated word with another character string) for restricting disclosure contained in a document to be disclosed and word information (such as word information extracted from the document).

企業には、顧客からの問合せ履歴や社内文書など多くの文書データが蓄積されており、これらの文書データを有効に活用したいというニーズが高まっている。文書検索システムやテキストマイニングシステムなどの文書処理装置は、これらの文書データを有効に活用するための装置として挙げられる。しかし、固有名詞として挙げられる個人名や企業名といった、一部のユーザには公開を制限したい単語（以下、公開制限単語と呼ぶ）が文書データに含まれているため、文書データや、文書データから抽出した単語情報にアクセス可能なユーザは、管理者などの一部のユーザに限られていた。 Companies are accumulating a lot of document data such as inquiry histories from customers and in-house documents, and there is an increasing need to make effective use of these document data. Document processing apparatuses such as a document search system and a text mining system are examples of apparatuses for effectively utilizing these document data. However, document data and document data, such as personal names and company names that are listed as proper nouns, are included in the document data because some users want to restrict their disclosure (hereinafter referred to as “public restriction words”). Users who can access the word information extracted from are limited to some users such as administrators.

しかし、問題がない範囲で文書データを公開することは、文書データを有効に活用する上で非常に重要なことであり、セキュアーに文書データを閲覧できる仕組みが望まれていた。そこで、文書公開方法として、公開制限単語を他の文字列などに置換して公開する技術が登場した。例えば、特開平１１−１４３８７１号公報（特許文献１）では、ユーザのプロファイルに応じて公開を制限する部分を特定し、その部分を黒塗りにして公開するという方法を用いている。また、特開２００２−３１２３６２号公報（特許文献２）では、固有名詞をイニシャル文字列などに置換して公開するという方法を用いている。 However, it is very important to make document data available to the extent that there is no problem, and it is very important to effectively use the document data, and a mechanism that can securely browse the document data has been desired. Therefore, as a document publishing method, a technique for publishing by replacing the restricted word with another character string has appeared. For example, Japanese Patent Application Laid-Open No. 11-143871 (Patent Document 1) uses a method of specifying a part whose publication is restricted according to a user profile and painting the part black. Japanese Laid-Open Patent Publication No. 2002-312362 (Patent Document 2) uses a method in which proper nouns are replaced with initial character strings and published.

特開平１１−１４３８７１号公報Japanese Patent Laid-Open No. 11-143871

特開２００２−３１２３６２号公報JP 2002-312362 A

上記従来の技術は、公開制限単語を黒塗りでマスキング（指定部分を黒く塗りつぶす）した場合にはマスキング部分の単語の種別が分からず内容を把握しにくいという問題があった。例えば、「パソコンが故障したので斉藤さんをお願いしたい」という文に含まれる“斉藤さん”をマスキングすると、「パソコンが故障したので■をお願いしたい」となり、 “■”に相当する部分が、企業名、個人名、製品名、あるいは他の種別の単語なのか区別できず、内容把握が困難である。また、種別が分かる文字列やイニシャル文字列を用いたマスキングでは、マスキング文字列が示す部分の種別が確認できるため内容の把握が容易であるが、その反面、一部の文書でマスキング文字列が示す単語が特定されてしまうと他の文書に含まれる同じマスキング文字列が示す単語も特定されてしまうために問題がある。例えば、「佐藤さんが入院した」「佐藤さんが不親切だ」という文があった場合に、「［社員Ｓ］が入院した」「［社員Ｓ］が不親切だ」とマスキングすると、佐藤さんが入院していることを知っている人が文書を見ると“社員Ｓ”が“佐藤さん”であることが分かり、「［社員Ｓ］が不親切だ」の“社員Ｓ”は“佐藤さん”を示すと特定されてしまう。単に“［社員］”とマスキングすることも考えられるが、同じ文書に同じ種別のマスキング対象が複数ある場合には、各々のマスキング単語が同じ人を示しているのか異なる人を示しているのかが分からず内容を把握しにくいという問題がある。 The above conventional technique has a problem that when the restricted word is masked with black (the designated portion is blacked out), the type of the word in the masking portion is not known and the content is difficult to grasp. For example, if you mask “Mr. Saito” in the sentence “I want to ask Mr. Saito because my computer has broken down,” “I want to ask you ■ because my computer has failed.” The part corresponding to “■” Names, personal names, product names, or other types of words cannot be distinguished, making it difficult to grasp the contents. In addition, in masking using a character string or initial character string whose type is known, it is easy to grasp the content because the type of the part indicated by the masking character string can be confirmed, but on the other hand, the masking character string in some documents If the indicated word is specified, there is a problem because the word indicated by the same masking character string included in another document is also specified. For example, if there are sentences such as “Mr. Sato was admitted” or “Mr. Sato was unkind”, masking that “[Employee S] was admitted” or “[Employee S] was unkind” A person who knows that he is hospitalized sees “Employee S” as “Mr. Sato” when viewing the document, and “Employee S” of “[Employee S] is unkind” is “Mr. Sato” "" Is specified. It may be possible to simply mask “[employee]”, but if there are multiple masking targets of the same type in the same document, whether each masking word indicates the same person or a different person There is a problem that it is difficult to grasp the contents without understanding.

本発明の目的は、種別が分かるマスキング文字列を提供し、かつ一部の文書でマスキング文字列が示す単語が特定されても他の文書のマスキング文字列が示す単語は特定されず、かつ同じ文書に含まれる異なる単語は区別可能なマスキングをすることである。
本発明の他の目的は、文書から抽出した文書情報をセキュアーに公開することであり、公開制限単語を含む文書情報から文書を検索する仕組みを提供することである。 An object of the present invention is to provide a masking character string whose type is known, and even if a word indicated by the masking character string is specified in some documents, the word indicated by the masking character string of another document is not specified and is the same Different words contained in a document are masked so that they can be distinguished.
Another object of the present invention is to securely disclose document information extracted from a document, and to provide a mechanism for retrieving a document from document information including a disclosure restriction word.

上記目的を達成するための、本願で開示する発明の概要は以下の通りである。
本発明のセキュリティ機能付き文書処理装置は、文書および文書から抽出した単語情報（例えば、係り受けの関係にある単語ペアなど）に含まれる公開制限単語を特定し、その単語を種別と付加文字列から成るマスキング文字列に置換して公開する。ここで付加文字列はランダムに生成、あるいは決められた規則に基づいて生成する。文書の情報性を保つために、同じ文書内の異なる単語に対するマスキング文字列は互いに異なるという特徴および、同じ文書内の同じ単語に対するマスキング文字列は同じであるという特徴を設ける。また、情報のセキュリティのために、同じ単語のマスキング文字列は、すべての文書で共通であってはならないという特徴を設ける。また、単語情報指定の文書検索では、検索条件である単語情報に公開制限単語が含まれている場合は、公開制限単語以外の単語情報を構成する単語で検索、あるいは同じ種別の単語に拡張して検索するかどうかの画面を表示し、検索実行の要否を指定できる仕組みを提供する。 In order to achieve the above object, an outline of the invention disclosed in the present application is as follows.
The document processing apparatus with a security function according to the present invention specifies a public restriction word included in a document and word information extracted from the document (for example, a word pair having a dependency relationship), and identifies the word as a type and an additional character string. Replace with a masking string consisting of and publish. Here, the additional character string is generated randomly or based on a predetermined rule. In order to maintain the information property of documents, there are provided a feature that masking character strings for different words in the same document are different from each other and a masking character string for the same words in the same document are the same. For information security, a masking character string of the same word must be common to all documents. In addition, in a document search with word information specification, if the word information that is a search condition includes an open restriction word, the word information constituting the word information other than the open restriction word is searched or expanded to the same type of word. A screen that displays whether or not to search, and provides a mechanism that allows you to specify whether or not to perform the search.

本発明の文書処理装置は、同じ公開制限単語であっても異なる文書であれば異なる文字列でマスキングするため、一部の文書でマスキング文字列が示す単語が特定されても、他の文書のマスキング文字列が示す単語が特定されないという効果がある。また、同じ文書に含まれる異なる単語のマスキング文字列は異なるため、文書の情報性を損なうことがなく、内容を誤って把握することを防止できるという効果がある。
その他の効果として、単語情報に含まれる公開制限単語をマスキングしても、セキュアーかつ容易に、単語情報指定の文書検索が実現できる。 Since the document processing apparatus of the present invention masks a different character string if it is a different document even if it is the same open restriction word, even if the word indicated by the masking character string is specified in some documents, There is an effect that the word indicated by the masking character string is not specified. Further, since the masking character strings of different words included in the same document are different, there is an effect that it is possible to prevent the contents from being misunderstood without impairing the information property of the document.
As another effect, even if the restricted word contained in the word information is masked, a document search with word information designation can be realized securely and easily.

情報損失を最小限に抑え、かつ公開制限単語を含む文書をセキュアーに公開するという目的を、マスキング文字列の生成方法の改良により実現した。 The purpose of minimizing information loss and securely publishing documents containing restricted words is realized by improving the masking string generation method.

以下、本発明の第１の実施例を、図を用いて説明する。
図１は、本実施例の文書公開装置の構成図である。本装置は、中央処理装置ＣＰＵ１０１と、主メモリ１０２と、表示装置１０３と、入力装置１０４と、記憶装置１１０と、で構成される。記憶装置１１０には、ＯＳ（オペレーティングシステム）１１１と、文書データ１１２と、公開制限単語データ１１３と、ユーザ情報データ１１４と、ユーザ認証プログラム１１５と、文書検索プログラム１１６と、公開文書生成プログラム１１７と、文書表示プログラム１１８と、が格納されている。 A first embodiment of the present invention will be described below with reference to the drawings.
FIG. 1 is a configuration diagram of a document disclosure apparatus according to the present embodiment. This apparatus includes a central processing unit CPU 101, a main memory 102, a display device 103, an input device 104, and a storage device 110. The storage device 110 includes an OS (operating system) 111, document data 112, public restriction word data 113, user information data 114, user authentication program 115, document search program 116, and public document generation program 117. A document display program 118 is stored.

公開制限単語データ１１３には、公開制限を行う単語が登録される。公開制限単語のほか、各単語に対応して、公開制限を行うユーザのアクセスレベル、単語の種別などが併せて登録される。ユーザ情報データ１１４には、ユーザごとの認証情報やアクセスレベルが登録される。ユーザ認証プログラム１１５は、入力装置１０４からユーザＩＤおよびパスワードの入力を受け、該入力情報とユーザ情報データ１１４との照合によってユーザを特定し、ユーザのアクセスレベルを取得する。また、文書検索プログラム１１６は、入力装置１０４から単語の入力を受け、該単語が含まれる文書を、文書データ１１２から抽出する。また、公開文書生成プログラム１１７は、ユーザのアクセスレベルと公開制限単語データ１１３を用いて、公開する文書（以下、公開文書と呼ぶ）に含まれる公開制限単語を特定し、該公開制限単語をマスキングする文字列を生成し、該公開制限単語を該生成した文字列に置換した公開文書を生成する。また、文書表示プログラム１１８は、公開文書を表示装置１０３に表示させる。尚、上記プログラムは、主メモリ１０２に読み込まれ、ＣＰＵ１０１が制御することにより実行される。 In the public restriction word data 113, a word for performing public restriction is registered. In addition to the restricted word, the access level of the user who restricts the disclosure, the type of word, and the like are also registered corresponding to each word. In the user information data 114, authentication information and access level for each user are registered. The user authentication program 115 receives the user ID and password input from the input device 104, specifies the user by collating the input information with the user information data 114, and acquires the access level of the user. Further, the document search program 116 receives an input of a word from the input device 104 and extracts a document including the word from the document data 112. Further, the public document generation program 117 specifies a public restriction word included in a document to be published (hereinafter referred to as a public document) using the access level of the user and the public restriction word data 113, and masks the public restriction word. A public document is generated by replacing the public restriction word with the generated character string. In addition, the document display program 118 causes the display device 103 to display a public document. The above program is read into the main memory 102 and executed under the control of the CPU 101.

次に、本実施例の処理の流れを、図２のフローチャートを用いて説明する。まず、ユーザ認証プログラム１１５で、ユーザＩＤとパスワード、ＩＣカード、生体認証などのユーザ認証手段を用いて、文書を閲覧しようとするユーザを特定し（Ｓ２０１）、ユーザのアクセスレベルを取得する。例えば、図３に示すテーブル（ユーザ情報データ１１４）でユーザ情報を管理するとすれば、ユーザＩＤ“１００２”とパスワード“ｇｆｄｄｆ”が入力された場合には、ユーザＩＤ“１００２”のユーザであると特定され、アクセスレベル２が取得される。尚、ユーザのアクセスレベルを取得できれば、これ以外の方式を用いても構わない。 Next, the processing flow of the present embodiment will be described with reference to the flowchart of FIG. First, the user authentication program 115 uses a user authentication means such as a user ID and a password, an IC card, and biometric authentication to identify a user who wants to view a document (S201), and obtains the access level of the user. For example, if user information is managed using the table (user information data 114) shown in FIG. 3, when the user ID “1002” and the password “gfddf” are input, the user is the user ID “1002”. The access level 2 is acquired. Other methods may be used as long as the user access level can be acquired.

次に、文書検索プログラム１１６において、文書に含まれる単語などの検索条件を指定し、検索条件に一致する文書を抽出し、公開文書を特定する（Ｓ２０２）。例えば、図４に示すテーブル（公開制限単語データ１１３）で文書を管理し、“ハードディスク”を検索条件とすると、“ハードディスク”が含まれる文書ＩＤ“１０００２”の文書が公開文書となる。尚、公開文書の特定方法は、文書ＩＤの指定、文書データに付随する登録日時やカテゴリの指定、全文を指定など、公開文書が特定できるものであれば方式を問わない。また、公開文書は複数あっても構わない。次に、メモリやハードディスクなどの記憶装置から公開文書を読み出し（Ｓ２０３）、該公開文書に含まれる単語と、公開制限単語データを照合し、公開文書に含まれる公開制限単語を特定する（Ｓ２０５）。ここで、ユーザのアクセスレベル値が、公開制限単語に設定されているアクセスレベル値より小さい場合は公開制限単語とはしない。例えば、図５に示すテーブルで公開制限単語を管理し、公開文書を「山田さんが不親切だった」とすると、アクセスレベル２のユーザであった場合には、公開制限単語データに含まれる“山田さん”が、公開制限単語として特定される。同じ公開文書を、アクセスレベル１のユーザのために表示させる場合には、公開制限単語データに含まれる“山田さん”は、公開制限単語として特定されず、マスキングされずにそのまま表示される。尚、公開制限単語は同じ文中に複数あっても構わない。公開制限単語が同じ文中に複数ある場合は、それぞれにたいして公開制限を行う。 Next, the document search program 116 specifies a search condition such as a word included in the document, extracts a document that matches the search condition, and specifies a public document (S202). For example, if a document is managed by the table shown in FIG. 4 (public restriction word data 113) and “hard disk” is used as a search condition, the document with the document ID “10002” including “hard disk” becomes a public document. The public document can be identified by any method as long as the public document can be identified, such as designation of a document ID, designation of registration date and category attached to the document data, and designation of the full text. There may be a plurality of public documents. Next, the public document is read from a storage device such as a memory or a hard disk (S203), the words included in the public document are compared with the public restriction word data, and the public restriction word contained in the public document is specified (S205). . Here, when the user's access level value is smaller than the access level value set in the public restriction word, it is not set as the public restriction word. For example, if the public restriction word is managed in the table shown in FIG. 5 and the public document is “Mr. Yamada was unfriendly”, if the user is an access level 2, the public restriction word data includes “ “Yamada-san” is identified as a restricted word. When the same public document is displayed for the access level 1 user, “Mr. Yamada” included in the public restriction word data is not specified as a public restriction word and is displayed as it is without masking. Note that there may be a plurality of public restriction words in the same sentence. If there are multiple release restriction words in the same sentence, release restriction is performed for each.

次に、公開文書作成プログラム１１７で、公開文書に含まれる公開制限単語の種別を、図５に示すテーブルから取得し、該種別と、付加文字列で構成されるマスキング文字列を生成する（Ｓ２０６）。付加文字列は、ランダムに生成し、同じ公開制限単語であっても文書が異なる場合には異なる付加文字列となるように生成する。したがって、同じ公開制限単語であっても、文書が異なる場合には異なる付加文字列が生成される。例えば、 “山田さん”という公開制限単語の種別が“社員”であり、小数値を持つ乱数Ｒ（０≦Ｒ＜１）を用いて生成した文字列が、
ｉｎｔ（Ｒ×２６）＋１＝２ … Ｂ（１はＡ、２はＢ、３はＣ…、２６はＺを示す）
である場合、種別“社員”と付加文字列“Ｂ”を組み合わせた“社員Ｂ”が、生成されるマスキング文字列となる。但し、生成したマスキング文字列が、すでに同じ文書に含まれる別の公開制限単語のマスキング文字列と同一の場合は、再度マスキング文字列を生成し、同じ文書内の異なる公開制限単語は、異なるマスキング文字列を生成する。したがって、マスキング文字列に置換しても、同じ文書内で異なる公開制限単語のマスキング文字列は区別できるという特徴がある。また、文書ごとにマスキング文字列を別途作成するようにすれば、複数の文書で同じマスキングが行われた箇所を比較すればマスキング前の単語が判明してしまうということを防ぐことができ、文書情報のセキュリティを向上させることができる。 Next, the public document creation program 117 acquires the type of the restricted word contained in the public document from the table shown in FIG. 5, and generates a masking character string composed of the type and the additional character string (S206). ). The additional character strings are randomly generated, and are generated so as to be different additional character strings when the documents are different even if they are the same open restriction word. Therefore, even if the same open restriction word is used, different additional character strings are generated if the documents are different. For example, a character string generated using a random number R (0 ≦ R <1) having a decimal value of “Employee” as the type of the restricted word “Mr. Yamada” is
int (R × 26) + 1 = 2 B (1 is A, 2 is B, 3 is C, and 26 is Z)
In this case, “Employee B”, which is a combination of the type “Employee” and the additional character string “B”, is the masking character string to be generated. However, if the generated masking character string is the same as the masking character string of another public restriction word already included in the same document, a masking character string is generated again, and different public restriction words in the same document are differently masked. Generate a string. Therefore, even if the masking character string is replaced, the masking character string of different open restriction words in the same document can be distinguished. In addition, if a masking character string is created separately for each document, it is possible to prevent the word before masking from being identified if the same masked part is compared in multiple documents. Information security can be improved.

また、付加文字列は、“００１”、“ＡＢ”、“○”というように、文字列が区別できるものであれば、どのような文字列を用いても構わない。また、“山田さん”を文字の色が赤の“社員”、“佐藤さん”を文字の色が青の“社員”というように表示されるというように、文字列の代わりに色情報を用いても構わない。また、付加文字列は、ランダムに生成しなくとも、ＡからＺまでのアルファベットを順に用いるなど、事前に決められた文字列集合から決められた順に文字列を抽出して利用する方法を用いても構わない。また、同じ文書内に同じ種別の公開制限単語を複数有する場合のみ種別を示す文字列と付加文字列をマスキング文字列とし、それ以外の場合には種別のみをマスキング文字列としても構わない。 As the additional character string, any character string may be used as long as the character string can be distinguished, such as “001”, “AB”, and “◯”. In addition, color information is used instead of character strings, such as “Yamada-san” is displayed as “employee” with red text color, “Sato-san” is displayed as “employee” with blue text color. It doesn't matter. In addition, the additional character string is not generated randomly, but the alphabet from A to Z is used in order, for example, by using a method of extracting and using character strings in a predetermined order from a predetermined character string set. It doesn't matter. In addition, the character string indicating the type and the additional character string may be used as a masking character string only when the same document has a plurality of open restriction words of the same type, and only the type may be used as the masking character string in other cases.

次に、文書表示プログラム１１８で、公開文書に含まれる公開制限単語を、生成したマスキング文字列に置換して公開文書を生成し（Ｓ２０７）、パソコンのディスプレイなどの表示装置に該公開文書を表示する（Ｓ２０８）。表示の際に、マスキング文字列の表示色を反転させる、四角で囲むなどの強調表示をし、これらの箇所がマスキング処理が行われた箇所であることを視認できるようにすれば、利便性が向上する。尚、すべての単語の閲覧権限を持つユーザは、上記Ｓ２０５〜Ｓ２０７のステップをスキップして文書を公開することが可能である（Ｓ２０４）。また、指定された文書を公開した後に、他の文書の公開要求を受けた場合は、Ｓ２０２〜Ｓ２０８のステップを実行する（Ｓ２０９）。 Next, the document display program 118 generates a public document by replacing the public restriction word included in the public document with the generated masking character string (S207), and displays the public document on a display device such as a personal computer display. (S208). When displaying, highlighting such as reversing the display color of the masking character string or enclosing it with a square, etc., and making it possible to visually recognize that these parts have been masked is convenient. improves. A user who has the authority to view all words can publish a document by skipping the steps S205 to S207 (S204). If a request to publish another document is received after the designated document is published, steps S202 to S208 are executed (S209).

上記実施例によれば、同じ公開制限単語であっても、異なる文書であれば異なるマスキング文字列が用いられるため、一部の文書でマスキング文字列が示す公開制限単語が特定されても、他の文書の閲覧には影響せず、文書のセキュリティを確保できるという効果がある。また、同じ文書に含まれる公開制限単語は、各々の公開制限単語が区別できるように異なるマスキング文字列を用いるため、文書の内容把握を誤ることを防止するという効果がある。 According to the above embodiment, different masking character strings are used for different documents even if they are the same open restriction word. This has the effect of ensuring document security without affecting the viewing of other documents. In addition, since the different restriction character strings included in the same document use different masking character strings so that each restriction word can be distinguished, there is an effect of preventing misunderstanding of the contents of the document.

本発明の第２の実施形態は、実施例１に記載の文書公開装置に加えて、複数の文書から抽出した単語情報などの文書情報を安全に公開できる文書処理装置を提供するものである。ここで単語情報とは、係り受けの関係にある単語ペアなど、少なくとも１つ以上の単語からなる単語の組とするが、文書データから単語の組が抽出できれば、方式は問わない。尚、単語情報は、文書データの内容の傾向を把握するために用いられ、例えば、“パソコン−故障する”という単語ペアの個数が多い場合には、文書データに、パソコンが故障したという内容が多いということが分かる。 The second embodiment of the present invention provides a document processing apparatus capable of safely publishing document information such as word information extracted from a plurality of documents in addition to the document publishing apparatus described in the first embodiment. Here, the word information is a set of words composed of at least one word such as a word pair having a dependency relationship, but any method may be used as long as the word set can be extracted from the document data. Note that the word information is used for grasping the tendency of the contents of the document data. For example, when the number of word pairs “PC-failure” is large, the document data indicates that the PC has failed. You can see that there are many.

図６は本実施例の文書処理装置の構成図である。本実施例は、本装置は、中央処理装置ＣＰＵ６０１と、主メモリ６０２と、表示装置６０３と、入力装置６０４と、記憶装置６１０と、で構成される。記憶装置６１０には、ＯＳ（オペレーティングシステム）６１１と、文書データ６１２と、公開制限単語データ６１３と、ユーザ情報データ６１４と、ユーザ認証プログラム６１５と、文書検索プログラム６１６と、公開文書生成プログラム６１７と、文書表示プログラム６１８と、単語情報データ６１９と、単語情報データ生成プログラム６２０と、単語情報データ表示プログラム６２１と、が格納されている。尚、実施例１と異なる構成は、単語情報データ６１９と、単語情報データ生成プログラム６２０と、単語情報データ表示プログラム６２１を、本実施例に加えた点である。 FIG. 6 is a block diagram of the document processing apparatus of this embodiment. In the present embodiment, this apparatus includes a central processing unit CPU 601, a main memory 602, a display device 603, an input device 604, and a storage device 610. The storage device 610 includes an OS (operating system) 611, document data 612, public restriction word data 613, user information data 614, a user authentication program 615, a document search program 616, and a public document generation program 617. A document display program 618, word information data 619, a word information data generation program 620, and a word information data display program 621 are stored. The configuration different from the first embodiment is that word information data 619, a word information data generation program 620, and a word information data display program 621 are added to the present embodiment.

単語情報データ６１９は、文書データから抽出した単語の組が格納され、該単語情報データ６１９は、単語情報データ生成プログラム６２０によって生成される。また、単語情報データ表示プログラム６２１は、単語情報データ６１９を表示装置６０３に表示し、表示された単語の組で検索要求を受けた場合は、文書検索プログラム６１６にて検索条件に一致する文書を特定し、文書を表示装置６０３に表示する。 The word information data 619 stores a set of words extracted from the document data, and the word information data 619 is generated by the word information data generation program 620. In addition, the word information data display program 621 displays the word information data 619 on the display device 603, and when a search request is received for the set of displayed words, the document search program 616 selects a document that matches the search condition. The document is identified and displayed on the display device 603.

次に、本実施例の処理の流れを図７のフローチャートを用いて説明する。まず、ハードディスクやメモリなどの記憶装置に記憶されている文書データを読み出し（Ｓ７０２）、形態素解析を用いて文書データを単語に分割し、単語と品詞情報を取得する（Ｓ７０３）。次に、単語と品詞情報を用いて、係り受けの関係にある単語のペアを取得し、その単語ペアの計数によって、単語情報を生成する（Ｓ７０４）。例えば、“パソコンが故障したので修理したい”という文書からは、“パソコン（名詞）、が（助詞）、故障（名詞）、し（動詞）、た（助動詞）、ので（助詞）、修理（名詞）、し（動詞）、たい（助動詞）”という単語が得られ、単語の品詞情報を用いて、“パソコン−故障”、“修理−する”という単語ペアが抽出され、単語ペアの計数によって図８に示すような単語情報データが得られる。尚、ここでは係り受けの関係にある単語ペアを用いたが、単語ペアでなくても指定した品詞の単語、ある条件で抽出された複数の単語の組など、文書の特徴を把握できるものであれば方式を問わない。尚、事前に単語情報が事前に生成されている場合は、上記Ｓ７０２〜Ｓ７０４のステップは不要である。 Next, the processing flow of the present embodiment will be described with reference to the flowchart of FIG. First, document data stored in a storage device such as a hard disk or a memory is read (S702), the document data is divided into words using morphological analysis, and word and part-of-speech information is acquired (S703). Next, using the word and part-of-speech information, a pair of words having a dependency relationship is acquired, and word information is generated by counting the number of word pairs (S704). For example, from the document “I want to repair my computer because it broke down”, “Computer (noun), ga (particle), breakdown (noun), shi (verb), ta (auxiliary verb), so (particle), repair (noun) ), Shi (verb), tai (auxiliary verb) "are obtained, and using the part of speech information of the word, word pairs" computer-failure "," repair-do "are extracted, and the number of word pairs is counted. Word information data as shown in FIG. Note that word pairs that have a dependency relationship are used here. However, even if they are not word pairs, it is possible to grasp the characteristics of a document such as a specified part-of-speech word or a set of words extracted under certain conditions. Any method is acceptable. Note that if the word information is generated in advance, the steps S702 to S704 are not necessary.

次に、ユーザ認証手段によってユーザのアクセスレベルを特定し（Ｓ７０５）、単語情報に含まれる公開制限単語を特定し、異なる単語ペアに含まれる同じ単語が同じマスキング文字列とならないように、種別とランダムに生成した付加文字列の組から成る該公開制限単語のマスキング文字列を生成し、公開する単語情報を該マスキング文字列でマスキングして公開する（Ｓ７０６）。尚、ユーザ認証と公開制限単語の特定方法は実施例１に記載の通りである。また、マスキング文字列は実施例１に記載のように、種別が分かれば文字列の代わりに色情報などを用いても構わない。また、付加文字列は、ランダムに生成しなくとも、ＡからＺまでのアルファベットを順に用いるなど、事前に決められた文字列集合から決められた順に文字列を抽出して利用する方法を用いても構わない。 Next, the access level of the user is specified by the user authentication means (S705), the open restriction word included in the word information is specified, and the type and the masking character string are set so that the same word included in different word pairs does not become the same masking character string. A masking character string of the disclosure restricted word composed of a set of randomly generated additional character strings is generated, and the word information to be disclosed is masked and disclosed by the masking character string (S706). In addition, the user authentication and the method for specifying the public restriction word are as described in the first embodiment. In addition, as described in the first embodiment, the masking character string may use color information or the like instead of the character string if the type is known. In addition, the additional character string is not generated randomly, but the alphabet from A to Z is used in order, for example, by using a method of extracting and using character strings in a predetermined order from a predetermined character string set. It doesn't matter.

次に、頻度上位Ｎ位の単語ペアといった指定された条件に基づき表示する単語情報を絞り込み、図９の例のようにディスプレイなどの表示装置に該単語情報を表示する（Ｓ７０７）。
次に、ユーザから単語ペアで文書検索の要求を受け、該単語ペアが含まれる文書を文書データから抽出し、表示装置に該文書を表示する（Ｓ７０９）。ここで、公開制限単語を含む単語ペアが指定された場合には、図１０に示す画面を表示装置に表示し、公開制限単語が含まれているため検索できない旨を表示し、単語ペアに含まれる公開制限単語以外の単語で検索するかどうかの確認画面を表示し、ユーザに検索実行の有無を確認させる。例えば、“[社員Ａ]−不親切”という単語ペアがあり、この単語ペアで検索要求を受けた場合には、“[社員Ａ]”が公開制限単語であるため“不親切”という単語で検索するかどうかの確認画面を表示し、“はい”が選択された場合は“不親切”という単語が含まれている文書を検索し、“いいえ”が選択された場合は検索を実行しない。したがって、公開制限単語を含む単語ペアであっても、簡単な操作で問題ない単語を検索条件として検索できる。また、文書の検索要求がない場合は、ステップＳ７０９をスキップする。 Next, the word information to be displayed is narrowed down based on designated conditions such as the word pair with the highest frequency N rank, and the word information is displayed on a display device such as a display as in the example of FIG. 9 (S707).
Next, a document search request is received from the user with a word pair, a document including the word pair is extracted from the document data, and the document is displayed on the display device (S709). Here, when a word pair including a public restriction word is specified, the screen shown in FIG. 10 is displayed on the display device, indicating that the search cannot be performed because the public restriction word is included, and is included in the word pair. A confirmation screen asking whether or not to search with a word other than the restricted word to be displayed is displayed to allow the user to confirm whether or not the search is executed. For example, if there is a word pair “[Employee A] -Unfriendly” and a search request is received with this word pair, the word “[Employee A]” is a public restriction word, so the word “Unfriendly” A confirmation screen for whether to search is displayed. When “Yes” is selected, a document including the word “unfriendly” is searched, and when “No” is selected, the search is not executed. Therefore, even for a word pair including a public restriction word, it is possible to search for a word having no problem with a simple operation as a search condition. If there is no document search request, step S709 is skipped.

上記実施例によれば、単語ペアなどの単語情報に含まれるマスキング文字列が示す単語が特定されても、他の単語ペアに含まれる同じ公開制限単語は異なるマスキング文字列が使われているため、他の単語ペアには影響しないという効果がある。また、公開制限単語を含む単語ペアでの検索を制限することによって、本文閲覧によるマスキング文字列でマスキングされた公開制限単語の特定を防止できるという効果がある。その他、マスキング文字列を含む単語ペアの検索を容易にするという効果がある。 According to the above embodiment, even if a word indicated by a masking character string included in word information such as a word pair is specified, a different masking character string is used for the same public restriction word included in another word pair. This has the effect of not affecting other word pairs. In addition, by restricting the search with word pairs including the restriction word, it is possible to prevent the restriction of the restriction word masked with the masking character string by browsing the text. In addition, there is an effect of facilitating the search for word pairs including masking character strings.

次に、本発明の第３の実施例を、図を用いて説明する。
本実施例は、実施例２に記載の文書処理装置において、公開制限単語を含む単語ペアは、アクセス権に応じて公開制限単語を種別が共通の単語に展開して本文を検索できる文書処理装置を提供するものであって、公開制限単語テーブルを、公開制限単語と、公開制限単語のアクセスレベルと、公開制限単語の種別と、上位概念と、上位概念のアクセスレベルから成る、図１１に示すテーブルのように拡張する。図１１の例では、各々の公開制限単語に１つの上位概念を設定しているが、アクセスレベルに応じて上位概念を複数用意したい場合は、テーブルを更に拡張し、１つの公開制限単語に対して複数の上位概念を設定しても構わない。 Next, a third embodiment of the present invention will be described with reference to the drawings.
The present embodiment is a document processing apparatus according to the second embodiment, in which a word pair including a public restriction word can be searched for a text by expanding the public restriction word into a common word according to the access right. FIG. 11 shows a public restriction word table including a public restriction word, a public restriction word access level, a public restriction word type, a superordinate concept, and a superordinate concept access level. Extend like a table. In the example of FIG. 11, one superordinate concept is set for each public restriction word. However, when a plurality of superordinate concepts are prepared according to the access level, the table is further expanded to correspond to one public restriction word. A plurality of superordinate concepts may be set.

次に、本実施例の処理の流れを説明する。本実施例は、実施例２に記載の単語情報表示画面にて、単語ペアで検索要求を受け、該単語ペアに公開制限単語が含まれている場合は、公開制限単語データベースを参照し、上位概念のアクセスレベルを参照し、上位概念での検索可否を決定する。検索が可能な場合には、上位概念であれば検索できる旨を図１２に示す画面で表示し、検索するかどうかの確認を求めて、公開制限単語データベースから、同じ上位概念を持つ公開制限単語を抽出し、その単語すべてと、文書情報に含まれる公開制限単語以外の単語で文書を検索する。 Next, the process flow of the present embodiment will be described. In the present embodiment, in the word information display screen described in the second embodiment, when a search request is received by a word pair and the word pair includes a public restriction word, the public restriction word database is referred to Referencing the concept access level, and determining whether or not the search is possible in the superordinate concept. If a search is possible, the fact that a high-level concept can be searched is displayed on the screen shown in FIG. 12, and a confirmation as to whether or not to search is requested. And the document is searched with all of the words and words other than the open restriction word included in the document information.

例えば、「山田さん−不親切」という単語ペアが「[社員Ａ]−不親切」とマスキングされていて、アクセス権限２を持つユーザが、この単語ペアで検索要求を出した場合には、上位概念は“部署Ａ社員”でアクセス権限２のユーザが閲覧可能であるため、公開制限単語データベースから、同じ上位概念“部署Ａ社員”を持つ“佐藤さん”を抽出し、“(山田さん or 佐藤さん) and 不親切”を検索キーとして文書を検索する。検索された文書に含まれる、これらの公開制限単語は、実施例１に記載の文書公開装置のように、同じ単語であっても文書が異なる場合には異なるマスキング文字列が用いられているため、検索された文書各々のマスキング文字列が示す単語が特定されることはない。 For example, if the word pair “Yamada-san-unkind” is masked as “[Employee A] -unkind” and a user with access authority 2 issues a search request with this word pair, Since the concept is “Department A employee” and can be viewed by users with access authority 2, “Sato-san” with the same superordinate concept “Department A employee” is extracted from the public restricted word database, and “(Yamada-san or Sato-san) 3) Search the document using “unfriendly” as a search key. Since these publication restriction words included in the retrieved document are the same word as in the document publication apparatus described in the first embodiment, different masking character strings are used when the documents are different. The word indicated by the masking character string of each retrieved document is not specified.

本実施例によれば、公開制限単語が含まれる単語ペアであっても、アクセス権限に応じて上位概念に属する単語に展開して検索することができるので、公開制限単語を含む単語ペアを用いた検索をセキュアーに、かつ公開制限単語を用いた場合と近い検索結果を得ることができるという効果がある。 According to the present embodiment, even a word pair including a public restriction word can be expanded and searched for a word belonging to a higher concept according to the access authority. Therefore, the word pair including the public restriction word is used. It is possible to obtain a search result that is close to the case where the search that was made is secure and the public restriction word is used.

同じ単語を異なる文字列でマスキングし、一部の情報からマスキング文字列が示す単語が特定されても影響が少ない本方式は、一部の単語を伏せて情報を公開するという文書処理装置全般に適用できる。 This method masks the same word with different character strings and has little effect even if the word indicated by the masking character string is specified from some information. Applicable.

文書公開装置の構成図を示した図である（実施例１）。1 is a diagram illustrating a configuration diagram of a document disclosure apparatus (first embodiment). FIG. 文書処理装置の処理の流れを示した説明図である（実施例１）。FIG. 10 is an explanatory diagram illustrating a processing flow of the document processing apparatus (first embodiment). ユーザ情報データの例を示した図である（実施例１）。(Example 1) which was the figure which showed the example of user information data. 文書データの例を示した図である（実施例１）。FIG. 6 is a diagram illustrating an example of document data (Example 1). 公開制限単語データの例を示した図である（実施例１）。(Example 1) which was the figure which showed the example of the publication | transmission restriction | limiting word data. 文書処理装置の構成図を示した図である（実施例２）。FIG. 10 is a diagram illustrating a configuration diagram of a document processing apparatus (second embodiment). 文書処理装置の処理の流れを示した図である（実施例２）。FIG. 10 is a diagram illustrating a processing flow of a document processing apparatus (second embodiment). 単語情報の例を示した図である（実施例２）。(Example 2) which was the figure which showed the example of word information. マスキングされた単語情報の例を示した図である（実施例２）。It is the figure which showed the example of the masked word information (Example 2). 文書検索の確認画面の例を示した図である（実施例２）。(Example 2) which was the figure which showed the example of the confirmation screen of a document search. 拡張した公開制限単語データの例を示した図である（実施例３）。(Example 3) which was the figure which showed the example of the expansion | deployment restriction | limiting word data expanded. 文書検索の確認画面の例を示した図である（実施例３）。(Example 3) which was the figure which showed the example of the confirmation screen of a document search.

Explanation of symbols

101：CPU、102：主メモリ、103：表示装置、104：入力装置、110：記憶装置、111：OS、112：文書データ、113：公開制限単語データ、114：ユーザ情報データ、115：ユーザ認証プログラム、116：文書検索プログラム、117：公開文書生成プログラム、118：文書表示プログラム。 101: CPU, 102: main memory, 103: display device, 104: input device, 110: storage device, 111: OS, 112: document data, 113: disclosure restricted word data, 114: user information data, 115: user authentication Program: 116: Document search program, 117: Public document generation program, 118: Document display program.

Claims

Obtaining user access rights;
Identifying a restriction word for public disclosure included in the public document based on the access authority;
Generating a masking character string composed of a character string indicating the type of the restricted word and a character string generated randomly or in a predetermined order;
A document publishing method comprising the step of displaying the restricted word contained in the published document by replacing it with the masking character string.

As the masking character string, color information for distinguishing a word is used for a character string indicating the type of the restricted word, or the kind of the restricted word is distinguished for a character string for distinguishing the word. The document publishing method according to claim 1, wherein the color information is used.

A document publishing device having a storage device for storing a public document and publishing restricted word data, a calculation unit, and a display device,
The calculation unit specifies a public restriction word included in a public document stored in the storage device using the public restriction word data, and distinguishes a character string indicating the type of the public restriction word from a word in the type. Generating a masking character string that is combined with the character string to be used, and performing a replacement process of replacing the public restriction word included in the public document with the masking character string,
The display device displays a public document subjected to the replacement processing.

4. The document publishing apparatus according to claim 3, wherein color information is used as a character string for distinguishing words in the type.

In a computing unit of a document publishing apparatus having a storage device for storing a public document and publication restricted word data, a computing unit, and a display device,
Reading the published document from the storage device;
Identifying a public restriction word included in the read public document using the public restriction word data;
Generating a masking character string that combines a character string indicating the type of the restricted word and a character string for distinguishing words in the type;
Performing the step of substituting the masking character string with the public restriction word included in the public document,
A document publishing program that causes the display device to display the replaced public document.

6. The document publishing program according to claim 5, wherein color information is used as a character string for distinguishing words in the type.

Dividing document data into morphemes and dividing them into words;
Extracting one or more word sets from the divided words, counting the extracted word sets, and generating word information;
Identifying a public restriction word included in the words constituting the word set;
Generating a masking character string composed of a character string indicating the type of the restricted word and a character string generated randomly or in a predetermined order;
And a step of displaying the restricted word contained in the word information by replacing it with the masking character string.

When a search request is received for a set of words, and the public restriction word is included in the set of words, it is indicated that the search cannot be performed, the searchable conditions are presented, and whether or not the search is to be executed is selected. The document processing apparatus according to claim 7, wherein:

When a search request is received for a set of words, and when a public restriction word is included in the set of words and search is possible using a higher concept of the public restriction word, the fact that the higher concept can be searched is displayed, and search is executed The document processing apparatus according to claim 3, wherein whether or not is necessary is selected.