JP5217513B2

JP5217513B2 - An information analysis processing method, an information analysis processing program, an information analysis processing device, an information registration processing method, an information registration processing program, an information registration processing device, an information registration analysis processing method, and an information registration analysis processing program.

Info

Publication number: JP5217513B2
Application number: JP2008053776A
Authority: JP
Inventors: 洋一金井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2008-03-04
Filing date: 2008-03-04
Publication date: 2013-06-19
Anticipated expiration: 2028-03-04
Also published as: JP2009211404A

Description

本発明は、文書などの情報を解析して分類する情報解析処理方法、情報解析処理プログラム、情報解析処理装置、および文書などの情報を分類するための分類情報を登録する情報登録処理方法、情報登録処理プログラム、情報登録処理装置、および文書などの情報を分類するための分類情報を登録し、該情報を解析して登録した分類情報に従って分類する情報登録解析処理方法、および情報登録解析処理プログラムに関する。 The present invention relates to an information analysis processing method for analyzing and classifying information such as a document, an information analysis processing program, an information analysis processing device, an information registration processing method for registering classification information for classifying information such as a document, and information Registration processing program, information registration processing device, information registration analysis processing method for registering classification information for classifying information such as documents, analyzing the information and classifying according to registered classification information, and information registration analysis processing program About.

近年の自然言語処理技術の発達と計算機の処理能力の向上に伴い、大量の蓄積文書集合の中から内容が類似する文書を抽出したり、類似度に基づいて分類をしたりすることが可能になっている。 With the recent development of natural language processing technology and the improvement of computer processing power, it is possible to extract documents with similar contents from a large collection of accumulated documents and classify them based on the degree of similarity It has become.

文書が類似するかどうかの判定方法としては、例えば、以下の手法が知られている。まず、対象文書を文字列や単語や文節を単位とする要素に分解し、その要素の組み合わせに基づいて特徴量を計算する。そして、全ての文書の組み合わせについて、特徴量の類似度を求め、類似度が一定以上であれば類似するとみなす。 As a method for determining whether documents are similar, for example, the following methods are known. First, the target document is decomposed into elements having character strings, words, and phrases as units, and feature quantities are calculated based on combinations of the elements. Then, the similarity of the feature amount is obtained for all the combinations of documents, and if the similarity is equal to or higher than a certain level, it is regarded as similar.

特徴量の計算方法としては様々な方式が考案されている。例えば、対象文書を文字列や単語や文節を単位とする要素に分解した後に、各要素の文書集合における出現頻度とその対象文書における出現頻度とに基づいて要素の重みを求めて、各要素とその重みによって構成されるベクトルによって特徴量を表現する方法が知られている。また、類似度は、そのベクトルの内積を求めるなどして算出する。類似度に基づく分類は、同じ分類のものとして定義された文書群の特徴量（ベクトル）の平均値を算出し、対象文書の特徴量（ベクトル）とその平均ベクトルとの類似度が一定以上であればその対象文書はその分類であると判断するような方法がある。 Various methods have been devised for calculating feature amounts. For example, after decomposing the target document into elements of character strings, words, and phrases, the weight of the element is obtained based on the appearance frequency of each element in the document set and the appearance frequency in the target document. A method for expressing a feature value by a vector constituted by the weight is known. The similarity is calculated by obtaining the inner product of the vectors. For classification based on similarity, the average value of feature quantities (vectors) of document groups defined as having the same classification is calculated, and the similarity between the feature quantity (vector) of the target document and the average vector is more than a certain level. If there is, there is a method for determining that the target document is the classification.

ここで、例えば、問合せ文を分類するものとして、問合せ文書の分類装置および方法ならびに当該方法を記述したプログラムを記録した記録媒体が開示されている（特許文献１参照）。この特許文献１には、文書集合をクラスタリングして自動分類する方法と、その分類結果を利用して新しい文書を分類する方法が記載されている。すなわち、蓄積された問合せ・回答文書情報を、その内容に基づいて複数のカテゴリに分類し、新たに寄せられた問合せに対して、その内容が最も近いカテゴリを選び出すものである。 Here, for example, as a method for classifying a query sentence, a query document classification apparatus and method and a recording medium on which a program describing the method is recorded are disclosed (see Patent Document 1). This patent document 1 describes a method for automatically classifying a document set by clustering a document set, and a method for classifying a new document using the classification result. That is, the accumulated inquiry / answer document information is classified into a plurality of categories based on the contents thereof, and the category having the closest contents is selected for a newly received inquiry.

また、セキュリティ情報との関連付けがなされていない情報を適切に保護するセキュリティ情報推定装置、セキュリティ情報推定方法、セキュリティ情報推定プログラム及び記録媒体が開示されている（特許文献２参照）。この特許文献２には、文書のセキュリティ属性を推定するシステムにおいて、その推定対象の文書に類似する文書を、既存の蓄積文書の中から検索し、見つかった類似する蓄積文書のセキュリティ属性を推定結果として提示するシステムが記載されている。 In addition, a security information estimation device, a security information estimation method, a security information estimation program, and a recording medium that appropriately protect information that is not associated with security information are disclosed (see Patent Document 2). In this patent document 2, in a system for estimating the security attribute of a document, a document similar to the document to be estimated is searched from existing stored documents, and the security attribute of the similar stored document found is estimated. Is presented as a system.

また、読み取られた原稿から文書ＩＤが取得されない場合でも文書管理のセキュリティを維持する画像形成装置、画像形成システム、セキュリティ管理装置およびセキュリティ管理方法が開示されている（特許文献３参照）。この特許文献３には、複写機で原稿をコピー等する際に、その原稿からセキュリティ情報を読み取れなかった場合にはスキャン画像から文字列などを抽出して、既存の文書管理データの文書と合致するかどうかを判断し、合致する場合にはコピーを停止するようなシステムが記載されている。 Also disclosed is an image forming apparatus, an image forming system, a security management apparatus, and a security management method that maintain document management security even when a document ID is not acquired from a read original (see Patent Document 3). In this patent document 3, when copying a manuscript with a copying machine, if security information cannot be read from the manuscript, a character string or the like is extracted from the scanned image and matches a document of existing document management data. A system is described that determines whether or not to perform copying, and stops copying if they match.

また、画像データに画像として含まれているテキスト情報を適切に判定することのできる画像処理装置、画像処理方法、画像処理プログラム及び記録媒体が開示されている（特許文献４参照）。セキュリティ属性を推定するシステムにおいて、推定対象の文書がスキャン画像の場合、そのスキャン画像にＯＣＲ（Optical Character Reader）をかけてテキストを抽出し、そのテキストと類似する文書を既存の蓄積文書の中から検索するが、ＯＣＲ誤りが影響してうまく検索できない場合がある。これを解決するために、特許文献４では、あらかじめ既存の蓄積文書の印刷イメージを作成し、その印刷イメージに対してＯＣＲをかけた結果のテキストを保持しておく。そして、そのテキストとマッチングを取れば類似する文書がより正確に探し出せるというものである。 In addition, an image processing apparatus, an image processing method, an image processing program, and a recording medium that can appropriately determine text information included as an image in image data are disclosed (see Patent Document 4). In a system for estimating security attributes, when a document to be estimated is a scanned image, the scanned image is extracted by applying an OCR (Optical Character Reader), and a document similar to the text is extracted from existing stored documents. Although the search is performed, the search may not be performed well due to an OCR error. In order to solve this problem, in Patent Document 4, a print image of an existing stored document is created in advance, and a text obtained as a result of performing OCR on the print image is held. And if the text is matched, similar documents can be found more accurately.

また、文書をカテゴリに分類する際に、推定したカテゴリに基づいて、内容の類似している文書をまとめ、ユーザに順次提示してチェックを促す文書分類支援方法および装置が開示されている（特許文献５参照）。電子文書を文書分類システムにかけた際には、分類誤りがないかどうかをチェックする必要があるが、特許文献５では、そのチェックを容易に行うことができるものである。また、文書分類システムそのものについても記載されている。 Further, there is disclosed a document classification support method and apparatus that, when classifying a document into categories, collects documents with similar contents based on an estimated category and sequentially presents them to a user to prompt a check (patent) Reference 5). When an electronic document is applied to a document classification system, it is necessary to check whether there is a classification error. However, in Patent Document 5, this check can be easily performed. It also describes the document classification system itself.

また、膨大なテキスト情報を自動分類する製品が製造されている（非特許文献１参照）。この非特許文献１の製品は、テキストに分類コード（複数）を対応させてシステムに学習させておき、図１４に示すように、後で分類したい対象文書のテキストを入力することで該当する分類コード（複数）が得られるというものである。 In addition, products that automatically classify enormous text information are manufactured (see Non-Patent Document 1). The product of this non-patent document 1 associates the classification code (plurality) with the text so that the system learns it, and as shown in FIG. The code (s) are obtained.

また、近年、企業などで扱う企業秘密や個人情報について、その情報漏えいを防止することが求められている。そのために、権限のある利用者にしか機密情報にアクセスできないようにアクセスを制御したり、機密情報を暗号化して権限のある利用者にしか閲覧、印刷できないようにするものが知られている。しかし、その一方で、権限のある利用者にアクセスされた機密情報は、それが印刷されたりすると機密管理ができなくなってしまうという問題があった。 In recent years, it has been required to prevent information leakage of trade secrets and personal information handled by companies. Therefore, it is known to control access so that only authorized users can access confidential information, or encrypt confidential information so that only authorized users can view and print. On the other hand, however, there is a problem that confidential information that is accessed by an authorized user cannot be managed if it is printed.

そのような問題に対しては、電子メールが送付される際や、紙文書がＭＦＰ（デジタル複合機：Multi Function Peripherals）においてスキャン、複写、ファクス送信される際に、その文書の内容を解析してその文書に類似する機密文書を文書ＤＢ（データベース）から検索して機密文書に類似するかどうかを判定し、その対象文書のセキュリティ属性を推定するようなシステムが提案されている（例えば、特許文献２〜４参照）。 To deal with such problems, the contents of a document are analyzed when an e-mail is sent or when a paper document is scanned, copied, or faxed by an MFP (Multi Function Peripherals). A system has been proposed in which a confidential document similar to the document is searched from a document DB (database) to determine whether the document is similar to the confidential document, and the security attribute of the target document is estimated (for example, a patent) References 2-4).

特開２０００−１４８７７０号公報JP 2000-148770 A 特開２００６−１８５１５３号公報JP 2006-185153 A 特開２００５−１６６０２３号公報Japanese Patent Laying-Open No. 2005-166023 特開２００６−２９３９１７号公報JP 2006-293917 A 特許第３６０３３９２号公報Japanese Patent No. 3603392 http://www.justsystem.co.jp/km/product/cb102.htmlhttp://www.justsystem.co.jp/km/product/cb102.html

しかしながら、実際には、解析を行う対象文書には、複数の機密文書の内容が混在する可能性がある。例えば、電子メールに添付された電子文書ファイルが複数の文書からのコピーアンドペーストで作成された場合が考えられる。また、ＭＦＰで複写する原稿が機密文書と一般文書が混在したものである場合が考えられる。 However, in reality, there is a possibility that the contents of a plurality of confidential documents are mixed in the target document to be analyzed. For example, a case where an electronic document file attached to an e-mail is created by copy and paste from a plurality of documents can be considered. In addition, there may be a case where a document to be copied by the MFP is a mixture of a confidential document and a general document.

上記特許文献１〜５や非特許文献１では、文書等の情報の全体に対しての類似判定を行うものである。従って、上述のような電子メールの添付ファイル、ＭＦＰで複写された原稿全体について、既存の機密文書に似ているかどうかを判定しようとしても、部分的には似ているかもしれないが全体としては似ていないという判定結果になってしまう場合がある。そして、むしろそれを悪用し、一般文書の中に機密情報を紛れ込ませることで、機密文書には類似しないと判定させるようなことが可能となってしまうという問題があった。 In Patent Documents 1 to 5 and Non-Patent Document 1, similarity determination is performed on the entire information such as a document. Therefore, if you try to determine whether an email attachment as described above or the entire original copied by MFP is similar to an existing confidential document, it may be partially similar, but as a whole In some cases, the determination result may not be similar. Rather, there is a problem that it is possible to determine that it is not similar to a confidential document by misusing it and inserting confidential information into a general document.

また、仕掛かり文書やドラフト文書など、機密文書ＤＢにまだ登録されていない文書については、それがメールで送信されたりＭＦＰでスキャン、複写、ファクス送信されたりしても、その文面がいずれの機密文書にも類似しないという判定結果になってしまう場合がある。そうすると、既存の機密文書と同様の内容を含みながらも文面が異なるがために機密文書ではないと判定されてしまうとう問題があった。 In addition, for documents that are not yet registered in the confidential document DB, such as work-in-progress documents and draft documents, even if they are sent by e-mail, scanned, copied, or faxed by the MFP, the text of any confidential text In some cases, the determination result may be similar to the document. Then, there is a problem that it is determined that the document is not a confidential document because it includes the same content as the existing confidential document but the text is different.

本発明は、上記に鑑みてなされたものであって、機密情報の漏洩を防止し、利便性を向上させる情報解析処理方法、情報解析処理プログラム、情報解析処理装置、情報登録処理方法、情報登録処理プログラム、情報登録処理装置、情報登録解析処理方法、および情報登録解析処理プログラムを提供することを目的とする。 The present invention has been made in view of the above, and is an information analysis processing method, an information analysis processing program, an information analysis processing device, an information registration processing method, and information registration that prevents leakage of confidential information and improves convenience. It is an object of the present invention to provide a processing program, an information registration processing device, an information registration analysis processing method, and an information registration analysis processing program.

上述した課題を解決し、目的を達成するために、請求項１にかかる発明は、情報解析処理装置で実行される情報解析処理方法において、前記情報解析処理装置は、登録対象の情報である登録対象情報を分割した部分情報の特徴量である第１特徴量を格納する類似情報検索記憶部と、前記登録対象情報の特徴量である第２特徴量を属性情報ごとに格納する情報分類記憶部と、を備え、情報解析受付部が、外部装置から解析対象の情報である解析対象情報を受信することで、前記解析対象情報の解析要求を受け付ける情報解析受付ステップと、情報分割解析部が、前記解析対象情報を、前記解析対象情報の部分である部分解析情報に分割する解析対象情報分割ステップと、類似情報検索部が、前記部分解析情報を構成する要素に基づいて前記部分解析情報の第３特徴量を算出し、算出された前記第３特徴量と前記類似情報検索記憶部に格納された前記第１特徴量とに基づいて、前記部分解析情報に類似する前記部分情報を検索する類似情報検索ステップと、情報分類部が、前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記部分解析情報の前記第３特徴量と前記情報分類記憶部に格納された前記第２特徴量とに基づいて、前記解析対象情報を、前記情報分類記憶部に格納されたいずれかの前記属性情報に分類する情報分類ステップと、情報解析処理部が、全ての前記部分解析情報が前記部分情報のいずれかに類似すると判定された場合、前記解析対象情報と前記類似情報検索ステップによる検索結果とを解析結果として出力し、少なくとも一つの前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記解析対象情報と前記検索結果と前記情報分類ステップによる分類結果とを解析結果として出力する解析結果出力ステップと、を含むことを特徴とする。 In order to solve the above-described problems and achieve the object, the invention according to claim 1 is an information analysis processing method executed by an information analysis processing device, wherein the information analysis processing device is registration information. A similar information search storage unit that stores a first feature amount that is a feature amount of partial information obtained by dividing target information, and an information classification storage unit that stores a second feature amount that is a feature amount of the registration target information for each attribute information The information analysis receiving unit receives the analysis target information that is the information to be analyzed from the external device, so that the information analysis receiving step for receiving the analysis request for the analysis target information, and the information division analysis unit, The analysis target information dividing step for dividing the analysis target information into partial analysis information that is a part of the analysis target information, and a similar information search unit based on the elements constituting the partial analysis information The partial information similar to the partial analysis information is calculated based on the calculated third feature amount and the first feature amount stored in the similar information search storage unit. And when the information classification unit determines that the partial analysis information is not similar to any of the partial information, the third feature amount of the partial analysis information and the information classification storage unit An information classification step for classifying the analysis target information into any one of the attribute information stored in the information classification storage unit based on the second feature amount stored in the information analysis processing unit, when said partial analysis information is determined to be similar to any of the partial information, and outputs the analysis result and the search result by the similar information retrieval step and said analyzed information, at least one said portion of the If the analysis information is determined not be similar to any of the partial information, to include an analysis result output step of outputting a classification result by the information classification step and the analyzed information to the search result as the analysis result It is characterized by.

また、請求項２にかかる発明は、請求項１に記載の情報解析処理方法において、ポリシー処理部が、前記解析結果を受け取った前記解析対象情報に対して、前記属性情報に対応づけて前記解析対象情報に実行する処理を定めたセキュリティポリシーに基づく処理を行うポリシー処理ステップを、さらに含むことを特徴とする。 The invention according to claim 2 is the information analysis processing method according to claim 1, wherein the policy processing unit associates the analysis target information that has received the analysis result with the attribute information in association with the analysis. It further includes a policy processing step for performing processing based on a security policy that defines processing to be executed on target information.

また、請求項３にかかる発明は、請求項１または２に記載の情報解析処理方法において、前記解析対象情報は、画像形成装置で処理された情報であることを特徴とする。 According to a third aspect of the present invention, in the information analysis processing method according to the first or second aspect, the analysis target information is information processed by an image forming apparatus.

また、請求項４にかかる発明は、情報登録解析処理装置で実行される情報登録解析処理方法において、前記情報登録解析処理装置は、登録対象の情報である登録対象情報を分割した部分情報の特徴量である第１特徴量を格納する類似情報検索記憶部と、前記登録対象情報の特徴量である第２特徴量を属性情報ごとに格納する情報分類記憶部と、を備え、情報登録受付部が、前記登録対象情報と前記登録対象情報の前記属性情報とを含む登録要求を受け付ける情報登録受付ステップと、情報分割登録部が、前記登録要求を受け付けた場合に、前記登録対象情報を前記部分情報に分割する登録対象情報分割ステップと、類似情報登録部が、前記部分情報を構成する要素に基づいて前記第１特徴量を算出し、算出した前記第１特徴量を前記類似情報検索記憶部に格納する部分情報格納ステップと、情報分類登録部が、前記登録対象情報を構成する要素に基づいて前記第２特徴量を算出し、算出した前記第２特徴量を、前記属性情報の分類に従って、前記情報分類記憶部に格納する分類格納ステップと、情報解析受付部が、外部装置から解析対象の情報である解析対象情報を受信することで、前記解析対象情報の解析要求を受け付ける情報解析受付ステップと、
情報分割解析部が、前記解析対象情報を、前記解析対象情報の部分である部分解析情報に分割する解析対象情報分割ステップと、類似情報検索部が、前記部分解析情報を構成する要素に基づいて前記部分解析情報の第３特徴量を算出し、算出された前記第３特徴量と前記類似情報検索記憶部に格納された前記第１特徴量とに基づいて、前記部分解析情報に類似する前記部分情報を検索する類似情報検索ステップと、情報分類部が、前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記部分解析情報の前記第３特徴量と前記情報分類記憶部に格納された前記第２特徴量とに基づいて、前記解析対象情報を、前記情報分類記憶部に格納されたいずれかの前記属性情報に分類する情報分類ステップと、情報解析処理部が、全ての前記部分解析情報が前記部分情報のいずれかに類似すると判定された場合、前記解析対象情報と前記類似情報検索ステップによる検索結果とを解析結果として出力し、少なくとも一つの前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記解析対象情報と前記検索結果と前記情報分類ステップによる分類結果とを解析結果として出力する解析結果出力ステップと、を含むことを特徴とする。 According to a fourth aspect of the present invention, there is provided an information registration analysis processing method executed by an information registration analysis processing device, wherein the information registration analysis processing device is characterized by partial information obtained by dividing registration target information that is registration target information. An information registration receiving unit comprising: a similar information search storage unit that stores a first feature amount that is a quantity; and an information classification storage unit that stores a second feature amount that is a feature amount of the registration target information for each attribute information. but with the registered information and the attribute information and the including registration request accepting information registration receiving step of the registration target data, when the information division registration unit, which has received the registration request, the registration target information A registration target information dividing step for dividing the partial information, and a similar information registering unit calculates the first feature amount based on elements constituting the partial information, and the calculated first feature amount is used as the similar information. The partial information storage step for storing in the search storage unit, and the information classification registration unit calculates the second feature amount based on the elements constituting the registration target information, and the calculated second feature amount is used as the attribute information. In accordance with the classification, the classification storage step for storing in the information classification storage unit and the information analysis receiving unit receive the analysis target information that is the information to be analyzed from the external device, thereby receiving the analysis request for the analysis target information Information analysis acceptance step;
An information division analysis unit divides the analysis target information into partial analysis information that is a part of the analysis target information, and a similar information search unit is based on elements constituting the partial analysis information. A third feature amount of the partial analysis information is calculated, and based on the calculated third feature amount and the first feature amount stored in the similar information search storage unit, the similarity to the partial analysis information When the similar information search step for searching for partial information and the information classification unit determine that the partial analysis information is not similar to any of the partial information, the third feature amount of the partial analysis information and the information classification An information classification step for classifying the analysis target information into any of the attribute information stored in the information classification storage unit based on the second feature amount stored in the storage unit, and an information analysis processing unit If all the partial analysis information is determined to be similar to any of the partial information, and outputs the analysis result and the search result by the similar information retrieval step and said analyzed information, at least one of said partial analysis information An analysis result output step of outputting, as an analysis result, the analysis target information, the search result, and the classification result obtained by the information classification step when it is determined that the partial information is not similar to any of the partial information , To do.

また、請求項５にかかる発明は、請求項４に記載の情報登録解析処理方法において、画像形成装置により形成された画像データを、前記登録対象情報として前記情報登録受付部に送出する情報登録送出ステップをさらに含み、前記情報登録受付ステップは、前記情報登録受付部が、前記登録対象情報としての前記画像データと前記属性情報とを含む前記登録要求を受け付けることを特徴とする。 The invention according to claim 5 is the information registration analysis processing method according to claim 4 , wherein the image data formed by the image forming apparatus is transmitted as information to be registered to the information registration receiving unit. step further comprises, the information registration receiving step, the information registration receiving unit, and the image data and the attribute information as the registered information, characterized in that accepting including pre Symbol registration request.

また、請求項６にかかる発明は、請求項４に記載の情報登録解析処理方法において、画像形成装置により形成された画像データを、前記解析対象情報として前記情報解析受付部に送出する情報解析送出ステップをさらに含み、前記情報解析受付ステップは、前記情報解析受付部が、前記外部装置から前記解析対象情報としての前記画像データを受信することを特徴とする。 The invention according to claim 6 is the information registration analysis processing method according to claim 4 , wherein the image data formed by the image forming apparatus is sent as the analysis target information to the information analysis reception unit. The information analysis accepting step further includes a step of receiving the image data as the analysis target information from the external device.

また、請求項７にかかる発明は、請求項１〜６に記載の方法をコンピュータに実行させるプログラムである。 The invention according to claim 7 is a program for causing a computer to execute the method according to claims 1 to 6 .

また、請求項８にかかる発明は、登録対象の情報である登録対象情報を分割した部分情報の第１特徴量を格納する類似情報検索記憶部と、前記登録対象情報の第２特徴量を属性情報ごとに格納する情報分類記憶部と、外部装置から解析対象の情報である解析対象情報を受信することで、前記解析対象情報の解析要求を受け付ける情報解析受付部と、前記解析対象情報を、前記解析対象情報の部分である部分解析情報に分割する情報分割解析部と、前記部分解析情報を構成する要素により算出された第３特徴量と前記類似情報検索記憶部に格納された前記第１特徴量とに基づいて、前記部分解析情報に類似する前記部分情報を検索する類似情報検索部と、前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記部分解析情報の前記第３特徴量と前記情報分類記憶部に格納された前記第２特徴量とに基づいて、前記解析対象情報を、前記情報分類記憶部に格納されたいずれかの前記属性情報に分類する情報分類部と、全ての前記部分解析情報が前記部分情報のいずれかに類似すると判定された場合、前記解析対象情報と前記類似情報検索ステップによる検索結果とを解析結果として出力し、少なくとも一つの前記部分解析情報が前記部分情報のいずれにも類似しないと判定された場合、前記解析対象情報と前記検索結果と前記情報分類ステップによる分類結果とを解析結果として出力する情報解析処理部と、を備えることを特徴とする。 According to an eighth aspect of the present invention, there is provided a similar information search storage unit that stores a first feature quantity of partial information obtained by dividing registration target information that is registration target information, and a second feature quantity of the registration target information. An information classification storage unit for storing each information, an information analysis reception unit that receives an analysis request for the analysis target information by receiving analysis target information that is information to be analyzed from an external device, and the analysis target information. An information division analysis unit that divides into partial analysis information that is a part of the analysis target information, a third feature amount calculated by an element constituting the partial analysis information, and the first information stored in the similar information search storage unit based on the feature amount, and the partial analysis information similar information retrieval unit for retrieving the partial information that is similar to, if said partial analysis information is determined not to be similar to any of the partial information, the partial solution Based on the third feature amount of information and the second feature amount stored in the information classification storage unit, the analysis target information is classified into any of the attribute information stored in the information classification storage unit And when the partial analysis information is determined to be similar to any of the partial information, the analysis target information and the search result of the similar information search step are output as an analysis result. When it is determined that the two pieces of partial analysis information are not similar to any of the partial information , an information analysis processing unit that outputs the analysis target information, the search result, and the classification result by the information classification step as an analysis result; It is characterized by providing.

本発明によれば、機密情報などの解析対象情報を分割することで、解析対象情報の類似度を判定する単位をより細かくして、解析対象情報を構成する部分である部分解析情報がいずれかの部分情報と類似するかを判定できる。また、部分解析情報が、登録されているいずれの部分情報にも類似しないと判定された場合には、解析対象情報をいずれかのカテゴリに分類できる。従って、解析対象情報が機密情報であるか否かの判定や分類をより正確に行うことができ、機密情報の漏洩を防止し、利便性を向上させるという効果を奏する。 According to the present invention, by dividing the analysis target information such as confidential information, the unit for determining the similarity of the analysis target information is made finer, and any of the partial analysis information that is a part constituting the analysis target information. It can be determined whether it is similar to the partial information. If it is determined that the partial analysis information is not similar to any registered partial information, the analysis target information can be classified into any category. Therefore, it is possible to more accurately determine whether or not the analysis target information is confidential information and to classify the information, thereby preventing leakage of confidential information and improving convenience.

以下に添付図面を参照して、この発明にかかる情報解析処理方法、情報解析処理プログラム、情報解析処理装置、情報登録処理方法、情報登録処理プログラム、情報登録処理装置、情報登録解析処理方法、および情報登録解析処理プログラムの最良な実施の形態を詳細に説明する。 With reference to the accompanying drawings, an information analysis processing method, an information analysis processing program, an information analysis processing device, an information registration processing method, an information registration processing program, an information registration processing device, an information registration analysis processing method, and The best embodiment of the information registration analysis processing program will be described in detail.

以下の実施の形態では、情報を機密文書の文書データに適用した例を示し、登録が行われる文書データを登録文書データとし、解析が行われる文書データを解析文書データと称する。また、情報解析処理装置を文書解析処理部に適用し、情報登録処理装置を文書登録部に適用し、さらに、情報登録解析処理装置を文書登録解析サーバに適用した例を示すが、これに限定されることはない。すなわち、文書などの情報を登録、解析可能な装置やサーバであれば、本発明を適用することができる。 In the following embodiment, an example in which information is applied to document data of a confidential document will be described. Document data to be registered is referred to as registered document data, and document data to be analyzed is referred to as analysis document data. In addition, an example in which the information analysis processing device is applied to the document analysis processing unit, the information registration processing device is applied to the document registration unit, and the information registration analysis processing device is applied to the document registration analysis server is shown. It will never be done. That is, the present invention can be applied to any device or server that can register and analyze information such as documents.

図１は、実施の形態にかかる文書登録解析サーバおよび関連周辺機器の全体構成を示す図である。図１に示すように、文書登録解析サーバ１は、類似文書検索ＤＢ（database）３００と、文書分類ＤＢ３１０と、文書登録部１００と、文書解析処理部２００と、文書監視部４００とから主に構成されている。そして、文書登録解析サーバ１は、画像ログＤＢ２と、ファイルサーバ３と、ＭＦＰ５と主にネットワーク等を介して接続されている。 FIG. 1 is a diagram illustrating an entire configuration of a document registration analysis server and related peripheral devices according to the embodiment. As shown in FIG. 1, the document registration analysis server 1 mainly includes a similar document search DB (database) 300, a document classification DB 310, a document registration unit 100, a document analysis processing unit 200, and a document monitoring unit 400. It is configured. The document registration analysis server 1 is connected to the image log DB 2, the file server 3, and the MFP 5 mainly via a network or the like.

まず、文書登録処理および文書解析処理の概要を説明する。文書監視部４００は、管理者Ｄから監視するフォルダの設定を受け付け、該設定に従ってファイルサーバ３の文書フォルダに機密文書の登録文書データが保存されたか否かを監視する。そして、文書監視部４００は、ファイルサーバ３から登録文書データを取得して、文書登録部１００に登録文書データの登録要求を送出する。 First, an overview of document registration processing and document analysis processing will be described. The document monitoring unit 400 receives the setting of the folder to be monitored from the administrator D, and monitors whether or not the registered document data of the confidential document is stored in the document folder of the file server 3 according to the setting. Then, the document monitoring unit 400 acquires registered document data from the file server 3 and sends a registration request for registered document data to the document registration unit 100.

文書登録部１００は、文書監視部４００から受け付けた機密文書の登録文書データを分割して類似文書検索ＤＢ３００に登録したり、機密文書の登録文書データをいずれかの機密カテゴリに分類して文書分類ＤＢ３１０に登録する。 The document registration unit 100 divides the registered document data of the confidential document received from the document monitoring unit 400 and registers it in the similar document search DB 300, or classifies the registered document data of the confidential document into one of the confidential categories. Register in DB310.

文書解析処理部２００は、一般利用者ＥがＰＣ４により印刷等を行った文書データや、一般利用者Ｆが紙原稿Ｐをスキャン、ファクス送信、複写等を行った文書データをＭＦＰ５から受信することで、解析文書データの解析要求を受け付ける。文書解析処理部２００は、受け付けた解析文書データが既存の機密文書の登録文書データに類似するか、既存の機密文書の登録文書データに類似しなくても既存の機密文書の機密カテゴリのいずれかに分類できるか等の解析を行う。そして、文書解析処理部２００は、その解析結果に基づいて、分類した解析文書データに対して、管理者Ｄにより設定されたセキュリティポリシーに従った処理、すなわち電子メールによる通報処理や画像ログＤＢ２への記録処理を行う。 The document analysis processing unit 200 receives from the MFP 5 document data printed by the general user E using the PC 4 or document data obtained by the general user F scanning, faxing, copying, etc. the paper document P. Then, an analysis request for analysis document data is received. The document analysis processing unit 200 selects either the received analysis document data that is similar to the registered document data of the existing confidential document or the confidential category of the existing confidential document even if it is not similar to the registered document data of the existing confidential document. Analyzes whether it can be classified into Then, based on the analysis result, the document analysis processing unit 200 processes the classified analysis document data according to the security policy set by the administrator D, that is, the notification process by e-mail or the image log DB 2. The recording process is performed.

次に、登録文書データの文書登録処理の詳細について説明する。図２は、文書監視部および文書登録部の構成図である。まず、文書監視部４００は、ファイルサーバ３の文書フォルダに新しく機密文書の登録文書データが保存されたか否かを監視するものであり、監視フォルダ設定部４０１と、文書保存監視部４０２と、文書登録要求部４０３とを主に備えている。 Next, the details of the document registration process for registered document data will be described. FIG. 2 is a configuration diagram of the document monitoring unit and the document registration unit. First, the document monitoring unit 400 monitors whether or not newly registered document data of a confidential document is stored in the document folder of the file server 3, and includes a monitoring folder setting unit 401, a document storage monitoring unit 402, a document A registration request unit 403 is mainly provided.

監視フォルダ設定部４０１は、監視フォルダ設定画面から管理者Ｄによる設定操作を受け付け、受け付けた設定操作に従って監視フォルダ設定データを生成して出力する。図３は、監視フォルダ設定画面の一例を示す図である。図３に示すように、監視フォルダ設定画面５０には、登録文書データの機密カテゴリの名称入力欄５１と、監視フォルダのパス入力欄５２とが表示されている。この名称入力欄５１とパス入力欄５２から、管理者Ｄによる機密カテゴリの名称と監視フォルダのパスの入力操作（設定操作）を受け付けて、関しフォルダ設定データを生成し、メモリ等の記憶媒体に保存する。また、監視フォルダ設定画面５０の下部には、設定済み監視フォルダの一覧５３が表示されている。図３では、「D:\Documents\Contracts\」という監視フォルダに格納された登録文書データは、「CONTRACT」という機密カテゴリの文書として登録されることになる。 The monitoring folder setting unit 401 receives a setting operation by the administrator D from the monitoring folder setting screen, and generates and outputs monitoring folder setting data according to the received setting operation. FIG. 3 is a diagram illustrating an example of a monitoring folder setting screen. As shown in FIG. 3, the monitoring folder setting screen 50 displays a confidential category name input field 51 for registered document data and a monitoring folder path input field 52. From the name input field 51 and the path input field 52, the administrator D receives an operation (setting operation) for the name of the confidential category and the path of the monitoring folder, generates related folder setting data, and stores it in a storage medium such as a memory. save. In addition, a list 53 of already set monitoring folders is displayed at the bottom of the monitoring folder setting screen 50. In FIG. 3, the registered document data stored in the monitoring folder “D: \ Documents \ Contracts \” is registered as a document of the confidential category “CONTRACT”.

ここで、登録文書データとは、例えば、文書で構成された文書データを示しているが、登録する文書データは電子文書ファイルに限定されることはない。すなわち、例えば、ＭＦＰ（デジタル複合機）で紙原稿をスキャンすることにより形成されたスキャン画像データに対して登録処理を行うように構成してもよい。この場合、紙原稿をスキャンする際に、ＭＦＰのオペレーションパネル等から、画像データの機密カテゴリを入力させる構成としてもよいし、スキャン送付の「あて先」を機密カテゴリとしてマッピングするようにする構成としてもよい。 Here, the registered document data indicates, for example, document data composed of documents, but the document data to be registered is not limited to electronic document files. That is, for example, the registration process may be performed on scanned image data formed by scanning a paper document with an MFP (digital multifunction peripheral). In this case, when scanning a paper document, the configuration may be such that the confidential category of the image data is input from the operation panel of the MFP, or the “destination” of the scan delivery is mapped as the confidential category. Good.

文書保存監視部４０２は、監視フォルダ設定部４０１により生成された監視フォルダ設定データを読み込んで、監視フォルダ設定データの設定に従って、ファイルサーバ３の文書フォルダに新しい機密文書の登録文書データが保存されたか否かを監視するものである。また、文書保存監視部４０２は、新しい機密文書の登録文書データがファイルサーバ３に保存されると、保存された機密文書の登録文書データのファイルパスと、監視フォルダ設定データを参照して登録文書データに対応する機密カテゴリを文書登録要求部４０３に送出する。 The document storage monitoring unit 402 reads the monitoring folder setting data generated by the monitoring folder setting unit 401, and whether registered document data of a new confidential document is stored in the document folder of the file server 3 according to the setting of the monitoring folder setting data. It is to monitor whether or not. When the registered document data of the new confidential document is stored in the file server 3, the document storage monitoring unit 402 refers to the registered document data with reference to the file path of the stored registered document data of the confidential document and the monitoring folder setting data. The confidential category corresponding to the data is sent to the document registration request unit 403.

文書登録要求部４０３は、ファイルサーバ３に登録文書データが保存された場合、ファイルサーバ３から登録文書データを読み出し、読み出した登録文書データと、登録文書データのファイルパスと、登録文書データの機密カテゴリとを含む登録要求を、文書登録部１００の文書登録受付部１０１に送出するものである。 When the registered document data is stored in the file server 3, the document registration request unit 403 reads the registered document data from the file server 3, reads the registered document data, the file path of the registered document data, and the confidentiality of the registered document data. A registration request including a category is sent to the document registration receiving unit 101 of the document registration unit 100.

次に、文書登録部１００は、文書監視部４００から送出されてくる機密文書の登録文書データを登録するものであり、文書登録受付部１０１と、文書登録処理部１０２と、文書ピース分割登録部１０３と、類似文書検索ＤＢ登録部１０４と、文書分類ＤＢ登録部１０５とを主に備え、類似文書検索ＤＢ３００と、文書分類ＤＢ３１０と接続されている。 Next, the document registration unit 100 registers the registered document data of the confidential document transmitted from the document monitoring unit 400, and includes a document registration reception unit 101, a document registration processing unit 102, and a document piece division registration unit. 103, a similar document search DB registration unit 104, and a document classification DB registration unit 105, which are connected to the similar document search DB 300 and the document classification DB 310.

文書登録受付部１０１は、文書監視部４００の文書登録要求部４０３から、登録文書データのファイルパスと、登録文書データの機密カテゴリと、登録文書データとを含む登録要求を受け付けると、受け付けたファイルパスと機密カテゴリと登録文書データとを文書登録処理部１０２に送出するものである。 When the document registration receiving unit 101 receives a registration request including the file path of the registered document data, the confidential category of the registered document data, and the registered document data from the document registration request unit 403 of the document monitoring unit 400, the received file The path, confidential category, and registered document data are sent to the document registration processing unit 102.

文書登録処理部１０２は、文書登録受付部１０１から、ファイルパスと機密カテゴリと登録文書データとを受け取り、ファイルパスと機密カテゴリと登録文書データとを文書ピース分割登録部１０３に送出し、機密カテゴリと登録文書データとを文書分類ＤＢ登録部１０５に送出するものである。 The document registration processing unit 102 receives the file path, the confidential category, and the registered document data from the document registration receiving unit 101, and sends the file path, the confidential category, and the registered document data to the document piece division registration unit 103, and the confidential category And registered document data are sent to the document classification DB registration unit 105.

文書ピース分割登録部１０３は、文書登録処理部１０２から、ファイルパスと機密カテゴリと登録文書データとを受け取り、受け取った登録文書データを、予め規定された分割規則に基づいて複数の登録文書ピースに分割してピース番号を付与するものである。また、文書ピース分割登録部１０３は、ファイルパスと機密カテゴリとピース番号と登録文書ピースとを組にして類似文書検索ＤＢ登録部１０４に送出する。なお、登録文書データが画像データであった場合、該画像データに対してＯＣＲ処理を施すことでテキスト情報を取得し、取得したテキスト情報を登録文書ピースに分割して登録処理を行う。 The document piece division registration unit 103 receives the file path, the confidential category, and the registered document data from the document registration processing unit 102, and converts the received registration document data into a plurality of registered document pieces based on a predetermined division rule. Dividing and assigning piece numbers. Also, the document piece division registration unit 103 sends the file path, the confidential category, the piece number, and the registered document piece to the similar document search DB registration unit 104 as a set. If the registered document data is image data, text information is obtained by performing OCR processing on the image data, and the obtained text information is divided into registered document pieces to perform registration processing.

ここで、上述した「予め規定された分割規則」とは、例えば、５００文字ごとに分割するなど文字数によって分割する規則や、登録文書データの半ページを一つの登録文書ピースとして分割する規則や、パラグラフ単位で分割する規則、句点で区切って分割する規則などである。 Here, the above-mentioned “predetermined division rule” is, for example, a rule for dividing by 500 characters, such as dividing every 500 characters, a rule for dividing a half page of registered document data as one registered document piece, There are rules for dividing by paragraph, rules for dividing by paragraphs, etc.

類似文書検索ＤＢ登録部１０４は、文書ピース分割登録部１０３により分割された各登録文書ピースを構成する要素により、各登録文書ピースを特徴付ける特徴量（第１特徴量）を算出し、算出した特徴量をファイルパスと機密カテゴリとピース番号とに対応づけて、類似文書検索ＤＢ３００に格納するものである。これにより、登録文書ピース単位の類似検索を可能にする。この第１特徴量は、登録文書ピースが類似しているか否かの検索をするために使用される。なお、第１特徴量の算出は、各登録文書ピースを構成する要素に基づいて公知の手法で算出される。 The similar document search DB registration unit 104 calculates a feature amount (first feature amount) that characterizes each registered document piece by using the elements constituting each registered document piece divided by the document piece division registration unit 103, and calculates the calculated feature. The amount is stored in the similar document search DB 300 in association with the file path, the confidential category, and the piece number. As a result, a similarity search in units of registered document pieces is enabled. This first feature amount is used to search whether the registered document pieces are similar. The first feature amount is calculated by a known method based on the elements constituting each registered document piece.

文書分類ＤＢ登録部１０５は、文書登録処理部１０２から、機密カテゴリと登録文書データとを受け取り、登録文書データを構成する要素により、登録文書データを特徴付ける分類特徴量（第２特徴量）を算出し、算出した分類特徴量を機密カテゴリごとに新たな分類特徴量として、文書分類ＤＢ３１０に格納するものである。これにより、登録文書データの分類を可能にする。この第２特徴量は、登録文書データを分類するために使用される。なお、第２特徴量の算出は、各登録文書データを構成する要素に基づいて公知の手法で算出される。 The document classification DB registration unit 105 receives the confidential category and the registered document data from the document registration processing unit 102, and calculates a classification feature amount (second feature amount) that characterizes the registered document data based on elements constituting the registered document data. The calculated classification feature amount is stored in the document classification DB 310 as a new classification feature amount for each confidential category. This enables classification of registered document data. This second feature amount is used to classify the registered document data. The second feature amount is calculated by a known method based on elements constituting each registered document data.

類似文書検索ＤＢ３００は、算出された登録文書ピースの特徴量を、ファイルパスと機密カテゴリとピース番号と対応づけて登録するＨＤＤ（Hard Disk Drive）などの記録媒体である。 The similar document search DB 300 is a recording medium such as an HDD (Hard Disk Drive) that registers the calculated feature amount of a registered document piece in association with a file path, a confidential category, and a piece number.

文書分類ＤＢ３１０は、算出された登録文書データの分類特徴量を、機密カテゴリごとに登録するＨＤＤなどの記録媒体である。 The document classification DB 310 is a recording medium such as an HDD for registering the calculated classification feature amount of registered document data for each confidential category.

次に、解析文書データの文書解析処理の詳細について説明する。図４は、文書解析処理部の構成図である。文書解析処理部２００は、ＭＦＰ５などの外部装置から受信した解析文書データを解析し、解析結果に基づいてセキュリティポリシーに従った処理を行うものであり、文書解析受付部２０１と、文書解析処理部２０２と、文書ピース分割解析部２０３と、類似文書検索部２０４と、文書分類部２０５と、ポリシー処理部２０６と、ポリシー設定部２０７とを主に備えている。 Next, details of the document analysis processing of the analysis document data will be described. FIG. 4 is a configuration diagram of the document analysis processing unit. The document analysis processing unit 200 analyzes analysis document data received from an external device such as the MFP 5 and performs processing according to the security policy based on the analysis result. The document analysis reception unit 201 and the document analysis processing unit 202, a document piece division analysis unit 203, a similar document search unit 204, a document classification unit 205, a policy processing unit 206, and a policy setting unit 207.

文書解析受付部２０１は、上述したように、ＰＣ４からの指示により印刷等を行ったり、紙原稿Ｐをスキャン、ファクス送信、複写等を行った解析文書データと、利用者固有の識別情報であるユーザＩＤと、受信する解析文書データに対していずれの処理が施されたかを示す文書処理種別（例えば、スキャン処理等）とを、ＭＦＰ５からネットワークを介して受信することで、解析文書データの解析要求を受け付けるものである。そして、文書解析受付部２０１は、受信した解析文書データとユーザＩＤと文書処理種別とを文書解析処理部２０２に送出する。 As described above, the document analysis reception unit 201 is analysis document data obtained by performing printing or the like according to an instruction from the PC 4 or scanning, fax transmission, copying, or the like of the paper original P, and identification information unique to the user. The analysis of the analysis document data is performed by receiving the user ID and the document processing type (for example, scan processing) indicating which processing has been performed on the analysis document data to be received from the MFP 5 via the network. It accepts requests. Then, the document analysis reception unit 201 sends the received analysis document data, user ID, and document processing type to the document analysis processing unit 202.

ここで、解析文書データとは、例えば、文書で構成された文書データを示しているが、解析する文書データは電子文書ファイルに限定されることはない。すなわち、例えば、ＭＦＰ（デジタル複合機）で紙原稿をスキャンすることにより形成されたスキャン画像データに対して解析処理を行うように構成してもよい。 Here, the analysis document data indicates, for example, document data composed of documents, but the document data to be analyzed is not limited to an electronic document file. In other words, for example, the analysis processing may be performed on the scanned image data formed by scanning a paper document with an MFP (digital multifunction peripheral).

文書解析処理部２０２は、文書解析受付部２０１から解析文書データとユーザＩＤと文書処理種別とを受け取り、解析文書データを文書ピース分割解析部２０３に送出する。 The document analysis processing unit 202 receives the analysis document data, the user ID, and the document processing type from the document analysis reception unit 201 and sends the analysis document data to the document piece division analysis unit 203.

文書ピース分割解析部２０３は、文書解析処理部２０２から解析文書データを受け取ると、該解析文書データを、予め規定された分割規則に基づいて複数の解析文書ピースに分割して解析ピース番号を付与するものである。また、文書ピース分割解析部２０３は、解析ピース番号と解析文書ピースとを対応づけて類似文書検索部２０４に送出する。なお、解析文書データが画像データであった場合、該画像データに対してＯＣＲ処理を施すことでテキスト情報を取得し、取得したテキスト情報を解析文書ピースに分割して解析処理を行う。ここで、予め規定された分割規則とは、登録処理の場合と同様である。 Upon receiving the analysis document data from the document analysis processing unit 202, the document piece division analysis unit 203 divides the analysis document data into a plurality of analysis document pieces based on a predetermined division rule and assigns analysis piece numbers. To do. Further, the document piece division analysis unit 203 associates the analysis piece number and the analysis document piece and sends them to the similar document search unit 204. If the analysis document data is image data, text information is acquired by performing OCR processing on the image data, and the acquired text information is divided into analysis document pieces to perform analysis processing. Here, the division rule defined in advance is the same as in the registration process.

類似文書検索部２０４は、文書ピース分割解析部２０３から、対応付けられた解析ピース番号と解析文書ピースとを受け取り、受け取った解析文書ピースに類似する登録文書ピースが類似文書検索ＤＢ３００に登録されているか否かを検索するものである。すなわち、類似文書検索部２０４は、文書ピース分割解析部２０３により分割された解析文書ピースを特徴付ける特徴量（第３特徴量）を解析文書ピースの構成要素から算出し、算出した特徴量と類似文書検索ＤＢ３００に格納された登録文書ピースにおける特徴量とを比較する。そして、登録文書ピースのいずれかの特徴量のうち、解析文書ピースの特徴量と類似するものが存在した場合、解析文書ピースと該登録文書ピースとは類似していると判断し、登録文書ピースのいずれかの特徴量のうち、解析文書ピースの特徴量と類似するものが存在しなかった場合、解析文書ピースと該登録文書ピースとは類似していないと判断する。特徴量の類似判断は、比較対象の特徴量の数値の差が所定の範囲内である場合に、類似していると判断することができるが、これに限定されるものではない。 The similar document search unit 204 receives the analysis piece number and the analysis document piece associated with each other from the document piece division analysis unit 203, and a registered document piece similar to the received analysis document piece is registered in the similar document search DB 300. Whether or not there is a search. That is, the similar document search unit 204 calculates a feature amount (third feature amount) that characterizes the analysis document piece divided by the document piece division analysis unit 203 from the constituent elements of the analysis document piece, and the calculated feature amount and the similar document The feature amount in the registered document piece stored in the search DB 300 is compared. If any of the feature quantities of the registered document piece is similar to the feature quantity of the analysis document piece, it is determined that the analysis document piece and the registered document piece are similar, and the registered document piece If any of the feature quantities similar to the feature quantity of the analysis document piece does not exist, it is determined that the analysis document piece is not similar to the registered document piece. The similarity determination of the feature amount can be determined to be similar when the difference between the numerical values of the feature amounts to be compared is within a predetermined range, but is not limited thereto.

類似検索を行った結果、解析文書ピースと類似する登録文書ピースが検索された場合、類似文書検索部２０４は、その登録文書ピースに対応づけられているファイルパスとピース番号と機密カテゴリとを含めた検索結果を生成し、生成した検索結果を文書ピース分割解析部２０３に送出する。 When a registered document piece similar to the analysis document piece is searched as a result of the similarity search, the similar document search unit 204 includes the file path, the piece number, and the confidential category associated with the registered document piece. The search result is generated, and the generated search result is sent to the document piece division analysis unit 203.

図５は、類似文書検索部により生成された検索結果の一例を示す図である。図５に示すように、検索結果６０には、解析文書ピースと、解析文書ピースの解析ピース番号とが必ず含まれている。そして、類似する登録文書ピースの情報として、ファイルパス、ピース番号、機密カテゴリを含める。また、類似する登録文書ピースが複数存在した場合には、その全ての類似する登録文書ピースの情報を検索結果に含める。図５では、２つの類似する登録文書ピース（１）および（２）の情報が検索された場合を示している。 FIG. 5 is a diagram illustrating an example of a search result generated by the similar document search unit. As shown in FIG. 5, the search result 60 always includes an analysis document piece and an analysis piece number of the analysis document piece. Then, the file path, piece number, and confidential category are included as information on similar registered document pieces. If there are a plurality of similar registered document pieces, information on all the similar registered document pieces is included in the search result. FIG. 5 shows a case where information on two similar registered document pieces (1) and (2) is retrieved.

一方、類似検索を行った結果、解析文書ピースと類似する登録文書ピースが検索されなかった場合、類似文書検索部２０４は、類似する登録文書ピースの情報に関する部分には何も含まれずに空のままで、解析文書ピースとピース番号のみが含まれた検索結果を生成し、生成した検索結果を文書ピース分割解析部２０３に送出する。 On the other hand, if the registered document piece similar to the analysis document piece is not searched as a result of the similarity search, the similar document search unit 204 does not include anything in the information related to the similar registered document piece and is empty. The search result including only the analysis document piece and the piece number is generated, and the generated search result is sent to the document piece division analysis unit 203.

そして、上述した文書ピース分割解析部２０３は、さらに、類似文書検索部２０４から送出された、解析ピース番号が付与された全ての解析文書ピースについての検索結果を受け取ると、それらの検索結果を集積したピース解析結果を生成して、生成したピース解析結果を文書解析処理部２０２に送出する。図６は、文書ピース分割解析部により生成されたピース解析結果の一例を示す図である。図６に示すように、ピース解析結果６１には、類似文書検索部２０４から送出された全ての検索結果（図５参照）が含まれている。図６では、１〜Ｎまでの検索結果（１）（２）・・・（Ｎ）が含まれている。 Further, when the document piece division analysis unit 203 described above receives the search results for all the analysis document pieces to which the analysis piece numbers are given and sent from the similar document search unit 204, the search result is accumulated. The generated piece analysis result is generated, and the generated piece analysis result is sent to the document analysis processing unit 202. FIG. 6 is a diagram illustrating an example of a piece analysis result generated by the document piece division analysis unit. As shown in FIG. 6, the piece analysis result 61 includes all the search results (see FIG. 5) sent from the similar document search unit 204. In FIG. 6, search results (1), (2),... (N) from 1 to N are included.

そして、上述した文書解析処理部２０２は、さらに、文書ピース分割解析部２０３から送出されたピース解析結果を受け取ると、そのピース解析結果から類似する登録文書ピースが検索できなかった検索結果の解析ピース番号および解析文書ピースを取得し、取得した解析ピース番号および解析文書ピースを文書分類部２０５に送出する。 When the document analysis processing unit 202 further receives the piece analysis result sent from the document piece division analysis unit 203, the analysis piece of the search result in which a similar registered document piece could not be searched from the piece analysis result. The number and the analysis document piece are acquired, and the acquired analysis piece number and the analysis document piece are sent to the document classification unit 205.

文書分類部２０５は、文書解析処理部２０２から送出された、類似する登録文書ピースが検索できなかった検索結果の解析ピース番号および解析文書ピースを受け取り、受け取った解析文書ピースの特徴量に基づいて解析文書データをいずれかの機密カテゴリに分類するものである。すなわち、文書分類部２０５は、受け取った解析文書ピースの特徴量を算出し、算出した特徴量と文書分類ＤＢ３１０のいずれかの機密カテゴリ（機密カテゴリＡ、Ｂ、Ｃ）に格納された登録文書データの分類特徴量とを比較して、いずれの機密カテゴリに分類されるかを判断し、分類処理を行ったピース分類結果を生成する。そして、文書分類部２０５は、生成したピース分類結果を集積した分類結果を生成して、生成した分類結果を文書解析処理部２０２に送出する。 The document classifying unit 205 receives the analysis piece number and the analysis document piece of the search result sent out from the document analysis processing unit 202, which could not be searched for similar registered document pieces, and based on the received feature amount of the analysis document piece The analysis document data is classified into any confidential category. That is, the document classification unit 205 calculates the feature amount of the received analysis document piece, and the registered document data stored in the calculated feature amount and one of the confidential categories (confidential categories A, B, and C) of the document classification DB 310. The classified feature amount is compared to determine which classified category it is classified into, and a piece classification result obtained by performing the classification process is generated. Then, the document classification unit 205 generates a classification result obtained by accumulating the generated piece classification results, and sends the generated classification result to the document analysis processing unit 202.

図７は、文書分類部により生成されたピース分類結果の一例を示す図である。図７に示すように、ピース分類結果６２には、解析文書ピースと、解析文書ピースの解析ピース番号とが必ず含まれている。そして、文書分類部２０５により算出された解析文書ピースの特徴量に基づいて分類された全ての機密カテゴリが含まれている。図７では、解析文書ピースが２つの機密カテゴリ（１）および（２）に分類されていることを示している。 FIG. 7 is a diagram illustrating an example of the piece classification result generated by the document classification unit. As shown in FIG. 7, the piece classification result 62 always includes an analysis document piece and an analysis piece number of the analysis document piece. Then, all classified categories classified based on the feature amount of the analysis document piece calculated by the document classification unit 205 are included. FIG. 7 shows that the analysis document piece is classified into two confidential categories (1) and (2).

図８は、文書分類部により生成された分類結果の一例を示す図である。図８に示すように、分類結果６３には、文書分類部２０５により生成された全てのピース分類結果（図７参照）が含まれている。図８では、１〜Ｎまでのピース分類結果（１）（２）・・・（Ｎ）が含まれている。 FIG. 8 is a diagram illustrating an example of the classification result generated by the document classification unit. As shown in FIG. 8, the classification result 63 includes all the piece classification results (see FIG. 7) generated by the document classification unit 205. In FIG. 8, pieces classification results (1), (2),... (N) from 1 to N are included.

そして、上述した文書解析処理部２０２は、類似文書検索部２０４により、全ての解析文書ピースに類似する登録文書ピースが検索された場合、ユーザＩＤと、解析文書データと、文書処理種別と、類似文書検索部２０４により検索されたピース解析結果とを解析結果として、ポリシー処理部２０６に送出する。 Then, when the registered document piece similar to all the analyzed document pieces is searched by the similar document search unit 204, the document analysis processing unit 202 described above is similar to the user ID, the analysis document data, the document processing type, and the like. The piece analysis result searched by the document search unit 204 is sent to the policy processing unit 206 as an analysis result.

また、文書解析処理部２０２は、類似文書検索部２０４によって、少なくとも１つの解析文書ピースに類似する登録文書ピースが検索されなかった場合、ユーザＩＤと、解析文書データと、文書処理種別と、類似文書検索部２０４により検索されたピース解析結果と、文書分類部２０５により分類された分類結果とを解析結果として、ポリシー処理部２０６に送出する。 In addition, when the similar document search unit 204 does not search for a registered document piece that is similar to at least one analysis document piece, the document analysis processing unit 202 uses the user ID, the analysis document data, the document processing type, and the similarity. The piece analysis result searched by the document search unit 204 and the classification result classified by the document classification unit 205 are sent to the policy processing unit 206 as analysis results.

ポリシー処理部２０６は、セキュリティポリシーを取得し、取得したセキュリティポリシーを参照して、文書解析処理部２０２から受け取ったユーザＩＤ、解析文書データ、文書処理種別、ピース解析結果、分類結果に対して、機密カテゴリごとにそのセキュリティポリシーの設定内容に従った処理、すなわち、解析文書データをメールサーバ７に送出することによる電子メール送信処理や、解析文書データを画面ログサーバ６に送出することによる画像ログの記録処理等を行うものである。 The policy processing unit 206 acquires a security policy, refers to the acquired security policy, and with respect to the user ID, analysis document data, document processing type, piece analysis result, and classification result received from the document analysis processing unit 202, Processing according to the setting contents of the security policy for each confidential category, that is, e-mail transmission processing by sending analysis document data to the mail server 7, and image log by sending analysis document data to the screen log server 6 The recording process is performed.

ポリシー設定部２０７は、監視フォルダ設定部４０１により生成された監視フォルダ設定データを取得し、取得した監視フォルダ設定データに設定されている機密カテゴリごとに、セキュリティポリシーの設定画面を表示し、管理者Ｄによるポリシーの設定操作を受け付け、受け付けた設定操作に従ってセキュリティポリシーを生成するものである。図９は、ポリシー設定部により生成されたセキュリティポリシー設定画面の一例を示す図である。図１０は、設定されたセキュリティポリシーの構造を示す図である。図９に示すように、セキュリティポリシー設定画面７０には、機密カテゴリの表示欄７１と、解析文書データにいずれの処理が施されたかを示す文書処理種別の選択欄７２と、分類された解析文書データに対する処理を示すアクションの選択欄７３とが表示されている。また、図９では、セキュリティポリシー設定画面の下部に、設定済みのセキュリティポリシー７４が表示されている。 The policy setting unit 207 acquires the monitoring folder setting data generated by the monitoring folder setting unit 401, displays a security policy setting screen for each confidential category set in the acquired monitoring folder setting data, and A policy setting operation by D is received, and a security policy is generated according to the received setting operation. FIG. 9 is a diagram illustrating an example of a security policy setting screen generated by the policy setting unit. FIG. 10 is a diagram showing the structure of the set security policy. As shown in FIG. 9, the security policy setting screen 70 includes a confidential category display column 71, a document processing type selection column 72 indicating which processing has been performed on the analysis document data, and a classified analysis document. An action selection field 73 indicating processing for data is displayed. In FIG. 9, a set security policy 74 is displayed at the bottom of the security policy setting screen.

ここで、上記の文書解析処理では、文書解析を要求する外部装置として、ＭＦＰ５（デジタル複合機）が図示されているが、解析文書データはデジタル複合機で処理された文書に限定されることはない。すなわち、例えば、ファイルサーバのある特定の文書フォルダに配置される電子文書ファイルを監視しておき、それを解析文書データとしたり、メールサーバを通過する電子メール本文やメールへの添付ファイルを解析文書データとしてもよい。 Here, in the above document analysis processing, MFP 5 (digital multifunction peripheral) is illustrated as an external device that requests document analysis. However, the analysis document data is not limited to documents processed by the digital multifunction peripheral. Absent. That is, for example, an electronic document file placed in a specific document folder on a file server is monitored and used as analysis document data, or an e-mail body passing through a mail server or an attached file to an e-mail is analyzed. It may be data.

次に、以上のように構成された本実施の形態にかかる文書登録解析サーバ１による処理について説明する。まず、文書登録解析サーバ１における文書登録部１００による文書登録処理について説明する。図１１は、実施の形態における文書登録部による文書登録処理の手順を示すフローチャートである。 Next, processing by the document registration analysis server 1 according to the present embodiment configured as described above will be described. First, document registration processing by the document registration unit 100 in the document registration analysis server 1 will be described. FIG. 11 is a flowchart illustrating a procedure of document registration processing by the document registration unit according to the embodiment.

まず、文書登録受付部１０１は、文書監視部４００における文書登録要求部４０３から、登録文書データのファイルパスと、登録文書データの機密カテゴリと、登録文書データとを受け付けると（ステップＳ１０）、ファイルパスと機密カテゴリと登録文書データとを文書登録処理部１０２に送出する。 First, the document registration receiving unit 101 receives a file path of registered document data, a confidential category of registered document data, and registered document data from the document registration request unit 403 in the document monitoring unit 400 (step S10). The path, confidential category, and registered document data are sent to the document registration processing unit 102.

そして、文書登録処理部１０２は、文書登録受付部１０１からファイルパスと機密カテゴリと登録文書データとを受け取る（ステップＳ１１）。そして、文書登録処理部１０２は、文書ピース分割登録部１０３に、受け取ったファイルパスと機密カテゴリと登録文書データとを送出する。また、文書登録処理部１０２は、文書分類ＤＢ登録部１０４に機密カテゴリと登録文書データとを送出する。 Then, the document registration processing unit 102 receives the file path, confidential category, and registered document data from the document registration receiving unit 101 (step S11). Then, the document registration processing unit 102 sends the received file path, confidential category, and registered document data to the document piece division registration unit 103. In addition, the document registration processing unit 102 sends the confidential category and registered document data to the document classification DB registration unit 104.

次に、文書ピース分割登録部１０３は、文書登録処理部１０２からファイルパスと機密カテゴリと登録文書データとを受け取り、受け取った登録文書データを登録文書ピースに分割し、ピース番号を付与する（ステップＳ１２）。そして、文書ピース分割登録部１０３は、ファイルパスと機密カテゴリとピース番号と登録文書ピースとを組にして、類似文書検索ＤＢ登録部１０４に送出する。 Next, the document piece division registration unit 103 receives the file path, the confidential category, and the registered document data from the document registration processing unit 102, divides the received registration document data into registered document pieces, and gives a piece number (step). S12). Then, the document piece division registration unit 103 sends the file path, the confidential category, the piece number, and the registered document piece to the similar document search DB registration unit 104 as a set.

次に、類似文書検索ＤＢ登録部１０４は、文書ピース分割登録部１０３からファイルパスと機密カテゴリとピース番号と登録文書ピースとを受け取ると、各登録文書ピースの特徴量を算出し、算出した特徴量をファイルパスと機密カテゴリとピース番号とに対応づけて類似文書検索ＤＢ３００に格納する（ステップＳ１３）。 Next, when the similar document search DB registration unit 104 receives the file path, the confidential category, the piece number, and the registered document piece from the document piece division registration unit 103, the similar document search DB registration unit 104 calculates the feature amount of each registered document piece, and calculates the calculated feature. The amount is stored in the similar document search DB 300 in association with the file path, the confidential category, and the piece number (step S13).

一方、文書分類ＤＢ登録部１０５は、文書登録処理部１０２から、機密カテゴリと登録文書データとを受け取ると、登録文書データの分類特徴量を算出し、算出した分類特徴量を機密カテゴリごとに文書分類ＤＢ３１０に格納する（ステップＳ１４）。 On the other hand, when the document classification DB registration unit 105 receives the confidential category and the registered document data from the document registration processing unit 102, the document classification DB registration unit 105 calculates a classification feature amount of the registered document data, and calculates the calculated classification feature amount for each confidential category. Store in the classification DB 310 (step S14).

このように、文書登録部１００による文書登録処理では、登録文書データを分割した登録文書ピースの特徴量と、登録文書データの分類特徴量を格納する。これにより、文書ピース単位の類似検索、および登録文書データの分類を行うことができる。 As described above, in the document registration process by the document registration unit 100, the feature amount of the registered document piece obtained by dividing the registered document data and the classification feature amount of the registered document data are stored. As a result, it is possible to perform similarity search in units of document pieces and classification of registered document data.

次に、文書登録解析サーバ１における文書解析処理部２００による文書解析処理について説明する。図１２は、実施の形態における文書解析処理部による文書解析処理の手順を示すフローチャートである。 Next, document analysis processing by the document analysis processing unit 200 in the document registration analysis server 1 will be described. FIG. 12 is a flowchart illustrating a procedure of document analysis processing by the document analysis processing unit in the embodiment.

文書解析受付部２０１は、ＭＦＰ５からユーザＩＤと、解析文書データと、文書処理種別とを受信することで、解析文書データの解析要求を受け付ける（ステップＳ３０）。そして、文書解析受付部２０１は、受信したユーザＩＤと解析文書データと文書処理種別とを文書解析処理部２０２に送出する。 The document analysis receiving unit 201 receives the analysis request for the analysis document data by receiving the user ID, the analysis document data, and the document processing type from the MFP 5 (step S30). Then, the document analysis reception unit 201 sends the received user ID, analysis document data, and document processing type to the document analysis processing unit 202.

文書解析処理部２０２は、文書解析受付部２０１から解析文書データとユーザＩＤと文書処理種別とを受け取ると（ステップＳ３１）、解析文書データを文書ピース分割解析部２０３に送出する。 When the document analysis processing unit 202 receives the analysis document data, the user ID, and the document processing type from the document analysis reception unit 201 (step S31), the document analysis processing unit 202 sends the analysis document data to the document piece division analysis unit 203.

文書ピース分割解析部２０３は、文書解析処理部２０２から解析文書データを受け取ると、解析文書データを分割し、類似文書検索部２０４が解析処理を行う（ステップＳ３２）。すなわち、文書ピース分割解析部２０３は、文書解析処理部２０２から受け取った解析文書データを分割して解析ピース番号を付与し、解析ピース番号と解析文書ピースとを対応づけて類似文書検索部２０４に送出する。そして、類似文書検索部２０４は、文書ピース分割解析部２０３から、対応付けられた解析ピース番号と解析文書ピースとを受け取り、受け取った解析文書ピースに類似する登録文書ピースが類似文書検索ＤＢ３００に登録されているか否かを検索する。そして、類似文書検索部２０４が検索結果を生成し、文書ピース分割解析部２０３が全ての検索結果を集積したピース解析結果を生成し、文書解析処理部２０２に送出する。 When the document piece division analysis unit 203 receives the analysis document data from the document analysis processing unit 202, the document piece division analysis unit 203 divides the analysis document data, and the similar document search unit 204 performs analysis processing (step S32). That is, the document piece division analysis unit 203 divides the analysis document data received from the document analysis processing unit 202 and assigns the analysis piece number, and associates the analysis piece number with the analysis document piece to the similar document search unit 204. Send it out. Then, the similar document search unit 204 receives the associated analysis piece number and the analysis document piece from the document piece division analysis unit 203, and the registered document piece similar to the received analysis document piece is registered in the similar document search DB 300. Search whether or not. Then, the similar document search unit 204 generates a search result, and the document piece division analysis unit 203 generates a piece analysis result obtained by accumulating all the search results, and sends it to the document analysis processing unit 202.

文書解析処理部２０２は、ピース解析結果に含まれる各検索結果について類似する登録文書ピースが全て検索されたか否かを判断する（ステップＳ３３）。ピース解析結果に類似する登録文書ピースが全て検索された場合（ステップＳ３３：Ｙｅｓ）、文書解析処理部２０２は、ユーザＩＤと、解析文書データと、文書処理種別と、類似文書検索部２０４により検索されたピース解析結果とを解析結果として、ポリシー処理部２０６に送出する（ステップＳ３６）。 The document analysis processing unit 202 determines whether or not all registered document pieces similar to each search result included in the piece analysis result have been searched (step S33). When all registered document pieces similar to the piece analysis result are searched (step S33: Yes), the document analysis processing unit 202 searches the user ID, the analysis document data, the document processing type, and the similar document search unit 204. The piece analysis result thus sent is sent as an analysis result to the policy processing unit 206 (step S36).

一方、ピース解析結果に類似する登録文書ピースが全て検索されなかった場合、すなわち少なくとも一つのピース解析結果に類似する登録文書ピースが検索されなかった場合（ステップＳ３３：Ｎｏ）、文書解析処理部２０２は、ピース解析結果から類似する登録文書ピースが検索されなかった検索結果の解析ピース番号および解析文書ピースを取得する（ステップＳ３４）。そして、文書解析処理部２０２は、取得した解析ピース番号および解析文書ピースを文書分類部２０５に送出する。 On the other hand, when all registered document pieces similar to the piece analysis result have not been searched, that is, when no registered document pieces similar to at least one piece analysis result have been searched (step S33: No), the document analysis processing unit 202 Acquires the analysis piece number and the analysis document piece of the search result from which the similar registered document piece was not searched from the piece analysis result (step S34). Then, the document analysis processing unit 202 sends the acquired analysis piece number and analysis document piece to the document classification unit 205.

文書分類部２０５は、文書解析処理部２０２から解析ピース番号および解析文書ピースを受け取ると、受け取った解析文書ピースの文書分類をする（ステップＳ３５）。すなわち、文書分類部２０５は、受け取った解析文書ピースの特徴量を算出し、算出した特徴量によりいずれの機密カテゴリに分類されるかを判断してピース分類結果を生成する。そして、文書分類部２０５は、生成したピース分類結果を集積して分類結果を生成し、生成した分類結果を文書解析処理部２０２に送出する。 Upon receiving the analysis piece number and the analysis document piece from the document analysis processing unit 202, the document classification unit 205 classifies the received analysis document piece (step S35). That is, the document classification unit 205 calculates a feature amount of the received analysis document piece, determines which classified category is classified according to the calculated feature amount, and generates a piece classification result. Then, the document classification unit 205 accumulates the generated piece classification results to generate a classification result, and sends the generated classification result to the document analysis processing unit 202.

そして、文書解析処理部２０２は、ユーザＩＤと、解析文書データと、文書処理種別と、類似文書検索部２０４により検索されたピース解析結果と、文書分類部２０５により分類された分類結果とを解析結果として、ポリシー処理部２０６に送出する（ステップＳ３６）。 Then, the document analysis processing unit 202 analyzes the user ID, the analysis document data, the document processing type, the piece analysis result searched by the similar document search unit 204, and the classification result classified by the document classification unit 205. As a result, it is sent to the policy processing unit 206 (step S36).

そして、ポリシー処理部２０６は、セキュリティポリシーを参照して、文書解析処理部２０２から受け取ったユーザＩＤ、解析文書データ、文書処理種別、ピース解析結果、分類結果に対して、機密カテゴリごとにそのセキュリティポリシーの設定内容に従った処理を行う。 Then, the policy processing unit 206 refers to the security policy, and with respect to the user ID, analysis document data, document processing type, piece analysis result, and classification result received from the document analysis processing unit 202, the security processing unit 206 Process according to the policy settings.

このように、文書解析処理部２００による文書解析処理では、解析文書データを分割した解析文書ピースの特徴量により、解析文書ピースに類似する登録文書ピースを検索する。そして、解析文書ピースに類似する登録文書ピースが検索されなかった場合は、解析文書データをいずれかの機密カテゴリに分類することができる。 As described above, in the document analysis process performed by the document analysis processing unit 200, a registered document piece similar to the analysis document piece is searched based on the feature amount of the analysis document piece obtained by dividing the analysis document data. If no registered document piece similar to the analyzed document piece is found, the analyzed document data can be classified into any confidential category.

このように、本実施の形態にかかる文書登録解析サーバ１では、予め文書ピース単位で登録文書ピースの特徴量を類似文書検索ＤＢに格納し、さらに文書全体（登録文書データ）で登録文書データの分類特徴量を文書分類ＤＢに格納しておくことで、解析文書ピース単位での類似検索が可能となり、さらに解析文書データの機密カテゴリの分類が可能となっている。つまり、機密情報などの解析文書データを分割することで、解析文書データの類似度を判定する単位をより細かくして、解析文書ピースがいずれかの登録文書ピースと類似するかを判定できる。また、解析文書ピースが、登録されているいずれの登録文書ピースにも類似しないと判定された場合には、解析文書データをいずれかの機密カテゴリに分類できる。従って、解析文書データが機密情報であるか否かの判定をより正確に行うことができる。また、機密情報と判定された情報に対してはセキュリティポリシーに従った処理を行うことができるため、機密情報の漏洩を防止し、利便性を向上させることができる。 As described above, in the document registration analysis server 1 according to the present embodiment, the feature amount of the registered document piece is stored in advance in the similar document search DB for each document piece, and the registered document data of the entire document (registered document data) is stored. By storing the classification feature quantity in the document classification DB, it is possible to perform a similar search in units of analysis document pieces and further classify the confidential categories of analysis document data. That is, by dividing the analysis document data such as confidential information, the unit for determining the similarity of the analysis document data can be made finer, and it can be determined whether the analysis document piece is similar to any registered document piece. If it is determined that the analysis document piece is not similar to any registered document piece, the analysis document data can be classified into any confidential category. Therefore, it is possible to more accurately determine whether the analysis document data is confidential information. In addition, since information determined as confidential information can be processed in accordance with the security policy, leakage of confidential information can be prevented and convenience can be improved.

本実施の形態では、複数のファイルサーバの文書フォルダを監視する場合などを考えて、文書監視部と文書登録部とを分けて構成する例を示したがこれに限定されることはない。すなわち、文書監視部と文書登録部を統合して一つの構成部としてもよい。 In the present embodiment, an example in which the document monitoring unit and the document registration unit are configured separately is described in consideration of the case of monitoring document folders of a plurality of file servers. However, the present invention is not limited to this. That is, the document monitoring unit and the document registration unit may be integrated into one component unit.

図１３は、本実施の形態の文書登録解析サーバのハードウェア構成を示す図である。本実施の形態の文書登録解析サーバ１は、ＣＰＵ（Central Processing Unit）５００１などの制御装置と、ＲＯＭ（Read Only Memory）５００２やＲＡＭ（Random Access Memory）５００３などの記憶装置と、ＨＤＤやＣＤドライブ装置などの外部記憶装置５００４と、ディスプレイ装置などの表示装置５００５と、キーボードやマウスなどの入力装置５００６と、通信Ｉ／Ｆ５００７と、これらを接続するバス５００８とを備えており、通常のコンピュータを利用したハードウェア構成となっている。 FIG. 13 is a diagram illustrating a hardware configuration of the document registration analysis server according to the present embodiment. The document registration analysis server 1 according to the present embodiment includes a control device such as a CPU (Central Processing Unit) 5001, a storage device such as a ROM (Read Only Memory) 5002 and a RAM (Random Access Memory) 5003, an HDD and a CD drive. An external storage device 5004 such as a device, a display device 5005 such as a display device, an input device 5006 such as a keyboard and a mouse, a communication I / F 5007, and a bus 5008 for connecting them are provided. The hardware configuration is used.

本実施の形態の文書登録解析サーバ１で実行される文書登録解析プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The document registration analysis program executed by the document registration analysis server 1 of the present embodiment is a file in an installable format or an executable format, and is a CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile). Disk) and the like are provided by being recorded on a computer-readable recording medium.

また、本実施の形態の文書登録解析サーバ１で実行される文書登録解析プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施の形態の文書登録解析サーバ１で実行される文書登録解析プログラムをインターネット等のネットワーク経由で提供または配布するように構成しても良い。 Further, the document registration analysis program executed by the document registration analysis server 1 of the present embodiment is stored on a computer connected to a network such as the Internet and is provided by being downloaded via the network. Also good. Further, the document registration analysis program executed by the document registration analysis server 1 of the present embodiment may be provided or distributed via a network such as the Internet.

また、本実施の形態の文書登録解析プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 Further, the document registration analysis program according to the present embodiment may be provided by being incorporated in advance in a ROM or the like.

本実施の形態の文書登録解析サーバ１で実行される文書登録解析プログラムは、上述した各部（文書登録受付部１０１、文書登録処理部１０２、文書ピース分割登録部１０３、類似文書検索ＤＢ登録部１０４、文書分類ＤＢ登録部１０５、文書解析受付部２０１、文書解析処理部２０２、文書ピース分割解析部２０３、類似文書検索部２０４、文書分類部２０５、ポリシー処理部２０６、ポリシー設定部２０７）を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ（プロセッサ）が上記記憶媒体から文書登録解析プログラムを読み出して実行することにより上記各部が主記憶装置上にロードされ、文書登録受付部１０１、文書登録処理部１０２、文書ピース分割登録部１０３、類似文書検索ＤＢ登録部１０４、文書分類ＤＢ登録部１０５、文書解析受付部２０１、文書解析処理部２０２、文書ピース分割解析部２０３、類似文書検索部２０４、文書分類部２０５、ポリシー処理部２０６、ポリシー設定部２０７が主記憶装置上に生成されるようになっている。 The document registration analysis program executed by the document registration analysis server 1 according to the present embodiment includes the above-described units (document registration receiving unit 101, document registration processing unit 102, document piece division registration unit 103, similar document search DB registration unit 104). , Document classification DB registration unit 105, document analysis reception unit 201, document analysis processing unit 202, document piece division analysis unit 203, similar document search unit 204, document classification unit 205, policy processing unit 206, and policy setting unit 207). As the actual hardware, the CPU (processor) reads and executes the document registration analysis program from the storage medium, and the above-described units are loaded on the main storage device. Document registration processing unit 102, document piece division registration unit 103, similar document search DB registration unit 104, document classification The B registration unit 105, the document analysis reception unit 201, the document analysis processing unit 202, the document piece division analysis unit 203, the similar document search unit 204, the document classification unit 205, the policy processing unit 206, and the policy setting unit 207 are provided on the main storage device. It is to be generated.

実施の形態にかかる文書登録解析サーバおよび関連周辺機器の全体構成を示す図である。It is a figure which shows the whole structure of the document registration analysis server concerning an embodiment, and a related peripheral device. 文書監視部および文書登録部の構成図である。It is a block diagram of a document monitoring part and a document registration part. 監視フォルダ設定画面の一例を示す図である。It is a figure which shows an example of the monitoring folder setting screen. 文書解析処理部の構成図である。It is a block diagram of a document analysis processing part. 類似文書検索部により生成された検索結果の一例を示す図である。It is a figure which shows an example of the search result produced | generated by the similar document search part. 文書ピース分割解析部により生成されたピース解析結果の一例を示す図である。It is a figure which shows an example of the piece analysis result produced | generated by the document piece division | segmentation analysis part. 文書分類部により生成されたピース分類結果の一例を示す図である。It is a figure which shows an example of the piece classification | category result produced | generated by the document classification | category part. 文書分類部により生成された分類結果の一例を示す図である。It is a figure which shows an example of the classification result produced | generated by the document classification | category part. ポリシー設定部により生成されたセキュリティポリシー設定画面の一例を示す図である。It is a figure which shows an example of the security policy setting screen produced | generated by the policy setting part. 設定されたセキュリティポリシーの構造を示す図である。It is a figure which shows the structure of the set security policy. 実施の形態における文書登録部による文書登録処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the document registration process by the document registration part in embodiment. 実施の形態における文書解析処理部による文書解析処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the document analysis process by the document analysis process part in embodiment. 本実施の形態の文書登録解析サーバのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the document registration analysis server of this Embodiment. 従来技術における膨大なテキスト情報を自動分類する製品の説明図である。It is explanatory drawing of the product which classifies a vast amount of text information in a prior art automatically.

Explanation of symbols

１文書登録解析サーバ
２画像ログＤＢ
３ファイルサーバ
４ＰＣ
５ＭＦＰ
１００文書登録部
１０１文書登録受付部
１０２文書登録処理部
１０３文書ピース分割登録部
１０４類似文書検索ＤＢ登録部
１０５文書分類ＤＢ登録部
２００文書解析処理部
２０１文書解析受付部
２０２文書解析処理部
２０３文書ピース分割解析部
２０４類似文書検索部
２０５文書分類部
２０６ポリシー処理部
２０７ポリシー設定部
３００類似文書検索ＤＢ
３１０文書分類ＤＢ
４００文書監視部
４０１監視フォルダ設定部
４０２文書保存監視部
４０３文書登録要求部
Ａ、Ｂ、Ｃ機密カテゴリ
Ｄ管理者
Ｅ、Ｆ一般利用者

1 Document registration analysis server 2 Image log DB
3 File server 4 PC
5 MFP
DESCRIPTION OF SYMBOLS 100 Document registration part 101 Document registration reception part 102 Document registration process part 103 Document piece division | segmentation registration part 104 Similar document search DB registration part 105 Document classification DB registration part 200 Document analysis process part 201 Document analysis reception part 202 Document analysis process part 203 Document Piece division analysis unit 204 Similar document search unit 205 Document classification unit 206 Policy processing unit 207 Policy setting unit 300 Similar document search DB
310 Document classification DB
400 Document monitoring unit 401 Monitoring folder setting unit 402 Document storage monitoring unit 403 Document registration request unit A, B, C Confidential category D Administrator E, F General user

Claims

In the information analysis processing method executed by the information analysis processing device,
The information analysis processing device includes:
A similar information search storage unit that stores a first feature quantity that is a feature quantity of partial information obtained by dividing registration target information that is registration target information;
An information classification storage unit that stores, for each attribute information, a second feature amount that is a feature amount of the registration target information,
An information analysis accepting step for receiving an analysis request for the analysis target information by receiving analysis target information that is information to be analyzed from an external device;
An information division analysis unit that divides the analysis target information into partial analysis information that is a part of the analysis target information; and
The similar information search unit calculates a third feature amount of the partial analysis information based on an element constituting the partial analysis information, and the calculated third feature amount and the similar information search storage unit stored in the third information amount A similar information search step for searching for the partial information similar to the partial analysis information based on the first feature amount;
When the information classification unit determines that the partial analysis information is not similar to any of the partial information, the third feature amount of the partial analysis information and the second feature amount stored in the information classification storage unit Based on the information classification step of classifying the analysis target information into any of the attribute information stored in the information classification storage unit,
When it is determined that all the partial analysis information is similar to any of the partial information, the information analysis processing unit outputs the analysis target information and the search result by the similar information search step as an analysis result, and at least one When it is determined that the two pieces of partial analysis information are not similar to any of the partial information , an analysis result output step of outputting the analysis target information, the search result, and a classification result by the information classification step as an analysis result;
An information analysis processing method comprising:

A policy processing step in which a policy processing unit performs processing based on a security policy that defines processing to be executed on the analysis target information in association with the attribute information, on the analysis target information that has received the analysis result; The information analysis processing method according to claim 1, further comprising:

The information analysis processing method according to claim 1, wherein the analysis target information is information processed by an image forming apparatus.

In the information registration analysis processing method executed by the information registration analysis processing device,
The information registration analysis processing device includes:
A similar information search storage unit that stores a first feature quantity that is a feature quantity of partial information obtained by dividing registration target information that is registration target information;
An information classification storage unit that stores, for each attribute information, a second feature amount that is a feature amount of the registration target information,
Information registration receiving unit, and the information registration receiving step of receiving a including registration request and the attribute information of the registration target data and the registration target data,
When the information division registration unit accepts the registration request, a registration target information division step for dividing the registration target information into the partial information;
A partial information storage step in which a similar information registration unit calculates the first feature amount based on an element constituting the partial information, and stores the calculated first feature amount in the similar information search storage unit;
An information classification registration unit calculates the second feature quantity based on elements constituting the registration target information, and stores the calculated second feature quantity in the information classification storage unit according to the classification of the attribute information A classification storage step;
An information analysis accepting step for receiving an analysis request for the analysis target information by receiving analysis target information that is information to be analyzed from an external device;
An information division analysis unit that divides the analysis target information into partial analysis information that is a part of the analysis target information; and
The similar information search unit calculates a third feature amount of the partial analysis information based on an element constituting the partial analysis information, and the calculated third feature amount and the similar information search storage unit stored in the third information amount A similar information search step for searching for the partial information similar to the partial analysis information based on the first feature amount;
When the information classification unit determines that the partial analysis information is not similar to any of the partial information, the third feature amount of the partial analysis information and the second feature amount stored in the information classification storage unit Based on the information classification step of classifying the analysis target information into any of the attribute information stored in the information classification storage unit,
When it is determined that all the partial analysis information is similar to any of the partial information, the information analysis processing unit outputs the analysis target information and the search result by the similar information search step as an analysis result, and at least one When it is determined that the two pieces of partial analysis information are not similar to any of the partial information , an analysis result output step of outputting the analysis target information, the search result, and a classification result by the information classification step as an analysis result;
An information registration analysis processing method characterized by comprising:

An information registration sending step of sending image data formed by the image forming apparatus to the information registration receiving unit as the registration target information;
The information registration receiving step, the information registration receiving unit, information registration according to said image data and the attribute information as the registered information in the claim 4, characterized in that accepting including pre Symbol registration request Analysis processing method.

An information analysis sending step of sending image data formed by the image forming apparatus to the information analysis receiving unit as the analysis target information;
5. The information registration analysis processing method according to claim 4 , wherein in the information analysis reception step, the information analysis reception unit receives the image data as the analysis target information from the external device.

The program which makes a computer perform the method of Claims 1-6 .

A similar information search storage unit that stores a first feature amount of partial information obtained by dividing registration target information that is registration target information;
An information classification storage unit that stores the second feature amount of the registration target information for each attribute information;
An information analysis receiving unit that receives an analysis request for the analysis target information by receiving analysis target information that is information to be analyzed from an external device;
An information division analysis unit that divides the analysis target information into partial analysis information that is a part of the analysis target information;
The partial information similar to the partial analysis information is searched based on the third feature value calculated by the elements constituting the partial analysis information and the first feature value stored in the similar information search storage unit. Similar information search unit,
When it is determined that the partial analysis information is not similar to any of the partial information, based on the third feature amount of the partial analysis information and the second feature amount stored in the information classification storage unit, An information classification unit for classifying the analysis target information into any of the attribute information stored in the information classification storage unit;
When it is determined that all the partial analysis information is similar to any of the partial information, the analysis target information and the search result by the similar information search step are output as an analysis result, and at least one partial analysis information is When it is determined that the partial information is not similar to any of the partial information , an information analysis processing unit that outputs the analysis target information, the search result, and a classification result by the information classification step as an analysis result;
An information analysis processing apparatus comprising: