JP2006279545A

JP2006279545A - Information processor, information processing method, and program therefor

Info

Publication number: JP2006279545A
Application number: JP2005095619A
Authority: JP
Inventors: Eiichiro Toshima; 英一朗戸島
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-03-29
Filing date: 2005-03-29
Publication date: 2006-10-12

Abstract

PROBLEM TO BE SOLVED: To provide an information processor that can suitably restrain copying processing and FAX processing of various kinds of documents including general documents generated and used in an office, an information processing method and a program therefor. SOLUTION: A scanner 16 reads the document and outputs image data. A copying machine 1 specifies a character carrying portion in the image data outputted by the scanner, performs character recognition processing to generate 1st character information on the basis of recognized character data, and compares the 1st character information with 2nd character information as information on characters included in a document to be controlled that is referred to from document management index data 18b, thereby deciding whether the document and the document to be controlled have similar document contents. Then the copying machine 1 specifies a document to be controlled which have document contents similar to that of the document in accordance with the decision and refers to control information of the document to be controlled specified from the document management index data 18b to decide whether document processing in response to a request to process the document is performed. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、原稿をスキャンした後にコピー処理、ＦＡＸ処理、及びＳＥＮＤ処理（電子メールでの送信処理）等の処理を行うことが可能な情報処理装置、情報処理方法及びそのプログラムに関するものである。 The present invention relates to an information processing apparatus, an information processing method, and a program thereof capable of performing processing such as copy processing, FAX processing, and SEND processing (e-mail transmission processing) after scanning a document.

近年、複写機は、単に原稿をコピー（複写）する機能だけでなく、多機能化がすすんでいる。例えば、ＰＣ（パーソナルコンピュータ）と接続して、ＰＣからの印刷指示に応じて印刷を行うプリンタ機能、原稿をスキャンしてＦＡＸ転送するＦＡＸ機能、及び原稿をスキャンしてメール転送するＳＥＮＤ機能などを備える複写機が提供されている。このように多機能な複写機は、ドキュメントのデジタル化の流れの中で、文書管理と連携したドキュメント・ソリューションを実現するポータルとして位置付けられている。 In recent years, copying machines have become more multifunctional than just the function of copying (copying) originals. For example, a printer function that connects to a PC (Personal Computer) and performs printing in response to a print instruction from the PC, a FAX function that scans and faxes a document, and a SEND function that scans and mails a document. A copier is provided. Such a multifunctional copying machine is positioned as a portal for realizing a document solution linked with document management in the flow of digitization of documents.

更に、複写機においてはセキュリティに対する考慮が重視されるようになっており、スキャン情報の漏洩防止技術としてコピー抑制機能などが提案されている。 Furthermore, security considerations have become important in copying machines, and a copy suppression function has been proposed as a technique for preventing leakage of scan information.

コピー抑制機能については、これまでにも様々な機能が提案され、いくつかは複写機に実装されてきた。例えば、商品券、有価証券などの特定原稿のコピー抑制を目的として、特定原稿に対して付加情報をいれるなど原稿を加工して出力する技術が提案されている（例えば、特許文献１を参照）。この特許文献１においては、特定原稿であるかどうか判定する手段として入力画像を特定原稿画像の画像データ同士を比較して類似度を求めることで、コピー抑制を行っている。 Various copy suppression functions have been proposed so far, and some have been implemented in copiers. For example, for the purpose of suppressing the copy of a specific manuscript such as a gift certificate or a securities, a technique for processing and outputting a manuscript such as adding additional information to the specific manuscript has been proposed (see, for example, Patent Document 1). . In Patent Document 1, as a means for determining whether or not a document is a specific document, copy suppression is performed by calculating the similarity by comparing the image data of the specific document image with the input image.

また、機密性が高い文書の処理に対してオペレータに注意を促す技術が開示されている（例えば、特許文献２を参照）。この特許文献２において、機密性が高いかどうかは、原稿画像を文字認識して文字化した上で「極秘」などの特定の記号列が含まれているかどうかで判断している。 In addition, a technique for alerting an operator to processing a highly confidential document is disclosed (for example, see Patent Document 2). In Patent Document 2, whether or not confidentiality is high is determined based on whether or not a specific symbol string such as “top secret” is included after character recognition of the original image.

また、紙に予めバーコードなどで著作物名称を付与しておき、ホストコンピュータから著作権情報を入手しコピーを続行するかどうかオペレータに問い合わせる技術が開示されている（例えば、特許文献３を参照）。 In addition, a technique is disclosed in which a copyrighted work name is given to paper in advance using a barcode or the like, and copyright information is obtained from a host computer and an operator is inquired whether to continue copying (for example, see Patent Document 3). ).

他にも、複写抑制画像が読み取り画像に含まれる場合に複写出力を抑制する装置において、その判定を外部装置で行う技術が開示されている（例えば、特許文献４を参照）。 In addition, in a device that suppresses copy output when a copy-suppressed image is included in a read image, a technique is disclosed in which the determination is performed by an external device (see, for example, Patent Document 4).

特開平５−１１０８１５号公報Japanese Patent Laid-Open No. 5-110815 特開２００１−２６６１１２号公報JP 2001-266112 A 特開平７−１２９２７０号公報JP 7-129270 A 特開平５−２４４４１２号公報Japanese Patent Application Laid-Open No. 5-244412

しかし、特許文献１のような画像データを直接比較する技術は、紙幣など画像パターンが定まっている場合のコピー抑制に効果が期待できるが、一般ドキュメントのように文章内容が重要である場合には効果は薄い。例えば、単語の並びを変更しただけで容易にこのコピー抑制機能を回避することができてしまう。 However, the technique of directly comparing image data as in Patent Document 1 can be expected to be effective for copy suppression when an image pattern such as banknotes is fixed, but when the text content is important as in a general document. The effect is thin. For example, the copy suppression function can be easily avoided simply by changing the word sequence.

また、特許文献２は、予め機密文書に特定の文字列を文章中、あるいは表題に付加するなどの運用が必要になり、その様な特定文字列を含まない一般ドキュメントのコピー抑制を行うことができない。 Patent Document 2 requires operations such as adding a specific character string to a confidential document in a sentence or the title in advance, and can suppress copying of a general document that does not include such a specific character string. Can not.

また、特許文献３の方式では、バーコードなどの付加情報を印刷時に予め付与しておく必要があり、例えば、紙文書配布後にコピー管理が必要になった場合など、臨機応変にセキュリティ管理することはできない。 In addition, in the method of Patent Document 3, additional information such as a barcode needs to be added in advance at the time of printing. For example, when copy management is necessary after paper document distribution, security management can be performed flexibly. I can't.

また、特許文献４の装置においても、複写抑制かどうかの判断は画像データの比較で行っているので、文章内容の類似する文書のコピー抑制については効果が薄い。 Also in the apparatus of Patent Document 4, since it is determined whether or not to suppress copying by comparing image data, the effect of copying suppression of documents having similar text contents is small.

このように、上記従来技術では、文書にバーコードや所定のキーワードが付与された原稿が対象であったり、紙幣のような統制のとれた図柄を含む原稿が対象であったりして、オフィスで作成や利用されている一般的なドキュメントのコピー抑制に対しては、必ずしも有効ではないという問題がある。すなわち、画像データの直接比較や、固定キーワードの比較や、バーコードなどの付与情報に基づくコピー抑制以外の方法によるコピー抑制が必要である。 As described above, in the above-described prior art, a document in which a barcode or a predetermined keyword is given to a document is a target, or a document including a controlled pattern such as a banknote is a target. There is a problem that it is not always effective for suppressing copy of a general document being created or used. That is, it is necessary to perform copy suppression by a method other than direct comparison of image data, comparison of fixed keywords, and copy suppression based on attached information such as a barcode.

本発明は、上述した事情を考慮してなされたもので、オフィスで作成及び利用されている一般文書を含む種々の種類の原稿に対して複写処理やＦＡＸ処理の適切な抑制を行うことが可能な情報処理装置、情報処理方法及びそのプログラムを提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and can appropriately suppress copying processing and FAX processing for various types of manuscripts including general documents created and used in an office. An information processing apparatus, an information processing method, and a program therefor are provided.

この発明は、上述した課題を解決すべくなされたもので、本発明による情報処理装置においては、入力手段により入力された処理要求に応じて原稿を読み取り、該読取られた原稿をイメージデータとして出力する読取手段と、読取手段が出力するイメージデータの中から文字掲載部分を特定して文字認識処理を行い、認識した文字データを基に第１の文字情報を出力する文字情報抽出手段と、文書処理を制御する対象となる制御対象文書に関する情報として、制御対象文書に含まれる文字に関する情報である第２の文字情報と、制御対象文書に対して一つまたは複数種類ある文書処理の内のどの処理を制御するかを定める制御情報とを少なくとも格納する情報格納手段と、文字情報抽出手段が出力する第１の文字情報と、情報格納手段から参照する制御対象文書の第２の文字情報とを基に、原稿と制御対象文書の文書内容が類似しているか否かを判断する類似判断手段と、類似判断手段の判断に応じて原稿と文書内容が類似する制御対象文書を特定し、情報格納手段から特定した制御対象文書の制御情報を参照して、原稿に対する処理要求に応じた文書処理の可否を判断する処理判断手段とを具備することを特徴とする。 The present invention has been made to solve the above-described problems. In the information processing apparatus according to the present invention, a document is read in response to a processing request input by the input means, and the read document is output as image data. A character information extracting means for identifying a character insertion portion from image data output by the reading means, performing character recognition processing, and outputting first character information based on the recognized character data, and a document As information about the control target document that is the target of control processing, the second character information that is information about the characters included in the control target document, and any one of one or more types of document processing for the control target document Information storage means for storing at least control information for determining whether to control processing, first character information output from the character information extraction means, and information storage means Similarity determination means for determining whether or not the document content of the document and the control target document is similar based on the second character information of the control target document to be processed, and the document and document content according to the determination of the similarity determination means A process determination unit that identifies similar control target documents, refers to the control information of the control target document specified from the information storage unit, and determines whether or not document processing is possible in response to a processing request for a document. Features.

また、本発明による情報処理方法においては、入力手段により入力された処理要求に応じて原稿を読み取り、該読取られた原稿をイメージデータとして出力する読取ステップと、読取ステップで出力するイメージデータの中から文字掲載部分を特定して文字認識処理を行い、認識した文字データを基に第１の文字情報を出力する文字情報抽出ステップと、文書処理を制御する対象となる制御対象文書に関する情報として、制御対象文書に含まれる文字に関する情報である第２の文字情報と、制御対象文書に対して一つまたは複数種類ある文書処理の内のどの処理を制御するかを定める制御情報とを少なくとも格納する情報格納手段から、制御対象文書の第２の文字情報を参照し、記文字情報抽出ステップで出力する第１の文字情報と比較することで、原稿と制御対象文書の文書内容が類似しているか否かを判断する類似判断ステップと、類似判断ステップの判断に応じて原稿と文書内容が類似する制御対象文書を特定し、情報格納手段から特定した制御対象文書の制御情報を参照して、原稿に対する処理要求に応じた文書処理の可否を判断する処理判断ステップとを有することを特徴とする。 In the information processing method according to the present invention, a document is read in response to a processing request input by the input means, the reading document is output as image data, and the image data output in the reading step is included. The character information extraction step for identifying the character posting portion and performing the character recognition processing, outputting the first character information based on the recognized character data, and the information on the control target document to be controlled for document processing, Stores at least second character information, which is information relating to characters included in the control target document, and control information for determining which of one or more types of document processing is to be controlled for the control target document. The second character information of the control target document is referred to from the information storage means and compared with the first character information output in the character information extraction step. The similarity determination step for determining whether or not the document content of the manuscript and the control target document are similar, and the control target document whose document content is similar to the manuscript according to the determination in the similarity determination step are specified and stored as information. And a process determination step of determining whether or not the document process can be performed in response to the process request for the document with reference to the control information of the control target document specified by the means.

また、本発明によるプログラムは、情報処理装置用のプログラムであって、入力手段により入力された処理要求に応じて原稿を読み取りイメージデータを出力する読取ステップと、読取ステップで出力するイメージデータの中から文字掲載部分を特定して文字認識処理を行い、認識した文字データを基に第１の文字情報を出力する文字情報抽出ステップと、文書処理を制御する対象となる制御対象文書に関する情報として、制御対象文書に含まれる文字に関する情報である第２の文字情報と、制御対象文書に対して一つまたは複数種類ある文書処理の内のどの処理を制御するかを定める制御情報とを少なくとも格納する情報格納手段から、制御対象文書の第２の文字情報を参照し、記文字情報抽出ステップで出力する第１の文字情報と比較することで、原稿と制御対象文書の文書内容が類似しているか否かを判断する類似判断ステップと、類似判断ステップの判断に応じて原稿と文書内容が類似する制御対象文書を特定し、情報格納手段から特定した制御対象文書の制御情報を参照して、原稿に対する処理要求に応じた文書処理の可否を判断する処理判断ステップとをコンピュータに実行させるためのプログラムである。 A program according to the present invention is a program for an information processing apparatus, and includes a reading step for reading a document in accordance with a processing request input by an input unit and outputting image data, and image data output in the reading step. The character information extraction step for identifying the character posting portion and performing the character recognition processing, outputting the first character information based on the recognized character data, and the information on the control target document to be controlled for document processing, Stores at least second character information, which is information relating to characters included in the control target document, and control information for determining which of one or more types of document processing is to be controlled for the control target document. The second character information of the control object document is referred to from the information storage means and compared with the first character information output in the character information extraction step. Thus, the similarity determination step for determining whether the document content of the manuscript and the control target document are similar, and the control target document whose document content is similar to the manuscript are specified according to the determination of the similarity determination step, and information is stored. A program for causing a computer to execute a process determination step for determining whether or not to process a document in response to a processing request for a document with reference to control information of a control target document specified by means.

本発明による情報処理装置、情報処理方法及びそのプログラムは、オフィスで作成及び利用されている一般文書を含む種々の種類の原稿に対して複写処理やＦＡＸ処理の適切な抑制を行うことができる。 The information processing apparatus, information processing method, and program therefor according to the present invention can appropriately suppress copy processing and FAX processing for various types of manuscripts including general documents created and used in an office.

以下、図面を参照して本発明の実施形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施形態における複写機（情報処理装置）の概略構成を示すブロック図である。図１において、１は複写機であり、ＣＰＵ１０、ＲＯＭ１２、ＲＡＭ１３、入力装置１４、表示装置１５、スキャナ１６、プリンタ１７、ハードディスク（ＨＤ）１８、リムーバブル外部記憶装置１９、及び通信装置２０から構成される。複写機１は、原稿を読み取り複写するコピー機能（複写機能）、原稿を読み取りＦＡＸするＦＡＸ処理機能、原稿を読み取り電子メールで送信するＳＥＮＤ処理機能などを備える多機能な複写機である。 FIG. 1 is a block diagram showing a schematic configuration of a copying machine (information processing apparatus) according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a copying machine, which includes a CPU 10, a ROM 12, a RAM 13, an input device 14, a display device 15, a scanner 16, a printer 17, a hard disk (HD) 18, a removable external storage device 19, and a communication device 20. The The copying machine 1 is a multifunctional copying machine having a copy function (copying function) for reading and copying an original, a FAX processing function for reading and FAXing an original, a SEND processing function for reading an original and sending it by e-mail, and the like.

ＣＰＵ１０は、例えばマイクロプロセッサであり、種々のプログラムを読み込み実行することで画像処理、文字処理、検索処理のための演算、論理判断等を行い、バス１１を介してバス１１に接続された各構成要素を制御する。 The CPU 10 is, for example, a microprocessor, and performs various operations such as image processing, character processing, search processing, and logical determination by reading and executing various programs, and each component connected to the bus 11 via the bus 11. Control elements.

バス１１は、バスであり、ＣＰＵ１０の制御対象である各構成要素を指示するアドレス信号、コントロール信号を転送する。また、各構成要素間のデータ転送を行う。ＲＡＭ１３は、ＣＰＵ１０が読み書き可能なランダムアクセスメモリであって、各構成要素からの各種データの一次記憶として利用されるメモリである。ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２は、読出し専用の不揮発性メモリである。ＲＯＭ１２は、例えばＣＰＵ１０が起動時に実行するブートプログラムを記憶する。具体的には、ＣＰＵ１０は、システム起動時にブートプログラムを実行すると、ＨＤ１８に記憶された制御プログラムをＲＡＭ１３にロードして実行する。この制御プログラムについては、後にフローチャートを参照して詳述する。 The bus 11 is a bus, and transfers an address signal and a control signal instructing each component to be controlled by the CPU 10. In addition, data transfer between each component is performed. The RAM 13 is a random access memory that can be read and written by the CPU 10, and is used as a primary storage for various data from each component. A ROM (Read Only Memory) 12 is a read-only nonvolatile memory. The ROM 12 stores, for example, a boot program that the CPU 10 executes at startup. Specifically, when the CPU 10 executes the boot program at the time of system startup, the CPU 10 loads the control program stored in the HD 18 into the RAM 13 and executes it. This control program will be described in detail later with reference to a flowchart.

入力装置１４は、タッチパネルや操作ボタン等である。尚、入力装置１４は、タッチパネルや操作ボタンに限定されるものではなく、通常のＰＣ（パーソナルコンピュータ）の様にキーボードや、マウス等で構成してもよい。表示装置１５は、例えば液晶ディスプレイやＣＲＴ等である。本実施形態では、表示装置１５の上に入力装置１４の一部が形成されることで、表示装置１５の画面に表示したボタンによる入力等が可能なタッチパネルを実現している。 The input device 14 is a touch panel, operation buttons, or the like. The input device 14 is not limited to a touch panel or operation buttons, and may be configured with a keyboard, a mouse, or the like like a normal PC (personal computer). The display device 15 is, for example, a liquid crystal display or a CRT. In the present embodiment, a part of the input device 14 is formed on the display device 15, thereby realizing a touch panel that can be input using buttons displayed on the screen of the display device 15.

スキャナ１６は、原稿である紙文書を読み取ってデジタル画像データ化する等の処理を行う装置である。プリンタ１７は、複写機１内で保持する文書データやイメージデータを印刷処理するための装置である。具体的には、プリンタ１７は、通信装置２０が通信回線（ネットワーク）経由で受信する電子文書、ＨＤ１８内に保持されている電子文書を印刷する。また、複写機１のコピー機能は、スキャナ１６から読み取られたスキャンイメージデータをそのままプリンタ１７により印刷することにより実現される。 The scanner 16 is a device that performs processing such as reading a paper document as a document and converting it into digital image data. The printer 17 is a device for printing document data and image data held in the copying machine 1. Specifically, the printer 17 prints an electronic document received by the communication device 20 via a communication line (network) and an electronic document held in the HD 18. The copying function of the copying machine 1 is realized by printing the scan image data read from the scanner 16 by the printer 17 as it is.

ＨＤ１８はハードディスクであり、ＣＰＵ１０により実行される制御プログラム１８ａ、文章内容及び文書レイアウトの類似検索を行う検索処理及び文書管理のための索引に関する情報である文書管理索引データ１８ｂ、文章内容の類似検索を行う際の各単語の重要度に関するデータである単語重要度テーブル１８ｃ等が格納されている。ここで、文章内容及び文書レイアウトの類似検索とは、スキャナ１６においてスキャンした原稿に含まれる文章の内容と類似するものを、文書管理索引データ１８ｂにおいて管理対象とする文章内容から検索する処理及び、スキャナ１６においてスキャンした原稿に含まれる文章や図柄のレイアウトと類似するものを、文書管理索引データ１８ｂにおいて管理対象とするレイアウトから検索する処理である。尚、文書管理索引データ１８ｂにおいて管理対象となる文章内容やレイアウトは、複写機１においてコピー処理、ＦＡＸ処理、及びＳＥＮＤ処理を制限する必要があるものである。 An HD 18 is a hard disk, and includes a control program 18a executed by the CPU 10, a search process for performing a similar search for text content and document layout, and document management index data 18b, which is information related to an index for document management, and a similar search for text content. A word importance degree table 18c, which is data relating to the importance degree of each word when performing, is stored. Here, the similarity search of the text content and the document layout is a process of searching for the text content included in the original scanned by the scanner 16 from the text content to be managed in the document management index data 18b, This is a process of searching the document management index data 18b for a layout similar to the layout of texts and symbols included in the document scanned by the scanner 16 from the layout to be managed. It should be noted that the text content and layout to be managed in the document management index data 18b need to restrict copy processing, FAX processing, and SEND processing in the copying machine 1.

リムーバブル外部記憶装置１９は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリデバイス、ＩＣカード等の複写機１に着脱可能な記録媒体に対するインタフェースを有する装置である。リムーバブル外部記憶装置１９は、上記構成に限定されるものではなく、例えばフレキシブルディスクやＣＤ、ＤＶＤ等のディスク状の記録媒体を設置してアクセスするためのドライブ等を備える構成でもよい。ＣＰＵ１０は、リムーバブル外部記憶装置１９を、上記ＨＤ１８と同様に利用することができる。リムーバブル外部記憶装置１９は、着脱可能な記録媒体を通じて他の複写機とのデータ交換を実現する。尚、ＨＤ１８に記憶される制御プログラムについては、リムーバブル外部記憶装置１９から必要に応じて全部または一部を複写したものであってもよい。 The removable external storage device 19 is an apparatus having an interface for a recording medium that can be attached to and detached from the copying machine 1 such as a USB (Universal Serial Bus) memory device or an IC card. The removable external storage device 19 is not limited to the above configuration, and may be configured to include a drive for installing and accessing a disk-shaped recording medium such as a flexible disk, CD, or DVD, for example. The CPU 10 can use the removable external storage device 19 in the same manner as the HD 18. The removable external storage device 19 realizes data exchange with other copying machines through a removable recording medium. The control program stored in the HD 18 may be copied in whole or in part from the removable external storage device 19 as necessary.

通信装置２０は、ネットワークコントローラであり、通信回線を介して外部とのデータ交換を行う装置である。複写機１は、入力装置１４等から入力される各種イベントに応じて作動する。具体的には、入力装置１４等からのインタラプトが供給されると、入力信号がＣＰＵ１０に送られ、それに伴ってイベントが発生し、イベントに応じてＣＰＵ１０がＲＯＭ１２またはＲＡＭ１３内に記憶される各種命令を読み出し、その実行によって各種の制御が行われる。 The communication device 20 is a network controller, and is a device that exchanges data with the outside via a communication line. The copying machine 1 operates according to various events input from the input device 14 or the like. Specifically, when an interrupt from the input device 14 or the like is supplied, an input signal is sent to the CPU 10, an event is generated accordingly, and the CPU 10 stores various commands stored in the ROM 12 or RAM 13 according to the event. And various controls are carried out by executing them.

以上の構成により、本実施形態における複写機１は、スキャンした原稿の文章内容やレイアウトが文書管理索引データ１８ｂにおいて管理対象となる文書の文書内容やレイアウトと類似しているか否かを判断し、類似していると判断した場合に、スキャンイメージデータを基にコピー処理、ＦＡＸ処理、又はＳＥＮＤ処理を行うことを制限することができる。 With the above configuration, the copying machine 1 according to the present embodiment determines whether the scanned document text content and layout are similar to the document content and layout of the document to be managed in the document management index data 18b. When it is determined that they are similar, it is possible to restrict the copy process, the FAX process, or the SEND process based on the scan image data.

図２は、図１に示した複写機１における操作の流れの例を示した図である。
図２に示すように、複写機１は、文書管理索引データ１８ｂ中に文章内容及びレイアウトの類似検索を行うための索引データと、セキュリティ管理のための情報を予め格納しておく。具体的には、例えばある議事録の文書Ａと同じ形式で同じ議題となる他の議事録の文書に対してコピー処理を制限したい場合には、その文書Ａの文章内容及びレイアウトに関する情報を文書管理索引データ１８ｂに登録する。尚、登録処理の詳細については後述する。これにより文書管理索引データ１８ｂに格納されているデータを利用して、複写機１は、文書Ａと文章内容及びレイアウトが類似する文書に対してコピー処理を制限することができる。 FIG. 2 is a diagram showing an example of the operation flow in the copying machine 1 shown in FIG.
As shown in FIG. 2, the copying machine 1 stores in advance index data for performing similar retrieval of text contents and layout and information for security management in the document management index data 18b. Specifically, for example, when it is desired to restrict the copy processing for another minutes document having the same format and the same agenda as document A of a certain minutes, information on the text content and layout of the document A is stored in the document. Register in the management index data 18b. Details of the registration process will be described later. Thus, using the data stored in the document management index data 18b, the copying machine 1 can restrict the copy process for a document having the same text content and layout as the document A.

例えば、複写機１の原稿台にセットされた紙文書である原稿２のコピーが指示されると、複写機１のスキャナ１６は、原稿２をスキャンしてスキャンイメージを出力する。このスキャンイメージを基に、複写機１は、原稿２のレイアウト情報や文章内容を抽出して、文書管理索引データ１８ｂを検索することにより、コピー処理、ＦＡＸ処理、及びＳＥＮＤ処理（以下、コピー処理等とする）の制限に関する情報である文書管理情報を生成する。そして、複写機１は、この文書管理情報に応じて、スキャンイメージをプリンタ１７で出力するコピー処理等を行うか否かを判断する。ここで、コピー処理等行うと判断した場合には、複写機１は、スキャンイメージを基にしたコピー処理等を行う。 For example, when an instruction to copy a document 2 that is a paper document set on the document table of the copying machine 1 is issued, the scanner 16 of the copying machine 1 scans the document 2 and outputs a scanned image. Based on this scan image, the copying machine 1 extracts the layout information and text content of the document 2 and searches the document management index data 18b, thereby performing copy processing, FAX processing, and SEND processing (hereinafter referred to as copy processing). Etc.) is generated. Then, the copier 1 determines whether or not to perform a copy process for outputting the scan image by the printer 17 in accordance with the document management information. If it is determined that copy processing or the like is to be performed, the copying machine 1 performs copy processing or the like based on the scan image.

また、コピー処理等を行わないと判断した場合には、複写機１は、スキャンイメージを基にしたコピー処理等を抑制し、かつ、コピー処理等の抑制の解除をセキュリティ管理者等に依頼する場合に備えて、類似していると判断した文書管理索引データ１８ｂにおいて管理対象の文書に関する情報（例えば、後述する文書ＩＤ）を表示装置１５に表示する。 If it is determined that the copy process is not performed, the copying machine 1 suppresses the copy process based on the scan image and requests the security administrator to cancel the suppression of the copy process. In preparation for the case, information related to a document to be managed (for example, a document ID described later) is displayed on the display device 15 in the document management index data 18b determined to be similar.

図３は、図１に示した複写機１の表示装置１５の画面遷移例を示した図である。図３において画面３−１は初期状態の画面である。画面３−１には、コピー処理動作を起動するコピーボタン３１、ＦＡＸ処理動作を起動するＦＡＸボタン３２、ＳＥＮＤ処理動作を起動するＳＥＮＤボタン３３、セキュリティ情報を設定するための設定ボタン３４の各種ボタンが配置されている。 FIG. 3 is a diagram showing a screen transition example of the display device 15 of the copying machine 1 shown in FIG. In FIG. 3, a screen 3-1 is a screen in an initial state. The screen 3-1 includes various buttons such as a copy button 31 for starting a copy processing operation, a FAX button 32 for starting a FAX processing operation, a SEND button 33 for starting a SEND processing operation, and a setting button 34 for setting security information. Is arranged.

例えば、コピーボタン３１が押下された場合は、原稿台にある原稿に対するコピー処理の可否が判定され、コピー処理可と判定した場合には、複写機１は、コピー処理動作を遂行する。また、コピー処理不可と判定した場合には、複写機１は、コピー処理動作を遂行しない。この場合には、複写機１は、画面３−２に示すような画面を表示装置１５に表示することで、コピー処理動作を遂行しない旨を利用者に通知すると共に文書管理索引データ１８ｂにおいてコピー処理抑制に設定されている文書の文書ＩＤ（例えば、上述した例の文書Ａの文書ＩＤ）を示すことができる。 For example, when the copy button 31 is pressed, it is determined whether or not the copy process can be performed on the document on the document table. When it is determined that the copy process is possible, the copying machine 1 performs a copy processing operation. If it is determined that the copy process is not possible, the copying machine 1 does not perform the copy process operation. In this case, the copying machine 1 displays a screen as shown in the screen 3-2 on the display device 15, thereby notifying the user that the copy processing operation is not performed and copying the document management index data 18b. The document ID of the document set to process suppression (for example, the document ID of the document A in the above example) can be indicated.

尚、複写機１は、従来技術のように文書の画像パターンを直接比較するのではなく、文書に含まれる文章内容及びレイアウトを比較することで、コピー処理等の制限対象となる文書であるか否かを判断する。このため、偶然、原稿（コピー処理の制限の必要ないもの）の文書に含まれる文章内容が文書管理索引データ１８ｂで管理している文書（以下、管理対象文書とする）の文章内容と類似しているだけでコピー処理を制限されてしまう場合があるかもしない。そのような場合は、コピー処理抑制の根拠（どの管理対象文書を基にコピー処理の制限が行われているか）をはっきりさせるために、画面３−２に示すように、コピー処理抑制の根拠となる管理対象文書の文書ＩＤを表示するようにしている。そうすることで、利用者は、文書セキュリティ管理者に相談することによりコピー処理の制限解除や、制限設定の変更を依頼することができる。 Whether the copying machine 1 is a document subject to restriction such as copy processing by comparing the text content and layout included in the document rather than directly comparing the image patterns of the document as in the prior art. Judge whether or not. For this reason, the text content included in the document of the manuscript (which does not need to be restricted in copy processing) is similar to the text content of the document (hereinafter referred to as a management target document) managed by the document management index data 18b. There may be cases where the copy process is restricted by just being. In such a case, in order to clarify the basis for copy processing suppression (which management target document is used to restrict copy processing), as shown in screen 3-2, The document ID of the management target document is displayed. By doing so, the user can request the restriction of copy processing to be canceled or the restriction setting to be changed by consulting the document security administrator.

図３のＦＡＸボタン３２、ＳＥＮＤボタン３３についても、押下されることでコピーボタン３１と同様に、複写機１は、ＦＡＸ機能、ＳＥＮＤ機能を実行する。また、文書管理索引データ１８ｂを参照することで、ＦＡＸ処理やＳＥＮＤ処理を制限すると判断した場合には、複写機１は、図３の画面３−３、画面３−４に示すように、ＦＡＸ処理やＳＥＮＤ処理を制限した旨と共に文書管理索引データ１８ｂにおいて参照された管理対象文書の文書ＩＤを表示装置１５に表示する。 When the FAX button 32 and the SEND button 33 in FIG. 3 are also pressed, the copying machine 1 executes the FAX function and the SEND function in the same manner as the copy button 31. If it is determined that the FAX process or the SEND process is restricted by referring to the document management index data 18b, the copying machine 1 determines that the FAX process or the SEND process is restricted as shown in screens 3-3 and 3-4 in FIG. The document ID of the management target document referenced in the document management index data 18b is displayed on the display device 15 together with the fact that the processing and the SEND processing are restricted.

また、図３の画面３−１で設定ボタン３４を押下すると、複写機１は、図３に示す画面３−５を表示装置１５に表示する。この画面３−５においては、利用者は文書管理索引データ１８ｂにおいて管理する文書のセキュリティ情報を変更することができる。尚、セキュリティ情報が誰にでも変更できるのであればセキュリティの意味がないので、複写機１は、画面３−５に示すように、文書セキュリティ管理者のみに設定変更の権限を限定するため、入力欄３５及び３６にログイン名及びパスワードの入力を要求し、それらの情報を利用してユーザ認証を行う。 When the setting button 34 is pressed on the screen 3-1 in FIG. 3, the copying machine 1 displays a screen 3-5 shown in FIG. 3 on the display device 15. On this screen 3-5, the user can change the security information of the document managed in the document management index data 18b. If anyone can change the security information, there is no security meaning. Therefore, as shown in the screen 3-5, the copier 1 limits the authority to change the setting to only the document security administrator. The fields 35 and 36 are requested to input a login name and password, and user authentication is performed using the information.

また、画面３−５に示すように、入力欄３７は、登録対象の文書ＩＤを入力する入力欄である。また、利用者（文書セキュリティ管理者）は、入力欄３７に入力した文書ＩＤで特定される文書に対して、コピー処理、ＦＡＸ処理、ＳＥＮＤ処理に対して「許可」または「禁止」の属性を選択的に設定できる。図３の画面３−５では、全て「禁止」が選択されている。最後に設定ボタン３８を押下することで、入力したセキュリティ情報が文書管理索引データ１８ｂに設定される。また、取消ボタン３９を押下することで、画面３−５における設定入力は無効となる。 Further, as shown on the screen 3-5, the input field 37 is an input field for inputting a document ID to be registered. In addition, the user (document security manager) sets the attribute “permitted” or “prohibited” to the copy process, the FAX process, and the SEND process for the document specified by the document ID input in the input field 37. Can be set selectively. In the screen 3-5 in FIG. 3, “prohibited” is all selected. Finally, by pressing the setting button 38, the input security information is set in the document management index data 18b. Further, when the cancel button 39 is pressed, the setting input on the screen 3-5 becomes invalid.

図４は、本実施形態の複写機１が備える文書解析機能の例を示す図である。
図４の上段に示すように、イメージデータ４−１は、スキャナ１６でスキャンされたスキャンイメージ例を示すものであり、原稿の紙文書がスキャナ１６によって読み取られデジタルデータ化された文書イメージである。このイメージデータ４−１に対して、まず、複写機１は、ブロック解析を行う。ブロック解析は、文書イメージ４−１に対してブロックの性質に応じた矩形ブロックに分割する処理である。文書イメージ４−１に対するブロック解析の結果、複写機１は、図４の下段に示すようにテキストブロック４−２、画像ブロック４−３、及び画像ブロック４−４の３つのブロックに分割する。テキストブロック４−２は、内部に文章（テキスト）が含まれていることを検出してテキストブロックと判断したブロックである。また、残りの画像ブロック４−３、４−４は、テキスト以外の情報（グラフ、写真など）が含まれていることを検出して画像ブロックと判断したブロックである。 FIG. 4 is a diagram illustrating an example of a document analysis function provided in the copying machine 1 according to the present embodiment.
As shown in the upper part of FIG. 4, the image data 4-1 shows an example of a scan image scanned by the scanner 16, and is a document image obtained by reading a paper document of an original by the scanner 16 and converting it into digital data. . First, the copying machine 1 performs block analysis on the image data 4-1. Block analysis is a process of dividing the document image 4-1 into rectangular blocks according to the properties of the blocks. As a result of the block analysis on the document image 4-1, the copying machine 1 divides the block into three blocks, a text block 4-2, an image block 4-3, and an image block 4-4 as shown in the lower part of FIG. The text block 4-2 is a block that is determined to be a text block by detecting that a sentence (text) is included therein. The remaining image blocks 4-3 and 4-4 are blocks that are determined to be image blocks by detecting that information (graph, photograph, etc.) other than text is included.

尚、本実施形態では、テキストブロックや画像ブロックの検出（判断）方法については、特に説明しないが、市販のＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）ソフトなどで一般的に利用されている種々の方法を用いてよい。また、複写機１は、テキストブロック４−２に対しては、文字認識処理を行い、テキストの抽出処理を行うが、画像ブロック４−３，４−４に対しては、文字認識処理やテキストの抽出処理を行わない。 In this embodiment, a method for detecting (determining) a text block or an image block is not particularly described, but various methods generally used in commercially available OCR (Optical Character Reader) software or the like are used. Good. The copying machine 1 performs character recognition processing and text extraction processing on the text block 4-2, but performs character recognition processing and text processing on the image blocks 4-3 and 4-4. The extraction process is not performed.

また、複写機１は、テキストブロック４−２に対して行った文字認識処理の結果として得られたテキスト情報を基に、当該テキスト情報の特徴となるキーワードを抽出する処理を行う。図５は、複写機１におけるテキストブロック４−２から抽出したテキスト情報例、及びテキスト情報から抽出されたキーワードデータ例を示す図である。 Further, the copying machine 1 performs a process of extracting a keyword that is a feature of the text information based on the text information obtained as a result of the character recognition process performed on the text block 4-2. FIG. 5 is a diagram showing an example of text information extracted from the text block 4-2 in the copying machine 1 and an example of keyword data extracted from the text information.

まず、複写機１は、図４に示したスキャンイメージ４−１のテキストブロック４−２に対して文字認識処理を行い、ＯＣＲテキスト情報としてテキスト情報５−２を出力する。文字認識処理であるため１００％正確な認識が行われるとは限らず、一定の誤認識データが含まれることになる。図中“ＢＸシリーズ”となるべき文字列は“８○シリーズ”となり、“超写真画質”となるべき文字列は“超写真白質”となってしまっている。このような誤認識文字はマッチングが取れないので、予め除去されてしまう。現時点でのワープロ文字に対する文字認識処理における文字認識率は１００％近い正確な処理なので、誤認識文字は全体から見れば少数である。このように誤認識単語を除外したとしても、本実施形態によるマッチング処理においては、誤差の範囲内にとどまり、全体としては類似した文章を抽出できる。 First, the copier 1 performs character recognition processing on the text block 4-2 of the scan image 4-1 shown in FIG. 4, and outputs text information 5-2 as OCR text information. Since it is character recognition processing, 100% accurate recognition is not always performed, and certain erroneous recognition data is included. In the figure, the character string to be “BX series” is “8 series”, and the character string to be “superphoto image quality” is “superphoto white matter”. Since such a misrecognized character cannot be matched, it is removed in advance. Since the character recognition rate in the character recognition process for word processor characters at the present time is an accurate process close to 100%, the number of misrecognized characters is small as a whole. Even if erroneously recognized words are excluded in this way, in the matching process according to the present embodiment, the sentence stays within the error range, and similar sentences can be extracted as a whole.

誤認識除去の手法は多々考えられるが、ここではキーワード抽出に基づく誤認識除去の例が示されている。複写機１には、解析可能なキーワードのリスト（キーワード辞書）が予め用意されており、このキーワードリストに基づき、テキスト情報５−２に含まれるキーワードが抽出キーワードデータ５−３としてリストアップされる。キーワード辞書に載っているキーワードのみがリストアップされるので、未知語などはなくなり、この段階で誤認識の大多数は除去される。なお、キーワード辞書は、ドキュメントの特徴を把握しやすいように特定の品詞（名詞、固有名詞、サ変名詞）の単語のみが登録されている。図５示の例では、キーワードデータ５−３に示すように「写真」「追求」などがピックアップされ、キーワード辞書にない「８○」は除外されている。 There are many methods for removing erroneous recognition. Here, an example of erroneous recognition removal based on keyword extraction is shown. The copier 1 is prepared with a list of keywords (keyword dictionary) that can be analyzed in advance. Based on this keyword list, keywords included in the text information 5-2 are listed as extracted keyword data 5-3. . Since only the keywords listed in the keyword dictionary are listed, there are no unknown words, and the majority of misrecognitions are removed at this stage. Note that in the keyword dictionary, only words with specific parts of speech (nouns, proper nouns, sa-changing nouns) are registered so that the characteristics of the document can be easily understood. In the example shown in FIG. 5, “photograph”, “pursuit” and the like are picked up as shown in the keyword data 5-3, and “8 ○” not in the keyword dictionary is excluded.

次に、複写機１は、抽出したキーワードデータ５−３を基に、文章内容が文書管理索引データ１８ｂで管理している管理対象文書の文章内容と類似しているいか否かを判断する。すなわち、複写機１は、文書管理索引データ１８ｂにおいて、文書内容に関する情報として各管理対象文書のテキスト情報から抽出したキーワードに関する情報を格納する。 Next, the copying machine 1 determines whether the text content is similar to the text content of the management target document managed by the document management index data 18b based on the extracted keyword data 5-3. That is, the copying machine 1 stores information on keywords extracted from the text information of each management target document as information on document contents in the document management index data 18b.

ここで、文書管理索引データ１８ｂの詳細について説明する。図６は、図１に示した文書管理索引データ１８ｂの構成例を示す図である。図６に示すように、文書管理索引データ１８ｂは、管理対象文書を特定する「文書ＩＤ」６−１に関連付けて「レイアウト特徴量」６−２、「文章内容特徴量」６−３、「文書制御情報」６−４を格納している。 Here, details of the document management index data 18b will be described. FIG. 6 is a diagram showing a configuration example of the document management index data 18b shown in FIG. As shown in FIG. 6, the document management index data 18b is associated with a “document ID” 6-1 that specifies a management target document, “layout feature” 6-2, “text content feature” 6-3, “ Document control information "6-4 is stored.

「文書ＩＤ」６−１は、その文書（管理対象文書）をユニークに特定できる識別情報である。「レイアウト特徴量」６−２は、「文書ＩＤ」６−１で特定される文書のレイアウトに関する情報である。「レイアウト特徴量」６−２は、上述した文書レイアウトの類似検索を行うための各文書のレイアウトに関するインデックス情報である。複写機１は、「レイアウト特徴量」６−２を基にレイアウトの類似性を判定する。例えば、複写機１は、文書のイメージデータをビットマップイメージに変換して、そのビットマップイメージを縦ｎ個、横ｍ個の矩形に分割し、各矩形の平均の輝度情報と色情報を「レイアウト特徴量」６−２として文書管理索引データ１８ｂに格納する。また、類似検索を行うための画像特徴量の例については、例えば、特開平１０−２６０９８３においても提案されており、これを用いてもよい。 The “document ID” 6-1 is identification information that can uniquely identify the document (management target document). The “layout feature amount” 6-2 is information relating to the layout of the document specified by the “document ID” 6-1. The “layout feature” 6-2 is index information relating to the layout of each document for performing the above-described similarity search of document layouts. The copying machine 1 determines the similarity of the layout based on the “layout feature” 6-2. For example, the copying machine 1 converts image data of a document into a bitmap image, divides the bitmap image into n vertical and m horizontal rectangles, and sets average luminance information and color information of each rectangle as “ Stored in the document management index data 18b as "layout feature" 6-2. An example of an image feature amount for performing a similarity search is also proposed in, for example, Japanese Patent Laid-Open No. 10-260983, and this may be used.

「レイアウト特徴量」６−２は、複写機１がネットワークを介してＰＣ等から印刷時に受信するラスタイメージ、コピー・ＦＡＸ・ＳＥＮＤ時のスキャンされデジタル化されたスキャンイメージを基に作成される。以上により、複写機１は、文書レイアウトの類似検索を行う際は、原稿のレイアウト特徴量を抽出して、文書管理索引データ１８ｂに格納する各管理対象文書の「レイアウト特徴量」６−２と比較して、レイアウト類似度を求める。 The “layout feature value” 6-2 is created based on a raster image received by the copying machine 1 from a PC or the like via a network during printing, and a scanned and digitized scan image at the time of copying / FAX / SEND. As described above, the copying machine 1 extracts the layout feature quantity of the document and performs “layout feature quantity” 6-2 of each management target document stored in the document management index data 18b when performing similar search of the document layout. The layout similarity is obtained by comparison.

次に、図６に示した「文章内容特徴量」６−３、「文書制御情報」６−４について、図７、図８を用いて詳述する。図７は、図６に示した「文章内容特徴量」６−３の構成例を示す図である。図７に示すように、「文章内容特徴量」６−３は、文章内容の類似検索を行うためのインデックス情報として、文書ＩＤで識別される各管理対象文書の文章内容に応じた文書ベクトルを格納する。この文書ベクトルとは、キーワード辞書に含まれる単語（キーワード）を次元として、文書ベクトルの各次元の値をその単語の出現度数とする。また、図７に示すように、各次元には１、２、３、…と識別番号が付与されており、例えば次元「２」は、「写真」、次元「５」は「追求」、次元「８」は「モデル」という単語が対応している。これらの単語は、図５のキーワードデータ５−３に示すようにキーワード辞書に含まれる単語である。 Next, “text content feature amount” 6-3 and “document control information” 6-4 shown in FIG. 6 will be described in detail with reference to FIGS. FIG. 7 is a diagram illustrating a configuration example of the “text content feature amount” 6-3 illustrated in FIG. As shown in FIG. 7, the “text content feature amount” 6-3 is a document vector corresponding to the text content of each management target document identified by the document ID, as index information for performing a similar search of text content. Store. The document vector is a word (keyword) included in the keyword dictionary as a dimension, and the value of each dimension of the document vector is the frequency of appearance of the word. As shown in FIG. 7, identification numbers 1, 2, 3,... Are assigned to each dimension. For example, dimension “2” is “photograph”, dimension “5” is “pursuit”, dimension. “8” corresponds to the word “model”. These words are words included in the keyword dictionary as shown in the keyword data 5-3 in FIG.

ただし、本実施形態の「文章内容特徴量」６−３においては、１単語を正確に１次元とせず、同一あるいは類似の単語群を１つの次元として文書ベクトルを構成する。例えば図７では、次元「２」に対して「写真」以外に「フォト」の単語も対応させている。「文章内容特徴量」６−３は、各次元に対応する単語が文書ＩＤで特定される文書に含まれる出現度数を、次元ごとに記憶する。 However, in the “text content feature amount” 6-3 of the present embodiment, one word is not exactly one-dimensional, and the document vector is configured with the same or similar word group as one dimension. For example, in FIG. 7, the word “photo” is associated with the dimension “2” in addition to “photo”. The “text content feature amount” 6-3 stores, for each dimension, the appearance frequency included in the document in which the word corresponding to each dimension is specified by the document ID.

また、図４に示した文書では一つのテキストブロック４−２のみを含んでいたが、１つの文書に複数のテキストブロックが存在する場合は、複写機１は、全てのテキストブロックから抽出されるキーワードデータについてまとめて集計して、１つの文書ベクトルに関する情報（文書内容特徴量）を作成して、文書管理索引データ１８ｂに格納する。また、複写機１は、文書の検索を行う際には、検索クエリとなるスキャンされた文書からも、文書管理索引データ１８ｂに格納する文書ベクトルと同じ形式のベクトルデータ（クエリベクトル）を作成する。これにより、複写機１は、スキャンした原稿のベクトルデータと、文書管理索引データ１８ｂに格納される「文章内容特徴量」６−３から参照する各管理対象文書の文書ベクトルを比較し類似度を求めることで、文章内容の類似検索を行う。 Further, the document shown in FIG. 4 includes only one text block 4-2, but when a plurality of text blocks exist in one document, the copying machine 1 is extracted from all the text blocks. The keyword data is aggregated and information (document content feature amount) related to one document vector is created and stored in the document management index data 18b. Further, when searching for a document, the copying machine 1 creates vector data (query vector) in the same format as the document vector stored in the document management index data 18b from the scanned document serving as the search query. . As a result, the copying machine 1 compares the vector data of the scanned document with the document vector of each managed document referenced from the “text content feature value” 6-3 stored in the document management index data 18b, and determines the similarity. By searching, a similar search of sentence contents is performed.

次に、図６に示した「文書制御情報」６−４について説明する。図８は、図６に示した「文書制御情報」６−４の構成例を示す図である。「文書制御情報」６−４は、文書ＩＤで識別される各管理対象文書に対応して、その文書をどのように管理すべきかを示すセキュリティ情報を格納する。具体的にはセキュリティ情報として、図８に示すように、文書ＩＤに対応して、複写機１における３種類の動作を制御する情報としてコピー処理制御情報８−１、ＦＡＸ処理制御情報８−２、ＳＥＮＤ処理制御情報８−３を格納する。 Next, the “document control information” 6-4 shown in FIG. 6 will be described. FIG. 8 is a diagram showing a configuration example of the “document control information” 6-4 shown in FIG. The “document control information” 6-4 stores security information indicating how to manage the document corresponding to each management target document identified by the document ID. Specifically, as security information, as shown in FIG. 8, copy processing control information 8-1 and FAX processing control information 8-2 are information that controls three types of operations in the copying machine 1 corresponding to the document ID. , SEND process control information 8-3 is stored.

本実施形態では、「文書制御情報」６−４は、コピー処理制御情報８−１、ＦＡＸ処理制御情報８−２、ＳＥＮＤ処理制御情報８−３として、複写機１における各動作を制御する情報として「０」又は「１」をそれぞれ格納する。各動作に対して許可するのであれば「１」、抑制するのであれば「０」をそれぞれ格納する。尚、ユーザごとに許可／抑制の制御を変更する実施形態を実現する場合には、例えば、この「文書制御情報」６−４の構成に、ユーザ情報の次元を新たに加えて３次元化して構成することにより実現できる。 In the present embodiment, the “document control information” 6-4 is information that controls each operation in the copying machine 1 as copy processing control information 8-1, FAX processing control information 8-2, and SEND processing control information 8-3. As “0” or “1”. “1” is stored for each operation, and “0” is stored for suppression. In order to realize an embodiment in which the permission / inhibition control is changed for each user, for example, the “document control information” 6-4 is added to the three dimensions by adding a dimension of user information. This can be realized by configuring.

図８の例においては、文書ＩＤ６９４７の文書はコピー処理許可、ＦＡＸ処理許可、ＳＥＮＤ処理許可であり、文書ＩＤ６９４８の文書はコピー処理のみ許可であり、ＦＡＸ処理とＳＥＮＤ処理は抑制である。文書ＩＤ６９４９の文書はＳＥＮＤ処理のみ許可であり、コピー処理とＦＡＸ処理は抑制である。 In the example of FIG. 8, the document with document ID 6947 is copy process permission, FAX process permission, and SEND process permission, the document ID 6948 is only permitted with copy process, and the FAX process and SEND process are suppressed. Only the SEND process is permitted for the document with the document ID 6949, and the copy process and the FAX process are suppressed.

次に、複写機１がスキャンした原稿のベクトルデータと、文書管理索引データ１８ｂに格納される「文章内容特徴量」６−３から参照する各管理対象文書の文書ベクトルを比較し類似度（文書内容類似度）を求める際に、参照する単語重要度テーブルについて説明する。図９は、本実施形態における単語重要度テーブルの一例を示す図である。図９において、単語重要度テーブル９１は、文書中における各単語の出現度数（出現頻度数）を示すテーブルである。図９に示すように、各単語の特定には「文章内容特徴量」６−３に示した各単語（次元）の識別番号を利用している。複写機１は、文書管理索引データ１８ｂにて管理する全管理対象文書において、各単語の出現する頻度を基にこの単語重要度テーブル９１を作成し、文章内容の類似性を判定する上で利用する。 Next, the vector data of the original scanned by the copying machine 1 is compared with the document vector of each managed document referred to from the “text content feature amount” 6-3 stored in the document management index data 18b, and the similarity (document The word importance level table to be referred to when obtaining the (content similarity) will be described. FIG. 9 is a diagram showing an example of the word importance table in the present embodiment. In FIG. 9, a word importance degree table 91 is a table showing the appearance frequency (appearance frequency) of each word in a document. As shown in FIG. 9, the identification number of each word (dimension) shown in “text content feature amount” 6-3 is used to identify each word. The copying machine 1 creates this word importance table 91 based on the frequency of occurrence of each word in all managed documents managed by the document management index data 18b, and is used to determine the similarity of the text contents. To do.

複写機１は、各単語の重要度を単語重要度テーブル９１の出現度数の逆数で算出する。具体的には、単語ｋの重要度ｗ_kを以下の式で求める。
ｗ_k＝１／（単語ｋの出現度数）
ここで、ｋ＝１、２、３、…、ｎであり、「文章内容特徴量」６−３に示した各単語（次元）の識別番号を示す数字である。 The copying machine 1 calculates the importance of each word as the reciprocal of the appearance frequency in the word importance table 91. Specifically, the importance w _k of the word k is _obtained by the following equation.
w _k = 1 / (frequency of word k)
Here, k = 1, 2, 3,..., N, and is a number indicating the identification number of each word (dimension) shown in “text content feature amount” 6-3.

ただし、出現度数が０の場合は単語の重要度も０とする。文書管理索引データ１８ｂにて管理する全管理対象文書のいずれにも出現しない単語は類似性判定には役に立たないと判断したためである。重要度が出現度数の逆数をとる理由は、多くの文書に多頻度で出現するようなありふれた単語は文章内容の類似性を判定する上では相対的に重要性が低いからである。 However, when the appearance frequency is 0, the importance of the word is also 0. This is because it is determined that a word that does not appear in any of the all management target documents managed by the document management index data 18b is not useful for similarity determination. The reason why the importance is the reciprocal of the appearance frequency is that a common word that appears frequently in many documents is relatively less important in determining the similarity of sentence contents.

また、複写機１は、文書の類似性判定を行うために文書管理索引データ１８ｂにて管理する管理対象文書Ａと原稿Ｂの文章内容類似度ＴＳ（Ｘ，Ｑ）を以下の式を利用して算出する。

Further, the copying machine 1 uses the following expression for the sentence content similarity TS (X, Q) of the management target document A and the manuscript B managed by the document management index data 18b in order to determine the similarity of documents. To calculate.

但し、管理対象文書Ａの文書ベクトルＸ＝（ｘ₁，ｘ₂，ｘ₃，…，ｘ_n）とし、原稿ＢのクエリベクトルＱ＝（ｑ₁，ｑ₂，ｑ₃，…，ｑ_n）とする。ここで１〜ｎの数字は、「文章内容特徴量」６−３に示した各単語（次元）の識別番号である。また、ｗ_kは単語ｋの重要度である。 However, the document vector X of the management target document A = (x ₁ , x ₂ , x ₃ ,..., X _n ) and the query vector Q of the document B = (q ₁ , q ₂ , q ₃ ,..., Q _n ) And Here, the numbers 1 to n are identification numbers of the respective words (dimensions) shown in the “text content feature amount” 6-3. W _k is the importance of the word k.

文章内容類似度ＴＳは、上記式に示すように、比較する２つの文書（管理対象文書Ａと原稿Ｂ）について、すべての単語（ｋ＝１からｋ＝ｎまで）の出現度数の差の絶対値にその単語の重要度ｗ_kを乗じたものを積算し、そのマイナスの値で表現する。マイナスにすることで、出現度数の差が小さいほど文章内容類似度ＴＳの値が大きくなる。すなわち、文章内容類似度ＴＳの値が大きいほど、比較する２つの文書の類似性が高いことを示す。 As shown in the above formula, the sentence content similarity TS is the absolute difference in the frequency of appearance of all the words (from k = 1 to k = n) for the two documents to be compared (managed document A and manuscript B). The value multiplied by the importance w _k of the word is added and expressed as a negative value. By making it negative, the value of the sentence content similarity TS increases as the difference in appearance frequency decreases. That is, the larger the value of the sentence content similarity TS, the higher the similarity between the two documents to be compared.

また、複写機１は、管理対象文書Ａと原稿Ｂのレイアウト類似度ＬＳについても、同様に双方の文書（管理対象文書Ａと原稿Ｂ）の類似性が高いほど、レイアウト類似度ＬＳの値が大きくなる計算式を用いて求める。 In the copying machine 1, the layout similarity LS between the management target document A and the original B is similarly increased as the similarity between both documents (the management target document A and the original B) increases. Use a formula that increases.

更に、複写機１は、上述した文章内容類似度ＴＳとレイアウト類似度ＬＳとから、総合類似度Ｓを以下の式により求める。
Ｓ＝α×ＴＳ＋β×ＬＳ Further, the copying machine 1 obtains the total similarity S from the above-described sentence content similarity TS and layout similarity LS by the following expression.
S = α × TS + β × LS

上記式により求めた総合類似度は基本的には文章内容類似度ＴＳとレイアウト類似度ＬＳを加算したものであるが、文章内容類似度ＴＳとレイアウト類似度ＬＳの重要性に応じて、重みαとβを乗じて加算している。αは文章内容類似度に対する重みであり、βはレイアウト類似度に対する重みである。αとβの値は可変であり、セキュリティ管理などの視点から適宜変更可能である。レイアウトと機密性の関係が薄く、文章内容のみチェックすればよいのであれば、レイアウト類似度の重みβの値を小さくする。例えば、レイアウトを一切無視するのであれば、α＝１、β＝０などとすればよい。他方、紙幣・有価証券のようにレイアウトについても一定のセキュリティ上の考慮を払い、文章内容と等しく重視したいのであれば、α＝１、β＝１などと設定する。 The total similarity obtained by the above formula is basically the sum of the text content similarity TS and the layout similarity LS, but the weight α depends on the importance of the text content similarity TS and the layout similarity LS. And β are added. α is a weight for the sentence content similarity, and β is a weight for the layout similarity. The values of α and β are variable and can be appropriately changed from the viewpoint of security management. If the relationship between the layout and confidentiality is low and only the text content needs to be checked, the layout similarity weight β is decreased. For example, if the layout is completely ignored, α = 1, β = 0, etc. On the other hand, if a certain security consideration is given to the layout like banknotes / securities, and if it is desired to place the same importance on the text content, α = 1, β = 1, etc. are set.

上述の複写機１の動作をフローチャートに従って説明する。 The operation of the copying machine 1 will be described with reference to a flowchart.

図１０は、複写機１の動作、より具体的にはＣＰＵ１０の処理手順を示すフローチャートである。図１０に示すように、まず、ステップＳ１１−１において、ＣＰＵ１０は、システムの初期化処理を行う。具体的には、ＣＰＵ１０は、各種パラメータの初期化や初期画面の表示等を行う。 FIG. 10 is a flowchart showing the operation of the copying machine 1, more specifically, the processing procedure of the CPU 10. As shown in FIG. 10, first, in step S11-1, the CPU 10 performs a system initialization process. Specifically, the CPU 10 initializes various parameters, displays an initial screen, and the like.

次に、ステップＳ１１−２において、ＣＰＵ１０は、タッチパネル等の入力装置１４からの入力、あるいは通信装置２０を介して直接接続又はネットワーク接続されている機器／デバイスからのリクエストなど、何らかのイベントが発生するのをマイクロプロセッサＣＰＵにおいて待つ処理である。イベントが発生すると、ステップＳ１１−３において、ＣＰＵ１０は、発生したイベントを判別し、ステップＳ１１−４においてイベントの種類に応じて各種の処理に分岐する。図１０においては、各種イベントに対応した分岐先の複数の処理をステップＳ１１−４という形でまとめて表現している。 Next, in step S 11-2, the CPU 10 generates some event such as an input from the input device 14 such as a touch panel or a request from a device / device connected directly or via the network via the communication device 20. This is a process of waiting in the microprocessor CPU. When an event occurs, in step S11-3, the CPU 10 determines the event that has occurred, and branches to various processes according to the type of event in step S11-4. In FIG. 10, a plurality of branch destination processes corresponding to various events are collectively expressed in the form of step S11-4.

上記ステップＳ１１−４の具体的な処理としては、図１１、図１２、図１３、図１４、図１５で詳述する印刷処理、文書管理情報設定処理、コピー処理、ＦＡＸ処理、ＳＥＮＤ処理がこの分岐先の一部となる。他の処理としては、詳細は記述されないが、コピー枚数を設定する処理、ＦＡＸ先やＳＥＮＤ先を指定する処理など通常の複写機の処理がある。次に、ステップＳ１１−５において、複写機１は、上記の各処理の処理終了を例えば表示装置１５に表示する表示処理を行う。エラーがあった場合のエラー表示、正常終了の場合の表示処理など通常広く行われる処理である。 Specific processing in step S11-4 includes print processing, document management information setting processing, copy processing, FAX processing, and SEND processing, which will be described in detail with reference to FIGS. 11, 12, 13, 14, and 15. It becomes a part of the branch destination. As other processing, although details are not described, there are normal copying machine processing such as processing for setting the number of copies and processing for specifying a FAX destination and a SEND destination. Next, in step S11-5, the copying machine 1 performs a display process for displaying, on the display device 15, for example, the end of each process described above. This process is generally performed widely, such as an error display when there is an error and a display process when the process ends normally.

次に、複写機１における印刷処理について説明する。図１１は、図１０に示したステップＳ１１−４の処理の一部である印刷処理の詳細を示すフローチャートである。複写機１における印刷処理は、通信装置２０を介してネットワークの先に接続されたＰＣ等の機器から電子ファイル（文書ファイル）の印刷が指示されたときに起動する処理である。尚、複写機１は、印刷指示と合わせて、印刷対象となる文書のラスタイメージ（ビットマップイメージ）を受信する。 Next, a printing process in the copying machine 1 will be described. FIG. 11 is a flowchart showing details of the printing process which is a part of the process of step S11-4 shown in FIG. The printing process in the copying machine 1 is a process that is started when an electronic file (document file) is instructed from a device such as a PC connected to the end of the network via the communication device 20. The copier 1 receives a raster image (bitmap image) of a document to be printed together with a print instruction.

図１１に示すように、まず、ステップＳ１２−１において、複写機１は、受信した文書のラスタイメージをプリンタ１７により印刷処理する通常の印刷処理を行う。次に、ステップＳ１２−２において、複写機１は、印刷処理したラスタイメージに対してブロック解析を行い、テキストブロックと画像ブロックを特定する。 As shown in FIG. 11, first, in step S 12-1, the copying machine 1 performs a normal printing process in which a printer 17 prints a received raster image of a document. Next, in step S12-2, the copying machine 1 performs block analysis on the raster image that has been subjected to printing processing, and specifies a text block and an image block.

次に、ステップＳ１２−３において、複写機１は、テキストブロック中の文字に対して文字認識処理を行い、テキスト情報を抽出する。次に、ステップＳ１２−４において、複写機１は、抽出されたテキスト情報からキーワード辞書を基にキーワードを抽出し、文章内容特徴量である文書ベクトルを生成する。次に、ステップＳ１２−５において、複写機１は、ラスタイメージから、画像の特徴量などを含むレイアウト特徴量を抽出する。次に、ステップＳ１２−６において、複写機１は、ステップＳ１２−４において抽出した文章内容特徴量及び、ステップＳ１２−５で抽出したレイアウト特徴量を、文書管理索引データ１８ｂに、印刷処理した文書を特定する文書ＩＤに関連付けて登録する。以上に示したように、複写機１は、印刷処理時に、文書管理索引データ１８ｂに印刷処理した文書に関する文書特徴情報（文章内容特徴量、レイアウト特徴量）を登録する処理を行うことができる。 Next, in step S12-3, the copier 1 performs character recognition processing on the characters in the text block and extracts text information. In step S12-4, the copying machine 1 extracts a keyword from the extracted text information based on the keyword dictionary, and generates a document vector that is a text content feature amount. Next, in step S12-5, the copying machine 1 extracts a layout feature amount including an image feature amount from the raster image. Next, in step S12-6, the copying machine 1 prints the document content feature value extracted in step S12-4 and the layout feature value extracted in step S12-5 into the document management index data 18b. Is registered in association with the document ID that identifies the ID. As described above, the copying machine 1 can perform processing for registering document feature information (text content feature amount, layout feature amount) regarding a document that has been subjected to print processing in the document management index data 18b during print processing.

次に、複写機１における文書管理情報設定処理について説明する。
図１２は、図１０に示したステップＳ１１−４の処理の一部である文書管理情報設定処理の詳細を示すフローチャートである。図１２に示すように、まず、ステップＳ１３−１において、複写機１は、図３に示した画面３−５を表示装置１５に表示することで、セキュリティ管理者のログイン名・パスワード（ユーザ認証に使用）、及び各文書に設定すべきセキュリティ情報の入力を、利用者に促す。次に、ステップＳ１３−２において、複写機１は、入力されたログイン名・パスワードに基づいてユーザ認証を行う。 Next, document management information setting processing in the copying machine 1 will be described.
FIG. 12 is a flowchart showing details of the document management information setting process which is a part of the process of step S11-4 shown in FIG. As shown in FIG. 12, first, in step S13-1, the copying machine 1 displays the screen 3-5 shown in FIG. 3 on the display device 15 to thereby display the login name / password (user authentication) of the security administrator. And the user is prompted to input security information to be set for each document. In step S13-2, the copier 1 performs user authentication based on the input login name / password.

次に、ステップＳ１３−３において、複写機１は、ステップＳ１３−２におけるユーザ認証が成功したか、すなわち、ユーザ認証によりセキュリティ情報が変更できる権限が認められたかどうかを判定する。ここで、ユーザ認証に失敗した場合には、複写機１は、文書管理索引データ１８ｂの情報を変更せずに文書管理情報設定処理を終了する。また、ユーザ認証に成功した場合には、複写機１は、ステップＳ１３−４において、入力された情報に従って文書のセキュリティ情報を文書管理索引データ１８ｂの「文書制御情報」６−４に登録し、登録処理後に文書管理情報設定処理を終了する。 Next, in step S13-3, the copying machine 1 determines whether the user authentication in step S13-2 has succeeded, that is, whether the authority to change the security information has been recognized by the user authentication. If the user authentication fails, the copying machine 1 ends the document management information setting process without changing the information of the document management index data 18b. If the user authentication is successful, the copying machine 1 registers the document security information in the “document control information” 6-4 of the document management index data 18b in accordance with the input information in step S13-4. After the registration process, the document management information setting process is terminated.

次に、複写機１におけるコピー処理について説明する。
図１３は、図１０に示したステップＳ１１−４の処理の一部であるコピー処理の詳細を示すフローチャートである。図１３に示すように、まず、ステップＳ１４−１において、複写機１は、原稿台上の紙文書をスキャナ１６で読み取り、ビットマップイメージ化する。次に、ステップＳ１４−２において、複写機１は、スキャンされたビットマップイメージをブロック解析し、テキストブロック、画像ブロック等に分離する。 Next, a copy process in the copying machine 1 will be described.
FIG. 13 is a flowchart showing details of the copy process which is a part of the process of step S11-4 shown in FIG. As shown in FIG. 13, first, in step S14-1, the copying machine 1 reads a paper document on the document table with the scanner 16 to form a bitmap image. Next, in step S14-2, the copying machine 1 performs block analysis on the scanned bitmap image and separates it into text blocks, image blocks, and the like.

次に、ステップＳ１４−３において、複写機１は、テキストブロック中の文字に対して文字認識処理を行い、テキスト情報を抽出する。次に、ステップＳ１４−４において、複写機１は、抽出したテキスト情報からキーワードを抽出し、文章内容特徴量であるクエリベクトルを生成する。次に、ステップＳ１４−５において、複写機１は、ビットマップイメージに対して画像特徴量などのレイアウト特徴量を抽出する。次に、ステップＳ１４−６において、複写機１は、抽出された特徴量から、上述したようにレイアウト類似度、文章内容類似度を求めて総合類似度を算出し、最も類似している管理対象文書の文書ＩＤを特定する。 Next, in step S14-3, the copying machine 1 performs character recognition processing on characters in the text block and extracts text information. Next, in step S14-4, the copying machine 1 extracts keywords from the extracted text information, and generates a query vector that is a sentence content feature amount. In step S14-5, the copying machine 1 extracts a layout feature amount such as an image feature amount from the bitmap image. Next, in step S14-6, the copying machine 1 calculates the overall similarity by obtaining the layout similarity and the sentence content similarity from the extracted feature amount as described above, and manages the most similar management target. The document ID of the document is specified.

次に、ステップＳ１４−７において、複写機１は、特定した文書ＩＤのセキュリティ情報（図８に示した「文書制御情報」６−４）を文書管理索引データ１８ｂから参照する。次に、ステップＳ１４−８において、複写機１は、参照したセキュリティ情報がコピー処理許可になっているか否かを判定する。ここで、コピー処理が許可されていると判定した場合には、複写機１は、ステップＳ１４−９に進み、通常のコピー処理を行い、ステップＳ１４−１０に進み、抽出された文書特徴情報（文章内容特徴量、レイアウト特徴量）を後述の図１６に示す文書登録処理により、文書管理索引データ１８ｂに登録する処理を行う。また、ステップＳ１４−８でコピー処理が禁止（抑制）されていると判定した場合は、複写機１は、ステップＳ１４−１１に進み、コピー処理抑制の根拠となった管理対象文書の文書ＩＤを表示装置１５に表示する処理を行う。 Next, in step S14-7, the copying machine 1 refers to the security information (“document control information” 6-4 shown in FIG. 8) of the specified document ID from the document management index data 18b. Next, in step S14-8, the copying machine 1 determines whether the referenced security information is permitted to be copied. If it is determined that the copy process is permitted, the copying machine 1 proceeds to step S14-9, performs a normal copy process, proceeds to step S14-10, and extracts the extracted document feature information ( A process of registering text content feature amounts and layout feature amounts) in the document management index data 18b is performed by a document registration process shown in FIG. If it is determined in step S14-8 that the copy process is prohibited (suppressed), the copying machine 1 proceeds to step S14-11 and determines the document ID of the management target document that is the basis for the copy process suppression. Processing to display on the display device 15 is performed.

次に、複写機１におけるＦＡＸ処理について説明する。
図１４は、図１０に示したステップＳ１１−４の処理の一部であるＦＡＸ処理の詳細を示すフローチャートである。尚、図１４におけるステップＳ１５−１〜Ｓ１５−７に示す処理は、上述した図１３におけるステップＳ１４−１〜Ｓ１４−７の処理と同等であるので、説明を省略する。 Next, FAX processing in the copying machine 1 will be described.
FIG. 14 is a flowchart showing details of the FAX process that is a part of the process of step S11-4 shown in FIG. The processes shown in steps S15-1 to S15-7 in FIG. 14 are the same as the processes in steps S14-1 to S14-7 in FIG.

ステップＳ１５−７の次に、ステップＳ１５−８において、複写機１は、セキュリティ情報においてＦＡＸ処理が許可になっているか否かを判定し、許可されていればステップＳ１５−９において通常のＦＡＸ処理を行った後、ステップＳ１５−１０において、抽出された文書特徴情報を後述する図１６に示す文書登録処理により、文書管理索引データ１８ｂに登録する処理を行う。また、ステップＳ１５−８においてＦＡＸ処理が禁止（抑制）されている場合は、複写機１は、ステップＳ１５−１１に進み、ＦＡＸ処理抑制の根拠となった管理対象文書の文書ＩＤを表示装置１５に表示する処理を行う。 Following step S15-7, in step S15-8, the copying machine 1 determines whether or not FAX processing is permitted in the security information. If permitted, normal FAX processing is performed in step S15-9. In step S15-10, the extracted document feature information is registered in the document management index data 18b by a document registration process shown in FIG. If the FAX processing is prohibited (suppressed) in step S15-8, the copying machine 1 proceeds to step S15-11, and displays the document ID of the management target document that is the basis for the FAX processing suppression on the display device 15. Process to be displayed.

次に、複写機１におけるＳＥＮＤ処理について説明する。
図１５は、図１０に示したステップＳ１１−４の処理の一部であるＳＥＮＤ処理の詳細を示すフローチャートである。尚、図１５におけるステップＳ１６−１〜Ｓ１６−７に示す処理は、上述した図１３におけるステップＳ１４−１〜Ｓ１４−７の処理と同等であるので、説明を省略する。 Next, the SEND process in the copying machine 1 will be described.
FIG. 15 is a flowchart showing details of the SEND process which is a part of the process of step S11-4 shown in FIG. The processes shown in steps S16-1 to S16-7 in FIG. 15 are the same as the processes in steps S14-1 to S14-7 in FIG.

ステップＳ１６−７の次に、ステップＳ１６−８において、複写機１は、セキュリティ情報がＳＥＮＤ処理許可になっているか否かを判定し、許可されていればステップＳ１６−９において通常のＳＥＮＤ処理を行った後、ステップＳ１６−１０において、抽出された文書特徴情報を後述する図１６に示す文書登録処理により、文書管理索引データ１８ｂに登録する処理を行う。また、ステップＳ１６−８においてＳＥＮＤ処理が禁止（抑制）されている場合は、ステップＳ１６−１１に進み、複写機１は、ＳＥＮＤ処理抑制の根拠となった管理対象文書の文書ＩＤを表示装置１５に表示する処理を行う。 After step S16-7, in step S16-8, the copying machine 1 determines whether or not the security information is SEND processing permitted. If permitted, the normal SEND processing is performed in step S16-9. In step S16-10, the extracted document feature information is registered in the document management index data 18b by a document registration process shown in FIG. If the SEND process is prohibited (suppressed) in step S16-8, the process advances to step S16-11, and the copying machine 1 displays the document ID of the management target document that is the basis for suppressing the SEND process on the display device 15. Process to be displayed.

次に、図１３〜図１５に示したステップＳ１４−１０、Ｓ１５−１０、Ｓ１６−１０で行う、複写機１における文書登録処理について説明する。
図１６は、複写機１において抽出した文書特徴情報（レイアウト特徴量、文章内容特徴量）を文書管理索引データ１８ｂに登録する処理（文書登録処理）を示すフローチャートである。まず、ステップＳ１７−１において、複写機１は、抽出した文書特徴情報を、文書管理索引データ１８ｂに所定のフォーマットに従って登録する。次に、ステップＳ１７−２において、複写機１は、文書特徴情報に含まれる文章内容特徴量を基に単語重要度テーブル９１を更新する。 Next, document registration processing in the copying machine 1 performed in steps S14-10, S15-10, and S16-10 shown in FIGS.
FIG. 16 is a flowchart showing processing (document registration processing) for registering document feature information (layout feature amount, text content feature amount) extracted in the copying machine 1 in the document management index data 18b. First, in step S17-1, the copying machine 1 registers the extracted document feature information in the document management index data 18b according to a predetermined format. In step S17-2, the copying machine 1 updates the word importance level table 91 based on the sentence content feature amount included in the document feature information.

以上に示した構成により、本実施形態の複写機１は、文章内容の類似性に基づいてコピー処理等の可否を判断できるので、所定の文章内容をもつ文書のコピー処理等禁止を網羅的に文書管理索引データ１８ｂに登録しておけば、内容が類似している文書のコピー処理等を禁止できる。すなわち、文書管理索引データ１８ｂに登録する文書に応じてセキュリティを柔軟に管理することができるので、操作性の高い複写機（文字処理装置）を提供できる。更に、複写機１は、コピー処理等禁止の際には、コピー処理等禁止の根拠となった文書の文書ＩＤを表示できるので、コピー処理等の禁止に対する対処も適切に行うことができる。例えば、セキュリティ管理者にコピー処理等の禁止の解除を依頼する場合に、文書ＩＤが特定されていれば、依頼に応じた解決策を迅速に推進することができる。また、このように、利用者の変更希望（依頼）を簡便に反映可能な柔軟なシステムを構築できる。 With the configuration described above, the copying machine 1 according to the present embodiment can determine whether or not copy processing or the like is possible based on the similarity of the text content, and thus comprehensively prohibits copy processing and the like of a document having a predetermined text content. If registered in the document management index data 18b, copy processing of documents having similar contents can be prohibited. That is, since security can be flexibly managed according to the document registered in the document management index data 18b, a copying machine (character processing device) with high operability can be provided. Furthermore, since the copying machine 1 can display the document ID of the document that is the basis for the prohibition of the copy process or the like when the copy process or the like is prohibited, the copy machine 1 can appropriately deal with the prohibition of the copy process or the like. For example, when the security administrator is requested to cancel the prohibition of copy processing or the like, if the document ID is specified, a solution according to the request can be promptly promoted. In addition, in this way, it is possible to construct a flexible system that can easily reflect the user's change request (request).

（他の実施形態）
なお、本発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない限りにおいて適宜変更が可能である。 (Other embodiments)
The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the gist of the present invention.

上述の実施形態においては、セキュリティの制御形態として、コピー処理、ＦＡＸ処理、及びＳＥＮＤ処理の動作許可／禁止のみを想定しているが、それ以外の実現形態も考えられる。例えば、上記各処理動作の際に文書に透かしを刷り込むか否か、付加情報（「配布禁止」など）を文書に付加するか否か、をセキュリティ制御の選択肢に加えても良い。 In the above-described embodiment, only the operation permission / prohibition of copy processing, FAX processing, and SEND processing is assumed as a security control mode, but other implementation modes are also conceivable. For example, whether or not a watermark is imprinted on the document during each processing operation and whether or not additional information (such as “Distribution prohibited”) is added to the document may be added to the security control options.

また、上記動作禁止の際に表示装置１５に表示するメッセージを指定（「この文書は部外秘です」など）できるようにしてもよい。この場合、文書制御情報に、上記メッセージ指定動作のＯＮ／ＯＦＦ情報を追加し、その情報に応じて、印刷処理、ＦＡＸ処理、ＳＥＮＤ処理、あるいはメッセージの表示処理を行う。このようにすることで、単純に複製を作成しうるかどうかだけでなく、複製物の取扱いに伴う注意を喚起することができ、よりきめ細かなセキュリティ管理を行うことができる。 In addition, a message to be displayed on the display device 15 when the operation is prohibited may be designated (“This document is confidential” or the like). In this case, ON / OFF information of the message designating operation is added to the document control information, and print processing, FAX processing, SEND processing, or message display processing is performed according to the information. In this way, not only whether a replica can be created simply, but also the attention associated with the handling of the replica can be raised, and finer security management can be performed.

また、上述の実施形態においては、レイアウト類似度ＬＳと文章内容類似度ＴＳから求める総合類似度Ｓに基づいて文書管理索引データ１８ｂで管理する全ての管理対象文書との類似性を判定し、最も類似性の高い文書を特定しているが、この際、文章内容類似度ＴＳに掛ける重みαと、レイアウト類似度ＬＳに積算する重みβは、適時変更可能としたが、その具体例を以下に説明する。 Further, in the above-described embodiment, the similarity with all the management target documents managed by the document management index data 18b is determined based on the overall similarity S obtained from the layout similarity LS and the text content similarity TS, A document having a high similarity is specified. At this time, the weight α applied to the sentence content similarity TS and the weight β integrated to the layout similarity LS can be changed in a timely manner. Specific examples thereof are as follows. explain.

例えば、ある状況では文書のレイアウト情報（レイアウト特徴量）はセキュリティ制御に関係がない場合もある。また、逆にレイアウト情報が重要で、文章内容は重要でない場合もある。前者の例は会社内の一般の機密文書、例えば戦略的新製品の企画書、未出願の特許明細書などであり、このような場合、文書に貼り付けられたイラスト、ロゴ、写真などの位置は重要でないことが多い。また、後者の例は紙幣であり、レイアウトの異なるものは一切無視して差し支えない。このような場合、類似性判定の重み付けパラメータα、βを文書ごとに変更することで、セキュリティを考慮した類似性の判断を実現できる。具体的には、パラメータ情報（重み付けパラメータα、βの値）を文書管理索引データ１８ｂに文書ごとに登録して、これを利用するようにすることで実現できる。尚、文書管理索引データ１８ｂに登録するα、βの具体的な値は、文書の登録時にユーザに指定させる。このようにすることで、紙幣から一般文書まで性質の異なる文書に対するコピー処理、ＦＡＸ処理、及びＳＥＮＤ処理などを、適切に制御でき、よりきめ細かなセキュリティ管理を実現できる。 For example, in some circumstances, document layout information (layout feature amount) may not be related to security control. Conversely, there are cases where layout information is important and sentence content is not important. Examples of the former are general confidential documents in the company, such as plans for strategic new products, unpatented patent specifications, etc.In such cases, the location of illustrations, logos, photographs, etc. attached to the documents Is often not important. Moreover, the latter example is a banknote, and those with different layouts can be ignored. In such a case, similarity determination in consideration of security can be realized by changing the weighting parameters α and β for similarity determination for each document. Specifically, this can be realized by registering the parameter information (values of the weighting parameters α and β) in the document management index data 18b for each document and using it. Note that the specific values of α and β registered in the document management index data 18b are specified by the user when the document is registered. By doing so, it is possible to appropriately control copy processing, FAX processing, SEND processing, and the like for documents having different properties from banknotes to general documents, and finer security management can be realized.

また、上述した実施形態において図１０〜１６に示した複写機１の各処理は、各処理の機能を実現する為のプログラムをメモリ（ＲＯＭ１２又はＲＡＭ１３など）から読み出してＣＰＵ１０が実行することによりその機能を実現させるものである。 In the above-described embodiment, each process of the copying machine 1 shown in FIGS. 10 to 16 is performed by reading a program for realizing the function of each process from the memory (ROM 12 or RAM 13) and executing it by the CPU 10. The function is realized.

尚、複写機１において図１０〜１６に示した各処理を実現するためには、上述した構成に限定されるものではなく、複写機１の各処理の全部または一部の機能を専用のハードウェアにより実現してもよい。また、上述したメモリは、ＲＯＭ１２又はＲＡＭ１３に限定されるものではなく、光磁気ディスク装置、フラッシュメモリ等の不揮発性のメモリや、ＣＤ−ＲＯＭ等の読み出しのみが可能な記録媒体、ＲＡＭ以外の揮発性のメモリ、あるいはこれらの組合せによるコンピュータ読み取り、書き込み可能な記録媒体より構成されてもよい。 In order to implement the processes shown in FIGS. 10 to 16 in the copying machine 1, the present invention is not limited to the above-described configuration, and all or some of the functions of each process of the copying machine 1 are performed with dedicated hardware. It may be realized by hardware. The memory described above is not limited to the ROM 12 or the RAM 13, but is a non-volatile memory such as a magneto-optical disk device or a flash memory, a recording medium such as a CD-ROM that can only be read, and a volatile memory other than the RAM. May be configured by a computer readable / writable recording medium using a unidirectional memory or a combination thereof.

また、複写機１の各処理の機能を実現する為のプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各処理を行っても良い。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェア（スキャナ１６やプリンタ１７など）を含むものとする。具体的には、記憶媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書きこまれた後、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含む。 Further, a program for realizing the function of each process of the copying machine 1 is recorded on a computer-readable recording medium, and the program recorded on this recording medium is read into a computer system and executed to execute each process. You can go. Here, the “computer system” includes an OS and hardware such as peripheral devices (scanner 16, printer 17, etc.). Specifically, after a program read from a storage medium is written to a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the program is read based on the instructions of the program. It includes the case where the CPU of the function expansion board or function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a storage device such as a hard disk built in the computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding a program for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現する為のものであっても良い。さらに、前述した機能をコンピュータシステムに既に記録されているプログラムとの組合せで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

また、上記のプログラムを記録したコンピュータ読み取り可能な記録媒体等のプログラムプロダクトも本発明の実施形態として適用することができる。上記のプログラム、記録媒体、伝送媒体およびプログラムプロダクトは、本発明の範疇に含まれる。
以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Further, a program product such as a computer-readable recording medium in which the above program is recorded can also be applied as an embodiment of the present invention. The above program, recording medium, transmission medium, and program product are included in the scope of the present invention.
The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

本発明の一実施形態における複写機（情報処理装置）の概略構成を示すブロック図である。1 is a block diagram illustrating a schematic configuration of a copying machine (information processing apparatus) according to an embodiment of the present invention. 図１に示した複写機１における操作の流れの例を示した図である。FIG. 2 is a diagram showing an example of an operation flow in the copying machine 1 shown in FIG. 1. 図１に示した複写機１の表示装置１５の画面遷移例を示した図である。FIG. 2 is a diagram showing a screen transition example of the display device 15 of the copying machine 1 shown in FIG. 1. 本実施形態の複写機１が備える文書解析機能の例を示す図である。2 is a diagram illustrating an example of a document analysis function provided in the copying machine 1 according to the present embodiment. FIG. 複写機１におけるテキストブロック４−２から抽出したテキスト情報例、及びテキスト情報から抽出されたキーワードデータ例を示す図である。3 is a diagram illustrating an example of text information extracted from a text block 4-2 in the copying machine 1 and an example of keyword data extracted from the text information. FIG. 図１に示した文書管理索引データ１８ｂの構成例を示す図である。It is a figure which shows the structural example of the document management index data 18b shown in FIG. 図６に示した「文章内容特徴量」６−３の構成例を示す図である。It is a figure which shows the structural example of "text content feature-value" 6-3 shown in FIG. 図６に示した「文書制御情報」６−４の構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of “document control information” 6-4 illustrated in FIG. 6. 本実施形態における単語重要度テーブルの一例を示す図である。It is a figure which shows an example of the word importance degree table in this embodiment. 複写機１の動作、より具体的にはＣＰＵ１０の処理手順を示すフローチャートである。4 is a flowchart showing an operation of the copying machine 1, more specifically, a processing procedure of the CPU 10. 図１０に示したステップＳ１１−４の処理の一部である印刷処理の詳細を示すフローチャートである。11 is a flowchart showing details of a printing process that is a part of the process of step S11-4 shown in FIG. 図１０に示したステップＳ１１−４の処理の一部である文書管理情報設定処理の詳細を示すフローチャートである。FIG. 11 is a flowchart showing details of a document management information setting process that is a part of the process of step S11-4 shown in FIG. 10; FIG. 図１０に示したステップＳ１１−４の処理の一部であるコピー処理の詳細を示すフローチャートである。11 is a flowchart showing details of a copy process that is a part of the process of step S11-4 shown in FIG. 図１０に示したステップＳ１１−４の処理の一部であるＦＡＸ処理の詳細を示すフローチャートである。11 is a flowchart illustrating details of a FAX process that is a part of the process of Step S11-4 illustrated in FIG. 10. 図１０に示したステップＳ１１−４の処理の一部であるＳＥＮＤ処理の詳細を示すフローチャートである。11 is a flowchart showing details of a SEND process that is a part of the process of step S11-4 shown in FIG. 複写機１において抽出した文書特徴情報（レイアウト特徴量、文章内容特徴量）を文書管理索引データ１８ｂに登録する処理（文書登録処理）を示すフローチャートである。6 is a flowchart showing processing (document registration processing) for registering document feature information (layout feature amount, text content feature amount) extracted in the copying machine 1 in the document management index data 18b.

Explanation of symbols

１複写機
１０ＣＰＵ
１１バス
１２ＲＯＭ
１３ＲＡＭ
１４入力装置
１５表示装置
１６スキャナ
１７プリンタ
１８ＨＤ（ハードディスク）
１９リムーバブル外部記憶装置
２０通信装置 1 Copying machine 10 CPU
11 Bus 12 ROM
13 RAM
14 Input device 15 Display device 16 Scanner 17 Printer 18 HD (Hard disk)
19 Removable External Storage Device 20 Communication Device

Claims

Reading means for reading a document in response to a processing request input by the input means, and outputting the read document as image data;
Character information extraction means for performing character recognition processing by specifying a character posting portion from the image data output by the reading means, and outputting first character information based on the recognized character data;
As information related to the control target document that is a target for controlling the document processing, second character information that is information related to characters included in the control target document, and one or more types of the document for the control target document Information storage means for storing at least control information for determining which of the processes is to be controlled;
Based on the first character information output by the character information extraction unit and the second character information of the control target document referenced from the information storage unit, the document contents of the manuscript and the control target document are Similarity determination means for determining whether or not they are similar;
A control target document whose document content is similar to that of the original is specified in accordance with the determination of the similarity determination unit, and the control information of the control target document specified from the information storage unit is referred to in response to the processing request for the original. An information processing apparatus comprising: processing determination means for determining whether or not document processing is possible.

Layout information extracting means for extracting information relating to the layout of the document from the image data output by the reading means and outputting first layout information;
The information storage means further stores second layout information which is information relating to a layout of the control target document;
The similarity determination means is based on the first layout information output by the layout information extraction means and the second layout information referred to by the information storage means together with the similarity determination of the document contents. The information processing apparatus according to claim 1, wherein similarities of layouts are also determined.

The said 1st character information and said 2nd character information are the information regarding the specific keyword extracted from the character data of the said original and the said control object document, The Claim 1 or Claim 2 characterized by the above-mentioned. Information processing device.

The similarity determination means assigns a weight to each of the specific keywords, and whether the document contents of the document and the control target document are similar based on the first character information and the second character information The information processing apparatus according to claim 3, wherein it is determined whether or not.

Display means;
Document identification information, which is information for identifying the control target document specified by the process determining unit according to the determination by the similarity determining unit when the process determining unit determines to prohibit the document processing according to the processing request. The information processing apparatus according to claim 1, further comprising display control means for displaying on the display means.

The display control means includes a control information change screen including at least an input means for inputting the document identification information of the control target document and a setting change means capable of changing the control information of the control target document. The information processing apparatus according to claim 5, wherein the information processing apparatus is displayed.

A reading step of reading a document in response to a processing request input by the input means, and outputting the read document as image data;
A character information extraction step of performing character recognition processing by specifying a character placement portion from the image data output in the reading step, and outputting first character information based on the recognized character data;
As information related to the control target document that is a target for controlling the document processing, second character information that is information related to characters included in the control target document, and one or more types of the document for the control target document The first character information is output in the character information extraction step by referring to the second character information of the control target document from an information storage means for storing at least control information for determining which of the processing is to be controlled. A similarity determination step for determining whether or not the document content of the document to be controlled is similar to the character information of
In response to the determination of the similarity determination step, a control target document whose document content is similar to that of the original is specified, and the control information of the control target document specified from the information storage unit is referred to in response to the processing request for the original. An information processing method comprising: a process determination step for determining whether or not document processing is possible.

A program for an information processing device,
A reading step of reading an original in accordance with a processing request input by an input means and outputting image data;
A character information extraction step of performing character recognition processing by specifying a character placement portion from the image data output in the reading step, and outputting first character information based on the recognized character data;
As information related to the control target document that is a target for controlling the document processing, second character information that is information related to characters included in the control target document, and one or more types of the document for the control target document The first character information is output in the character information extraction step by referring to the second character information of the control target document from an information storage means for storing at least control information for determining which of the processing is to be controlled. A similarity determination step for determining whether or not the document content of the document to be controlled is similar to the character information of
In response to the determination of the similarity determination step, a control target document whose document content is similar to that of the original is specified, and the control information of the control target document specified from the information storage unit is referred to in response to the processing request for the original. A program for causing a computer to execute a process determination step for determining whether document processing is possible.