JP2010026987A

JP2010026987A - Network document management system

Info

Publication number: JP2010026987A
Application number: JP2008191006A
Authority: JP
Inventors: Koji Inose; 康二猪瀬
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-07-24
Filing date: 2008-07-24
Publication date: 2010-02-04

Abstract

<P>PROBLEM TO BE SOLVED: To reduce the load of retrieval by user's final visual observation by narrowing down first similar business form images in a system for storing a log while associating the log with a printed image and performing image retrieval to acquire a desired log. <P>SOLUTION: The system for associating an image with a print log to manage them includes: a means for extracting the name of a business form template, data inherent to the business form, and the coordinates of the data; a means for transmitting the same information to a retrieval server; and a means for printing the business form template. Especially, the retrieval server includes a means for collating the data inherent to the business form with data obtained by recognizing the image to specify the image. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、履歴管理、特に画像処理装置において履歴情報、画像データ、テキストデータを文書管理サーバへ効率よく格納、管理するネットワーク文書管理システムに関するものである。 The present invention relates to history management, and more particularly to a network document management system for efficiently storing and managing history information, image data, and text data in a document management server in an image processing apparatus.

デジタル複合機の普及に従って、誰でも原稿の印刷、複写や送信が容易に行うことが可能になっている。しかし、これらはユーザにとっての利便性向上の反面、機密原稿などの印刷や複写、送信といった情報漏洩の問題が新たに発生している。このような問題への対策として、印刷や複写、FAX、E-mail送信などの際に、読み取った画像データ、及びテキストデータをすべて記憶装置に蓄積しておく文書管理システムが存在する。これにより、情報漏洩した原稿が、どのプリンタやデジタル複合機で処理されたかを、管理者が前記蓄積したデータを確認することによって、いつ、どこで、どのような処理をされたかを追跡可能になるといった効果がある。 With the widespread use of digital multi-function peripherals, anyone can easily print, copy, and transmit originals. However, while these improve convenience for users, new information leakage problems such as printing, copying, and transmission of confidential manuscripts have arisen. As a countermeasure against such a problem, there is a document management system that stores all read image data and text data in a storage device during printing, copying, FAX, e-mail transmission, and the like. This makes it possible to track when, where, and what processing has been performed by the administrator by checking the stored data to determine which printer or digital multi-function peripheral has processed the document with the leaked information. There is an effect.

一方、印刷物を拾得し、画像読取装置を用いて画像検索をすることで、関連付いた「いつ、どこで、どのような」を調査する過程において、OCRによって、画像データの特定の位置の画像を文字列として抽出した結果と、予め画像登録時に関連付けた文字列とを照合することで画像、即ちその画像が「いつ、どこで、どのように」印刷されたかを特定する技術は公知で、特にその精度を高めるために、OCRによる誤認識を考慮して、画像に対して複数の文字列を関連付けておくことで、ヒット率を高める技術がある。例えば特許文献１がその代表である。
特開平１０−３１７２７号公報 On the other hand, in the process of investigating the associated “when, where, and what” by picking up printed matter and performing image search using an image reader, the image at a specific position in the image data is obtained by OCR. A technique for identifying an image, that is, when the image was printed “when, where, and how” by collating a result extracted as a character string with a character string associated in advance at the time of image registration is known. In order to improve accuracy, there is a technique for increasing the hit rate by associating a plurality of character strings with an image in consideration of erroneous recognition by OCR. For example, Patent Document 1 is a representative example.
JP 10-31727 A

帳票のように、業務に依存した特定のテンプレートに対してバリアブルデータを重ね合わせる画像の場合、バリアブルデータの差異がほとんどない画像データが蓄積されるケースも少なくない。例えば特定の商品に関する見積書などは、特定の担当者によって日を問わず何回にも渡り印刷されることが考えられ、作成した日付、時刻を表す文字列のみが異なるだけで他の文字列や画像が同一という特徴を有する見積書画像が蓄積される。このような類似した画像が多い中から、画像の特徴だけを手掛かりに所望の画像を検索すると、多くの画像が候補として提示され、最終的に画像の一致性を目視確認する作業が極めて困難となる。 In the case of an image in which variable data is superimposed on a specific template depending on the business, such as a form, there are not a few cases in which image data with almost no difference in variable data is accumulated. For example, it is conceivable that a quote for a specific product is printed many times by a specific person regardless of the date, and only the character strings representing the created date and time are different. And the estimate image having the characteristic that the images are the same. When there are many similar images and a desired image is searched using only the characteristics of the image as a clue, many images are presented as candidates, and it is extremely difficult to finally visually check the matching of the images. Become.

本状況において先行技術は、所望の画像データにヒットする確率を高める効果はあるが、同時に他の画像がヒットする確率を高める作用もあることから、最終的に類似画像の一致性を目視確認する必要があるという課題は依然として残る。仮にOCRによって読み取る、印刷物上の箇所を増やすことで絞込む効果を得るにも、予め帳票画像毎にその位置情報を登録しておく必要があり、手間である。 In this situation, the prior art has the effect of increasing the probability of hitting the desired image data, but also has the effect of increasing the probability of hitting other images at the same time. The issue of need remains. Even in order to obtain the effect of narrowing down by increasing the number of places on the printed matter that are read by OCR, it is necessary to register the position information for each form image in advance, which is troublesome.

上述した課題を解決するために本発明は、プリンタやデジタル複合機のジョブのコンテンツデータ（画像データ/テキストデータ）をログ情報とともにサーバに保存し、後にジョブの追跡が可能なシステムにおいて、
最終的にサーバに格納されるすべての画像データについて、帳票生成操作と関連付いた、例えば帳票を生成した日付等、個々の画像データが固有に有する情報をテキストデータとして抽出する、帳票生成操作関連情報抽出手段と、
前記帳票生成操作関連情報抽出手段によって得られたテキストの、画像データ上の位置を抽出する、固有テキスト位置抽出手段と、
画像データの元となる帳票用テンプレートに関する情報を、画像データの固定の位置に印字する、テンプレート情報印字手段と、
前記帳票生成操作関連テキスト情報抽出手段によって得られたテキストと、前記固有テキスト位置抽出手段によって得られた位置情報と、元となる画像データと、画像データの元となる帳票用テンプレートの情報を関連づけておく関連付け手段と、
印刷物を画像読み取り装置によって読み取った結果の一部である、前記テンプレート情報印字手段によって印字されたテンプレート情報と、前記関連付け手段から、対応する帳票固有のテキスト情報（第一のテキスト情報とする）と、第一のテキスト情報の画像データ上での位置を抽出し、さらに印刷物を画像読み取り装置によって読み取った結果の一部である、第一のテキスト情報の画像データ上での位置と同一の位置に在る、第一のテキスト情報に対応するテキスト情報（第二のテキスト情報とする）を抽出し、第一のテキスト情報と第二のテキスト情報が一致するような画像データを特定する、画像データ特定手段を有することを特徴とする。 In order to solve the above-described problems, the present invention is a system in which content data (image data / text data) of a job of a printer or digital multifunction peripheral is stored in a server together with log information, and the job can be traced later.
For all image data finally stored in the server, related to the form generation operation, for example, the information that each image data has uniquely, such as the date when the form was generated, is extracted as text data. Information extraction means;
Unique text position extracting means for extracting the position on the image data of the text obtained by the form generation operation related information extracting means;
Template information printing means for printing information about the form template that is the source of the image data at a fixed position of the image data;
Associating the text obtained by the form generation operation related text information extracting means, the position information obtained by the unique text position extracting means, the original image data, and the information of the form template from which the image data is based Association means to keep
The template information printed by the template information printing means, which is a part of the result of reading the printed matter by the image reading device, and the corresponding form-specific text information (referred to as first text information) from the association means, The position of the first text information on the image data is extracted, and further, the printed material is read by the image reading device, and is the same position as the position of the first text information on the image data. Extracting text information corresponding to the first text information (assumed to be second text information), and specifying image data that matches the first text information and the second text information, image data It has a specific means.

本発明により、類似する帳票画像から所望の画像、即ちその画像をいつ、どこで、どのように印刷したかを絞り込むことが可能となる。 According to the present invention, it is possible to narrow down a desired image from similar form images, that is, when, where and how the image was printed.

次に、本発明の詳細を実施例の記述に従って説明する。 Next, details of the present invention will be described in accordance with the description of the embodiments.

図１は本発明の実施の一形態に係るシステム構成を表す図である。ここではネットワーク上にデジタル複合機101、画像処理サーバ102、データサーバ103、ユーザPC104、プリンタドライバ105、プリントサーバ106、プリンタ107、画像検索サーバ108が接続されている。 FIG. 1 is a diagram showing a system configuration according to an embodiment of the present invention. Here, a digital multi-function peripheral 101, an image processing server 102, a data server 103, a user PC 104, a printer driver 105, a print server 106, a printer 107, and an image search server 108 are connected to the network.

デジタル複合機101はスキャナ・プリンタ機能、アプリケーション実行環境などを有している。デジタル複合機101は帳票生成実行したジョブに関して、同時にそのジョブの電子的な画像データを生成する。デジタル複合機101はこの画像データを即時、或いは一旦自身の記憶装置に保存したあと、画像処理サーバ102に転送する。 The digital multifunction peripheral 101 has a scanner / printer function, an application execution environment, and the like. The digital multi-function peripheral 101 generates electronic image data of the job at the same time as the job for which the form generation is executed. The digital multi-function peripheral 101 transfers the image data to the image processing server 102 immediately or once after storing it in its storage device.

ユーザPC104は一般的なクライアントPCである。ユーザPC104では印刷を実行したとき、プリンタドライバ105が同時に電子的な画像データを生成する。このときさらにプリンタドライバ105は文字描画命令として送られてきた文字列を、実行ジョブのテキストデータとして同時に抽出することも考えられ、その場合は画像データとテキストデータを関連付けておく。ユーザPC104はこれらのデータを即時、或いは一旦自身の記憶装置、又はプリントサーバ106の記憶装置に保存したあと、画像処理サーバ102に転送する。 The user PC 104 is a general client PC. When printing is performed on the user PC 104, the printer driver 105 simultaneously generates electronic image data. At this time, the printer driver 105 may simultaneously extract the character string sent as the character drawing command as the text data of the execution job. In this case, the image data and the text data are associated with each other. The user PC 104 immediately or temporarily saves the data in its own storage device or the storage device of the print server 106, and then transfers it to the image processing server 102.

画像処理サーバ102は上記転送されてきた画像データに対し、データサーバ103に格納できる形式に画像変換処理を行い、データサーバ103へ転送する。 The image processing server 102 performs image conversion processing on the transferred image data in a format that can be stored in the data server 103 and transfers the image data to the data server 103.

本発明の特徴的な処理である、帳票固有のテキストデータ、同座標、帳票テンプレート名称を、画像データと関連付けて保存するデータサーバは103に位置し、OCRによって抽出されたテンプレート情報を基に所望の画像データを抽出する画像検索サーバは108に位置する。 A data server that stores the form-specific text data, the same coordinates, and the form template name in association with image data, which is characteristic processing of the present invention, is located at 103, and is desired based on the template information extracted by OCR. An image search server for extracting the image data is located at 108.

図２、図３、図４、図５を用いて、デジタル複合機（101）、あるいはPC（104）において帳票生成する際に、検索サーバ（108）にて検索を実施する際に必要な情報を抽出、印字、データサーバ（103）へ登録する処理について説明する。 Information required for performing a search in the search server (108) when generating a form in the digital multi-function peripheral (101) or the PC (104) using FIG. 2, FIG. 3, FIG. 4, and FIG. The process of extracting, printing, and registering to the data server (103) will be described.

S201は帳票を表す。S202、S203は帳票毎に同一あるいは異なる、両方の可能性のある値が挿入される帳票テンプレート上のフィールドを表している。S204は帳票生成操作と関連する、帳票固有の値が挿入されることが予めわかっているフィールドを表している。S205は帳票テンプレートを特定する情報を印字するフィールドであり、同名称が値として印字されていることを表している。S206、S207は、S204の位置、即ち帳票固有の値の印字位置を表す。 S201 represents a form. S202 and S203 represent fields on the form template in which the same or different values for both forms are inserted. S204 represents a field that is known in advance to be inserted with a form-specific value related to the form generation operation. S205 is a field for printing information for specifying a form template, and indicates that the same name is printed as a value. S206 and S207 represent the position of S204, that is, the print position of the value specific to the form.

S301は帳票を表す。S201との差異は、S204に対応するS304の値のみである。S302はS202に、S303はS203に、S305はS205に、S306はS206に、S307はS207にそれぞれ対応する。これらの情報が送付された、データサーバ上の管理テーブルを図４に示す。図２の帳票がS401の“01”、図３の帳票がS401の"002"に対応する。S402はS205、S305に、S403はS204、S304に、S404はS206、S207の具体的な座標、およびS306、S307の具体的な座標に、S405はS204に挿入された値、S304に挿入された値にそれぞれ対応する。S406は実際に知りたい「いつ、どこで、どのように」印刷したかの情報であるが、本提案の特徴と直接関係がないので、詳細な説明は省略する。 S301 represents a form. The difference from S201 is only the value of S304 corresponding to S204. S302 corresponds to S202, S303 corresponds to S203, S305 corresponds to S205, S306 corresponds to S206, and S307 corresponds to S207. FIG. 4 shows a management table on the data server to which these pieces of information are sent. The form in FIG. 2 corresponds to “01” in S401, and the form in FIG. 3 corresponds to “002” in S401. S402 in S205 and S305, S403 in S204 and S304, S404 in the specific coordinates of S206 and S207, and S306 and S307 in specific coordinates, S405, the value inserted in S204, and inserted in S304 Each corresponds to a value. S406 is information about the “when, where, and how” the print is actually desired, but since it is not directly related to the feature of the present proposal, a detailed description is omitted.

図５はデータサーバ（103）へ登録する処理のフローチャートである。まず処理を開始する（S501）。必要な情報を受信し（S502）、既に登録済みの帳票テンプレートか否かを判定（S503）、登録済みでない初めて受信したテンプレートであればテンプレート名称を登録、帳票固有の値を有するフィールドの名称、座標を関連付ける（S504）。図２、図３、図４の例では同フィールドが１個だけだが、１つの帳票に一般に複数存在してもよい。最後に、画像データに、テンプレート名称と帳票固有の値を関連付けて保存する（S505）。 FIG. 5 is a flowchart of processing for registration in the data server (103). First, processing is started (S501). Receive necessary information (S502), determine whether it is already registered form template (S503), register the template name if it is the first received template not registered, the name of the field having a form-specific value, Associate coordinates (S504). In the example of FIGS. 2, 3, and 4, there is only one field, but a plurality of fields may generally exist in one form. Finally, the template name and the form-specific value are stored in association with the image data (S505).

図６を用いて、画像検索時に所望の帳票に絞り込む処理を説明する。 A process of narrowing down to a desired form at the time of image search will be described with reference to FIG.

検索処理を開始する（S601）。印刷物を画像読取装置にかけ、固定位置に印字されている帳票テンプレートの名称をOCRにかけ抽出する（S602）。既に登録されている帳票テンプレートか否かを判定し（S603）、登録がなければ処理を終了する。登録があれば、関連して登録されている、帳票固有の値が挿入されるフィールドの名称、座標、帳票固有の値を抽出（S604）し、読み取り画像の、抽出した座標位置に対してOCRをかけて帳票固有のデータを抽出して、S604で得た値と比較する（S605）。S604では一般に複数の画像に関連した値が抽出され、S606で帳票固有の値が合致した画像を特定、本システムの本来の趣旨である印刷ログを検索者に提示して、処理を終了する。 Search processing is started (S601). The printed material is applied to the image reading apparatus, and the name of the form template printed at the fixed position is extracted by applying the OCR (S602). It is determined whether or not the form template has already been registered (S603). If there is no registration, the process is terminated. If registered, the name, coordinates, and form-specific value of the field to which the form-specific value is registered are extracted (S604), and OCR is extracted for the extracted coordinate position of the scanned image. Is used to extract the form-specific data and compare it with the value obtained in S604 (S605). In S604, values related to a plurality of images are generally extracted. In S606, an image that matches a form-specific value is identified, a print log that is the original purpose of the system is presented to the searcher, and the process ends.

システム構成図System Configuration 帳票例１Form example 1 帳票例２Form example 2 データサーバ上の帳票画像管理テーブルForm image management table on the data server データサーバ登録フローチャートData server registration flowchart 画像検索フローチャートImage search flowchart

Claims

The content data (image data / text data) of the job of the printer or digital multi-function peripheral can be saved on the server together with log information, and the job can be traced later. The job is composed of the form template and the form data. In the resulting network document management system,
For all image data finally stored in the server, a form generation operation related information extracting means for extracting, as text data, information inherent to each image data associated with the form generation operation;
Unique text position extracting means for extracting the position on the image data of the text obtained by the form generation operation related information extracting means;
Template information printing means for printing information about the form template that is the source of the image data at a fixed position of the image data;
Associating the text obtained by the form generation operation related text information extracting means, the position information obtained by the unique text position extracting means, the original image data, and the information of the form template from which the image data is based And a network document management system characterized by comprising association means.

The template information printed by the template information printing means, which is a part of the result of reading the printed matter by the image reading device, and the corresponding form-specific text information (referred to as first text information) from the association means, The position of the first text information on the image data is extracted, and further, the printed material is read by the image reading device, and is the same position as the position of the first text information on the image data. Extracting text information corresponding to the first text information (assumed as the second text information), and specifying image data that matches the first text information and the second text information, image data The network document management system according to claim 1, further comprising a specifying unit.