JP6492622B2

JP6492622B2 - Character image processing system, information processing apparatus, and control program for information processing apparatus

Info

Publication number: JP6492622B2
Application number: JP2014257750A
Authority: JP
Inventors: 鷲尾　宏司; 宏司鷲尾
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2014-12-19
Filing date: 2014-12-19
Publication date: 2019-04-03
Anticipated expiration: 2034-12-19
Also published as: JP2016118909A

Description

本発明は、文字画像処理システム、情報処理装置、および情報処理装置の制御プログラムに関する。より特定的には、本発明は、情報処理装置とＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）装置とを備えた文字画像処理システム、情報処理装置、および情報処理装置の制御プログラムに関する。 The present invention is a character image processing system, information processing apparatus, and a control program of the information processing apparatus. More particularly, the present invention relates to an information processing apparatus and the OCR (Optical Character Recognition) text image processing system comprising a device, information processing apparatus, and a control program of the information processing apparatus.

画像形成装置の一つであるＭＦＰ（ＭｕｌｔｉｆｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）は、スキャナー機能、ファクシミリ機能、複写機能、プリンターとしての機能、データ通信機能、およびサーバー機能を備えている。 An MFP (Multifunction Peripheral), which is one of image forming apparatuses, includes a scanner function, a facsimile function, a copying function, a printer function, a data communication function, and a server function.

近年のＭＦＰには、スキャンした画像データを用いてサーチャブルＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）（登録商標）を作成する機能が搭載されているものがある。サーチャブルＰＤＦとは、スキャンした原稿の画像に含まれる文字を、ＯＣＲ処理によってテキストデータ化し、そのテキストデータを原稿画像に合成することによって得られるＰＤＦファイルである。サーチャブルＰＤＦは、ベースのレイヤーと、その上にある透明レイヤーとを含んでいる。ベースのレイヤーは、ＪＰＥＧ形成などの画像データよりなっている。透明レイヤーは、ＯＣＲ処理によって得られたテキストデータよりなっている。 Some recent MFPs have a function of creating a searchable PDF (Portable Document Format) (registered trademark) using scanned image data. The searchable PDF is a PDF file obtained by converting characters included in a scanned document image into text data by OCR processing and combining the text data with the document image. The searchable PDF includes a base layer and a transparent layer above the base layer. The base layer includes image data such as JPEG formation. The transparent layer is composed of text data obtained by OCR processing.

サーチャブルＰＤＦによれば、文書内の文字（キーワード）検索が可能である。また、文書内の文字を他のデジタル文書にコピーアンドペーストすることが可能である。したがって、文書を電子化するために、紙文書の文字をタイピングする作業が不要になる。 Searchable PDF enables character (keyword) search in a document. It is also possible to copy and paste characters in a document into another digital document. Therefore, it is not necessary to type characters in a paper document in order to digitize the document.

なお、サーチャブルＰＤＦの作成に関する技術は、たとえば下記特許文献１などに開示されている。 In addition, the technique regarding creation of searchable PDF is disclosed by the following patent document 1, etc., for example.

サーチャブルＰＤＦを作成するためには、上述のようにＯＣＲ処理が必要である。ＯＣＲ処理はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）に大きな負荷をかける。このため、ＭＦＰでＯＣＲ処理を行う場合には、ＭＦＰの他の動作（たとえば、コピー動作、スキャン動作、プリント動作、またはファクシミリの送受信など）に支障をきたすおそれがある。 In order to create a searchable PDF, OCR processing is required as described above. OCR processing places a heavy load on the CPU (Central Processing Unit). For this reason, when OCR processing is performed in the MFP, there is a possibility that other operations of the MFP (for example, copy operation, scan operation, print operation, facsimile transmission / reception, etc.) may be hindered.

ＭＦＰは、サーチャブルＰＤＦを作成するための処理のうち、スキャンした画像データを作成する処理、文字領域を判別する処理、およびＰＤＦ画像を作成する処理を、ハードウェアによって行う。このため、ＭＦＰは、これらの処理を瞬時に行うことができる。一方、ＭＦＰはＯＣＲ処理をソフトウェアによって行う。このため、ＭＦＰはＯＣＲ処理のためにＣＰＵを長時間占有する傾向にある。 Among the processes for creating a searchable PDF, the MFP performs a process for creating scanned image data, a process for determining a character area, and a process for creating a PDF image by hardware. Therefore, the MFP can perform these processes instantaneously. On the other hand, the MFP performs OCR processing by software. For this reason, the MFP tends to occupy the CPU for a long time for OCR processing.

ＯＣＲ処理によるＣＰＵの長時間の占有を回避する技術は、たとえば下記特許文献２などに開示されている。下記特許文献２の技術では、ＭＦＰの稼働時に、ＯＣＲ処理による計算負荷を管理し、ＯＣＲ処理を制御する技術が開示されている。しかし、この技術は、ＯＣＲ処理以外のＭＦＰの動作を優先するためにＯＣＲ処理の優先順位を下げるものである。このため、サーチャブルＰＤＦの作成に時間を要するという問題があった。 A technique for avoiding the occupation of the CPU for a long time by OCR processing is disclosed in, for example, Patent Document 2 below. In the technique of Patent Document 2 below, a technique for managing the calculation load by OCR processing and controlling the OCR processing when the MFP is operating is disclosed. However, this technique lowers the priority of the OCR process in order to prioritize the operation of the MFP other than the OCR process. Therefore, there is a problem that it takes time to create a searchable PDF.

そこで、インターネット上にある外部サーバーのＯＣＲサイトを利用する技術が提案されている。この技術は、たとえば下記特許文献３に開示されている。外部サーバーは、個人に対して提供するサービスの一つとして、記憶領域を個人に解放している。ＯＣＲサイトは、クライアント（ＭＦＰ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、または携帯端末など）から受信した画像データを、外部サーバーに転送する。外部サーバーは、転送された画像データに対してＯＣＲ処理を行い、得られたテキストデータを記憶する。クライアントは、テキストデータを閲覧したり、取得したりすることができる。このＯＣＲサイトを利用することにより、ＯＣＲ処理によるＣＰＵの長時間の占有を回避することができる。 Therefore, a technique for using an OCR site of an external server on the Internet has been proposed. This technique is disclosed in Patent Document 3 below, for example. The external server releases the storage area to the individual as one of the services provided to the individual. The OCR site transfers image data received from a client (such as an MFP, a PC (Personal Computer), or a portable terminal) to an external server. The external server performs OCR processing on the transferred image data and stores the obtained text data. The client can browse and acquire text data. By using this OCR site, it is possible to avoid long-time occupation of the CPU due to OCR processing.

特開２０１２−７３７４９号公報JP 2012-73749 A 特開２０１３−１６１２６８号公報JP 2013-161268 A 特開２０１０−９１３１号公報JP 2010-9131 A

しかしながら、外部のＯＣＲサイトを利用してＯＣＲ処理を行う場合には、機密情報が漏洩しやすいという問題があった。機密情報の漏洩は、ＯＣＲサイトへ第三者が不正にアクセスすることが原因である。 However, when performing OCR processing using an external OCR site, there is a problem that confidential information is likely to leak. The leakage of confidential information is caused by unauthorized access by a third party to the OCR site.

通常、ＯＣＲサイトでは、ＳＳＬ（ＳｅｃｕｒｅＳｏｃｋｅｔｓＬａｙｅｒ）などを用いて、ログインのためのＩＤおよびパスワードが管理している。しかし、第三者がこれらのＩＤやパスワードをパケットスニッフィングなどの方法で盗んだ場合、第三者は、ＯＣＲサイトの外部サーバーに記憶されているの個人情報や原稿画像を閲覧したり、ダウンロードしたりすることが可能になる。また第三者は、ＯＣＲサイトの外部サーバーと個人の端末との間で送受信したデータを閲覧したり、ダウンロードしたりすることが可能になる。 Usually, at the OCR site, the ID and password for login are managed using SSL (Secure Sockets Layer) or the like. However, if a third party steals these IDs or passwords by means of packet sniffing or the like, the third party can view or download personal information and manuscript images stored on an external server of the OCR site. It becomes possible to do. Also, a third party can view or download data transmitted / received between an external server of the OCR site and a personal terminal.

本発明は、上記課題を解決するためのものであり、その目的は、機密情報の漏洩を抑止することのできる文字画像処理システム、情報処理装置、および情報処理装置の制御プログラムを提供することである。 The present invention is intended to solve the above problems, its object is to provide a character imaging system that can suppress the leakage of confidential information, information processing apparatus, and a control program of an information processing apparatus It is.

本発明の一の局面に従う文字画像処理システムは、第１の情報処理部と、第１の情報処理部とネットワークを介して通信可能なＯＣＲ機能を有する第２の情報処理部と備えた文字画像処理システムであって、第１の情報処理部は、画像データ内の文字領域を複数の画像ブロックに分割する画像ブロック作成手段と、複数の画像ブロックの配列順序を変更する配列順序変更手段と、配列順序変更手段にて配列順序を変更した後の複数の画像ブロックの各々の間に連結用画像を挿入する挿入手段と、配列順序変更手段にて配列順序を変更した後の複数の画像ブロックに基づいて作成された暗号化画像であって、連結用画像が挿入された複数の画像ブロックを含む暗号化画像を第２の情報処理部へ送信する第１の送信手段とを含み、第２の情報処理部は、暗号化画像に対してＯＣＲ処理を行うことにより、第１のテキストデータを作成するＯＣＲ処理手段と、第１のテキストデータを含むＯＣＲ後データを第１の情報処理部に送信する第２の送信手段とを含み、第１の情報処理部はさらに、連結用画像に基づいてＯＣＲ後データを複数の文字列に分解し、連結用画像に相当する文字をＯＣＲ処理後データから削除することにより、ＯＣＲ後データに基づいて第２のテキストデータを作成する作成手段と、画像データ内の文字領域のそれぞれ対応する位置に第２のテキストデータを貼り付ける貼付手段とを含む。 A character image processing system according to one aspect of the present invention includes a first information processing unit and a second image processing unit having an OCR function capable of communicating with the first information processing unit via a network. In the processing system, the first information processing unit includes an image block creating unit that divides a character area in the image data into a plurality of image blocks, an arrangement order changing unit that changes an arrangement order of the plurality of image blocks, Inserting means for inserting a connecting image between each of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means, and a plurality of image blocks after the arrangement order is changed by the arrangement order changing means A first transmission unit configured to transmit an encrypted image generated based on the encrypted image including a plurality of image blocks into which the connection image is inserted to the second information processing unit; Information processing Performs OCR processing on the encrypted image, thereby generating OCR processing means for creating first text data and second OCR data including the first text data to the first information processing unit. The first information processing unit further decomposes the post-OCR data into a plurality of character strings based on the concatenated image, and deletes characters corresponding to the concatenated image from the post-OCR processed data. Thus, a creation means for creating the second text data based on the post-OCR data and a pasting means for pasting the second text data to the corresponding positions of the character areas in the image data are included.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、画像データ内の文字領域を特定し、画像データ内の文字領域の座標を特定する文字領域特定手段をさらに含み、貼付手段は、座標に基づいて第２のテキストデータを貼り付ける。 Preferably, in the character image processing system, the first information processing unit further includes character region specifying means for specifying a character region in the image data and specifying the coordinates of the character region in the image data. The second text data is pasted based on the coordinates.

上記文字画像処理システムにおいて好ましくは、貼付手段は、第２のテキストデータを、画像データの透明レイヤーにおける文字領域に対応する位置に貼り付ける。 Preferably, in the character image processing system, the pasting unit pastes the second text data at a position corresponding to the character region in the transparent layer of the image data.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、セキュリティーレベルの設定を受け付けるレベル受付手段をさらに含み、画像ブロック作成手段は、レベル受付手段にて受け付けたレベルに応じて決定されたサイズの複数の画像ブロックに、文字領域を分割する。 Preferably, in the character image processing system, the first information processing unit further includes a level receiving unit that receives a security level setting, and the image block creating unit is determined according to the level received by the level receiving unit. A character area is divided into a plurality of image blocks of a size.

上記文字画像処理システムにおいて好ましくは、第２の情報処理部は、第１のＯＣＲ装置と、第１のＯＣＲ装置とは別の第２のＯＣＲ装置とを含み、第１の送信手段は、暗号化画像のうち第１の部分を第１のＯＣＲ装置へ送信し、暗号化画像のうち第１の部分とは異なる第２の部分を第２のＯＣＲ装置へ送信する。 Preferably, in the above character image processing system, the second information processing unit includes a first OCR device and a second OCR device different from the first OCR device, and the first transmission means includes an encryption unit. The first portion of the encrypted image is transmitted to the first OCR device, and the second portion of the encrypted image that is different from the first portion is transmitted to the second OCR device.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、原稿の画像を読み取ることにより画像データを作成する画像読取手段をさらに含む。 Preferably, in the character image processing system, the first information processing unit further includes an image reading unit that creates image data by reading an image of a document.

上記文字画像処理システムにおいて好ましくは、画像ブロック作成手段は、矩形の文字領域における一つの辺の方向である第１の方向に存在する白画素を積算した個数の分布であって、第１の方向に対して垂直な第２の方向に沿った分布を抽出する第１の分布抽出手段と、矩形の文字領域における、第２の方向に存在する白画素を積算した個数の分布であって、第１の方向に沿った分布を抽出する第２の分布抽出手段と、第１および第２の分布抽出手段の各々にて抽出した分布に基づいて決定した位置で、画像データ内の文字領域を分割することにより、複数の画像ブロックを作成する分割手段とを含む。 Preferably, in the character image processing system, the image block creation means is a distribution of the number of white pixels accumulated in a first direction which is the direction of one side in a rectangular character region, and the first direction A first distribution extracting means for extracting a distribution along a second direction perpendicular to the second distribution, and a distribution of the number of white pixels existing in the second direction in a rectangular character region, A character area in the image data is divided at a position determined based on the distribution extracted by the second distribution extracting means for extracting the distribution along the direction 1 and each of the first and second distribution extracting means. And a dividing unit for creating a plurality of image blocks.

上記文字画像処理システムにおいて好ましくは、分割手段は、第１の分布抽出手段にて抽出した分布に基づいて、行間を特定する行間特定手段と、行間特定手段にて特定した行間で文字領域を分割することにより、文字領域を複数の行に分割する行分割手段とを含み、第２の分布抽出手段は、行分割手段にて分割した複数の行の各々について、第２の方向に存在する白画素を積算した個数の分布であって、第１の方向に沿った分布を抽出し、分割手段は、第２の分布抽出手段にて抽出した分布に基づいて、文字の隙間位置を特定する隙間特定手段と、隙間特定手段にて特定した隙間位置に基づいて、境界位置を決定する境界決定手段と、境界決定手段にて決定した境界位置で、複数の行の各々を分割する列方向分割手段とさらに含む。 Preferably, in the above character image processing system, the dividing unit divides the character area between the line specified by the line specifying unit and the line specifying unit that specifies the line spacing based on the distribution extracted by the first distribution extracting unit. The second distribution extracting means includes white line existing in the second direction for each of the plurality of lines divided by the line dividing means. A distribution of the number of pixels integrated, wherein a distribution along the first direction is extracted, and the dividing unit specifies a gap position of the character based on the distribution extracted by the second distribution extracting unit. Identifying means; boundary determining means for determining the boundary position based on the gap position specified by the gap specifying means; and column direction dividing means for dividing each of the plurality of rows at the boundary position determined by the boundary determining means. And further includes.

上記文字画像処理システムにおいて好ましくは、境界決定手段は、隙間特定手段にて特定した隙間位置のうち、隣接する他の隙間位置との間隔が閾値以上である隙間位置を、境界位置として決定する。 Preferably, in the above character image processing system, the boundary determining unit determines, as the boundary position, a gap position in which the distance from another adjacent gap position is equal to or greater than a threshold among the gap positions specified by the gap specifying unit.

上記文字画像処理システムにおいて好ましくは、連結用画像は、文字認識の結果が既知であり、第１の情報処理部が予め保持している画像である。 In the character image processing system, preferably, the connection image is an image whose character recognition result is known and is held in advance by the first information processing unit .

上記文字画像処理システムにおいて好ましくは、連結用画像は文字ではない記号の画像である。 In the character image processing system, preferably, the connection image is an image of a symbol that is not a character.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、配列順序変更手段にて配列順序を変更する前の複数の画像ブロックの各々の順序と、配列順序変更手段にて配列順序を変更した後の複数の画像ブロックの各々の順序との関係を示す関係情報を保持する配列情報保持手段をさらに含む。 Preferably, in the character image processing system, the first information processing unit changes the order of each of the plurality of image blocks before the arrangement order is changed by the arrangement order changing unit and the arrangement order by the arrangement order changing unit. The image processing apparatus further includes array information holding means for holding relation information indicating a relation with the order of each of the plurality of image blocks.

上記文字画像処理システムにおいて好ましくは、ネットワークはインターネットである。 In the character image processing system, the network is preferably the Internet.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、原稿を光学的に読み取り可能な画像形成装置を含む。 Preferably, in the character image processing system, the first information processing unit includes an image forming apparatus capable of optically reading a document.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、画像形成装置とは別体の端末をさらに含み、第１の送信手段は、暗号化画像を端末から第２の情報処理部へ送信する。 Preferably, in the character image processing system, the first information processing unit further includes a terminal separate from the image forming apparatus, and the first transmission unit transmits the encrypted image from the terminal to the second information processing unit. Send.

上記文字画像処理システムにおいて好ましくは、第１の情報処理部は、光学的に読み取られた画像データに基づいて暗号化画像を生成する。 Preferably, in the character image processing system, the first information processing unit generates an encrypted image based on the optically read image data.

本発明の他の局面に従う情報処理装置は、ＯＣＲ装置と通信を行う情報処理装置であって、画像データ内の文字領域を複数の画像ブロックに分割する画像ブロック作成手段と、複数の画像ブロックの配列順序を変更する配列順序変更手段と、配列順序変更手段にて配列順序を変更した後の複数の画像ブロックの各々の間に連結用画像を挿入する挿入手段と、配列順序変更手段にて配列順序を変更した後の複数の画像ブロックに基づいて作成された暗号化画像であって、連結用画像が挿入された複数の画像ブロックを含む暗号化画像をＯＣＲ装置へ送信する送信手段と、暗号化画像に基づいてＯＣＲ処理を行うことにより作成された第１のテキストデータを含むＯＣＲ後データを、ＯＣＲ装置から受信する受信手段と、連結用画像に基づいてＯＣＲ後データを複数の文字列に分解し、連結用画像に相当する文字をＯＣＲ処理後データから削除することにより、ＯＣＲ後データに基づいて第２のテキストデータを作成する作成手段と、画像データ内の文字領域のそれぞれ対応する位置に第２のテキストデータを貼り付ける貼付手段とを備える。 An information processing apparatus according to another aspect of the present invention is an information processing apparatus that communicates with an OCR apparatus, and includes an image block creating unit that divides a character region in image data into a plurality of image blocks, and a plurality of image blocks. An arrangement order changing means for changing the arrangement order, an insertion means for inserting a linking image between each of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means, and an arrangement by the arrangement order changing means A transmitting means for transmitting an encrypted image created based on a plurality of image blocks after changing the order and including a plurality of image blocks into which concatenated images are inserted, to the OCR device; the OCR post-data including the first text data created by performing the OCR process on the basis of an image, receiving means for receiving from the OCR device, based on the consolidated image O Decomposes the R post data to a plurality of strings by removing the character that corresponds to the connection image from the OCR-processed data, and generating means for generating second text data based on OCR after data, image data Pasting means for pasting the second text data to the corresponding positions of the character areas.

本発明のさらに他の局面に従う情報処理装置の制御プログラムは、ＯＣＲ装置と通信を行う情報処理装置の制御プログラムであって、画像データ内の文字領域を複数の画像ブロックに分割する画像ブロック作成ステップと、複数の画像ブロックの配列順序を変更する配列順序変更ステップと、配列順序変更ステップにて配列順序を変更した後の複数の画像ブロックの各々の間に連結用画像を挿入する挿入ステップと、配列順序変更ステップにて配列順序を変更した後の複数の画像ブロックに基づいて作成された暗号化画像であって、連結用画像が挿入された複数の画像ブロックを含む暗号化画像をＯＣＲ装置へ送信する送信ステップと、暗号化画像に基づいてＯＣＲ処理を行うことにより作成された第１のテキストデータを含むＯＣＲ後データを、ＯＣＲ装置から受信する受信ステップと、連結用画像に基づいてＯＣＲ後データを複数の文字列に分解し、連結用画像に相当する文字をＯＣＲ処理後データから削除することにより、ＯＣＲ後データに基づいて第２のテキストデータを作成する作成ステップと、画像データ内の文字領域のそれぞれ対応する位置に第２のテキストデータを貼り付ける貼付ステップとをコンピューターに実行させるためのものである。 An information processing device control program according to still another aspect of the present invention is an information processing device control program for communicating with an OCR device, and an image block creation step of dividing a character region in image data into a plurality of image blocks. An arrangement order changing step for changing the arrangement order of the plurality of image blocks, and an insertion step for inserting a linking image between each of the plurality of image blocks after the arrangement order is changed in the arrangement order changing step; An encrypted image created on the basis of a plurality of image blocks after the arrangement order is changed in the arrangement order changing step and including the plurality of image blocks with the concatenated images inserted therein is sent to the OCR device A post-OCR data including a transmission step for transmission and first text data created by performing OCR processing based on the encrypted image; And a receiving step of receiving from the OCR device, decomposing the OCR post data to a plurality of character strings on the basis of the consolidated image, by removing the character that corresponds to the connection image from the OCR-processed data, OCR after data And a pasting step for pasting the second text data to the corresponding positions of the character areas in the image data.

本発明によれば、機密情報の漏洩を抑止することのできる文字画像処理システム、情報処理装置、および情報処理装置の制御プログラムを提供することができる。
According to the present invention, it is possible to provide a character imaging system that can suppress the leakage of confidential information, information processing apparatus, and a control program of the information processing apparatus.

本発明の第１の実施の形態における文書画像処理システムの構成を概念的に示すブロック図である。It is a block diagram which shows notionally the structure of the document image processing system in the 1st Embodiment of this invention. 本発明の第１の実施の形態における文字画像処理システムの動作の概要を示すシーケンス図である。It is a sequence diagram which shows the outline | summary of operation | movement of the character image processing system in the 1st Embodiment of this invention. 本発明の第１の実施の形態において、タブレット端末の操作パネルに表示された画面ＳＲを模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the screen SR displayed on the operation panel of the tablet terminal. 本発明の第１の実施の形態において、読取画像データＩＭに含まれる文字領域Ｌ１、Ｌ２、およびＬ３を模式的に示す図である。FIG. 4 is a diagram schematically showing character areas L1, L2, and L3 included in read image data IM in the first embodiment of the present invention. 本発明の第１の実施の形態における、文字領域Ｌ１の画像におけるｘ方向およびｙ方向の各々の白画素の分布を模式的に示す図である。It is a figure which shows typically distribution of each white pixel of the x direction and the y direction in the image of the character area L1 in the 1st Embodiment of this invention. 本発明の第１の実施の形態において、個数ｗ２の分布に基づいて特定された文字の隙間位置を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the gap position of the character specified based on distribution of the number w2. 本発明の第１の実施の形態において、文字領域Ｌ１の画像を分割することにより得られた複数の画像ブロックＢＬを模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the several image block BL obtained by dividing | segmenting the image of the character area L1. タブレット端末が保持する分割テーブルを模式的に示す図である。It is a figure which shows typically the division | segmentation table which a tablet terminal hold | maintains. 本発明の第１の実施の形態においてタブレット端末が作成する番号テーブルを模式的に示す表である。It is a table | surface which shows typically the number table which a tablet terminal produces in the 1st Embodiment of this invention. 本発明の第１の実施の形態において、複数の画像ブロックＢＬの各々に付けられた第１の番号を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the 1st number attached | subjected to each of several image block BL. 図１０に示す複数の画像ブロックＢＬの各々に含まれる文字列を表記したものである。The character strings included in each of the plurality of image blocks BL shown in FIG. 10 are represented. 本発明の第１の実施の形態において、複数の画像ブロックＢＬに関する番号列テーブルを模式的に示す表である。In the 1st Embodiment of this invention, it is a table | surface which shows typically the number sequence table regarding several image block BL. 本発明の第１の実施の形態において、第２の番号に従って配列順序を変更した後の複数の画像ブロックＢＬの各々を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically each of several image block BL after changing the arrangement | sequence order according to a 2nd number. 本発明の第１の実施の形態において、第２の番号に従って配列順序を変更した後の複数の画像ブロックＢＬの各々の間に、連結用画像を挿入した状態を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the state which inserted the image for connection between each of several image block BL after changing the arrangement | sequence order according to a 2nd number. 本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像の一例を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically an example of the encryption image produced based on the image of the character area L1. 本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像の他の例を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the other example of the encryption image produced based on the image of the character area L1. 本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像のさらに他の例を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the other example of the encryption image produced based on the image of the character area L1. 本発明の第１の実施の形態において、ＯＣＲ端末が作成したＯＣＲ後データを模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the data after OCR which the OCR terminal produced. 本発明の第１の実施の形態において、ＯＣＲ後データを分割することによって得られた複数の文字列を模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the some character string obtained by dividing | segmenting the data after OCR. 本発明の第１の実施の形態において作成された、文字領域Ｌ１内のテキストデータを模式的に示す図である。It is a figure which shows typically the text data in the character area L1 produced in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるテキストデータの貼り付け方法を模式的に示す図である。It is a figure which shows typically the attachment method of the text data in the 1st Embodiment of this invention. 本発明の第１の実施の形態における文字画像処理システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the character image processing system in the 1st Embodiment of this invention. 本発明の第２の実施の形態における文字画像処理システムの動作の概要を示すシーケンス図である。It is a sequence diagram which shows the outline | summary of operation | movement of the character image processing system in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における暗号化マトリクスの構成を模式的に示す図である。It is a figure which shows typically the structure of the encryption matrix in the 2nd Embodiment of this invention. 図２４の暗号化マトリクスが示す第１の番号と第２の番号との関係を数字で示した番号テーブルである。FIG. 25 is a number table showing the relationship between the first number and the second number shown in the encryption matrix of FIG. 本発明の第２の実施の形態における文字領域Ｌ１の画像を模式的に示す図である。It is a figure which shows typically the image of the character area L1 in the 2nd Embodiment of this invention. 本発明の第１の実施の形態において、文字領域Ｌ１の画像を分割することにより得られた複数の画像ブロックＢＬを模式的に示す図である。In the 1st Embodiment of this invention, it is a figure which shows typically the several image block BL obtained by dividing | segmenting the image of the character area L1. 本発明の第２の実施の形態において作成された暗号化画像を模式的に示す図である。It is a figure which shows typically the encryption image produced in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において生成されたＯＣＲ後データを模式的に示す図である。It is a figure which shows typically the data after OCR produced | generated in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において、ＯＣＲ後データに含まれる文字を表示したバイナリエディタの画面を模式的に示す。In the 2nd Embodiment of this invention, the screen of the binary editor which displayed the character contained in the data after OCR is shown typically. 本発明の第２の実施の形態における文字画像処理システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the character image processing system in the 2nd Embodiment of this invention. 本発明の変形例における文字画像処理システムの動作の概要を示すシーケンス図である。It is a sequence diagram which shows the outline | summary of operation | movement of the character image processing system in the modification of this invention.

以下、本発明の実施の形態について、図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

以下の実施の形態では、情報処理装置（第１の情報処理部）がタブレット端末で構成されている場合について説明する。情報処理装置は、ＭＦＰ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、携帯電話、ファクシミリ装置、プリンター、または複写機などであってもよい。ＯＣＲ装置は、ＯＣＲ処理を行うものであればよく、たとえばサーバー、ＰＣまたは携帯電話などであってもよい。 In the following embodiment, a case where the information processing apparatus (first information processing unit) is configured by a tablet terminal will be described. The information processing apparatus may be an MFP, a PC (Personal Computer), a mobile phone, a facsimile machine, a printer, or a copier. The OCR device only needs to perform OCR processing, and may be, for example, a server, a PC, or a mobile phone.

［第１の実施の形態］ [First Embodiment]

（文書画像処理システムの構成） (Configuration of document image processing system)

図１は、本発明の第１の実施の形態における文書画像処理システムの構成を概念的に示すブロック図である。 FIG. 1 is a block diagram conceptually showing the structure of the document image processing system in the first embodiment of the present invention.

図１を参照して、本実施の形態における文書画像処理システムは、ＭＦＰ１００およびタブレット端末２００（第１の情報処理部の一例）と、ＯＣＲ端末３００−１および３００−２（第２の情報処理部の一例）とを備えている。ＭＦＰ１００およびタブレット端末２００は、たとえばオフィス内のイントラネット４０１を通じて相互に接続されている。イントラネット４０１はインターネット（外部ネットワーク）４０２に接続されている。ＭＦＰ１００およびタブレット端末２００の各々は、イントラネット４０１およびインターネット４０２を通じてＯＣＲ端末３００−１および３００−２の各々と接続されている。またタブレット端末２００は、ユーザーによってオフィス外に持ち出された場合などに、たとえば中継器（図示無し）などを通じてインターネット４０２に無線接続することも可能である。 Referring to FIG. 1, document image processing system in the present embodiment includes MFP 100 and tablet terminal 200 (an example of a first information processing unit), OCR terminals 300-1 and 300-2 (second information processing). An example). MFP 100 and tablet terminal 200 are connected to each other through an intranet 401 in the office, for example. The intranet 401 is connected to the Internet (external network) 402. Each of MFP 100 and tablet terminal 200 is connected to each of OCR terminals 300-1 and 300-2 through intranet 401 and Internet 402. The tablet terminal 200 can be wirelessly connected to the Internet 402 through a repeater (not shown), for example, when taken out of the office by a user.

イントラネット４０１は、たとえば有線または無線のＬＡＮなどの専用回線を用いたものである。イントラネット４０１は、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）のプロトコルを用いて各種機器を接続する。イントラネット４０１に接続された機器同士は、通信を行うことが可能となっている。 The intranet 401 uses a dedicated line such as a wired or wireless LAN. The intranet 401 connects various devices using a TCP / IP (Transmission Control Protocol / Internet Protocol) protocol. Devices connected to the intranet 401 can communicate with each other.

インターネット４０２は、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）を用いたものである。インターネット４０２に接続された機器同士は、通信を行うことが可能となっている。さらに、イントラネット４０１に接続された機器は、インターネット４０２に接続された機器と通信を行うことが可能となっている。 The Internet 402 uses a WAN (Wide Area Network). Devices connected to the Internet 402 can communicate with each other. Furthermore, a device connected to the intranet 401 can communicate with a device connected to the Internet 402.

ＯＣＲ端末３００−１および３００−２の各々はインターネット４０２経由でユーザーに対してＯＣＲ処理のサービスを提供する。ユーザーは、タブレット端末２００などを通じてＯＣＲ端末３００−１および３００−２の各々が提供するサービスを受ける。 Each of the OCR terminals 300-1 and 300-2 provides an OCR processing service to the user via the Internet 402. The user receives a service provided by each of the OCR terminals 300-1 and 300-2 through the tablet terminal 200 or the like.

ＭＦＰ１００は、ＣＰＵ１１０と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２０と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３０と、記憶部１４０と、ネットワークＩ／Ｆ１５０と、画像読取部１６０と、ＰＤＦ作成部１７０と、文字領域抽出部１８０と、操作パネル１９０と、画像形成部１９５とを含んでいる。ＣＰＵ１１０は、ＲＯＭ１２０、ＲＡＭ１３０、記憶部１４０、ネットワークＩ／Ｆ１５０、画像読取部１６０、ＰＤＦ作成部１７０、文字領域抽出部１８０、操作パネル１９０、および画像形成部１９５の各々と相互に接続されている。 The MFP 100 includes a CPU 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage unit 140, a network I / F 150, an image reading unit 160, a PDF creation unit 170, a character area extraction. A unit 180, an operation panel 190, and an image forming unit 195. CPU 110 is mutually connected to ROM 120, RAM 130, storage unit 140, network I / F 150, image reading unit 160, PDF creation unit 170, character area extraction unit 180, operation panel 190, and image forming unit 195. .

ＣＰＵ１１０は、ＭＦＰ１００全体を制御する。ＲＯＭ１２０は、ＣＰＵ１１０が実行する制御プログラムを格納する。ＲＡＭ１３０は、ＣＰＵ１１０の作業用のメモリである。記憶部１４０は、各種情報を記憶（保持）している。ネットワークＩ／Ｆ１５０は、イントラネット４０１やインターネット４０２を介して外部機器との通信を行う。画像読取部１６０は、原稿の画像を光学的に読み取る。ＰＤＦ作成部１７０は、画像読取部１６０で読み取った画像のＰＤＦファイルを作成する。文字領域抽出部１８０は、読み取った画像から、文字が表示された領域である文字領域の画像を抽出する。操作パネル１９０は、表示部、ソフトウェアキー、およびハードウェアキーなどを含んでいる。操作パネル１９０は、各種情報を表示するとともに、各種操作を受け付ける。 CPU 110 controls MFP 100 as a whole. The ROM 120 stores a control program executed by the CPU 110. The RAM 130 is a working memory for the CPU 110. The storage unit 140 stores (holds) various types of information. The network I / F 150 communicates with an external device via the intranet 401 or the Internet 402. The image reading unit 160 optically reads a document image. The PDF creation unit 170 creates a PDF file of the image read by the image reading unit 160. The character area extraction unit 180 extracts an image of a character area, which is an area where characters are displayed, from the read image. The operation panel 190 includes a display unit, software keys, hardware keys, and the like. The operation panel 190 displays various information and accepts various operations.

画像形成部１９５は、プリントジョブを実行する。画像形成部１９５は、おおまかに、トナー像形成部、定着装置、および用紙搬送部などで構成される。画像形成部１９５は、たとえば電子写真方式で用紙に画像を形成する（プリントする）。画像形成部１９５は、いわゆるタンデム方式で４色の画像を合成し、用紙にカラー画像を形成可能に構成される。トナー像形成部は、Ｃ（シアン）、Ｍ（マゼンタ）、Ｙ（イエロー）、Ｋ（ブラック）の各色について設けられた感光体と、感光体からトナー像が転写（１次転写）される中間転写ベルトと、中間転写ベルトから用紙に画像を転写（２次転写）する転写部などで構成される。定着装置は、加熱ローラーおよび加圧ローラーを有する。定着装置は、加熱ローラーと加圧ローラーとでトナー像が形成された用紙を挟みながら搬送し、その用紙に加熱及び加圧を行なう。これにより、定着装置は、用紙に付着したトナーを溶融させて用紙に定着させ、用紙に画像を形成する。用紙搬送部は、給紙ローラー、搬送ローラー、およびそれらを駆動するモーターなどで構成されている。用紙搬送部は、用紙を給紙カセットから給紙して、ＭＦＰ１００の筐体の内部で搬送する。また、用紙搬送部は、画像が形成された用紙をＭＦＰ１００の筐体から排紙トレイなどに排出する。 The image forming unit 195 executes a print job. The image forming unit 195 is roughly composed of a toner image forming unit, a fixing device, a paper transport unit, and the like. The image forming unit 195 forms (prints) an image on a sheet by, for example, electrophotography. The image forming unit 195 is configured to be able to form a color image on a sheet by synthesizing four color images by a so-called tandem method. The toner image forming unit includes a photoconductor provided for each color of C (cyan), M (magenta), Y (yellow), and K (black), and an intermediate in which a toner image is transferred (primary transfer) from the photoconductor. The image forming apparatus includes a transfer belt and a transfer unit that transfers an image from the intermediate transfer belt to a sheet (secondary transfer). The fixing device has a heating roller and a pressure roller. The fixing device conveys the sheet on which the toner image is formed between the heating roller and the pressure roller, and heats and presses the sheet. As a result, the fixing device melts the toner adhering to the paper and fixes it on the paper to form an image on the paper. The paper transport unit includes a paper feed roller, a transport roller, and a motor that drives them. The paper transport unit feeds paper from the paper feed cassette and transports the paper inside the housing of MFP 100. The paper transport unit discharges the paper on which the image has been formed from the housing of the MFP 100 to a paper discharge tray or the like.

タブレット端末２００は、ＣＰＵ２１０と、ＲＯＭ２２０と、ＲＡＭ２３０と、記憶部２４０と、ネットワークＩ／Ｆ２５０と、操作パネル２６０と、暗号化部２７０と、暗号解読部２８０と、ＰＤＦ編集部２９０とを含んでいる。ＣＰＵ２１０は、ＲＯＭ２２０、ＲＡＭ２３０、記憶部２４０、ネットワークＩ／Ｆ２５０、操作パネル２６０、暗号化部２７０、暗号解読部２８０、およびＰＤＦ編集部２９０の各々と相互に接続されている。 The tablet terminal 200 includes a CPU 210, a ROM 220, a RAM 230, a storage unit 240, a network I / F 250, an operation panel 260, an encryption unit 270, a decryption unit 280, and a PDF editing unit 290. Yes. CPU 210 is mutually connected to ROM 220, RAM 230, storage unit 240, network I / F 250, operation panel 260, encryption unit 270, decryption unit 280, and PDF editing unit 290.

ＣＰＵ２１０は、タブレット端末２００全体を制御する。ＲＯＭ２２０は、ＣＰＵ２１０が実行する制御プログラムを格納する。ＲＡＭ２３０は、ＣＰＵ２１０の作業用のメモリである。記憶部２４０は、サーチャブルＰＤＦ作成のためのソフトウェアのプログラムや、後述する分割テーブル、番号テーブル、または暗号化マトリクスなどの各種情報を記憶（保持）している。ネットワークＩ／Ｆ２５０は、イントラネット４０１やインターネット４０２を介して外部機器との通信を行う。操作パネル２６０は、各種情報を表示するとともに、各種操作を受け付ける。暗号化部２７０は、ＯＣＲ端末３００−１または３００−２に送信する文字領域の画像を暗号化する。暗号解読部２８０は、ＯＣＲ端末３００−１または３００−２から受信したデータを解読してテキストデータを作成する。ＰＤＦ編集部２９０は、画像のＰＤＦファイルに対してテキストデータを追加する。 The CPU 210 controls the entire tablet terminal 200. The ROM 220 stores a control program executed by the CPU 210. The RAM 230 is a working memory for the CPU 210. The storage unit 240 stores (holds) various kinds of information such as a software program for creating a searchable PDF and a division table, a number table, or an encryption matrix, which will be described later. The network I / F 250 communicates with an external device via the intranet 401 or the Internet 402. The operation panel 260 displays various information and accepts various operations. The encryption unit 270 encrypts the image of the character area to be transmitted to the OCR terminal 300-1 or 300-2. The decryption unit 280 decrypts the data received from the OCR terminal 300-1 or 300-2 and creates text data. The PDF editing unit 290 adds text data to the image PDF file.

ＯＣＲ端末３００−１および３００−２の各々は、ＯＣＲ機能を有しており、ＯＣＲ処理のサービスを提供するウェブサイトであるＯＣＲサイトを持っている。ＯＣＲ端末３００−１および３００−２の各々は、互いに異なる装置であり、別々のＯＣＲサイトを持っている。ＯＣＲ端末３００−１および３００−２の各々は、ＣＰＵ３１０と、ＲＯＭ３２０と、ＲＡＭ３３０と、記憶部３４０と、ネットワークＩ／Ｆ３５０と、ＯＣＲ処理部３６０と、暗号化部３７０と、暗号解読部３８０とを含んでいる。ＣＰＵ３１０は、ＲＯＭ３２０、ＲＡＭ３３０、記憶部３４０、ネットワークＩ／Ｆ３５０、ＯＣＲ処理部３６０、暗号化部３７０、および暗号解読部３８０の各々と相互に接続されている。 Each of the OCR terminals 300-1 and 300-2 has an OCR function, and has an OCR site that is a website that provides an OCR processing service. Each of the OCR terminals 300-1 and 300-2 is a different device and has different OCR sites. Each of the OCR terminals 300-1 and 300-2 includes a CPU 310, a ROM 320, a RAM 330, a storage unit 340, a network I / F 350, an OCR processing unit 360, an encryption unit 370, and a decryption unit 380. Is included. CPU 310 is mutually connected to ROM 320, RAM 330, storage unit 340, network I / F 350, OCR processing unit 360, encryption unit 370, and decryption unit 380.

ＣＰＵ３１０は、ＯＣＲ端末全体を制御する。ＲＯＭ３２０は、ＣＰＵ３１０が実行する制御プログラムを格納する。ＲＡＭ３３０は、ＣＰＵ３１０の作業用のメモリである。記憶部３４０は、後述する暗号化マトリクスなどの各種情報を記憶（保持）している。また記憶部３４０は、ＯＣＲ処理のユーザーのための記憶領域（個人フォルダ）を有している。ネットワークＩ／Ｆ３５０は、インターネット４０２を介して外部機器との通信を行う。ＯＣＲ処理部３６０は、タブレット端末２００から受信した文字領域の画像に対してＯＣＲ処理を行うことにより、テキストデータを作成する。暗号化部３７０は、ＯＣＲ処理によって得られたデータを暗号化する。暗号解読部３８０は、タブレット端末２００から受信したデータを解読して元の文字領域の画像を作成する。 The CPU 310 controls the entire OCR terminal. The ROM 320 stores a control program executed by the CPU 310. The RAM 330 is a working memory for the CPU 310. The storage unit 340 stores (holds) various information such as an encryption matrix described later. The storage unit 340 has a storage area (personal folder) for the user of the OCR process. A network I / F 350 communicates with an external device via the Internet 402. The OCR processing unit 360 creates text data by performing OCR processing on the character area image received from the tablet terminal 200. The encryption unit 370 encrypts the data obtained by the OCR process. The decryption unit 380 decrypts the data received from the tablet terminal 200 and creates an image of the original character area.

なお、文字画像処理システムが備えるタブレット端末、ＭＦＰ、およびＯＣＲ端末の各々の個数は任意である。ＭＦＰは画像読取機能を有する装置であればよい。 Note that the number of tablet terminals, MFPs, and OCR terminals included in the character image processing system is arbitrary. The MFP may be an apparatus having an image reading function.

ＭＦＰ１００とタブレット端末２００との間は、イントラネット４０１で接続されている。このため、ＭＦＰ１００とタブレット端末２００との間で送受信される情報は、漏洩しにくい。一方、タブレット端末２００とＯＣＲ端末３００−１および３００−２の各々の間は、インターネット４０２で接続されている。このため、タブレット端末２００とＯＣＲ端末３００−１および３００−２の各々との間で送受信される情報は、漏洩しやすい。 The MFP 100 and the tablet terminal 200 are connected via an intranet 401. For this reason, information transmitted and received between the MFP 100 and the tablet terminal 200 is difficult to leak. On the other hand, the tablet terminal 200 and each of the OCR terminals 300-1 and 300-2 are connected by the Internet 402. For this reason, information transmitted and received between the tablet terminal 200 and each of the OCR terminals 300-1 and 300-2 is likely to leak.

（文字画像処理システムの動作の概要） (Outline of operation of character image processing system)

次に、文字画像処理システムが行うサーチャブルＰＤＦ化の動作の概要を説明する。 Next, an outline of searchable PDF operation performed by the character image processing system will be described.

図２は、本発明の第１の実施の形態における文字画像処理システムの動作の概要を示すシーケンス図である。 FIG. 2 is a sequence diagram showing an outline of the operation of the character image processing system according to the first embodiment of the present invention.

図２を参照して、タブレット端末のユーザーは、予めＭＦＰの原稿台に原稿をセットした状態で、タブレット端末を通じてサーチャブルＰＤＦの作成指示を行う。タブレット端末は、サーチャブルＰＤＦの作成指示を受け付ける（処理ＰＲ０）。 Referring to FIG. 2, the user of the tablet terminal issues a searchable PDF creation instruction through the tablet terminal in a state where the document is set in advance on the document table of the MFP. The tablet terminal accepts a searchable PDF creation instruction (process PR0).

タブレット端末は、サーチャブルＰＤＦの作成指示を受け付けると、ＭＦＰに対して原稿の画像の読み取りおよびＰＤＦファイルの送信の指示を行う（処理スタートを通知する）（処理ＰＲ１）。 Upon receiving the searchable PDF creation instruction, the tablet terminal instructs the MFP to read the image of the document and transmit the PDF file (notifies the start of processing) (processing PR1).

ＭＦＰは、タブレット端末から指示を受け付けると、ＣＣＤイメージセンサなどを用いて原稿の画像を光学的に読み取り、Ａ／Ｄ変換によってデジタル化された読取画像データを作成する（処理ＰＲ２）。次にＭＦＰは、読取画像データ内から文字領域の画像を抽出する（処理ＰＲ３）。続いてＭＦＰは、読取画像データのＰＤＦファイルを作成する（処理ＰＲ４）。次にＭＦＰは、文字領域の画像、文字領域の座標、および読取画像データのＰＤＦファイルをタブレット端末に送信する（処理ＰＲ５）。 Upon receiving an instruction from the tablet terminal, the MFP optically reads an image of the document using a CCD image sensor or the like, and creates read image data digitized by A / D conversion (process PR2). Next, the MFP extracts an image of the character area from the read image data (process PR3). Subsequently, the MFP creates a PDF file of the read image data (process PR4). Next, the MFP transmits the image of the character area, the coordinates of the character area, and the PDF file of the read image data to the tablet terminal (process PR5).

タブレット端末は、文字領域の画像、文字領域の座標、および読取画像データのＰＤＦファイルを受信すると、文字領域の画像を複数の画像ブロックに分割する（処理ＰＲ６）。複数の画像ブロックの各々は、複数の文字を含んでいる。次にタブレット端末は、複数の画像ブロックの配列順序を変更する（並び替える）。次にタブレット端末は、配列順序を変更した後の複数の画像ブロックの各々の間を、連結用画像を用いて連結する。これにより、暗号化画像が作成される（処理ＰＲ７）。暗号化画像は、暗号化された文字領域の画像である。続いてタブレット端末は、暗号化画像をＯＣＲ端末に送信する（処理ＰＲ８）。 Upon receiving the character area image, the character area coordinates, and the PDF file of the read image data, the tablet terminal divides the character area image into a plurality of image blocks (process PR6). Each of the plurality of image blocks includes a plurality of characters. Next, the tablet terminal changes (rearranges) the arrangement order of the plurality of image blocks. Next, the tablet terminal connects the plurality of image blocks after the arrangement order is changed using the connection image. As a result, an encrypted image is created (process PR7). An encrypted image is an image of an encrypted character area. Subsequently, the tablet terminal transmits the encrypted image to the OCR terminal (process PR8).

ＯＣＲ端末は、暗号化画像をタブレット端末から受信すると、暗号化画像に対してＯＣＲ処理を行うことにより、ＯＣＲ後データを作成する（処理ＰＲ９）。ＯＣＲ後データは、暗号化したテキストデータである。続いてＯＣＲ端末は、作成したＯＣＲ後データをタブレット端末に送信する（処理ＰＲ１０）。 When the OCR terminal receives the encrypted image from the tablet terminal, the OCR terminal performs OCR processing on the encrypted image to create post-OCR data (process PR9). The post-OCR data is encrypted text data. Subsequently, the OCR terminal transmits the created post-OCR data to the tablet terminal (process PR10).

タブレット端末は、ＯＣＲ後データをＯＣＲ端末から受信すると、受信したＯＣＲ後データを、複数の画像ブロックに対応する複数の文字列に分割する。次にタブレット端末は、複数の文字列の配列順序を、複数の画像ブロックの変更前の配列順序に並べ直し、複数の文字列を結合する。これにより、文字領域の画像のテキストデータが作成される（処理ＰＲ１１）。その後タブレット端末は、文字領域の座標に基づいて、得られたテキストデータを、読取画像データのＰＤＦファイルに貼り付ける（処理ＰＲ１２）。これにより、サーチャブルＰＤＦが作成される。 When the tablet terminal receives the post-OCR data from the OCR terminal, the tablet terminal divides the received post-OCR data into a plurality of character strings corresponding to a plurality of image blocks. Next, the tablet terminal rearranges the arrangement order of the plurality of character strings into the arrangement order before the change of the plurality of image blocks, and combines the plurality of character strings. Thereby, text data of the image of the character area is created (process PR11). Thereafter, the tablet terminal pastes the obtained text data on the PDF file of the read image data based on the coordinates of the character area (process PR12). As a result, a searchable PDF is created.

本実施の形態においては、インターネット４０２上での情報漏洩を、主に下記の２つの方法で抑止する。 In the present embodiment, information leakage on the Internet 402 is mainly suppressed by the following two methods.

１．ＯＣＲ処理前の文字領域の画像を、複数の文字を含む複数の画像ブロックに区切り、複数の画像ブロックの配列順序を入れ替える。 1. The image of the character area before the OCR processing is divided into a plurality of image blocks including a plurality of characters, and the arrangement order of the plurality of image blocks is changed.

２．ＯＣＲ処理後のテキストデータにダミー情報が混ざるよう、ＯＣＲ端末３００−１および３００−２の各々に送る複数の画像ブロックにダミーの画像（ダミーブロック）を混ぜる。 2. A dummy image (dummy block) is mixed with a plurality of image blocks sent to each of the OCR terminals 300-1 and 300-2 so that dummy information is mixed with the text data after the OCR processing.

（サーチャブルＰＤＦの作成指示） (Instruction for creating searchable PDF)

続いて、サーチャブルＰＤＦの作成指示（図２の処理ＰＲ０）について詳細に説明する。 Next, a searchable PDF creation instruction (process PR0 in FIG. 2) will be described in detail.

図３は、本発明の第１の実施の形態において、タブレット端末の操作パネルに表示された画面ＳＲを模式的に示す図である。 FIG. 3 is a diagram schematically showing a screen SR displayed on the operation panel of the tablet terminal in the first embodiment of the present invention.

図３を参照して、本実施の形態では、ユーザーがタブレット端末で行う操作が、サーチャブルＰＤＦ化の動作のトリガーとなる。タブレット端末の画面ＳＲは、サーチャブルＰＤＦの作成指示と、セキュリティーレベルの設定とを受け付ける画面である。画面ＳＲは、「サーチャブルＰＤＦ作成」キーＫＹ１と、「矢印」キーＫＹ２およびＫＹ３とを含んでいる。 With reference to FIG. 3, in this Embodiment, operation which a user performs with a tablet terminal becomes a trigger of operation | movement of searchable PDF conversion. The screen SR of the tablet terminal is a screen that accepts a searchable PDF creation instruction and a security level setting. Screen SR includes “searchable PDF creation” key KY1 and “arrow” keys KY2 and KY3.

タブレット端末は、キーＫＹ２が押下される度に、４→３→２→１という順序で、設定されているセキュリティーレベルを下げる。またタブレット端末は、キーＫＹ３が押下される度に、１→２→３→４という順序で、設定されているセキュリティーレベルを上げる。 Each time the key KY2 is pressed, the tablet terminal lowers the set security level in the order of 4 → 3 → 2 → 1. Each time the key KY3 is pressed, the tablet terminal increases the set security level in the order of 1 → 2 → 3 → 4.

タブレット端末は、キーＫＹ１が押下された場合に、設定されているセキュリティーレベルでのサーチャブルＰＤＦの作成を開始する。タブレット端末は、ＭＦＰに対して原稿の読み取りおよびＰＤＦの送信の指示を行う。 When the key KY1 is pressed, the tablet terminal starts creating a searchable PDF at the set security level. The tablet terminal instructs the MFP to read a document and transmit a PDF.

（文字領域の画像の抽出方法） (Text region image extraction method)

続いて、文字領域の画像の抽出方法（図２の処理ＰＲ３）について詳細に説明する。 Next, a method for extracting an image of a character area (process PR3 in FIG. 2) will be described in detail.

図４は、本発明の第１の実施の形態において、読取画像データＩＭに含まれる文字領域Ｌ１、Ｌ２、およびＬ３を模式的に示す図である。 FIG. 4 is a diagram schematically showing the character regions L1, L2, and L3 included in the read image data IM in the first embodiment of the present invention.

図４を参照して、文字領域の画像の抽出において、ＭＦＰは、読取画像データＩＭに対して領域判別処理を行う。これにより、読取画像データＩＭが、網点領域Ｎ１と、写真領域Ｐ１と、文字領域Ｌ１、Ｌ２、およびＬ３と、その他の領域Ｚ１とに分類分けされる。そしてＭＦＰは、読取画像データＩＭ内の文字領域Ｌ１、Ｌ２、およびＬ３の各々の画像を特定する。なお、文字領域の形状は任意であるが、ここでは、１ページの読取画像データＩＭの中に矩形形状の３つの文字領域Ｌ１、Ｌ２、およびＬ３が特定されたものとする。 Referring to FIG. 4, in extracting an image of a character area, the MFP performs an area determination process on read image data IM. Thereby, the read image data IM is classified into a halftone dot area N1, a photograph area P1, character areas L1, L2, and L3, and other areas Z1. Then, the MFP specifies each image in the character areas L1, L2, and L3 in the read image data IM. The shape of the character area is arbitrary, but here, it is assumed that three character areas L1, L2, and L3 having a rectangular shape are specified in one page of read image data IM.

ＭＦＰは、特定した文字領域の座標を特定する。特定される座標は、文字領域の対角線の両端の頂点の座標である。具体的には、ＭＦＰは、文字領域Ｌ１の座標として、左上の頂点の座標（ｘ１，ｙ１）と、右下の頂点の座標（ｘ１１，ｙ１１）とを特定する。ＭＦＰは、文字領域Ｌ２の座標として、左上の頂点の座標（ｘ２，ｙ２）と、右下の頂点の座標（ｘ１２，ｙ１２）とを特定する。ＭＦＰは、文字領域Ｌ３の座標として、左上の頂点の座標（ｘ３，ｙ３）と、右下の頂点の座標（ｘ１３，ｙ１３）とを特定する。なお、特定される文字領域の座標は任意のものでよい。 The MFP specifies the coordinates of the specified character area. The specified coordinates are the coordinates of the vertices at both ends of the diagonal line of the character area. Specifically, the MFP specifies the coordinates (x1, y1) of the upper left vertex and the coordinates (x11, y11) of the lower right vertex as the coordinates of the character area L1. The MFP specifies the coordinates (x2, y2) of the upper left vertex and the coordinates (x12, y12) of the lower right vertex as the coordinates of the character area L2. The MFP specifies the coordinates (x3, y3) of the upper left vertex and the coordinates (x13, y13) of the lower right vertex as the coordinates of the character area L3. The coordinates of the specified character area may be arbitrary.

ＭＦＰは、文字領域Ｌ１、Ｌ２、およびＬ３の各々の画像と、文字領域Ｌ１、Ｌ２、Ｌ３の各々の座標とをタブレット端末に送信する。このとき、それぞれの文字領域の座標は、文字領域の画像ファイルのヘッダ部に格納されることが好ましい。またＭＦＰは、読取画像データＩＭ全体のＰＤＦファイルもタブレット端末に送信する。 The MFP transmits the images of the character areas L1, L2, and L3 and the coordinates of the character areas L1, L2, and L3 to the tablet terminal. At this time, the coordinates of each character area are preferably stored in the header portion of the image file of the character area. The MFP also transmits a PDF file of the entire read image data IM to the tablet terminal.

タブレット端末に送信された文字領域Ｌ１、Ｌ２、およびＬ３の各々の画像は、以降の処理において順番に１つずつ処理される。 Each image of the character areas L1, L2, and L3 transmitted to the tablet terminal is processed one by one in order in the subsequent processing.

以降の説明では、文字領域Ｌ１の画像に関する処理を取り上げるが、文字領域Ｌ２およびＬ３の画像に対する処理も、文字領域Ｌ１の画像に対する処理と同様に行われる。 In the following description, processing relating to the image of the character region L1 will be taken up, but processing for the images of the character regions L2 and L3 is performed in the same manner as the processing for the image of the character region L1.

なおＭＦＰは、画像データから文字領域の画像を抽出せずに、読取画像データＩＭの画像全体を文字領域の画像としてもよい。 Note that the MFP may extract the entire image of the read image data IM as the character region image without extracting the character region image from the image data.

（文字領域の画像の分割方法） (Division method of character area image)

続いて、文字領域の画像を複数の画像ブロックに分割する方法（図２の処理ＰＲ６）について詳細に説明する。 Next, a method for dividing an image of a character area into a plurality of image blocks (processing PR6 in FIG. 2) will be described in detail.

図５は、本発明の第１の実施の形態における、文字領域Ｌ１の画像におけるｘ方向およびｙ方向の各々の白画素の分布を模式的に示す図である。図５では、矩形の文字領域Ｌ１の画像における横方向に延在する辺の方向をｘ方向（第１の方向の一例）としており、文字領域Ｌ１の画像における縦方向に延在する辺の方向をｙ方向（第２の方向の一例）としている。ここでは、文字領域に日本語の文字が含まれている場合について説明するが、文字領域に日本語以外の言語の文字が含まれている場合でも同様の方法で分割することができる。 FIG. 5 is a diagram schematically showing the distribution of white pixels in the x direction and the y direction in the image of the character region L1 in the first embodiment of the present invention. In FIG. 5, the direction of the side extending in the horizontal direction in the image of the rectangular character region L1 is the x direction (an example of the first direction), and the direction of the side extending in the vertical direction in the image of the character region L1. Is the y direction (an example of the second direction). Here, a case where Japanese characters are included in the character area will be described. However, even when characters in a language other than Japanese are included in the character area, division can be performed in the same manner.

図５を参照して、文字領域Ｌ１の画像が漏洩した場合にも、第三者によって漏洩した内容が把握されないようにするために、タブレット端末は、文字領域Ｌ１の画像を複数の画像ブロックに分割する。文字領域Ｌ１の画像は、１つの画像ブロックに含まれる文字数が、文字領域Ｌ１に記載された内容を推測することができない程度の文字数の範囲内となるように分割される。また、文字領域Ｌ１の画像は、複数の画像ブロックの各々の境界位置が文字の内部とならないように分割される。境界位置が文字の内部に決定されると、その文字が２つに途切れ、途切れた文字はＯＣＲ処理において正しく認識されないためである。 Referring to FIG. 5, in order to prevent the third party from grasping the leaked content even when the image of the character region L1 leaks, the tablet terminal converts the image of the character region L1 into a plurality of image blocks. To divide. The image of the character area L1 is divided so that the number of characters included in one image block falls within the range of the number of characters such that the contents described in the character area L1 cannot be estimated. Further, the image of the character region L1 is divided so that the boundary positions of the plurality of image blocks are not inside the character. This is because when the boundary position is determined inside the character, the character is interrupted in two, and the interrupted character is not recognized correctly in the OCR process.

ここでは、文字領域Ｌ１の画像は、白地に対して白以外の色の文字が表示された画像であるものとする。タブレット端末は、矩形の文字領域Ｌ１の画像から、ｘ方向に存在する白画素を積算した個数ｗ１の分布であって、ｙ方向に沿った分布（以降、個数ｗ１の分布と記すことがある）を抽出する。またタブレット端末は、矩形の文字領域Ｌ１の画像から、ｙ方向に存在する白画素を積算した個数ｗ２の分布であって、ｘ方向に沿った分布（以降、個数ｗ２の分布と記すことがある）を抽出する。タブレット端末は、個数ｗ１およびｗ２の各々の分布に基づいて決定した境界位置で、文字領域Ｌ１の画像を分割することにより、複数の画像ブロックを作成する。 Here, it is assumed that the image of the character region L1 is an image in which characters of colors other than white are displayed on a white background. The tablet terminal has a distribution of the number w1 obtained by integrating the white pixels existing in the x direction from the image of the rectangular character region L1, and the distribution along the y direction (hereinafter may be referred to as a distribution of the number w1). To extract. Further, the tablet terminal has a distribution of the number w2 obtained by integrating white pixels existing in the y direction from the image of the rectangular character region L1, and may be described as a distribution along the x direction (hereinafter, distribution of the number w2). ). The tablet terminal creates a plurality of image blocks by dividing the image of the character region L1 at the boundary position determined based on the distribution of each of the numbers w1 and w2.

タブレット端末は、個数ｗ１およびｗ２の各々の分布に基づいて文字領域Ｌ１に描かれた文字が縦書きか横書きかを推定する。すなわち、個数ｗ１の分布において個数ｗ１の極大値（ピーク）が周期的に現れる場合には、タブレット端末は、文字領域Ｌ１に描かれた文字が横書きであると推定する。文字領域Ｌ１に描かれた文字が横書きである場合、行間の隙間が、個数ｗ１の分布において周期的な極大値をもたらすためである。一方、個数ｗ２の分布において個数ｗ２の極大値が周期的に変動する場合には、タブレット端末は、文字領域Ｌ１に描かれた文字が縦書きであると推定する。文字領域Ｌ１に描かれた文字が縦書きである場合、列間の隙間が、個数ｗ２の分布において周期的な極大値をもたらすためである。個数ｗ１およびｗ２の各々の分布において個数ｗ１および個数ｗ２の各々が周期的に変動する場合には、タブレット端末は、文字領域Ｌ１に描かれた文字が縦書きであると推定してもよいし、横書きであると推定してもよい。 The tablet terminal estimates whether the character drawn in the character area L1 is vertical writing or horizontal writing based on the distribution of the numbers w1 and w2. That is, when the maximum value (peak) of the number w1 appears periodically in the distribution of the number w1, the tablet terminal estimates that the character drawn in the character region L1 is horizontal writing. This is because, when the character drawn in the character region L1 is horizontally written, the gap between the lines causes a periodic maximum value in the distribution of the number w1. On the other hand, when the local maximum value of the number w2 varies periodically in the distribution of the number w2, the tablet terminal estimates that the character drawn in the character region L1 is vertical writing. This is because when the character drawn in the character region L1 is vertically written, the gap between the columns causes a periodic maximum value in the distribution of the number w2. If each of the numbers w1 and w2 varies periodically in the distribution of the numbers w1 and w2, the tablet terminal may estimate that the character drawn in the character area L1 is vertical writing. , It may be presumed to be horizontal writing.

ここでは、個数ｗ１の分布において個数ｗ１の極大値が周期的に現れている。このため、タブレット端末は、文字領域Ｌ１に描かれた文字は横書きであると推定する。この場合、タブレット端末は、個数ｗ１の分布に基づいて、個数ｗ１が極大値（ピーク）となる位置を文字領域Ｌ１の画像の行間位置ＹＰとして特定する。そしてタブレット端末は、行間位置ＹＰで文字領域Ｌ１の画像を分割することにより、文字領域Ｌ１の画像を複数の行に分割する。 Here, the maximum value of the number w1 appears periodically in the distribution of the number w1. For this reason, the tablet terminal estimates that the character drawn in the character area L1 is horizontal writing. In this case, based on the distribution of the number w1, the tablet terminal specifies the position where the number w1 has a maximum value (peak) as the interline position YP of the image of the character region L1. The tablet terminal divides the image of the character region L1 into a plurality of lines by dividing the image of the character region L1 at the line spacing position YP.

次にタブレット端末は、分割した複数の行の各々について個数ｗ２の分布を抽出し、個数ｗ２の分布に基づいて文字の隙間位置を特定する。隙間位置において個数ｗ２は極大値（ピーク）となる。このため、タブレット端末は、個数ｗ２の分布に基づいて、個数ｗ２が極大値となる位置を隙間位置として特定する。隙間位置は、画像ブロックの境界位置の候補となる。 Next, the tablet terminal extracts the distribution of the number w2 for each of the plurality of divided lines, and specifies the character gap position based on the distribution of the number w2. In the gap position, the number w2 has a maximum value (peak). For this reason, the tablet terminal specifies the position where the number w2 becomes the maximum value as the gap position based on the distribution of the number w2. The gap position is a candidate for the boundary position of the image block.

図６は、本発明の第１の実施の形態において、個数ｗ２の分布に基づいて特定された文字の隙間位置を模式的に示す図である。図６では、文字の隙間位置を三角形の先端で示している。 FIG. 6 is a diagram schematically showing the gap positions of characters specified based on the distribution of the number w2 in the first embodiment of the present invention. In FIG. 6, the position of the gap between characters is indicated by the tip of a triangle.

図６を参照して、「い」、「り］、「こ」、「ふ」、または「川」などの文字は、互いに離れた複数の線によって構成されている。これらの文字では、文字の内部の位置で個数ｗ２が極大値となり、文字の内部が隙間位置として特定される。文字の内部の隙間位置が画像ブロックの境界位置となると、文字が２つに途切れ、途切れた文字はＯＣＲ処理することができなくなる。加えて、文字の内部の隙間位置が画像ブロックの境界位置となると、画像ブロックの切片から、連結されるべき他の画像ブロックが判明し、セキュリティーレベルが低下するおそれもある。画像ブロックの境界位置としては、文字の内部の隙間位置ではなく、文字同士の隙間位置が決定される必要がある。 Referring to FIG. 6, characters such as “I”, “Ri”, “Ko”, “Fu”, or “River” are composed of a plurality of lines separated from each other. In these characters, the number w2 has a maximum value at the position inside the character, and the inside of the character is specified as the gap position. When the gap position inside the character becomes the boundary position of the image block, the character is interrupted in two, and the interrupted character cannot be subjected to OCR processing. In addition, when the gap position inside the character becomes the boundary position of the image block, another image block to be connected is found from the segment of the image block, and the security level may be lowered. As the boundary position of the image block, not the gap position inside the character but the gap position between the characters needs to be determined.

そこでタブレット端末は、特定した隙間位置のうち、隣接する他の隙間位置との間隔が閾値以上である隙間位置を、境界位置（図６における「分割ＯＫ」の位置）として決定する。閾値としては、隣接する隙間位置同士の間隔の平均値または標準値などを採用することができる。 Therefore, the tablet terminal determines, as the boundary position (the position of “divided OK” in FIG. 6), the gap position in which the gap between the adjacent gap positions is equal to or greater than the threshold among the specified gap positions. As the threshold value, an average value or a standard value of intervals between adjacent gap positions can be employed.

具体的には、タブレット端末は、複数の文字の隙間位置の各々の間隔（距離）Ｄ１を計算する。次にタブレット端末は、閾値よりも短い間隔Ｄ１が２つ以上連続している場所をマークする。タブレット端末は、マークした場所を構成する隙間位置を避けて、それ以外の部分に存在する隙間位置の中から、画像ブロックの境界位置を決定する。 Specifically, the tablet terminal calculates an interval (distance) D1 of each of the gap positions of a plurality of characters. Next, the tablet terminal marks a place where two or more intervals D1 shorter than the threshold value are continuous. The tablet terminal avoids the gap positions that make up the marked location, and determines the boundary position of the image block from among the gap positions that exist in other portions.

図７は、本発明の第１の実施の形態において、文字領域Ｌ１の画像を分割することにより得られた複数の画像ブロックＢＬを模式的に示す図である。 FIG. 7 is a diagram schematically showing a plurality of image blocks BL obtained by dividing the image of the character region L1 in the first embodiment of the present invention.

図７を参照して、タブレット端末は、決定した境界位置ＸＰで、複数の行の各々を分割する。これにより、複数の画像ブロックＢＬが得られる。本実施の形態では、複数の画像ブロックＢＬの各々が文字を含むレベルで分割されているため、複数の画像ブロックＢＬの各々がＯＣＲ処理を受けた際に、画像ブロックＢＬに含まれる文字が正しく認識され易くなる。 Referring to FIG. 7, the tablet terminal divides each of the plurality of rows at the determined boundary position XP. Thereby, a plurality of image blocks BL are obtained. In the present embodiment, since each of the plurality of image blocks BL is divided at a level including characters, when each of the plurality of image blocks BL is subjected to OCR processing, the characters included in the image block BL are correct. It becomes easy to be recognized.

図８は、タブレット端末が保持する分割テーブルを模式的に示す図である。 FIG. 8 is a diagram schematically illustrating a division table held by the tablet terminal.

図８を参照して、分割テーブルは、セキュリティーレベルと、被分割文字数およびダミーブロック数との関係を示すテーブルである。タブレット端末は、分割テーブルを参照して、設定されたセキュリティーレベルに応じた被分割文字数およびダミーブロックの数を決定する。セキュリティーレベルとは、図３に示す画面を通じて設定されたセキュリティーレベルである。被分割文字数とは、画像ブロックのサイズ（１つの画像ブロックに含まれる文字数）である。ダミーブロックとは、複数の画像ブロックの配列順序を変更する際に挿入される、文字領域の画像とは無関係なブロックである。ダミーブロックについては後述する。 Referring to FIG. 8, the division table is a table showing the relationship between the security level and the number of characters to be divided and the number of dummy blocks. The tablet terminal refers to the division table and determines the number of characters to be divided and the number of dummy blocks according to the set security level. The security level is a security level set through the screen shown in FIG. The number of characters to be divided is the size of an image block (the number of characters included in one image block). The dummy block is a block that is inserted when the arrangement order of a plurality of image blocks is changed and is not related to the image of the character area. The dummy block will be described later.

分割テーブルでは、セキュリティーレベルが高くなるほど被分割文字数が少なくなり、ダミーブロック数が多くなるように規定されている。たとえば、セキュリティーレベルが１の場合には、被分割文字数が９個であり、ダミーブロックの数が０個である。セキュリティーレベルが３の場合には、被分割文字数は５個であり、ダミーブロック数は１個である。 The division table stipulates that the higher the security level, the smaller the number of characters to be divided and the larger the number of dummy blocks. For example, when the security level is 1, the number of characters to be divided is nine and the number of dummy blocks is zero. When the security level is 3, the number of characters to be divided is 5, and the number of dummy blocks is 1.

（複数の画像ブロックの配列順序の変更方法および連結方法） (How to change the order of multiple image blocks and how to connect them)

続いて、複数の画像ブロックの配列順序の変更方法および連結方法（図２の処理ＰＲ７）について詳細に説明する。 Next, a method for changing the arrangement order of a plurality of image blocks and a connection method (processing PR7 in FIG. 2) will be described in detail.

図９は、本発明の第１の実施の形態においてタブレット端末が作成する番号テーブルを模式的に示す表である。 FIG. 9 is a table schematically showing a number table created by the tablet terminal according to the first embodiment of the present invention.

図９を参照して、番号テーブルは、第１の番号と第２の番号との関係を模式的に示すテーブルである。ここでは、説明の便宜のため、文字領域の画像を９個（分割数９）の画像ブロックに分割した場合を想定する。 Referring to FIG. 9, the number table is a table schematically showing the relationship between the first number and the second number. Here, for convenience of explanation, it is assumed that the image of the character area is divided into nine (9 divisions) image blocks.

タブレット端末は、複数の画像ブロックの各々に第１の番号を付与する。第１の番号は、配列順序を変更する前の複数の画像ブロックの各々の順序を示すものである。具体的には、タブレット端末は、複数の画像ブロックの配列順序に従って、複数の画像ブロックの各々に、第１の番号として「１」、「２」、「３」、「４」、「５」、「６」、「７」」、「８」、「９」という第１の番号を付与する。 The tablet terminal assigns a first number to each of the plurality of image blocks. The first number indicates the order of each of the plurality of image blocks before the arrangement order is changed. Specifically, the tablet terminal assigns a first number “1”, “2”, “3”, “4”, “5” to each of the plurality of image blocks according to the arrangement order of the plurality of image blocks. , “6”, “7” ”,“ 8 ”,“ 9 ”are assigned first numbers.

次にタブレット端末は、第１の番号の配列順序をランダムに並び替える。並び替えの方法としては、たとえば、１〜９の乱数を発生させた後、前に使われていない番号に限って順次採用する方法などがある。ダミーブロックを挿入する場合、次にタブレット端末は、分割数よりも大きな数字（ここでは「１０」）を任意の位置に挿入する。この数字はダミーブロックに相当するものである。ダミーブロックを示す数字は分割数よりも大きいため、複数の画像ブロックと容易に区別することができる。 Next, the tablet terminal randomly rearranges the arrangement order of the first numbers. As a rearrangement method, for example, after random numbers 1 to 9 are generated, only numbers that have not been used before are sequentially adopted. When inserting a dummy block, the tablet terminal next inserts a number larger than the number of divisions (here, “10”) at an arbitrary position. This number corresponds to a dummy block. Since the number indicating the dummy block is larger than the number of divisions, it can be easily distinguished from a plurality of image blocks.

タブレット端末は、得られた番号列に従って、複数の画像ブロックの各々に、第２の番号を付与する。ここでは、タブレット端末は、第１の番号が「１」である画像ブロックに「３」という第２の番号が付与され、第１の番号が「２」である画像ブロックに「５」という番号が付与され、第１の番号が「３」である画像ブロックに「８」という第２の番号が付与されている。第２の番号は、配列順序を変更した後の複数の画像ブロックの配列順序（ＯＣＲ端末に送信する暗号化画像における複数の画像ブロックの配列順序）を示すものである。 The tablet terminal assigns a second number to each of the plurality of image blocks according to the obtained number sequence. Here, in the tablet terminal, the second number “3” is assigned to the image block whose first number is “1”, and the number “5” is assigned to the image block whose first number is “2”. And the second number “8” is assigned to the image block whose first number is “3”. The second number indicates the arrangement order of the plurality of image blocks after changing the arrangement order (the arrangement order of the plurality of image blocks in the encrypted image transmitted to the OCR terminal).

タブレット端末は、複数の画像ブロックの配列順序をランダムに変更するたびに、新たな番号テーブルを作成してもよいし、予め保持していた番号テーブルを用いて、複数の画像ブロックの配列順序を変更してもよい。タブレット端末は、少なくとも文字領域の画像のテキストデータを作成するまで、番号テーブルを保持する。 Each time the tablet terminal randomly changes the arrangement order of the plurality of image blocks, it may create a new number table, or the number order of the plurality of image blocks may be changed using the number table previously stored. It may be changed. The tablet terminal holds a number table until text data of at least a character area image is created.

図１０は、本発明の第１の実施の形態において、複数の画像ブロックＢＬの各々に付けられた第１の番号を模式的に示す図である。図１１は、図１０に示す複数の画像ブロックＢＬの各々に含まれる文字列を表記したものである。なお以降の図では、便宜上、画像ブロックの外周に黒枠を付けていることがあるが、実際には黒枠は存在しない。 FIG. 10 is a diagram schematically showing a first number assigned to each of a plurality of image blocks BL in the first embodiment of the present invention. FIG. 11 shows a character string included in each of the plurality of image blocks BL shown in FIG. In the following drawings, a black frame may be attached to the outer periphery of the image block for convenience, but no black frame actually exists.

図１０を参照して、ここでの分割数は５４である。タブレット端末は、文字領域Ｌ１の画像を分割することにより得られた複数の画像ブロックＢＬの各々に１〜５４の各々の第１の番号を付ける。タブレット端末は、横書きであることを想定して複数の画像ブロックＢＬの各々に第１の番号を付ける。複数の画像ブロックＢＬの各々を第１の番号に従って配列させると、図１１に示すように、文字領域Ｌ１の画像が得られ、元の意味を持つ文章が得られる。 Referring to FIG. 10, the number of divisions here is 54. The tablet terminal assigns each first number of 1 to 54 to each of the plurality of image blocks BL obtained by dividing the image of the character region L1. The tablet terminal assumes a horizontal writing and assigns a first number to each of the plurality of image blocks BL. When each of the plurality of image blocks BL is arranged according to the first number, an image of the character region L1 is obtained as shown in FIG. 11, and a sentence having the original meaning is obtained.

なお、図１１中の「（２２）」、「（２３）」、「（２４）」、「（３６）」、「（５３）」、および「（５４）」と表記された画像ブロックＢＬは、いずれも空白（文字を含まない）のブロックである。これらの画像ブロックＢＬのうち、「（５３）」および「（５４）」と表記されたものは、文章の終わりの空白である。タブレット端末は、複数の画像ブロックＢＬが空白の画像ブロックを含む場合に、一部または全部の空白の画像ブロックにダミーの文字画像を挿入してもよい。 In FIG. 11, the image blocks BL indicated as “(22)”, “(23)”, “(24)”, “(36)”, “(53)”, and “(54)” , Both are blank (not including character) blocks. Among these image blocks BL, those described as “(53)” and “(54)” are blanks at the end of the sentence. When the plurality of image blocks BL include blank image blocks, the tablet terminal may insert dummy character images into some or all blank image blocks.

図１２は、本発明の第１の実施の形態において、複数の画像ブロックＢＬに関する番号テーブルを模式的に示す表である。図１３は、本発明の第１の実施の形態において、第２の番号に従って配列順序を変更した後の複数の画像ブロックＢＬの各々を模式的に示す図である。図１３において、複数の画像ブロックＢＬの各々には、第１の番号が表示されている。 FIG. 12 is a table schematically showing a number table related to a plurality of image blocks BL in the first embodiment of the present invention. FIG. 13 is a diagram schematically showing each of the plurality of image blocks BL after the arrangement order is changed according to the second number in the first embodiment of the present invention. In FIG. 13, a first number is displayed in each of the plurality of image blocks BL.

図１２を参照して、複数の画像ブロックＢＬの各々には、第１の番号および第２の番号が付与されている。たとえば、第１の番号が「１」である画像ブロックに「２６」という第２の番号が付与されており、第１の番号が「２」である画像ブロックに「３６」という番号が付与されており、第１の番号が「３」である画像ブロックに「１６」という第２の番号が付与されている。 Referring to FIG. 12, a first number and a second number are assigned to each of the plurality of image blocks BL. For example, a second number “26” is assigned to an image block whose first number is “1”, and a number “36” is assigned to an image block whose first number is “2”. The second number “16” is assigned to the image block whose first number is “3”.

図１３を参照して、タブレット端末は、第１および第２の番号を付与した後で、第２の番号に従って、複数の画像ブロックＢＬの配列順序を変更する（この際、必要に応じてダミーブロックを挿入する）。その結果、第１の番号が「１」である画像ブロックＢＬ１は、２６番目の位置に配置される。第１の番号が「２」である画像ブロックＢＬ２は、３６番目の位置に配置される。第１の番号が「３」である画像ブロックＢＬ３は、１６番目の位置に配置される。 Referring to FIG. 13, after assigning the first and second numbers, the tablet terminal changes the arrangement order of the plurality of image blocks BL according to the second number (at this time, dummy Insert a block). As a result, the image block BL1 having the first number “1” is arranged at the 26th position. The image block BL2 having the first number “2” is arranged at the 36th position. The image block BL3 having the first number “3” is arranged at the 16th position.

図１４は、本発明の第１の実施の形態において、第２の番号に従って配列順序を変更した後の複数の画像ブロックＢＬの各々の間に、連結用画像を挿入した状態を模式的に示す図である。なお図１４では、図１３に示す複数の画像ブロックＢＬの各々に含まれる文字列が表記されている。 FIG. 14 schematically shows a state in which a connecting image is inserted between each of the plurality of image blocks BL after the arrangement order is changed according to the second number in the first embodiment of the present invention. FIG. In FIG. 14, character strings included in each of the plurality of image blocks BL shown in FIG. 13 are shown.

図１４を参照して、タブレット端末は、配列順序を変更した後の複数の画像ブロックＢＬの各々を互いに連結することにより、暗号化画像を作成する。タブレット端末は、連結する際に、配列順序を変更した後の複数の画像ブロックＢＬの各々の間に、たとえば「＋」などの連結用画像を挿入する。連結用画像としては、任意のものを使用することができるが、ＯＣＲ端末でのＯＣＲ処理において正しく認識されるものであり、文字認識の結果が既知であるものであることが好ましい。連結用画像は、典型的には、文字ではない記号の画像である。 Referring to FIG. 14, the tablet terminal creates an encrypted image by connecting each of the plurality of image blocks BL after the arrangement order is changed. When connecting, the tablet terminal inserts a connecting image such as “+” between each of the plurality of image blocks BL after the arrangement order is changed. Any image can be used as the concatenation image, but it is preferably one that is recognized correctly in the OCR processing at the OCR terminal and that the result of character recognition is known. The connection image is typically an image of a symbol that is not a character.

なお、第１の番号が「５３」である空白の画像ブロックＢＬ４および第１の番号が「５４」である空白の画像ブロックＢＬ５の各々には、「＃＃＃＃＃」というダミーの文字画像が挿入されている。ダミーの文字画像としては、任意のものを使用することができるが、ＯＣＲ処理の結果が既知である記号が表示されたものが用いられることが好ましい。 A dummy character image “####” is included in each of the blank image block BL4 having the first number “53” and the blank image block BL5 having the first number “54”. Has been inserted. As the dummy character image, an arbitrary image can be used, but it is preferable to use an image on which a symbol with a known result of OCR processing is displayed.

作成された暗号化画像は、文字領域Ｌ１の暗号化された画像に相当する。 The created encrypted image corresponds to the encrypted image in the character area L1.

（暗号化画像の送信方法） (Encrypted image transmission method)

次に、暗号化画像の送信方法（図２の処理ＰＲ８）について詳細に説明する。 Next, the encrypted image transmission method (process PR8 in FIG. 2) will be described in detail.

図１５は、本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像の一例を模式的に示す図である。図１６は、本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像の他の例を模式的に示す図である。図１７は、本発明の第１の実施の形態において、文字領域Ｌ１の画像に基づいて作成された暗号化画像のさらに他の例を模式的に示す図である。 FIG. 15 is a diagram schematically showing an example of an encrypted image created based on the image of the character area L1 in the first embodiment of the present invention. FIG. 16 is a diagram schematically showing another example of the encrypted image created based on the image of the character area L1 in the first embodiment of the present invention. FIG. 17 is a diagram schematically showing still another example of the encrypted image created based on the image of the character area L1 in the first embodiment of the present invention.

図１５を参照して、暗号化画像は、配列順序を変更した後の複数の画像ブロックに基づいて作成されたものであればよい。本例において、タブレット端末は、連結用画像を挿入した複数の画像ブロックを含む１枚の画像として暗号化画像ＳＤ１を作成する。この場合、タブレット端末は、１つのＯＣＲ端末３００−１（図１）に対して暗号化画像ＳＤ１を送信する。 Referring to FIG. 15, the encrypted image only needs to be created based on a plurality of image blocks after the arrangement order is changed. In this example, the tablet terminal creates the encrypted image SD1 as one image including a plurality of image blocks into which the connection images are inserted. In this case, the tablet terminal transmits the encrypted image SD1 to one OCR terminal 300-1 (FIG. 1).

図１６を参照して、本例において、タブレット端末は、連結用画像を挿入した複数の画像ブロックを行毎に分割することにより、各行の暗号化画像ＳＤ１〜ＳＤ９の各々を作成する。タブレット端末は、１つのＯＣＲ端末３００−１に対して暗号化画像ＳＤ１〜ＳＤ９を送信する。これにより、セキュリティーを向上することができる。 Referring to FIG. 16, in this example, the tablet terminal divides a plurality of image blocks into which images for connection are inserted for each row, thereby creating each of encrypted images SD1 to SD9 in each row. The tablet terminal transmits encrypted images SD1 to SD9 to one OCR terminal 300-1. Thereby, security can be improved.

図１７を参照して、本例において、タブレット端末は、連結用画像を挿入した複数の画像ブロックを行毎に分割することにより、各行の暗号化画像ＳＤ１〜ＳＤ９の各々を作成する。タブレット端末は、２つのＯＣＲ端末に対して暗号化画像ＳＤ１〜ＳＤ９を２つに分割して送信する。タブレット端末は、たとえばＯＣＲ端末３００−１に対して上部（第１の部分の一例）の暗号化画像ＳＤ１〜ＳＤ５を送信し（図１７（ａ））、ＯＣＲ端末３００−２（図１）に対して下部（第２の部分の一例）の暗号化画像ＳＤ６〜ＳＤ９を送信する（図１７（ｂ））。これにより、セキュリティーを一層向上することができる。すなわち、万が一、一方のＯＣＲ端末に送信した暗号化画像の配列順序が第三者によって入手され、正しい配列順序に戻されたとしても、他方のＯＣＲ端末に送信された暗号化画像が入手されない限り、第三者によって文字領域の画像が完全に再現されることはない。 Referring to FIG. 17, in this example, the tablet terminal creates each of the encrypted images SD1 to SD9 of each row by dividing a plurality of image blocks into which the images for connection are inserted for each row. The tablet terminal divides the encrypted images SD1 to SD9 into two for transmission to the two OCR terminals. The tablet terminal transmits, for example, the upper (an example of the first part) encrypted images SD1 to SD5 to the OCR terminal 300-1 (FIG. 17A) and sends it to the OCR terminal 300-2 (FIG. 1). In contrast, encrypted images SD6 to SD9 in the lower part (an example of the second part) are transmitted (FIG. 17B). Thereby, security can be further improved. In other words, even if the arrangement order of encrypted images transmitted to one OCR terminal is obtained by a third party and returned to the correct arrangement order, the encrypted images transmitted to the other OCR terminal are not obtained. The character area image is not completely reproduced by a third party.

なお、タブレット端末が複数の暗号化画像を作成する場合、複数の暗号化画像の各々には、文字領域の画像を特定する情報と、文字領域の画像における暗号化画像の位置とを示すファイル名が付されることが好ましい。たとえば２つめの文字領域Ｌ２（図４）の画像における３行目の暗号化画像であれば、「ｒｅｇ０２ｌｉｎｅ０３．ｊｐｅｇ」などのファイル名が付されることが好ましい。 When the tablet terminal creates a plurality of encrypted images, each of the plurality of encrypted images includes a file name indicating information specifying the image of the character area and the position of the encrypted image in the image of the character area. Is preferably attached. For example, in the case of the encrypted image on the third line in the image of the second character area L2 (FIG. 4), it is preferable to assign a file name such as “reg02line03.jpeg”.

（ＯＣＲ後データの作成方法） (How to create post-OCR data)

次に、ＯＣＲ後データの作成方法（図２の処理ＰＲ９）について説明する。 Next, a method for creating post-OCR data (process PR9 in FIG. 2) will be described.

ＯＣＲ端末は、暗号化画像を受信すると、受信した暗号化画像を、ＯＣＲ端末内の個人フォルダに格納する。この個人フォルダは、タブレット端末のユーザーに事前に割り当てられたフォルダである。そしてＯＣＲ端末は、タブレット端末のソフトウェアからのコマンドを受信すると、ＯＣＲ処理を開始する。 When the OCR terminal receives the encrypted image, the OCR terminal stores the received encrypted image in a personal folder in the OCR terminal. This personal folder is a folder assigned in advance to the user of the tablet terminal. When the OCR terminal receives a command from the software of the tablet terminal, the OCR terminal starts OCR processing.

図１８は、本発明の第１の実施の形態において、ＯＣＲ端末が作成したＯＣＲ後データを模式的に示す図である。 FIG. 18 is a diagram schematically showing post-OCR data created by the OCR terminal in the first embodiment of the present invention.

図１８を参照して、ＯＣＲ端末は、テキスト形式のＯＣＲ後データＯＤを作成し、ＯＣＲ後データＯＤを、暗号化画像が格納されているのと同じ個人フォルダに格納する。ＯＣＲ後データは、文字領域Ｌ１の画像のテキストデータを暗号化したものに相当する。 Referring to FIG. 18, the OCR terminal creates post-OCR data OD in text format, and stores post-OCR data OD in the same personal folder where the encrypted image is stored. The post-OCR data corresponds to the encrypted text data of the image in the character area L1.

ＯＣＲ後データＯＤは、暗号化画像のファイル名と同じファイル名（拡張子を除く）が付与されることが好ましい。具体的には、「ｒｅｇ０２ｌｉｎｅ０３．ｊｐｅｇ」というファイル名の暗号化画像に対してＯＣＲ処理を行った場合には、ＯＣＲ後データＯＤには、「ｒｅｇ０２ｌｉｎｅ０３．ｔｘｔ」というファイル名が付与されることが好ましい。これにより、データの取り違えを抑止することができる。 The post-OCR data OD is preferably given the same file name (excluding the extension) as the file name of the encrypted image. Specifically, when the OCR process is performed on the encrypted image having the file name “reg02line03.jpeg”, the file name “reg02line03.txt” may be assigned to the post-OCR data OD. preferable. Thereby, it is possible to prevent data from being mixed up.

ＯＣＲ端末は、ＯＣＲ処理後、ＯＣＲ処理の対象となった暗号化画像を削除する。タブレット端末は、ＯＣＲ処理完了（変換完了）の通知とともにＯＣＲ後データＯＤをタブレット端末に送信する。 After the OCR process, the OCR terminal deletes the encrypted image that is the target of the OCR process. The tablet terminal transmits post-OCR data OD to the tablet terminal together with a notification of completion of OCR processing (conversion completion).

なお、暗号化画像の受信、ＯＣＲ処理、およびＯＣＲ後データＯＤの送信という一連の処理を１つのＯＣＲ端末が同時に行うことがないよう、タブレット端末は管理する。これにより、データの取り違えを抑止することができる。 Note that the tablet terminal manages a series of processes such as reception of encrypted images, OCR processing, and transmission of post-OCR data OD so that one OCR terminal does not perform simultaneously. Thereby, it is possible to prevent data from being mixed up.

（文字領域の画像のテキストデータの作成方法およびテキストデータの貼り付け方法） (How to create text data for text area images and paste text data)

次に、文字領域の画像のテキストデータの作成方法（図２の処理ＰＲ１１）およびテキストデータの貼り付け方法（図２の処理ＰＲ１２）について説明する。 Next, a text data image creation method (process PR11 in FIG. 2) and text data paste method (process PR12 in FIG. 2) of the character area image will be described.

タブレット端末は、ＯＣＲ後データを受信すると、番号テーブルに基づいてＯＣＲ後データＯＤの配列順序を変更することにより、文字領域Ｌ１内のテキストデータを作成する。テキストデータの作成は、タブレット端末のソフトウェアを用いて行われる。 When receiving the post-OCR data, the tablet terminal changes the arrangement order of the post-OCR data OD based on the number table, thereby creating text data in the character area L1. Text data is created using the software of the tablet terminal.

図１９は、本発明の第１の実施の形態において、ＯＣＲ後データを分割することによって得られた複数の文字列を模式的に示す図である。図２０は、本発明の第１の実施の形態において作成された、文字領域Ｌ１内のテキストデータを模式的に示す図である。 FIG. 19 is a diagram schematically showing a plurality of character strings obtained by dividing post-OCR data in the first embodiment of the present invention. FIG. 20 is a diagram schematically showing text data in the character area L1 created in the first embodiment of the present invention.

図１９を参照して、タブレット端末は、ＯＣＲ後データに含まれる「＋」という連結用画像に基づいて、ＯＣＲ後データを、分割数５４の複数の文字列ＣＳに分解する。次にタブレット端末は、番号テーブルに基づいて複数の文字列ＣＳの配列順序を元の配列順序に並べ直す。図１２に示す番号テーブルによれば、第１の番号が「１」である画像ブロックは、２６番目の位置に移動している（「２６」という第２の番号を有している）。したがって、タブレット端末は、２５番目の「＋」と２６番目の「＋」とに挟まれた２６番目の文字列ＣＳ１の配列順序を１番目に変更する。同様に、第１の番号が「２」である画像ブロックは、３６番目の位置に移動している。したがって、タブレット端末は、３５番目の「＋」と３６番目の「＋」とに挟まれた３６番目の文字列ＣＳ２の配列順序を２番目に変更する。 Referring to FIG. 19, the tablet terminal decomposes the post-OCR data into a plurality of divisional character strings CS based on the concatenation image “+” included in the post-OCR data. Next, the tablet terminal rearranges the arrangement order of the plurality of character strings CS to the original arrangement order based on the number table. According to the number table shown in FIG. 12, the image block whose first number is “1” has moved to the 26th position (has the second number “26”). Therefore, the tablet terminal changes the arrangement order of the 26th character string CS1 sandwiched between the 25th “+” and the 26th “+” to the first. Similarly, the image block whose first number is “2” has moved to the 36th position. Therefore, the tablet terminal changes the arrangement order of the 36th character string CS2 sandwiched between the 35th “+” and the 36th “+” to the second.

タブレット端末は、配列順序を元の配列順序に戻した後、必要に応じて連結用画像およびダミーブロックを削除し、ダミーの文字画像に相当する文字（ここでは「＃＃＃＃＃」）という文字）を消去する。その後タブレット端末は、複数の文字列を互いに連結し、１つの文字列とする。これにより、図２０に示すように、文字領域Ｌ１のテキストデータＴＤが作成される。 After returning the arrangement order to the original arrangement order, the tablet terminal deletes the linking image and the dummy block as necessary, and calls the character corresponding to the dummy character image (here, “####”) Character). Thereafter, the tablet terminal connects a plurality of character strings to one character string. Thereby, as shown in FIG. 20, the text data TD of the character region L1 is created.

図２１は、本発明の第１の実施の形態におけるテキストデータの貼り付け方法を模式的に示す図である。 FIG. 21 is a diagram schematically showing a text data pasting method according to the first embodiment of the present invention.

図２１を参照して、読取画像データＩＭのＰＤＦファイルは、読取画像が含まれるレイヤーであるレイヤーＬＲ１と、レイヤーＬＲ１上に設けられた透明レイヤーＬＲ２とにより構成されている。タブレット端末は、得られたテキストデータＴＤを、透明レイヤーＬＲ２における文字領域Ｌ１に対応する位置に貼り付ける。タブレット端末は、同様に、文字領域Ｌ２およびＬ３の各々の画像から得られたテキストデータを、透明レイヤーＬＲ２における文字領域Ｌ２およびＬ３の各々に対応する位置に貼り付ける。これにより、サーチャブルＰＤＦが作成される。 Referring to FIG. 21, the PDF file of read image data IM includes a layer LR1 that is a layer including the read image, and a transparent layer LR2 provided on layer LR1. The tablet terminal pastes the obtained text data TD at a position corresponding to the character area L1 in the transparent layer LR2. Similarly, the tablet terminal pastes the text data obtained from the images of the character areas L2 and L3 at positions corresponding to the character areas L2 and L3 in the transparent layer LR2. As a result, a searchable PDF is created.

（文字画像処理システムの動作を示すフローチャート） (Flowchart showing the operation of the character image processing system)

図２２は、本発明の第１の実施の形態における文字画像処理システムの動作を示すフローチャートである。 FIG. 22 is a flowchart showing the operation of the character image processing system according to the first embodiment of the present invention.

図２２を参照して、タブレット端末のＣＰＵは、セキュリティーレベルの設定および実行指示を受け付けると（Ｓ１）、ＭＦＰに対してスキャンの実行指示を送信する（Ｓ３）。 Referring to FIG. 22, when receiving the security level setting and execution instruction (S1), the CPU of the tablet terminal transmits a scan execution instruction to the MFP (S3).

ＭＦＰのＣＰＵは、スキャンの実行指示を受信すると、原稿をスキャンし（Ｓ５）、読取画像データから文字領域の画像を抽出する（Ｓ７）。次にＭＦＰのＣＰＵは、読取画像のＰＤＦファイルを作成し（Ｓ９）、タブレット端末に対して文書領域の画像および座標、ならびに読取画像のＰＤＦファイルを送信する（Ｓ１１）。 When receiving the scan execution instruction, the CPU of the MFP scans the document (S5), and extracts an image of the character area from the read image data (S7). Next, the CPU of the MFP creates a PDF file of the read image (S9), and transmits the image and coordinates of the document area and the PDF file of the read image to the tablet terminal (S11).

タブレット端末のＣＰＵは、文書領域の画像などを受信すると、文字領域の画像を複数の画像ブロックに分割し（Ｓ１３）、複数の画像ブロックの配列順序を変更する（Ｓ１５）。続いてタブレット端末のＣＰＵは、必要に応じて複数の画像ブロックにダミーブロックやダミーの文字画像を挿入し、複数の画像ブロックを連結用画像で連結することにより、暗号化画像を作成する（Ｓ１７）。続いてタブレット端末のＣＰＵは、ＯＣＲ端末に対して暗号化画像を送信する（Ｓ１９）。 When the CPU of the tablet terminal receives the image of the document area or the like, it divides the image of the character area into a plurality of image blocks (S13), and changes the arrangement order of the plurality of image blocks (S15). Subsequently, the CPU of the tablet terminal inserts a dummy block or a dummy character image into a plurality of image blocks as necessary, and creates an encrypted image by connecting the plurality of image blocks with a connection image (S17). ). Subsequently, the CPU of the tablet terminal transmits the encrypted image to the OCR terminal (S19).

ＯＣＲ端末のＣＰＵは、暗号化画像に対してＯＣＲ処理を実行し（Ｓ２１）、得られたＯＣＲ後データをタブレット端末に送信する（Ｓ２３）。 The CPU of the OCR terminal performs an OCR process on the encrypted image (S21), and transmits the obtained post-OCR data to the tablet terminal (S23).

タブレット端末のＣＰＵは、ＯＣＲ後データを複数の文字列に分割し、複数の文字列の配列順序を元に戻す（Ｓ２５）。次にタブレット端末のＣＰＵは、複数の文字列から連結用画像を除去し、必要に応じてダミーブロックやダミーの文字画像に対応する文字を削除することにより、テキストデータを作成する（Ｓ２７）。次にタブレット端末のＣＰＵは、読取画像のＰＤＦファイルにテキストデータを貼り付けることにより、サーチャブルＰＤＦを作成し（Ｓ２９）、処理を終了する。 The CPU of the tablet terminal divides the post-OCR data into a plurality of character strings, and restores the arrangement order of the plurality of character strings (S25). Next, the CPU of the tablet terminal removes the connection image from the plurality of character strings, and deletes characters corresponding to the dummy block and the dummy character image as necessary, thereby creating text data (S27). Next, the CPU of the tablet terminal creates a searchable PDF by pasting the text data into the PDF file of the read image (S29), and ends the process.

［第２の実施の形態］ [Second Embodiment]

本実施の形態では、始めに、文字画像処理システムが行うサーチャブルＰＤＦ化の動作の概要を説明する。 In this embodiment, first, an outline of the searchable PDF operation performed by the character image processing system will be described.

図２３は、本発明の第２の実施の形態における文字画像処理システムの動作の概要を示すシーケンス図である。 FIG. 23 is a sequence diagram showing an outline of the operation of the character image processing system according to the second embodiment of the present invention.

図２３を参照して、本実施の形態における文字画像処理システムの動作のうち、タブレット端末がサーチャブルＰＤＦの作成指示を受け付ける処理（図２の処理ＰＲ０）から、ＭＦＰが文字領域の画像などをタブレット端末に送信する処理（図２の処理ＰＲ５）までは、第１の実施の形態における動作（図２）と同じである。したがって、その説明は繰り返さない。 Referring to FIG. 23, of the operations of the character image processing system according to the present embodiment, from the process in which the tablet terminal accepts a searchable PDF creation instruction (process PR0 in FIG. 2), Up to the process (process PR5 in FIG. 2) to be transmitted to the terminal is the same as the operation (FIG. 2) in the first embodiment. Therefore, the description will not be repeated.

タブレット端末は、文字領域の画像、文字領域の座標、および読取画像データのＰＤＦファイルを受信すると、文字領域の画像を複数の画像ブロックに分割する。そしてタブレット端末は、暗号化マトリクス（第１の関係情報の一例）に基づいて、複数の画像ブロックの配列順序を変更する（並び替える）。これにより、暗号化画像が作成される（処理ＰＲ１１）。続いてタブレット端末は、暗号化画像と、暗号化マトリクスとをＯＣＲ端末に送信する（処理ＰＲ１２）。 When the tablet terminal receives the image of the character area, the coordinates of the character area, and the PDF file of the read image data, the tablet terminal divides the image of the character area into a plurality of image blocks. Then, the tablet terminal changes the order of the plurality of image blocks based on the encryption matrix (an example of the first relationship information). As a result, an encrypted image is created (process PR11). Subsequently, the tablet terminal transmits the encrypted image and the encryption matrix to the OCR terminal (process PR12).

ＯＣＲ端末は、暗号化画像をタブレット端末から受信すると、タブレット端末からのコマンドに従って、暗号化マトリクスに基づいて、暗号化画像を文字領域の画像に復元する（元に戻す）（処理ＰＲ１３）。次にＯＣＲ端末は、タブレット端末からのコマンドに従って、文字領域の画像に対してＯＣＲ処理を行うことにより、ＯＣＲ後データを作成する（処理ＰＲ１４）。続いてＯＣＲ端末は、タブレット端末からのコマンドに従って、作成したＯＣＲ後データを所定のバイト数を有する複数のデータ片に分割し、暗号化マトリクス（第２の関係情報の一例）に基づいて、複数のデータ片の配列順序を変更する（並び替える）。これにより、暗号化したテキストデータが作成される（処理ＰＲ１５）。次にＯＣＲ端末は、暗号化したテキストデータをタブレット端末に送信する（処理ＰＲ１６）。 When the OCR terminal receives the encrypted image from the tablet terminal, the OCR terminal restores (restores) the encrypted image to the character area image based on the encryption matrix in accordance with the command from the tablet terminal (process PR13). Next, the OCR terminal creates post-OCR data by performing OCR processing on the character area image in accordance with the command from the tablet terminal (processing PR14). Subsequently, the OCR terminal divides the created post-OCR data into a plurality of data pieces having a predetermined number of bytes according to a command from the tablet terminal, and based on the encryption matrix (an example of second relationship information) The arrangement order of the data pieces is changed (rearranged). As a result, encrypted text data is created (process PR15). Next, the OCR terminal transmits the encrypted text data to the tablet terminal (process PR16).

タブレット端末は、暗号化したテキストデータをＯＣＲ端末から受信すると、暗号化マトリクスに基づいて、暗号化したテキストデータにおける複数のデータ片の配列順序を元に戻す。これにより、文字領域の画像のテキストデータが作成される（処理ＰＲ１７）。その後タブレット端末は、文字領域の座標に基づいて、得られたテキストデータをＰＤＦファイルに貼り付ける（処理ＰＲ１８）。これにより、サーチャブルＰＤＦが作成される。 When the tablet terminal receives the encrypted text data from the OCR terminal, the tablet terminal returns the arrangement order of the plurality of data pieces in the encrypted text data based on the encryption matrix. Thereby, text data of the image of the character area is created (process PR17). Thereafter, the tablet terminal pastes the obtained text data on the PDF file based on the coordinates of the character area (process PR18). As a result, a searchable PDF is created.

（暗号化マトリクスの構成） (Configuration of encryption matrix)

図２４は、本発明の第２の実施の形態における暗号化マトリクスの構成を模式的に示す図である。図２４では、暗号化マトリクス内の一部の要素が拡大されており、拡大された要素の濃度が数字で示されている。 FIG. 24 is a diagram schematically showing the configuration of the encryption matrix in the second embodiment of the present invention. In FIG. 24, some of the elements in the encryption matrix are enlarged, and the density of the enlarged elements is indicated by numerals.

図２４を参照して、タブレット端末は、暗号化マトリクスを保持している。暗号化マトリクスは、２次元の乱数テーブルであり、暗号化マトリクスは、配列順序を変更する前の複数の画像ブロックの配列順序（第１の番号）を示す座標の各々に、配列順序を変更した後の複数の画像ブロックの配列順序（第２の番号）を示す濃度の画素の各々を配置することにより作成されたものである。 Referring to FIG. 24, the tablet terminal holds an encryption matrix. The encryption matrix is a two-dimensional random number table, and the encryption matrix changes the arrangement order to each of the coordinates indicating the arrangement order (first number) of the plurality of image blocks before the arrangement order is changed. It is created by arranging each pixel of density indicating the arrangement order (second number) of a plurality of subsequent image blocks.

図２４の暗号化マトリクスは、縦方向に１２８（＝Ｍ、Ｍは自然数）個、横方向に１２８（＝Ｎ、Ｎは自然数）個、合計１６３８４個の要素を含んでいる。 The encryption matrix of FIG. 24 includes 128 (= M, M is a natural number) in the vertical direction and 128 (= N, N is a natural number) in the horizontal direction, for a total of 16384 elements.

暗号化マトリクスの各要素は、１つの行で見た場合に左から右に向かって１つずつ増加する座標を有している。また暗号化マトリクスの各要素は、下の行であるほど大きい座標を有している。すなわち、ｍ行ｎ列目の要素（ｍ、ｎはｍ≦Ｍ、ｎ≦Ｎを満たす自然数）の座標は、「（ｍ−１）×Ｍ＋ｎ」と表される。暗号化マトリクスの各要素の座標は、第１の番号を示している。 Each element of the encryption matrix has coordinates that increase by one from the left to the right when viewed in one row. Each element of the encryption matrix has a larger coordinate as it is in the lower row. That is, the coordinates of the element in the m-th row and the n-th column (m and n are natural numbers satisfying m ≦ M and n ≦ N) are expressed as “(m−1) × M + n”. The coordinates of each element of the encryption matrix indicate a first number.

また、各要素は、１６３８４（＝Ｍ×Ｎ）段階に区分された互いに異なる濃度で構成されている。要素の濃度は第２の番号を示しており、濃度が薄くなるに従って第２の番号が増加する。具体的には、最も濃度が濃い画素が「１」という第２の数字を示しており、２番目に濃度が濃い画素が「２」という第２の数字を示しており、最も濃度が薄い要素が「１６３８４」という第２の数字を示している。 Each element is composed of different concentrations divided into 16384 (= M × N) stages. The element density indicates a second number, and the second number increases as the density decreases. Specifically, the darkest pixel indicates the second number “1”, the second darkest pixel indicates the second number “2”, and the lightest element Indicates the second number “16384”.

図２５は、図２４の暗号化マトリクスが示す第１の番号と第２の番号との関係を数字で示した番号テーブルである。 FIG. 25 is a number table showing the relationship between the first number and the second number indicated by the encryption matrix of FIG.

図２４および図２５を参照して、１行１列目の要素（第１の番号が「１」である要素）は、「５６３８」番目に濃い濃度で表されている。１行２列目の要素（第１の番号が「２」である要素）は、「１２３５」番目に濃い濃度で表されている。１行３列目の要素（第１の番号が「３」である要素）は、「７５５」番目に濃い濃度で表されている。１行４列目の要素（第１の番号が「４」である要素）は、「６１７１」番目に濃い濃度で表されている。 Referring to FIGS. 24 and 25, the element in the first row and the first column (the element whose first number is “1”) is represented by the “5638” -th darkest density. The element in the first row and the second column (the element whose first number is “2”) is represented by the “1235” -th darkest density. The element in the first row and the third column (the element whose first number is “3”) is represented by the “755” th darkest density. The element in the first row and the fourth column (the element whose first number is “4”) is represented by the “6171” th darkest density.

なお、暗号化マトリクスは、ＦＭ（ＦｒｅｑｕｅｎｃｙＭｏｄｕｌａｔｉｏｎ）スクリーンのディザマトリクスを作成する技術を用いて作成することができる。 The encryption matrix can be created by using a technique for creating a dither matrix of an FM (Frequency Modulation) screen.

タブレット端末は、Ｍ×Ｎ＝８ａ×８ａ、８ａ×１０ａ、８ａ×１２ａ、および８ａ×１４ａ（ａ＝１〜１６程度）などの要素を持つ複数の暗号化マトリクスを予め保持しておき、文字領域の画像の大きさに従って、複数の画像ブロックの配列順序を変更する際に使用する暗号化マトリクスを選択してもよい。 The tablet terminal holds in advance a plurality of encryption matrices having elements such as M × N = 8a × 8a, 8a × 10a, 8a × 12a, and 8a × 14a (a = 1 to about 16). An encryption matrix to be used when changing the arrangement order of the plurality of image blocks may be selected according to the size of the image in the region.

タブレット端末が複数の暗号化マトリクスの中から使用する暗号化マトリクスを選択する場合、ＯＣＲ端末も同様に複数の暗号化マトリクスを予め保持しており、タブレット端末は、暗号化画像を送信する際に、選択した暗号化マトリクスを特定する情報をＯＣＲ端末に通知してもよい。またタブレット端末は、選択した暗号化マトリクスを、暗号化画像とともにＯＣＲ端末に送信してもよい。 When the tablet terminal selects an encryption matrix to be used from among a plurality of encryption matrices, the OCR terminal similarly holds a plurality of encryption matrices in advance, and the tablet terminal transmits an encrypted image. The OCR terminal may be notified of information specifying the selected encryption matrix. The tablet terminal may transmit the selected encryption matrix together with the encrypted image to the OCR terminal.

またタブレット端末は、暗号化画像を作成する際に新たな暗号化マトリクスを作成し、作成した暗号化マトリクスを、暗号化画像とともにＯＣＲ端末に送信してもよい。 The tablet terminal may create a new encryption matrix when creating the encrypted image, and transmit the created encryption matrix to the OCR terminal together with the encrypted image.

（暗号化画像の作成方法） (How to create an encrypted image)

次に、暗号化画像の作成方法（図２３の処理ＰＲ１１）について説明する。 Next, a method for creating an encrypted image (process PR11 in FIG. 23) will be described.

図２６は、本発明の第２の実施の形態における文字領域Ｌ１の画像を模式的に示す図である。図２７は、本発明の第１の実施の形態において、文字領域Ｌ１の画像を分割することにより得られた複数の画像ブロックＢＬを模式的に示す図である。 FIG. 26 is a diagram schematically showing an image of the character area L1 in the second embodiment of the present invention. FIG. 27 is a diagram schematically showing a plurality of image blocks BL obtained by dividing the image of the character region L1 in the first embodiment of the present invention.

図２６および図２７を参照して、タブレット端末は、複数の画像ブロックＢＬの各々が、文字領域中の文字よりも小さいサイズを有するように、文字領域Ｌ１の画像を複数の画像ブロックＢＬに分割する。ここでは、ＳＸ個×ＳＹ個の画素よりなる文字領域Ｌ１の画像が、ＢＸ（ＢＹは自然数）個×ＢＹ（ＢＹは自然数）個の画素を持つ複数の画像ブロックＢＬに分割されるものとする。 Referring to FIGS. 26 and 27, the tablet terminal divides the image of character region L1 into a plurality of image blocks BL such that each of the plurality of image blocks BL has a size smaller than the characters in the character region. To do. Here, it is assumed that the image of the character region L1 composed of SX × SY pixels is divided into a plurality of image blocks BL having BX (BY is a natural number) × BY (BY is a natural number) pixels. .

タブレット端末は、Ｍ個×Ｎ個（暗号化マトリクスの縦方向および横方向の要素の数）の画像ブロックで文字領域Ｌ１の画像全体がカバーされるように、画像ブロックのサイズ（ＢＸおよびＢＹの値）を決定する。言い換えれば、タブレット端末は、ＳＸ≦Ｍ×ＢＸ、ＳＹ≦Ｎ×ＢＹを満たす最小のＢＸおよびＢＹを決定する。 The tablet terminal has an image block size (BX and BY) so that the entire image of the character area L1 is covered with M × N image blocks (the number of elements in the vertical and horizontal directions of the encryption matrix). Value). In other words, the tablet terminal determines the minimum BX and BY that satisfy SX ≦ M × BX and SY ≦ N × BY.

なお、画像ブロックのサイズ（ＢＸ個×ＢＹ個）は、ＢＸ＝ｘ×ｓｆ（個）、ＢＹ＝ｙ×ｓｆ（個）と表記される。この表記方法は、１つの画像ブロックが、ｘ×ｙ＝１×１、１×２、１×３、または１×４・・・という矩形に対して、スケールファクターｓｆ（ｓｆ＝自然数）を乗じることにより得られる形状を有することを意味している。 Note that the image block size (BX × BY) is expressed as BX = x × sf (pieces) and BY = y × sf (pieces). In this notation method, one image block multiplies a rectangle of x × y = 1 × 1, 1 × 2, 1 × 3, or 1 × 4... By a scale factor sf (sf = natural number). It has the shape obtained by this.

画像ブロックのサイズは、セキュリティーレベルに応じて決定される。すなわち、セキュリティーレベルが高くなるほど、画像ブロックのサイズは小さくなる。画像ブロックのサイズが小さくなるほど、画像ブロックの絵柄に基づいて画像ブロック同士をつなぎ合わせることは困難となり、セキュリティーを向上することができる。 The size of the image block is determined according to the security level. That is, the higher the security level, the smaller the image block size. As the image block size decreases, it becomes more difficult to connect the image blocks based on the design of the image block, and security can be improved.

図２７では、文字領域Ｌ１の画像が、１７９２個×８９６個の画素により構成されており、文字領域Ｌ１の画像が、１４個×７個（＝ＢＸ個×ＢＹ個）の画素よりなる複数の画像ブロックに分割されている。この場合には、１つの文字が、およそ３個×６個＝１８個の画像ブロックに分割されている。この場合には、隣接した２つの文字の部分が１つの画像ブロックに含まれる可能性が低い。したがって、１つの画像ブロックから文字同士の配列順序を推測することは不可能である。 In FIG. 27, the image of the character region L1 is composed of 1792 × 896 pixels, and the image of the character region L1 is a plurality of pixels composed of 14 × 7 (= BX × BY) pixels. It is divided into image blocks. In this case, one character is divided into approximately 3 × 6 = 18 image blocks. In this case, it is unlikely that two adjacent character portions are included in one image block. Therefore, it is impossible to estimate the arrangement order of characters from one image block.

図２８は、本発明の第２の実施の形態において作成された暗号化画像を模式的に示す図である。 FIG. 28 is a diagram schematically showing an encrypted image created in the second embodiment of the present invention.

図２８を参照して、次にタブレット端末は、暗号化マトリクスに基づいて、複数の画像ブロックＢＬの配列順序を変更し、配列順序を変更した後の数の画像ブロックの各々を結合する。これにより、暗号化画像ＳＤが作成される。暗号化画像ＳＤは、文字が含まれているか否かさえ判断することができないものになっている。タブレット端末は、暗号化画像ＳＤをＯＣＲ端末に送信する。なお、タブレット端末は、配列順序を変更した後の数の画像ブロックの各々を結合せずに、変更後の配列順序でＯＣＲ端末に順次送信してもよい。 Referring to FIG. 28, the tablet terminal then changes the arrangement order of the plurality of image blocks BL based on the encryption matrix, and combines each of the number of image blocks after the arrangement order is changed. Thereby, the encrypted image SD is created. The encrypted image SD is such that it cannot be determined whether or not characters are included. The tablet terminal transmits the encrypted image SD to the OCR terminal. Note that the tablet terminal may sequentially transmit to the OCR terminal in the changed arrangement order without combining each of the number of image blocks after the arrangement order is changed.

（暗号化したテキストデータの作成方法） (How to create encrypted text data)

次に、暗号化したテキストデータの作成方法（図２３の処理ＰＲ１５）について説明する。 Next, a method for creating encrypted text data (process PR15 in FIG. 23) will be described.

図２９は、本発明の第２の実施の形態において生成されたＯＣＲ後データを模式的に示す図である。 FIG. 29 is a diagram schematically showing post-OCR data generated in the second embodiment of this invention.

図２９を参照して、ＯＣＲ端末は、ＯＣＲ端末に予めインストールされていたソフトウェアを用いて、暗号化マトリクスに基づいて、受信した暗号化画像の複数の画像ブロックの配列順序を元に戻す。これにより、元の文字領域Ｌ１の画像が復元される。そしてＯＣＲ端末は、復元した画像に対してＯＣＲ処理を行う。これにより、ＯＣＲ後データＯＤが作成される。作成されたＯＣＲ後データは、文字領域Ｌ１の画像に含まれる文字のテキストデータである。 Referring to FIG. 29, the OCR terminal restores the arrangement order of the plurality of image blocks of the received encrypted image based on the encryption matrix, using software preinstalled in the OCR terminal. Thereby, the image of the original character area L1 is restored. Then, the OCR terminal performs OCR processing on the restored image. As a result, post-OCR data OD is created. The created post-OCR data is text data of characters included in the image of the character area L1.

なお、ＯＣＲ端末は、タブレット端末が保持している暗号化マトリクスと同一の暗号化マトリクスを予め保持していてもよい。またタブレット端末は、暗号化画像とともに暗号化マトリクスをＯＣＲ端末に送信してもよい。 Note that the OCR terminal may previously hold the same encryption matrix as the encryption matrix held by the tablet terminal. The tablet terminal may transmit the encryption matrix together with the encrypted image to the OCR terminal.

図３０は、本発明の第２の実施の形態において、ＯＣＲ後データに含まれる文字を表示したバイナリエディタの画面を模式的に示す。 FIG. 30 schematically shows a binary editor screen displaying characters included in post-OCR data in the second embodiment of the present invention.

図３０を参照して、次にＯＣＲ端末は、ＯＣＲ後データを所定のデータ量を有する複数のデータ片に分割し、暗号化マトリクスに基づいて、複数のデータ片の配列順序を変更する。 Referring to FIG. 30, next, the OCR terminal divides the post-OCR data into a plurality of data pieces having a predetermined data amount, and changes the arrangement order of the plurality of data pieces based on the encryption matrix.

具体的には、ＯＣＲ端末は、ＯＣＲ後データに含まれる文字をバイナリエディタの画面ＢＳに表示させる。画面ＢＳは、領域ＲＧ１と領域ＲＧ２とを含んでいる。領域ＲＧ１は、ＯＣＲ後データに含まれる文字が表示される領域である。領域ＲＧ２は、ＯＣＲ後データに含まれる文字に対応する、Ｓｈｉｆｔ−ＪＩＳ形式の２バイトのバイナリーコードが表示される領域である。 Specifically, the OCR terminal displays characters included in the post-OCR data on the screen BS of the binary editor. The screen BS includes a region RG1 and a region RG2. Area RG1 is an area where characters included in post-OCR data are displayed. Area RG2 is an area in which a Shift-JIS format 2-byte binary code corresponding to characters included in post-OCR data is displayed.

ＯＣＲ端末は、領域ＲＧ２に表示されたバイナリーコードを所定のデータ量を有する複数のデータ片に分割する。複数のデータ片の各々のデータ量は、奇数バイト（たとえば１バイトまたは３バイト）であることが好ましい。これにより、複数のデータ片の各々が文字単位で分割されたものとなることが回避され、データ片に含まれるバイナリーコードから文字が解読されることを抑止することができる。 The OCR terminal divides the binary code displayed in the region RG2 into a plurality of data pieces having a predetermined data amount. The data amount of each of the plurality of data pieces is preferably an odd number of bytes (for example, 1 byte or 3 bytes). Thereby, it is avoided that each of the plurality of data pieces is divided in character units, and it is possible to prevent the characters from being decoded from the binary code included in the data pieces.

図３０では、ＯＣＲ後データに含まれる「［従来技術・・・」という文字が、「８１７９８Ｆ５Ｄ９７８８８Ｂ５Ａ・・・」というバイナリーコードに対応している。「［」という文字は「８１７９」というバイナリーコードに対応する（なお、［は図面では隅付き括弧）。「従」という文字は「８Ｆ５Ｄ」というバイナリーコードに対応する。「来」という文字は「９７８８」というバイナリーコードに対応する。「技」という文字は「８Ｂ５Ａ」というバイナリーコード対応する。「術」という文字は「８Ｆ７０」というバイナリーコードに対応する。 In FIG. 30, the characters “[Prior Art ...” included in the post-OCR data correspond to the binary code “81 79 8F 5D 97 88 8B 5A. The character “[” corresponds to the binary code “8179” (where [is a bracket with a corner in the drawing). The letter “subordinate” corresponds to the binary code “8F5D”. The characters “coming” correspond to the binary code “9788”. The character “technique” corresponds to the binary code “8B5A”. The letter “jutsu” corresponds to the binary code “8F70”.

１バイト単位のデータ片に分割する場合、ＯＣＲ端末は、領域ＲＧ２に表示されたバイナリーコードを「８１」、「７９」、「８Ｆ」、および「５Ｄ」・・・という複数のデータ片に分割し、バイナリーコードの配列順序に従って第１の番号を付与する。 When dividing into 1-byte data pieces, the OCR terminal divides the binary code displayed in the region RG2 into a plurality of data pieces “81”, “79”, “8F”, “5D”,. Then, the first number is assigned according to the arrangement order of the binary code.

３バイト単位のデータ片に分割する場合、ＯＣＲ端末は、領域ＲＧ２に表示されたバイナリーコードを「８１７９８Ｆ」、「５Ｄ９７８８」、および「８Ｂ５Ａ８Ｆ」・・・という複数のデータ片に分割し、バイナリーコードの配列順序に従って第１の番号を付与する。 When dividing into data pieces of 3 bytes, the OCR terminal divides the binary code displayed in the region RG2 into a plurality of data pieces “81798F”, “5D9788”, “8B5A8F”. The first number is assigned according to the arrangement order of

次にＯＣＲ端末は、暗号化マトリクスに基づいて、複数のデータ片の配列順序を変更する。これにより、暗号化したテキストデータが作成される。ＯＣＲ端末は、暗号化したテキストデータをタブレット端末に送信する。 Next, the OCR terminal changes the arrangement order of the plurality of data pieces based on the encryption matrix. As a result, encrypted text data is created. The OCR terminal transmits the encrypted text data to the tablet terminal.

ＯＣＲ端末は、複数の暗号化マトリクスを予め保持しておき、ＯＣＲ後データに含まれる文字の総数に従って、複数のデータ片の配列順序を変更する際に使用する暗号化マトリクスを選択してもよい。 The OCR terminal may store a plurality of encryption matrices in advance and select an encryption matrix to be used when changing the arrangement order of the plurality of data pieces according to the total number of characters included in the post-OCR data. .

ＯＣＲ端末が複数の暗号化マトリクスの中から使用する暗号化マトリクスを選択する場合、タブレット端末も同様に複数の暗号化マトリクスを予め保持しており、ＯＣＲ端末は、暗号化したテキストデータを送信する際に、選択した暗号化マトリクスを特定する情報をタブレット端末に通知してもよい。またＯＣＲ端末は、選択した暗号化マトリクスを、暗号化したテキストデータとともにタブレット端末に送信してもよい。 When the OCR terminal selects an encryption matrix to be used from among a plurality of encryption matrices, the tablet terminal similarly holds a plurality of encryption matrices in advance, and the OCR terminal transmits the encrypted text data. At this time, information specifying the selected encryption matrix may be notified to the tablet terminal. The OCR terminal may transmit the selected encryption matrix to the tablet terminal together with the encrypted text data.

またＯＣＲ端末は、暗号化したテキストデータを作成する際に新たな暗号化マトリクスを作成し、作成した暗号化マトリクスを、暗号化したテキストデータとともにタブレット端末に送信してもよい。 The OCR terminal may create a new encryption matrix when creating encrypted text data, and send the created encryption matrix to the tablet terminal together with the encrypted text data.

さらにＯＣＲ端末は、複数のデータ片の配列順序を変更する際に用いる暗号化マトリクスとして、暗号化画像を文字領域Ｌ１の画像に復元する際に用いた暗号化マトリクスと同一のものを用いてもよいし、異なるものを用いてもよい。 Further, the OCR terminal may use the same encryption matrix used when restoring the encrypted image to the image of the character area L1 as the encryption matrix used when changing the arrangement order of the plurality of data pieces. A different one may be used.

なお、本実施の形態における文字画像処理システムの構成および上述以外の動作は、第１の実施の形態における文字画像処理システムの構成および動作と同様であるため、その説明は繰り返さない。 The configuration of the character image processing system in the present embodiment and operations other than those described above are the same as the configuration and operation of the character image processing system in the first embodiment, and therefore description thereof will not be repeated.

図３１は、本発明の第２の実施の形態における文字画像処理システムの動作を示すフローチャートである。 FIG. 31 is a flowchart showing the operation of the character image processing system according to the second embodiment of the present invention.

図３１を参照して、文字画像処理システムは、始めに図２２に示すフローチャートにおけるステップＳ１〜ステップＳ１３の処理を行う。 Referring to FIG. 31, the character image processing system first performs steps S1 to S13 in the flowchart shown in FIG.

ステップＳ１３の処理に続いて、タブレット端末のＣＰＵは、暗号化マトリクスに基づいて、複数の画像ブロックの配列順序を変更することにより、暗号化画像を作成する（Ｓ１０１）。次にタブレット端末のＣＰＵは、暗号化画像をＯＣＲ端末に送信する（Ｓ１０３）。 Following the processing in step S13, the CPU of the tablet terminal creates an encrypted image by changing the arrangement order of the plurality of image blocks based on the encryption matrix (S101). Next, the CPU of the tablet terminal transmits the encrypted image to the OCR terminal (S103).

ＯＣＲ端末のＣＰＵは、暗号化画像を受信すると、暗号化マトリクスに基づいて、複数の画像ブロックの配列順序を元に戻すことにより、元の文字領域の画像を復元する（Ｓ１０５）。次にＯＣＲ端末のＣＰＵは、文字領域の画像に対してＯＣＲ処理を実行し（Ｓ１０７）、得られたＯＣＲ後データを複数のデータ片に分割する（Ｓ１０９）。続いてＯＣＲ端末のＣＰＵは、暗号化マトリクスに基づいて、複数のデータ片の配列順序を変更することにより、暗号化したテキストデータを作成する（Ｓ１１１）。次にＯＣＲ端末のＣＰＵは、暗号化したテキストデータをタブレット端末に送信する（Ｓ１１３）。 Upon receiving the encrypted image, the CPU of the OCR terminal restores the original character area image by returning the arrangement order of the plurality of image blocks based on the encryption matrix (S105). Next, the CPU of the OCR terminal executes OCR processing on the image of the character area (S107), and divides the obtained post-OCR data into a plurality of data pieces (S109). Subsequently, the CPU of the OCR terminal creates encrypted text data by changing the arrangement order of the plurality of data pieces based on the encryption matrix (S111). Next, the CPU of the OCR terminal transmits the encrypted text data to the tablet terminal (S113).

タブレット端末は、暗号化したテキストデータを受信すると、暗号化マトリクスに基づいて複数のデータ片の配列順序を元に戻すことにより、テキストデータを復元（作成）する（Ｓ１１５）。次にタブレット端末のＣＰＵは、読取画像のＰＤＦファイルにテキストデータを貼り付けることにより、サーチャブルＰＤＦを作成し（Ｓ１１７）、処理を終了する。 Upon receiving the encrypted text data, the tablet terminal restores (creates) the text data by returning the arrangement order of the plurality of data pieces based on the encryption matrix (S115). Next, the CPU of the tablet terminal creates a searchable PDF by pasting the text data into the PDF file of the read image (S117), and ends the process.

［実施の形態の効果］ [Effect of the embodiment]

上述の実施の形態によれば、ＯＣＲサイトを利用してＯＣＲ処理を行う場合に、タブレット端末は、ＯＣＲ端末に対して、ＯＣＲ処理の対象となる文字領域の画像の暗号化画像をＯＣＲサイトに送信する。またＯＣＲ端末は、タブレット端末に対して、文字領域の画像の暗号化したテキストデータを送信する。これにより、機密情報の漏洩を防止することができ、外部のＯＣＲサイトでのＯＣＲ処理のセキュリティー性を高めることができる。 According to the above-described embodiment, when performing OCR processing using an OCR site, the tablet terminal transmits an encrypted image of a character area image to be subjected to OCR processing to the OCR site. Send. The OCR terminal transmits the encrypted text data of the character area image to the tablet terminal. Thereby, leakage of confidential information can be prevented, and the security of OCR processing at an external OCR site can be improved.

また、ＯＣＲサイトを利用してサーチャブルＰＤＦを作成する場合には、画像データ内の文字領域の座標に基づいて、テキストデータを貼り付けることにより、サーチャブルＰＤＦを作成することができる。 In addition, when creating a searchable PDF using an OCR site, a searchable PDF can be created by pasting text data based on the coordinates of a character area in image data.

特に第１の実施の形態によれば、文字同士の隙間位置で画像データを分割することにより、複数の画像ブロックが作成され、作成した複数の画像ブロックの配列順序を変更することにより、暗号化画像が作成される。また、ＯＣＲ処理によって得られたテキストデータは、暗号化されてタブレット端末に送信される。これにより、万が一、暗号化画像が第三者によって不正に入手された場合であっても、第三者は元の画像データに含まれる文字列を把握することが困難になる。また、複数の画像ブロックの各々には文字が含まれているため、ＯＣＲ処理の正確性が向上する。 In particular, according to the first embodiment, a plurality of image blocks are created by dividing image data at a gap position between characters, and encryption is performed by changing the arrangement order of the created image blocks. An image is created. The text data obtained by the OCR process is encrypted and transmitted to the tablet terminal. As a result, even if the encrypted image is illegally obtained by a third party, it is difficult for the third party to grasp the character string included in the original image data. In addition, since each of the plurality of image blocks includes characters, the accuracy of the OCR process is improved.

特に第２の実施の形態によれば、画像データ中の文字よりも小さいサイズを有する複数の画像ブロックに画像データを分割することにより、複数の画像ブロックが作成され、作成した複数の画像ブロックの配列順序を変更することにより、暗号化画像が作成される。また、ＯＣＲ処理によって得られたテキストデータは、暗号化されてタブレット端末に送信される。これにより、万が一、暗号化画像が第三者によって不正に入手された場合であっても、第三者は画像データに含まれる文字さえも把握することが困難になる。 In particular, according to the second embodiment, by dividing the image data into a plurality of image blocks having a size smaller than the characters in the image data, a plurality of image blocks are created. By changing the arrangement order, an encrypted image is created. The text data obtained by the OCR process is encrypted and transmitted to the tablet terminal. Thereby, even if the encrypted image is illegally obtained by a third party, it becomes difficult for the third party to grasp even the characters included in the image data.

［その他］ [Others]

図３２は、本発明の変形例における文字画像処理システムの動作の概要を示すシーケンス図である。 FIG. 32 is a sequence diagram showing an outline of the operation of the character image processing system in the modification of the present invention.

図３２を参照して、本変形例においては、上述の第１の実施の形態におけるタブレット端末が行う各処理をＭＦＰが行う。文字画像処理システムは、タブレット端末を備えておらず、ＭＦＰ（情報処理装置の一例）とＯＣＲ端末とのみを備えている。本変形例における文字画像処理システムの動作について、以下に説明する。 Referring to FIG. 32, in this modification, the MFP performs each process performed by the tablet terminal in the first embodiment described above. The character image processing system does not include a tablet terminal, but includes only an MFP (an example of an information processing apparatus) and an OCR terminal. The operation of the character image processing system in this modification will be described below.

ＭＦＰのユーザーは、予めＭＦＰの原稿台に原稿をセットした状態で、ＭＦＰの操作パネルを通じてサーチャブルＰＤＦの作成指示を行う。ＭＦＰは、サーチャブルＰＤＦの作成指示を受け付ける（処理ＰＲ０）。 The user of the MFP issues a searchable PDF creation instruction through the operation panel of the MFP in a state where the document is set in advance on the document table of the MFP. The MFP accepts a searchable PDF creation instruction (process PR0).

ＭＦＰは、サーチャブルＰＤＦの作成指示を受け付けると、原稿の画像を光学的に読み取り、読取画像データを作成する（処理ＰＲ２）。次にＭＦＰは、読取画像データから文字領域の画像を抽出する（処理ＰＲ３）。続いてＭＦＰは、読取画像データのＰＤＦファイルを作成する（処理ＰＲ４）。次にＭＦＰは、文字領域の画像を複数の画像ブロックに分割する（処理ＰＲ６）。次にＭＦＰは、複数の画像ブロックの配列順序を変更する。次にＭＦＰは、配列順序を変更した後の複数の画像ブロックの各々の間を、連結記号を用いて連結し、暗号化画像を作成する（処理ＰＲ７）。続いてＭＦＰは、暗号化画像をＯＣＲ端末に送信する（処理ＰＲ８）。 When the MFP receives a searchable PDF creation instruction, the MFP optically reads an image of a document and creates read image data (process PR2). Next, the MFP extracts a character area image from the read image data (process PR3). Subsequently, the MFP creates a PDF file of the read image data (process PR4). Next, the MFP divides the image of the character area into a plurality of image blocks (process PR6). Next, the MFP changes the arrangement order of the plurality of image blocks. Next, the MFP connects the plurality of image blocks after the arrangement order is changed using a connection symbol to create an encrypted image (process PR7). Subsequently, the MFP transmits the encrypted image to the OCR terminal (process PR8).

ＯＣＲ端末は、暗号化画像をＭＦＰから受信すると、暗号化画像に対してＯＣＲ処理を行うことにより、ＯＣＲ後データを作成する（処理ＰＲ９）。続いてＯＣＲ端末は、作成したＯＣＲ後データをＭＦＰに送信する（処理ＰＲ１０）。 When the OCR terminal receives the encrypted image from the MFP, the OCR terminal performs OCR processing on the encrypted image to create post-OCR data (processing PR9). Subsequently, the OCR terminal transmits the created post-OCR data to the MFP (process PR10).

ＭＦＰは、ＯＣＲ後データをＯＣＲ端末から受信すると、受信したＯＣＲ後データを、画像ブロックの単位の複数の文字列に分割する。次にＭＦＰは、複数の文字列の配列順序を、複数の画像ブロックの変更前の配列順序に並べ直し、複数の文字列を結合する。これにより、文字領域の画像のテキストデータが作成される（処理ＰＲ１１）。その後ＭＦＰは、文字領域の座標に基づいて、得られたテキストデータをＰＤＦファイルに貼り付ける（処理ＰＲ１２）。これにより、サーチャブルＰＤＦが作成される。 When the MFP receives the post-OCR data from the OCR terminal, the MFP divides the received post-OCR data into a plurality of character strings in units of image blocks. Next, the MFP rearranges the arrangement order of the plurality of character strings into the arrangement order before the change of the plurality of image blocks, and combines the plurality of character strings. Thereby, text data of the image of the character area is created (process PR11). Thereafter, the MFP pastes the obtained text data on the PDF file based on the coordinates of the character area (process PR12). As a result, a searchable PDF is created.

同様に、上述の第２の実施の形態におけるタブレット端末が行う各動作をＭＦＰが行うことにより、タブレット端末が省略されてもよい。 Similarly, the tablet terminal may be omitted by the MFP performing each operation performed by the tablet terminal in the second embodiment.

本発明の処理の対象となる画像データは、ＰＤＦ形式のものに限られるものではない。本発明は、あらゆる形式の画像データに対してＯＣＲ処理を行う際に適用することができる。ＯＣＲ処理の対象となる画像データは、原稿を読み取った読取画像データである場合の他、情報処理装置が保持している画像データであってもよい。 The image data to be processed by the present invention is not limited to the PDF format. The present invention can be applied when OCR processing is performed on image data of any format. The image data to be subjected to the OCR process may be image data held by the information processing apparatus in addition to the read image data obtained by reading a document.

上述の実施の形態は互いに組み合わせることができる。たとえば、第１の実施の形態において、第２の実施の形態と同様の方法で、ＯＣＲ処理によって得られたＯＣＲ後データを、第２の実施の形態と同様の方法で複数のデータ片に分割して、タブレット端末に送信してもよい。また、第１の実施の形態において、第２の実施の形態のような暗号化マトリクスを用いて第１の番号と第２の番号との関係を記録してもよいし、第２の実施の形態において、第１の実施の形態のような番号テーブルを用いて第１の番号と第２の番号との関係を記録してもよい。 The above-described embodiments can be combined with each other. For example, in the first embodiment, the post-OCR data obtained by OCR processing is divided into a plurality of data pieces by the same method as in the second embodiment, using the same method as in the second embodiment. And you may transmit to a tablet terminal. In the first embodiment, the relationship between the first number and the second number may be recorded using the encryption matrix as in the second embodiment, or the second embodiment In the embodiment, the relationship between the first number and the second number may be recorded using a number table as in the first embodiment.

上述の実施の形態における処理は、ソフトウェアにより行なっても、ハードウェア回路を用いて行なってもよい。また、上述の実施の形態における処理を実行するプログラムを提供することもできるし、そのプログラムをＣＤ−ＲＯＭ、フレキシブルディスク、ハードディスク、ＲＯＭ、ＲＡＭ、メモリカードなどの記録媒体に記録してユーザーに提供することにしてもよい。プログラムは、ＣＰＵなどのコンピューターにより実行される。また、プログラムはインターネットなどの通信回線を介して、装置にダウンロードするようにしてもよい。 The processing in the above-described embodiment may be performed by software or may be performed using a hardware circuit. It is also possible to provide a program for executing the processing in the above-described embodiment, and record the program on a recording medium such as a CD-ROM, a flexible disk, a hard disk, a ROM, a RAM, or a memory card and provide it to the user. You may decide to do it. The program is executed by a computer such as a CPU. The program may be downloaded to the apparatus via a communication line such as the Internet.

上述の実施の形態は、すべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The above-described embodiment is to be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１００ＭＦＰ（ＭｕｌｔｉｆｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）
１１０，２１０，３１０ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）
１２０，２２０，３２０ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）
１３０，２３０，３３０ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）
１４０，２４０，３４０記憶部
１５０，２５０，３５０ネットワークＩ／Ｆ
１６０画像読取部
１７０ＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）作成部
１８０文字領域抽出部
１９０操作パネル
１９５画像形成部
２００タブレット端末
２６０操作パネル
２７０，３７０暗号化部
２８０，３８０暗号解読部
２９０ＰＤＦ編集部
３００−１，３００−２ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）端末
３６０ＯＣＲ処理部
４０１イントラネット
４０２インターネット
ＢＬ，ＢＬ１，ＢＬ２，ＢＬ３，ＢＬ４，ＢＬ５画像ブロック
ＢＳバイナリエディタの画面
ＣＳ，ＣＳ１，ＣＳ２文字列
Ｄ１複数の文字の隙間位置の各々の間隔（距離）
ＩＭ読取画像データ
ＫＹ１，ＫＹ２，ＫＹ３キー
Ｌ１，Ｌ２，Ｌ３文字領域
ＬＲ１レイヤー
ＬＲ２透明レイヤー
Ｎ１網点領域
ＯＤＯＣＲ後データ
Ｐ１写真領域
ＲＧ１，ＲＧ２バイナリエディタの画面内の領域
ＳＤ，ＳＤ１，ＳＤ２，ＳＤ３，ＳＤ４，ＳＤ５，ＳＤ６，ＳＤ７，ＳＤ８，ＳＤ９暗号化画像
ＳＲタブレットの操作パネルに表示された画面
ＴＤテキストデータ
ＸＰ境界位置
ＹＰ行間位置
Ｚ１その他の領域 100 MFP (Multifunction Peripheral)
110, 210, 310 CPU (Central Processing Unit)
120, 220, 320 ROM (Read Only Memory)
130, 230, 330 RAM (Random Access Memory)
140, 240, 340 Storage unit 150, 250, 350 Network I / F
160 Image Reading Unit 170 PDF (Portable Document Format) Creation Unit 180 Character Area Extraction Unit 190 Operation Panel 195 Image Formation Unit 200 Tablet Terminal 260 Operation Panel 270, 370 Encryption Unit 280, 380 Encryption Decryption Unit 290 PDF Editing Unit 300-1 , 300-2 Optical Character Recognition (OCR) terminal 360 OCR processing unit 401 Intranet 402 Internet BL, BL1, BL2, BL3, BL4, BL5 Image block BS Binary editor screen CS, CS1, CS2 Character string D1 Multiple character gaps Distance between each position (distance)
IM scanned image data KY1, KY2, KY3 key L1, L2, L3 Character area LR1 layer LR2 Transparent layer N1 Halftone dot area OD OCR data P1 Photo area RG1, RG2 Binary editor screen area SD, SD1, SD2, SD3 , SD4, SD5, SD6, SD7, SD8, SD9 Encrypted image SR Screen displayed on tablet operation panel TD Text data XP Boundary position YP Line spacing Z1 Other area

Claims

A character image processing system comprising a first information processing unit and a second information processing unit having an OCR (Optical Character Recognition) function capable of communicating with the first information processing unit via a network,
The first information processing unit includes:
Image block creating means for dividing a character area in the image data into a plurality of image blocks;
Arrangement order changing means for changing the arrangement order of the plurality of image blocks;
Inserting means for inserting a connecting image between each of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means;
An encrypted image created on the basis of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means, the encrypted image including the plurality of image blocks into which the connection images are inserted And a first transmission means for transmitting to the second information processing unit,
The second information processing unit
OCR processing means for creating first text data by performing OCR processing on the encrypted image;
Second transmitting means for transmitting post-OCR data including the first text data to the first information processing unit;
The first information processing unit further includes:
Based on the post-OCR data, the post-OCR data is decomposed into a plurality of character strings, and characters corresponding to the concatenation image are deleted from the post- OCR data. Creating means for creating text data;
A character image processing system including pasting means for pasting the second text data at a position corresponding to each of character regions in the image data.

The first information processing unit further includes character area specifying means for specifying a character area in the image data and specifying coordinates of the character area in the image data,
The character image processing system according to claim 1, wherein the pasting unit pastes the second text data based on the coordinates.

The character image processing system according to claim 2, wherein the pasting unit pastes the second text data at a position corresponding to the character area in the transparent layer of the image data.

The first information processing unit further includes level receiving means for receiving a security level setting,
The character according to claim 1, wherein the image block creating unit divides the character region into the plurality of image blocks having a size determined according to the level received by the level receiving unit. Image processing system.

The second information processing unit includes a first OCR device and a second OCR device different from the first OCR device,
The first transmission means transmits a first portion of the encrypted image to the first OCR device, and a second portion of the encrypted image that is different from the first portion is the first portion. The character image processing system according to claim 1, wherein the character image processing system transmits the data to two OCR devices.

The character image processing system according to claim 1, wherein the first information processing unit further includes an image reading unit that creates the image data by reading an image of a document.

The image block creating means
A distribution of the number of white pixels accumulated in a first direction which is the direction of one side in the rectangular character region, and a distribution along a second direction perpendicular to the first direction First distribution extracting means for extracting
A distribution of the number of white pixels existing in the second direction in the rectangular character region, the second distribution extracting unit extracting a distribution along the first direction;
Dividing means for creating the plurality of image blocks by dividing a character area in the image data at a position determined based on the distribution extracted by each of the first and second distribution extracting means; The character image processing system according to claim 1, comprising:

The dividing means includes
Based on the distribution extracted by the first distribution extracting means, a line spacing specifying means for specifying a line spacing;
Line dividing means for dividing the character area into a plurality of lines by dividing the character area between the lines specified by the line-interval specifying means,
The second distribution extraction unit is a distribution of the number of white pixels existing in the second direction for each of the plurality of rows divided by the row division unit, the first direction To extract the distribution along
The dividing means includes
Based on the distribution extracted by the second distribution extracting means, a gap specifying means for specifying the gap position of the character;
Boundary determining means for determining a boundary position based on the gap position specified by the gap specifying means;
The character image processing system according to claim 7, further comprising: a column direction dividing unit that divides each of the plurality of rows at a boundary position determined by the boundary determining unit.

9. The character according to claim 8, wherein the boundary determining unit determines, as a boundary position, a gap position in which the distance from another adjacent gap position is equal to or greater than a threshold among the gap positions specified by the gap specifying unit. Image processing system.

The character image processing system according to claim 1, wherein the connection image is an image that has a known character recognition result and is held in advance by the first information processing unit .

The character image processing system according to claim 10 , wherein the connection image is an image of a symbol that is not a character.

The first information processing unit includes the plurality of image blocks before the arrangement order is changed by the arrangement order changing unit, and the plurality of image blocks after the arrangement order is changed by the arrangement order changing unit. further comprising character image processing system according to any one of claims 1 to 11, sequence information holding means for holding the relation information showing the relation between each of the sequence of image blocks.

The network is the Internet, the character image processing system according to any one of claims 1 to 12.

The first information processing unit, a document containing the optically readable image forming apparatus, the character image processing system according to any one of claims 1 to 13.

The first information processing unit further includes a terminal separate from the image forming apparatus,
The character image processing system according to claim 14 , wherein the first transmission unit transmits the encrypted image from the terminal to the second information processing unit.

The first information processing unit generates the encrypted image based on image data optically read, the character image processing system according to any one of claims 1 to 15.

An information processing apparatus that communicates with an OCR (Optical Character Recognition) apparatus,
Image block creating means for dividing a character area in the image data into a plurality of image blocks;
Arrangement order changing means for changing the arrangement order of the plurality of image blocks;
Inserting means for inserting a connecting image between each of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means;
An encrypted image created on the basis of the plurality of image blocks after the arrangement order is changed by the arrangement order changing means, the encrypted image including the plurality of image blocks into which the connection images are inserted Transmitting means for transmitting to the OCR device;
Receiving means for receiving post-OCR data including first text data created by performing OCR processing based on the encrypted image from the OCR device;
Based on the post-OCR data, the post-OCR data is decomposed into a plurality of character strings, and characters corresponding to the concatenation image are deleted from the post- OCR data. Creating means for creating text data;
An information processing apparatus comprising: pasting means for pasting the second text data at positions corresponding to character regions in the image data.

A control program for an information processing device that communicates with an OCR (Optical Character Recognition) device,
An image block creating step for dividing a character area in the image data into a plurality of image blocks;
An arrangement order changing step for changing the arrangement order of the plurality of image blocks;
An insertion step of inserting a connecting image between each of the plurality of image blocks after changing the arrangement order in the arrangement order changing step;
An encrypted image created on the basis of the plurality of image blocks after the arrangement order is changed in the arrangement order changing step, the encrypted image including the plurality of image blocks into which the connection images are inserted Transmitting to the OCR device;
A receiving step of receiving post-OCR data including first text data created by performing OCR processing based on the encrypted image from the OCR device;
Based on the post-OCR data, the post-OCR data is decomposed into a plurality of character strings, and characters corresponding to the concatenation image are deleted from the post- OCR data. A creation step to create text data;
A control program for an information processing apparatus for causing a computer to execute a pasting step of pasting the second text data at a position corresponding to each of character regions in the image data.