JP2017199288A

JP2017199288A - Image processing device, image processing method and program

Info

Publication number: JP2017199288A
Application number: JP2016091623A
Authority: JP
Inventors: 誠榎本; Makoto Enomoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2017-11-02

Abstract

PROBLEM TO BE SOLVED: To solve a problem that in modification processing of character recognition from a document, there is a big burden of a modification work operation for a modification worker, and it is impossible to easily instruct a part to be corrected and the manner to correct the part.SOLUTION: An image processing device images a paper document, performs character recognition of an area on which characters are arranged, in an imaged image, extracts a character string in question from the character string of the recognized result, displays data of the extracted character string and attribute data of the character string, acquires an indication position to at least either of the displayed data and the paper document, and performs different modification processing based on the acquired indication position when character string data displayed by display means is modified.SELECTED DRAWING: Figure 3

Description

本発明は、文字認識の修正を支援する画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program that support correction of character recognition.

ＯＣＲ（光学式文字認識）を用いて紙面文書を認識し、電子的な文字データを得るという情報抽出処理は、紙の文書の電子処理を可能とする、非常に重要な技術である。例えば、企業において請求書や納品書といった紙面文書と、企業のシステムを連携することで会計処理や税金還付の処理を自動化するといったことが可能となる。しかし、ＯＣＲをはじめとした認識技術は認識内容を保証することが難しく、人手による認識結果の確認や、修正といった作業が必要であった。そこで、修正の負担を軽減する手法が提案されている。たとえば、特許文献１では誤認識された文字をユーザが修正する際に、同じ文字に対しても修正を反映させることでユーザの負担を軽減させる方法が提案されている。 Information extraction processing that recognizes a paper document using OCR (optical character recognition) and obtains electronic character data is a very important technology that enables electronic processing of a paper document. For example, it is possible to automate accounting processing and tax refund processing by linking paper documents such as invoices and delivery notes with corporate systems. However, recognition techniques such as OCR are difficult to guarantee the recognition contents, and it has been necessary to manually confirm and correct the recognition results. Therefore, a method for reducing the burden of correction has been proposed. For example, Patent Document 1 proposes a method of reducing the burden on the user by reflecting the correction on the same character when the user corrects the misrecognized character.

一方、従来はスキャナを用いて文書を読み取り、マウスやキーボードなどの指示装置を用いて操作をすることが一般的だったが、近年ではスキャナなどの接触型の読み取り装置の代わりにカメラなどで撮影し、プロジェクタで投影をする装置が提案されている。特許文献２では原稿台上に上向きに文書を置き、原稿台上部に取りつけられた撮像装置を用いて、文書を非接触で読み取り、読み取った電子データを投影装置により投影する装置構成が提案されている。このような装置構成を用いることで、文書の読み取り作業と確認修正作業とを一連の作業として行うことが可能となり、ＯＣＲ技術の適用の幅を広げることができる。たとえば、対面による商談の場において、机上に置かれた書類などを読み取って電子化した結果を、すぐに投影して確認できるようになるので、商談にかかる時間を短縮することができる。 On the other hand, in the past, it was common to read a document using a scanner and operate using a pointing device such as a mouse or a keyboard. However, in recent years, a camera or the like is used instead of a contact-type reading device such as a scanner. However, an apparatus for projecting with a projector has been proposed. Patent Document 2 proposes an apparatus configuration in which a document is placed upward on a platen, the document is read in a non-contact manner using an imaging device mounted on the top of the platen, and the read electronic data is projected by a projection device. Yes. By using such an apparatus configuration, it is possible to perform a document reading operation and a confirmation and correction operation as a series of operations, and the application range of the OCR technique can be widened. For example, in a face-to-face business negotiation, the result of reading and digitizing a document placed on a desk can be immediately projected and confirmed, so that the time required for business negotiation can be reduced.

特開平５−２０４９２JP-A-5-20492 特開２０１５−４１２３３JP2015-41233

しかし、特許文献１の方法ではキーボードなどの指示装置を用いて修正内容の文字を指示することが必要である。作業スペースやハードウェア構成の制限によりハードウェアキーボードを用意できず、ソフトウェアキーボードの文字の入力自体に負担がある場合に、修正作業者の操作の負担が大きい。また、修正作業者がどの部分をどのように修正したいかを簡単に指示することができない。 However, in the method of Patent Document 1, it is necessary to instruct the character of the correction content using an instruction device such as a keyboard. When the hardware keyboard cannot be prepared due to limitations on the work space and hardware configuration, and there is a burden on the input of characters on the software keyboard itself, the burden on the operation of the correction operator is large. In addition, it is not possible to easily instruct which part the correction operator wants to correct.

本発明は上記の課題に鑑みてなされたものであり、簡単に文字認識結果の修正作業を行える画像処理装置を提供することを目的とする。また、その方法、及びプログラムを提供することを目的とする The present invention has been made in view of the above problems, and an object thereof is to provide an image processing apparatus capable of easily correcting a character recognition result. It is also intended to provide the method and program

本発明に係る画像処理装置は以下の構成を備える。即ち、紙文書を撮像する撮像手段と、前記撮像手段が撮像した画像において、文字が配置されている領域を文字認識する認識手段と、前記認識手段による認識結果の文字列から対象とする文字列を抽出する抽出手段と、前記抽出手段が抽出した文字列のデータと当該文字列の属性データとを表示する表示手段と、前記表示手段が表示したデータと、前記紙文書と、の少なくとも何れかに対する指示位置を取得する取得手段と、前記表示手段が表示している文字列のデータを修正するとき、前記取得手段が取得した指示位置に基づいて異なる修正処理を行う修正手段。 An image processing apparatus according to the present invention has the following configuration. That is, an imaging unit that captures a paper document, a recognition unit that recognizes a region where a character is arranged in an image captured by the imaging unit, and a character string that is a target from a character string that is a recognition result of the recognition unit Extraction means for extracting the character string data, display means for displaying the character string data extracted by the extraction means and attribute data for the character string, data displayed by the display means, and at least one of the paper document Acquisition means for acquiring an indicated position for the character, and correction means for performing different correction processing based on the indicated position acquired by the acquisition means when correcting the character string data displayed by the display means.

本発明によれば、撮影装置と投影装置を含む構成により、文字認識結果を修正する際、修正作業者が簡単に修正作業を行えるように支援することが可能となる。 According to the present invention, the configuration including the photographing device and the projection device can assist the correction operator to easily perform the correction work when correcting the character recognition result.

実施形態における画像処理装置を使用したときの外観を示す例Example of appearance when using the image processing apparatus according to the embodiment 実施形態におけるハードウェア構成の例Example of hardware configuration in an embodiment 実施形態における処理の全体の流れを示すフローチャートThe flowchart which shows the flow of the whole process in embodiment 実施形態における確認・修正処理を示すフローチャートFlowchart showing confirmation / correction processing in the embodiment 実施形態における文書画像、抽出結果の例Example of document image and extraction result in the embodiment 実施形態におけるＵＩ表示の例Example of UI display in the embodiment 実施形態における再撮影処理時の画角を示す例Example showing angle of view during re-shooting process in the embodiment 実施形態における再撮影処理時の効果を示す例Example showing the effect of re-shooting processing in the embodiment 実施形態におけるＵＩ表示の例Example of UI display in the embodiment 実施形態におけるＵＩ操作のタッチ判定を説明する例Example explaining touch determination of UI operation in the embodiment

以下、本発明の実施形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

＜第１の実施形態＞
はじめに、図１を用いて本実施形態における画像処理装置（以下、「文書処理装置」、「撮影投影装置」とも記載する）の処理概要を説明する。 <First Embodiment>
First, an outline of processing of an image processing apparatus (hereinafter also referred to as “document processing apparatus” or “photographing / projection apparatus”) in the present embodiment will be described with reference to FIG.

作業台１０１は、ユーザが作業を行うためのテーブルやカウンターなどの作業台である。
撮影投影装置１０２は、作業台１０１と作業台上に置かれた文書１０３などを撮影し、撮影した内容やＵＩを作業台１０１へ投影表示する機能を有する画像処理装置である。作業台１０１上の文書１０３は、本人確認書類や申込書といった、紙文書であり、本画像処理装置を用いて情報抽出処理を行う対象となる文書実体である。 The work table 101 is a work table such as a table or a counter for a user to perform work.
The photographing / projecting apparatus 102 is an image processing apparatus having a function of photographing a work table 101 and a document 103 placed on the work table, and projecting and displaying the photographed contents and UI on the work table 101. A document 103 on the work table 101 is a paper document such as an identity verification document or an application form, and is a document entity that is a target of information extraction processing using the image processing apparatus.

図２は本発明の実施形態における撮影投影装置１０２のハードウェア構成の一例である。 FIG. 2 is an example of a hardware configuration of the photographing projection apparatus 102 according to the embodiment of the present invention.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１は、各種処理のための演算や論理判断などを行い、バス２０２に接続された各種要素を制御する。 A CPU (Central Processing Unit) 201 performs calculations and logic determinations for various processes, and controls various elements connected to the bus 202.

メモリ２０３は、プログラムメモリとデータメモリを保持する。プログラムメモリには、フローチャートにより後述する各種処理手順を含むＣＰＵによる制御のためのプログラムを格納する。このメモリはＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）であっても良いし、外部記憶装置などからプログラムがロードされるＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）であっても良い。あるいは、これらの組合せで実現しても構わない。 The memory 203 holds a program memory and a data memory. The program memory stores a program for control by the CPU including various processing procedures described later with reference to flowcharts. This memory may be a ROM (Read-Only Memory) or a RAM (Random Access Memory) in which a program is loaded from an external storage device or the like. Or you may implement | achieve with these combination.

記憶装置２０４は本実施形態に係るデータやプログラムを記憶しておくためのハードディスクなどの装置である。 The storage device 204 is a device such as a hard disk for storing data and programs according to the present embodiment.

撮影装置２０５はレンズと撮像素子を持ち、レンズから入射する光を撮影し画像データに現像する装置である。本実施形態では、主に作業台１０１を撮影し、文書１０３の情報を抽出するための画像データ取得に利用される。 The photographing device 205 is a device having a lens and an image sensor, photographing light incident from the lens and developing it into image data. In the present embodiment, the work table 101 is mainly photographed and used to acquire image data for extracting information of the document 103.

入力装置２０６はユーザからの指示を入力するための装置である。ユーザは入力装置２０６を介して、当該撮影投影装置に指示を行う。入力装置２０６は、たとえば、ユーザによる机へのタッチ指示を机と指との距離により検出可能な距離センサや、ユーザのジェスチャー指示を検出可能な可視光センサにより実現することができる。その他のセンサやタッチパネルやペンやマウスなどのハードウェアを用いてもよい。 The input device 206 is a device for inputting an instruction from the user. The user gives an instruction to the photographing / projecting apparatus via the input device 206. The input device 206 can be realized by, for example, a distance sensor that can detect a touch instruction on a desk by a user based on a distance between the desk and a finger, or a visible light sensor that can detect a user's gesture instruction. Other sensors, touch panels, hardware such as a pen and a mouse may be used.

投影装置２０７はＵＩを表示する装置である。投影装置２０７は、たとえば、液晶表示器やレーザー表示器によるプロジェクタ装置により実現することができる。 The projection device 207 is a device that displays a UI. The projection device 207 can be realized by a projector device using a liquid crystal display or a laser display, for example.

図１乃至図２で説明する構成は一例であり、その他の形態で実現されてもよい。たとえば、撮影装置２０５と投影装置２０７が独立したハードウェアとして存在してもよいし、それぞれのハードウェアがＵＳＢやネットワークケーブルにより協調動作してもよい。 The configuration described in FIGS. 1 and 2 is an example, and may be realized in other forms. For example, the imaging device 205 and the projection device 207 may exist as independent hardware, or each hardware may operate in cooperation with a USB or a network cable.

図３は本実施形態における撮影投影装置の処理の流れをフローチャートにより説明するものである。以下、フローチャートは、ＣＰＵが制御プログラムを実行することにより実現されるものとする。 FIG. 3 is a flowchart for explaining the processing flow of the photographing projection apparatus in the present embodiment. Hereinafter, the flowchart is realized by the CPU executing the control program.

ステップＳ３０１では、撮影装置２０５は、文書１０３を含む作業台１０１を撮影し、撮像画像を生成し、ステップＳ３０２へ進む。
ステップＳ３０２では、ＣＰＵ２０１は、ステップＳ３０１で生成した撮像画像から文書１０３の領域を検出し、ステップＳ３０３へ進む。文書には文字が配置されている領域が含まれている。 In step S301, the imaging device 205 images the work table 101 including the document 103, generates a captured image, and proceeds to step S302.
In step S302, the CPU 201 detects the area of the document 103 from the captured image generated in step S301, and proceeds to step S303. The document includes an area where characters are arranged.

文書領域の検出については、たとえば、紙面の辺を直線抽出により抽出し、四辺を推定する方法を用いればよい。直線抽出は、撮像画像をグレースケールへと変換し、ＳｏｂｅｌやＬａｐｌａｃｉａｎフィルタなどの公知のエッジ強調をかけ、ハフ変換や最小近似法などの公知の手法により線分を抽出すればよい。その他、画角や照明などの撮影条件が固定であれば、文書１０３が存在しない状態の作業台１０１をあらかじめ撮影しておいて、その撮影画像との画素差分をとってもよい。また、専用のセンサ装置によって検出し、撮影画像上の座標へマッピングしてもよい。 For the detection of the document area, for example, a method of extracting the sides of the paper by straight line extraction and estimating the four sides may be used. The straight line extraction may be performed by converting a captured image into a gray scale, applying a known edge enhancement such as a Sobel or Laplacian filter, and extracting a line segment by a known method such as a Hough transform or a minimum approximation method. In addition, if shooting conditions such as an angle of view and illumination are fixed, the work table 101 in a state where the document 103 does not exist may be shot in advance, and a pixel difference from the shot image may be taken. Alternatively, it may be detected by a dedicated sensor device and mapped to coordinates on the captured image.

ステップＳ３０３では、ＣＰＵ２０１は、ステップＳ３０２で得た文書１０３の領域を文書画像として長方形形状へと補正し、ステップＳ３０４へ進む。ステップＳ３０２で得た文書１０３の領域は、撮影装置２０５と作業台１０１の位置関係により、あおりが生じ、不等辺四角形となっている。不等辺四角形を長方形形状へと補正する演算は、いわゆる逆透視変換演算を用いればよい。変換行列のパラメータについては、特開２００３−２８８５８８（特許文献３）に開示されているように、逆透視変換の演算式に４頂点の座標を与える事で、得られる連立方程式を解くことにより抽出できる。また、撮影装置２０５が常に台に対して一定の角度であれば、あらかじめ逆透視変換のパラメータを与えておけばよい。 In step S303, the CPU 201 corrects the area of the document 103 obtained in step S302 into a rectangular shape as a document image, and proceeds to step S304. The region of the document 103 obtained in step S302 has an unequal side rectangle due to a tilt due to the positional relationship between the photographing apparatus 205 and the work table 101. What is necessary is just to use what is called reverse perspective transformation calculation for the calculation which correct | amends an unequal side rectangle to a rectangular shape. The parameters of the transformation matrix are extracted by solving the simultaneous equations obtained by giving the coordinates of the four vertices to the inverse perspective transformation formula, as disclosed in Japanese Patent Laid-Open No. 2003-288588 (Patent Document 3). it can. In addition, if the imaging device 205 is always at a certain angle with respect to the table, the parameters for inverse perspective transformation may be given in advance.

ステップＳ３０４では、ＣＰＵ２０１は、ステップＳ３０３で得た文書画像に対して、文字認識処理をおこなって認識文字列を得て、ステップＳ３０５へ進む。 In step S304, the CPU 201 performs character recognition processing on the document image obtained in step S303 to obtain a recognized character string, and proceeds to step S305.

ステップＳ３０５では、ＣＰＵ２０１は、ステップＳ３０４で得た文字認識結果から、あらかじめ決められたルールに従い、所望の属性を持つ文字列情報を抽出し、ステップＳ３０６へ進む。所望の属性の文字列情報とは、たとえば免許証や住民票の氏名、住所といった属性に対する個別の内容や、申込書の記載内容といった、ある特定の意味を示す情報である。 In step S305, the CPU 201 extracts character string information having a desired attribute from the character recognition result obtained in step S304 according to a predetermined rule, and proceeds to step S306. The character string information of a desired attribute is information indicating a specific meaning such as individual contents for attributes such as a driver's license, a name of a resident card, and an address, and contents described in an application form.

属性に対応する文字列情報を得る方法としては、レイアウトが固定された文書であれば、文書中の特定の座標に存在する文字列を取得すればよい。レイアウトを特定できない場合は、文書の認識結果の中から、正規表現や辞書によって候補となる文字列を絞り込んでもよいし、属性を示す名称の文字列を検索して、その近傍から文字列を特定してもよい。文字認識結果が複数の候補があった場合や、属性名称の近傍検索により複数検索された場合は、それぞれを候補文字列として抽出する。複数の候補文字列は、ルールにより候補順序や信頼度を決定してもよいし、文字認識の確からしさにより順序を決定してもよい。本手法は一例であり、目的となる文字列とその座標が取得できれば、その他の方法を用いてもよい。文書１０３の例では、”氏名”という属性は、”山田太郎”という文字列情報を持つことが分かるようになる。 As a method of obtaining the character string information corresponding to the attribute, a character string existing at a specific coordinate in the document may be acquired if the document has a fixed layout. If the layout cannot be specified, candidate character strings may be narrowed down by regular expressions or dictionaries from the document recognition results, or a character string with a name indicating an attribute is searched and a character string is specified from the vicinity. May be. When there are a plurality of candidates for the character recognition result or when a plurality of searches are performed by neighborhood search for attribute names, each is extracted as a candidate character string. For a plurality of candidate character strings, the order of candidates and the reliability may be determined by a rule, or the order may be determined by the probability of character recognition. This method is an example, and other methods may be used as long as the target character string and its coordinates can be acquired. In the example of the document 103, it can be seen that the attribute “name” has character string information “Taro Yamada”.

ステップＳ３０６では、ＣＰＵ１０１は、ステップＳ３０５で得た文字列情報について、ユーザに修正支援ＵＩを提示し、ユーザの入力に基づいて修正を行い、所望の属性に対する文字列情報を確定する。図６を用いて後述するように、文字認識と抽出とをした結果を修正し、確定する。ステップＳ３０６の処理の詳細については、図４のフローチャートを用いて後述する。 In step S306, the CPU 101 presents the correction support UI to the user with respect to the character string information obtained in step S305, corrects the character string information based on the user input, and determines the character string information for a desired attribute. As will be described later with reference to FIG. 6, the result of character recognition and extraction is corrected and confirmed. Details of the processing in step S306 will be described later with reference to the flowchart of FIG.

図５（ａ）の文書画像５０１は、文書１０３として住民票を作業台１０１上に置き、ステップＳ３０１乃至ステップＳ３０３までの処理の結果得られた文書画像の例である。
本実施形態では、住民票から世帯主の氏名（以下、氏名属性とする）と住所（以下、住所属性とする）を抽出することを目的とする。文書画像５０１の例では、氏名属性５０２の“山田太郎”という文字列と、住所属性５０３の“東京都大田区下丸子ＸＸＸ”という文字列が正しい値として期待される。 A document image 501 in FIG. 5A is an example of a document image obtained as a result of processing from step S301 to step S303 by placing a resident card on the work table 101 as the document 103.
The purpose of this embodiment is to extract a householder's name (hereinafter referred to as name attribute) and address (hereinafter referred to as address attribute) from the resident card. In the example of the document image 501, a character string “Taro Yamada” of the name attribute 502 and a character string “XXX Shimomaruko XXX, Tokyo Ota-ku” are expected as correct values.

ステップＳ３０５の情報抽出の方法の例として、“世帯主”という文字列を文書中から抽出し、その右方向の近傍領域に存在する文字列を氏名属性として抽出する。そして“住所”という文字列を文書中から抽出し、その右方向の近傍領域に存在する文字列を住所属性として抽出する。 As an example of the information extraction method in step S305, the character string “household owner” is extracted from the document, and the character string existing in the right-hand vicinity region is extracted as the name attribute. Then, a character string “address” is extracted from the document, and a character string existing in the right-hand vicinity region is extracted as an address attribute.

図５（ｂ）の情報抽出結果５０４は、文書画像５０１に対して、ステップＳ３０４とステップＳ３０５の処理の結果得られた抽出結果をツリー構造で示した図である。抽出結果をルートとして、その下に氏名属性、住所属性の各属性、その下に複数個の候補、その下に文字列とその値、矩形座標とその値が文書左上原点としたＸ、Ｙ座標、Ｗ（幅）、Ｈ（高さ）サイズの表記で格納されている。
抽出結果５０４では、氏名属性の第一候補５０５として、文字列“山田大郎”、矩形座標Ｘ、Ｙ、Ｗ、Ｈが抽出されている。同様に、住所属性として、３つの候補５０６、５０７、５０８が抽出されている。 The information extraction result 504 in FIG. 5B is a diagram showing the extraction result obtained as a result of the processing in step S304 and step S305 in a tree structure for the document image 501. X and Y coordinates with the extracted result as the root, name attribute and address attribute below it, multiple candidates below it, character string and its value below it, rectangular coordinate and its value as the origin at the top left of the document , W (width), and H (height) size.
In the extraction result 504, as the first candidate 505 for the name attribute, the character string “Taro Yamada” and the rectangular coordinates X, Y, W, and H are extracted. Similarly, three candidates 506, 507, and 508 are extracted as address attributes.

以下、文書画像５０１と情報抽出結果５０４をステップＳ３０６への入力の例とし、図４のフローチャートを用いて処理の説明を行う。 Hereinafter, the document image 501 and the information extraction result 504 will be described as examples of input to step S306, and the processing will be described using the flowchart of FIG.

ステップＳ４０１では、ＣＰＵ２０１は、撮影投影装置１０２でＵＩ表示を行うために投影映像領域に対する文書１０３の相対座標を求め、ステップＳ４０２へ進む。本実施形態では説明のため、便宜上作業台１０１の面と投影装置２０７の投影領域、ステップＳ３０１での文書撮影の領域は一致するとする。すなわち、作業台１０１の面の２次元座標と、ステップＳ３０１で取得した撮影画像上の座標と、投影装置２０７で投影する投影画面上の座標は一致する。 In step S401, the CPU 201 obtains relative coordinates of the document 103 with respect to the projected video area in order to perform UI display on the photographing projection apparatus 102, and the process proceeds to step S402. In the present embodiment, for the sake of explanation, it is assumed that the surface of the work table 101, the projection area of the projection device 207, and the document photographing area in step S301 coincide for convenience. That is, the two-dimensional coordinates of the surface of the work table 101 coincide with the coordinates on the captured image acquired in step S301 and the coordinates on the projection screen projected by the projection device 207.

そのため、ステップＳ３０１の時点から作業台１０１上の文書１０３の位置が変化していなければ、ステップＳ３０２で文書検出により得られた撮影画像上の文書領域座標をそのまま用いることができる。ステップＳ３０１の時点から作業台１０１上の文書１０３の位置が移動している場合は、再度ステップＳ３０１とステップＳ３０２と同様の処理を行うことで文書の位置を特定する。 Therefore, if the position of the document 103 on the work table 101 has not changed from the time of step S301, the document area coordinates on the captured image obtained by document detection in step S302 can be used as they are. If the position of the document 103 on the work table 101 has moved from the time of step S301, the position of the document is specified by performing the same processing as in steps S301 and S302 again.

実際は、撮影装置２０５の文書撮影の領域と、投影装置２０７の投影領域は異なっているため、座標変換が必要となる。座標変換の方法としては、あらかじめ撮影投影装置１０２の撮影装置２０５と投影装置２０７の位置関係がわかっていれば、幾何変換のパラメータを持たせておいて適宜マッピングすればよい。その他、投影装置２０７で位置合わせのためのマーカーを投影し、撮影装置２０５で撮影、認識することで座標を合わせてもよい。この方法は一例であり、その他の方法を用いてもよい。 Actually, since the document photographing area of the photographing apparatus 205 is different from the projection area of the projecting apparatus 207, coordinate conversion is required. As a method of coordinate conversion, if the positional relationship between the image capturing device 205 and the projection device 207 of the image capturing / projecting device 102 is known in advance, mapping may be appropriately performed with a geometric conversion parameter. In addition, the coordinates may be adjusted by projecting a marker for alignment with the projection device 207 and photographing and recognizing with the photographing device 205. This method is an example, and other methods may be used.

ステップＳ４０２では、ＣＰＵ２０１は、確認・修正支援のためのＵＩを表示・更新し、ステップＳ４０３へ進む。 In step S402, the CPU 201 displays / updates a UI for confirmation / correction support, and proceeds to step S403.

ＵＩの基本構成は、文書１０３を表示のＵＩ部品としてそのまま使用し、作業台１０１のその他のスペースに抽出結果と修正フィールド、確認・修正完了を指示する完了ボタンが存在する。これは一例であり、その他のＵＩ部品があってもよいし、ＵＩ部品を省略し、ジェスチャーなどのユーザ操作により入力させてもよい。 The basic configuration of the UI uses the document 103 as a UI component for display as it is, and the extraction result, the correction field, and a completion button for instructing confirmation / correction completion exist in other spaces of the work table 101. This is an example, and there may be other UI parts, or the UI parts may be omitted and input by a user operation such as a gesture.

図６（ａ）はここでのＵＩの表示例である。文書６０１は文書１０３の実体である。その右側のスペースに情報抽出結果５０４の氏名属性を持つ第一候補５０５の文字列がＵＩ表示６０２として投影装置２０７により投影されている。住所属性を持つ第一候補５０６の文字列がＵＩ表示６０３として同様に投影されている。ＵＩ表示６０２、６０３は修正入力のための入力フィールドも兼ねている。完了ボタン６０４は、確認・修正作業の完了を伝えるＵＩ部品として投影されている。破線矩形６０５は、氏名属性の確認・修正する現在の選択状態を示すハイライト表示である。ここでは情報抽出結果５０４の信頼度から確認が必要な項目としてあらかじめ氏名属性値が修正対象として選択されている状態を示しているが、最初は未選択状態であってもよい。破線矩形６０６は、現在選択されている氏名属性の紙面上の矩形座標、すなわち第一候補５０５の矩形座標を示すハイライト表示であり、文書６０１の紙の上に重畳して投影されている。 FIG. 6A shows a display example of the UI here. A document 601 is an entity of the document 103. The character string of the first candidate 505 having the name attribute of the information extraction result 504 is projected by the projection device 207 as the UI display 602 in the right space. A character string of the first candidate 506 having an address attribute is similarly projected as a UI display 603. The UI displays 602 and 603 also serve as input fields for correction input. The completion button 604 is projected as a UI part that notifies the completion of the confirmation / correction work. A broken-line rectangle 605 is a highlight display showing a current selection state for confirming / correcting the name attribute. Here, a state in which the name attribute value is selected as a correction target in advance as an item that needs to be confirmed from the reliability of the information extraction result 504 is shown, but may be initially unselected. A broken-line rectangle 606 is a highlighted display showing the rectangular coordinates on the paper surface of the currently selected name attribute, that is, the rectangular coordinates of the first candidate 505, and is projected on the paper of the document 601 in a superimposed manner.

ステップＳ４０３では、ＣＰＵ２０１は、ユーザからの入力を待機し、ユーザからの操作があればその操作内容を取得し、ステップＳ４０４へ進む。本実施形態の説明では、ユーザからの操作は作業台上のいずれの箇所を指によりタッチする操作をユーザ操作とし、そのタッチ座標（指示位置）を取得する。これは一例であり、マウス操作や、別途入力センサを用いた押下操作、その他ジェスチャーを用いた操作であってもよい。 In step S403, the CPU 201 waits for input from the user. If there is an operation from the user, the CPU 201 acquires the operation content, and proceeds to step S404. In the description of the present embodiment, the operation from the user is an operation in which any part on the workbench is touched with a finger as a user operation, and the touch coordinates (instructed position) are acquired. This is an example, and may be a mouse operation, a pressing operation using a separate input sensor, or other operations using a gesture.

ステップＳ４０４では、ＣＰＵ２０１は、ステップＳ４０３で得た操作内容を元にＵＩ処理の分岐を行う。現在選択されている属性以外の入力フィールド、すなわち住所属性のＵＩ表示６０３がタッチされ選択されれば、ステップＳ４０５へ進み、ステップＳ４０５では修正対象属性を住所属性に変更しステップＳ４０２へ進む。現在選択されている属性の修正文字列として、文書６０１の特定の文字列がタッチされ選択されれば、ステップＳ４１２へ進み、タッチされた位置の文字列を取得し、ステップＳ４１３では、修正対象の属性の文字列情報をステップＳ４１２で取得した文字列へと修正し、ステップＳ４０２へ進む。ステップＳ４１２、Ｓ４１３の処理の詳細については後述する。 In step S404, the CPU 201 branches the UI process based on the operation content obtained in step S403. If an input field other than the currently selected attribute, that is, the UI display 603 of the address attribute is touched and selected, the process proceeds to step S405. In step S405, the correction target attribute is changed to the address attribute, and the process proceeds to step S402. If a specific character string of the document 601 is touched and selected as the correction character string of the currently selected attribute, the process proceeds to step S412 to acquire the character string at the touched position. In step S413, the correction target character string The attribute character string information is corrected to the character string acquired in step S412, and the process proceeds to step S402. Details of the processes in steps S412 and S413 will be described later.

現在選択されている属性の入力フィールド、すなわち氏名属性のＵＩ表示６０２がタッチされれば、ステップＳ４０６へ進む。 If the input field of the currently selected attribute, that is, the UI display 602 of the name attribute is touched, the process proceeds to step S406.

本例では、氏名属性の正しい値として“山田太郎”を得たいが、氏名属性第一候補５０５は、抽出矩形座標は正解であるが、文字列が文字認識の誤認識により“山田大郎”となっている。ユーザはこれを修正するために、氏名属性の抽出結果６０２の入力フィールドをタッチしたとして、以下説明を続ける。 In this example, it is desired to obtain “Taro Yamada” as the correct value of the name attribute, but the first candidate for name attribute 505 is correct in the extracted rectangular coordinates, but the character string is “Taro Yamada” due to misrecognition of character recognition. It has become. In order to correct this, it is assumed that the input field of the name attribute extraction result 602 is touched, and the following description will be continued.

ステップＳ４０６では、ＣＰＵ２０１は、文字入力モードに変更し、ステップＳ４０７へ進む。入力フィールドがタッチされた状態で文字入力モードに移行することはＰＣなどで一般的な動作であり、ハードウェアキーボードを備えない携帯端末ではソフトウェアキーボードを表示することで文字入力が可能な状態となる。本実施形態の説明も同様にソフトウェアキーボードを用いる。これは一例であり、音声などその他の方法で文字入力をさせてもよい。 In step S406, the CPU 201 changes to the character input mode and proceeds to step S407. Shifting to the character input mode while the input field is touched is a general operation in a PC or the like, and a portable terminal without a hardware keyboard is in a state where characters can be input by displaying a software keyboard. . The description of this embodiment also uses a software keyboard. This is an example, and characters may be input by other methods such as voice.

ステップＳ４０７乃至ステップＳ４１０ではユーザの文字入力修正の意図があると推定する。すなわち文字認識結果が誤認識で、修正操作を必要としていると解釈し、再度文字列を撮影し、認識を行う。
ステップＳ４０７では、ＣＰＵ２０１は、修正対象である氏名属性の第一候補５０５の矩形座標を取得し、ステップＳ４０８へ進む。図６（ａ）のハイライト表示６０６で示された領域がこれに相当する。 In steps S407 to S410, it is estimated that the user has an intention to correct the character input. That is, the character recognition result is misrecognized, and it is interpreted that a correction operation is required, and the character string is photographed again for recognition.
In step S407, the CPU 201 acquires the rectangular coordinates of the first candidate 505 for the name attribute to be corrected, and the process proceeds to step S408. The region indicated by the highlight display 606 in FIG. 6A corresponds to this.

ステップＳ４０８では、ＣＰＵ２０１は、撮影装置２０５の撮影条件の設定を行い、ステップＳ４０９へ進む。ステップＳ４０７で取得した文字領域に対して、ステップＳ３０１より好適な条件で撮影を行う。本実施形態では、光学ズームを用いて焦点距離を長く、すなわち望遠ズームを行うことで対象文字列の解像度を向上させる。 In step S408, the CPU 201 sets the shooting conditions of the shooting device 205, and proceeds to step S409. The character area acquired in step S407 is photographed under conditions more favorable than in step S301. In this embodiment, the focal length is increased using optical zoom, that is, the resolution of the target character string is improved by performing telephoto zoom.

図７は、撮影画角のイメージである。本実施形態では、前述の通りステップＳ３０１での撮影画角は作業台１０１と一致しているので、これと同じ面７０１が撮影画角である。破線で示した画角７０２は、ステップＳ４０８で設定した焦点距離の画角である。レンズの繰り出し距離などを無視すれば、焦点距離が２倍になれば、文字列あたりの解像度は４倍となる。 FIG. 7 is an image of a shooting angle of view. In the present embodiment, since the shooting angle of view in step S301 coincides with the work table 101 as described above, the same surface 701 is the shooting angle of view. An angle of view 702 indicated by a broken line is an angle of view of the focal length set in step S408. If the lens extension distance and the like are ignored, the resolution per character string is quadrupled when the focal length is doubled.

ステップＳ４０９では、撮影装置２０５は、撮影を行い、撮影画像を取得し、ステップＳ４１０へ進む。画角７０２が領域６０６を含むように撮影を行う。
ステップＳ４１０では、ＣＰＵ２０１は、撮影画像から文字領域を抽出し、ステップＳ４１１へ進む。
ステップＳ４０７で取得した文字領域と、ステップＳ４０９で撮影した撮影画像の座標を合わせ、文字領域を抽出する。撮影角度による歪みがある場合は、それもここで補正を行う。 In step S409, the imaging device 205 performs imaging, acquires a captured image, and proceeds to step S410. Shooting is performed so that the angle of view 702 includes the region 606.
In step S410, the CPU 201 extracts a character area from the captured image, and proceeds to step S411.
The character area is extracted by combining the character area acquired in step S407 with the coordinates of the captured image captured in step S409. If there is distortion due to the shooting angle, it is also corrected here.

図８が抽出した文字領域の画像例である。文字領域画像８０１は、ステップＳ３０１で取得した撮影画像から対象文字列を抽出した文字領域画像、文字領域画像８０２はステップＳ４１０で取得した文字領域画像の例である。文字領域画像８０１では解像度の不足から“太”の文字の点となる一部画素が欠けているが、文字領域画像８０２ではきちんと取得できている。 FIG. 8 shows an example of the extracted character area image. The character area image 801 is an example of the character area image obtained by extracting the target character string from the captured image acquired in step S301, and the character area image 802 is an example of the character area image acquired in step S410. In the character area image 801, some pixels that are dots of “thick” characters are missing due to lack of resolution, but the character area image 802 can be acquired properly.

ステップＳ４１１では、ＣＰＵ２０１は、ステップＳ４１０で取得した文字領域画像８０２に対して文字認識を行い、ステップＳ４０２へ進む。文字領域画像８０２は文字認識するための十分な解像度を持っているため、文字認識に成功し、“山田太郎”の文字列が取得できた。 In step S411, the CPU 201 performs character recognition on the character area image 802 acquired in step S410, and proceeds to step S402. Since the character area image 802 has sufficient resolution for character recognition, the character recognition succeeded and the character string “Taro Yamada” was obtained.

ステップＳ４０２では、ＣＰＵ２０１は、ステップＳ４１１で取得した文字認識結果を基に修正ＵＩを更新し、ステップＳ４０３へ進む。
本実施形態では説明のためにフローチャートによる逐次的な処理として表現しているが、ユーザによる入力はステップＳ４０７乃至ステップＳ４１１の間も並行してユーザからの入力処理を受け付けてもよい。 In step S402, the CPU 201 updates the correction UI based on the character recognition result acquired in step S411, and proceeds to step S403.
In the present embodiment, for the sake of explanation, it is expressed as a sequential process according to a flowchart, but an input by a user may be accepted in parallel during steps S407 to S411.

図６（ｂ）はここでの修正ＵＩの表示の例である。
ソフトウェアキーボード６０７が、ステップＳ４０６により表示されている。修正候補提示６０８が、ステップＳ４１１で取得された新しい文字認識結果である。この候補が正しい場合は、ユーザはこれをタッチすることで修正操作を完了することができる。ステップＳ４１１で取得した認識結果が正しくない場合は、ユーザはソフトウェアキーボード６０７を用いて正しい文字を入力し修正操作を完了する。支援候補提示後もソフトウェアキーボード６０７によるユーザからの文字入力操作が続いた場合は、再度撮影設定を変更してステップＳ４０７乃至ステップＳ４１１の処理を行い、追加で修正候補を提示してもよい。
すべての修正操作が終われば、ユーザは完了ボタン６０４をタッチし、確認・修正処理を完了する。 FIG. 6B shows an example of display of the modified UI here.
A software keyboard 607 is displayed in step S406. The correction candidate presentation 608 is the new character recognition result acquired in step S411. If this candidate is correct, the user can complete the correction operation by touching the candidate. If the recognition result acquired in step S411 is not correct, the user inputs correct characters using the software keyboard 607 and completes the correction operation. When a character input operation from the user using the software keyboard 607 continues after the support candidate is presented, the shooting setting may be changed again, and the processing from step S407 to step S411 may be performed to additionally present a correction candidate.
When all the correction operations are completed, the user touches the completion button 604 to complete the confirmation / correction process.

また、撮影装置２０５がさらにパン・チルトなどの機構を備え、本撮影投影装置から撮影向きを制御できる場合は、修正対象の矩形座標を画角の中心に変更することで、さらに焦点距離の延長が可能となり、対象文字列周辺の解像度をあげることが可能となる。その他、焦点距離以外の設定を変更し、より文字認識に好適な条件を設定して再撮影してもよい。例えば、フォーカスを対象の矩形座標に合わせることや、矩形座標内のコントラストが最大となるように露出を調整することが考えられる。さらに撮影装置２０５のレンズの歪みや絞りに対する光学性能といった情報を加味して設定を変更してもよい。 In addition, when the photographing apparatus 205 further includes a mechanism such as pan / tilt and the photographing direction can be controlled from the photographing and projecting apparatus, the focal length can be further extended by changing the rectangular coordinate to be corrected to the center of the angle of view. It is possible to increase the resolution around the target character string. In addition, a setting other than the focal length may be changed to set a condition more suitable for character recognition and re-shoot. For example, it is conceivable to adjust the exposure so that the focus is set to the target rectangular coordinates or the contrast in the rectangular coordinates is maximized. Further, the setting may be changed in consideration of information such as lens distortion of the photographing apparatus 205 and optical performance with respect to the aperture.

以上説明したとおり、本撮影投影装置を用いて帳票からの情報抽出を行った場合、誤認識結果の修正時に、再度文書撮影、文字認識処理を簡単に行うことができ、正解となる文字列を候補としてＵＩ提示する可能性を高め、ユーザはより少ない操作によって修正操作を完了することが可能となる。 As described above, when information is extracted from a form using this photographing and projection apparatus, document correction and character recognition processing can be easily performed again when correcting the erroneous recognition result. The possibility of presenting the UI as a candidate is increased, and the user can complete the correction operation with fewer operations.

ステップＳ４１２，ステップＳ４１３について詳述する。
本撮影投影装置では、属性の候補文字列の箇所が誤っていた場合、正しい文字列をタッチすることで、ステップＳ４１２、ステップＳ４１３の処理によりタッチした箇所を属性の文字列として変更することが可能である。 Step S412, step S413 will be described in detail.
In this photographing and projection apparatus, when the location of the candidate character string of the attribute is incorrect, the touched location can be changed as the attribute character string by touching the correct character string. It is.

図９は、ユーザが図６の修正ＵＩに対して、住所属性のＵＩ表示６０３をタッチし、フローチャートＳ４０５の処理により修正対象が住所属性へと変更した修正ＵＩの例である。現在の修正対象属性が住所属性であることを示すハイライト表示９０１が投影装置２０７により表示されている。また、住所属性の文字列として第１候補５０６の文字列“神奈川件川崎市中原区ＹＹＹ”がＵＩ表示６０３に、その矩形座標がハイライト表示９０２として投影装置２０７により投影されている。 FIG. 9 is an example of a modified UI in which the user touches the UI display 603 of the address attribute with respect to the modified UI of FIG. 6 and the correction target is changed to the address attribute by the process of the flowchart S405. A highlight display 901 indicating that the current correction target attribute is an address attribute is displayed by the projection device 207. In addition, the character string “Kanagawa case Nakahara-ku YYY” in the first candidate 506 is projected on the UI display 603 as the address attribute character string by the projection device 207 as the highlight display 902.

前述したステップＳ３０５の処理では、住所属性は“住所”の文字列の右方向近傍の文字列を住所属性として取得している。 In the process of step S305 described above, as the address attribute, the character string near the right direction of the character string “address” is acquired as the address attribute.

抽出結果５０４の住所属性の候補５０６、５０７、５０８ではその条件を満たす文字列としてそれぞれ、“神奈川県川崎市中原区ＹＹＹ”、“東京都大田区下丸子ＸＸＸ”、“神奈川県川崎市多摩区ＺＺＺ”がこの条件に合致するとして、取得されている。ここで、複数条件が合致した場合の優先度は特に設定していないため候補順は不定となり、本来取得したかった住所属性５０３が第２候補５０７となっている。 In the address attribute candidates 506, 507, and 508 of the extraction result 504, as character strings satisfying the conditions, “Yaray, Nakahara-ku, Kawasaki-shi, Kanagawa-ken”, “Shimomaruko XXX, Ota-ku, Tokyo”, “ZZZ, Tama-ku, Kawasaki-shi, Kanagawa-ken, respectively. "Is acquired as satisfying this condition. Here, since the priority is not set in particular when a plurality of conditions are matched, the candidate order is indefinite, and the address attribute 503 originally intended to be acquired is the second candidate 507.

そこでユーザは文書６０１上の文字列領域９０３をタッチすることで修正操作を行い、ステップＳ４１２、ステップＳ４１３の処理により正しい値へと変更することが可能である。正しい値とは、住民票６０１から住所属性として取得したかった文字列である。 Therefore, the user can perform a correction operation by touching the character string area 903 on the document 601 and can change the value to a correct value by the processing in steps S412 and S413. The correct value is a character string that the user wants to acquire from the resident card 601 as an address attribute.

しかし、文書６０１の文字が小さい場合や行間が狭い場合に、ユーザが意図した文字領域９０３ではなく、隣接する領域がタッチされたと判定してしまう可能性がある。本実施形態では、対象属性の候補となる矩形領域に対してタッチ判定エリアを広げる事で、修正操作を支援する。 However, when the character of the document 601 is small or the line spacing is narrow, it may be determined that the adjacent area is touched instead of the character area 903 intended by the user. In the present embodiment, the correction operation is supported by expanding the touch determination area with respect to the rectangular area that is a candidate for the target attribute.

例えば、図１０（ａ）は、本実施形態を用いない場合の文書６０１上での各文字列のタッチ判定エリアを破線で示した図である。ユーザが文字領域９０３を意図してタッチ操作し、タッチ座標が１００１であった場合、文字領域９０３のタッチ判定エリア１００２ではなく、タッチ判定エリア１００３が押されたと判定されてしまう。 For example, FIG. 10A is a diagram showing the touch determination area of each character string on the document 601 when the present embodiment is not used by a broken line. When the user performs a touch operation on the character area 903 and the touch coordinates are 1001, it is determined that the touch determination area 1003 is pressed instead of the touch determination area 1002 of the character area 903.

図１０の（ｂ）は本実施形態での各文字列のタッチ判定エリアを破線で示した図である。住所属性の候補である５０６、５０７、５０８と対応する文字領域のタッチ判定のエリアを拡張した状態を、それぞれタッチ判定エリア１００６、１００５、１００７で示している。これによると、先ほどと同じ位置１００１をタッチしても、目的の文字領域９０３のタッチ判定エリア１００４をタッチしたと判定され、正しい文字列が対象属性の文字列として修正される。 FIG. 10B is a diagram showing the touch determination area of each character string in the present embodiment by a broken line. The touch determination areas 1006, 1005, and 1007 show the expanded states of the touch determination areas of the character areas corresponding to the address attribute candidates 506, 507, and 508, respectively. According to this, even if the same position 1001 as before is touched, it is determined that the touch determination area 1004 of the target character area 903 is touched, and the correct character string is corrected as the character string of the target attribute.

本説明では対象属性の候補文字列に対して同様にタッチ判定エリアを拡大したが、候補の順序などに重みをつけて拡大してもよい。その他、別途辞書情報や解析結果を用いてタッチ判定エリアを拡大してもよい。例えば、抽出結果では候補文字列として取得されなかった値でも、住所であれば一般地名辞書を用いて住所らしさの値を算出し、住所らしさの値が高い場合にタッチ判定エリアを拡大してもよい。 In this description, the touch determination area is similarly expanded with respect to the candidate character string of the target attribute. However, the candidate order may be enlarged with weighting. In addition, the touch determination area may be enlarged using separate dictionary information and analysis results. For example, even if it is a value that is not acquired as a candidate character string in the extraction result, if it is an address, the value of the address is calculated using the general place name dictionary, and the touch determination area may be expanded when the value of the address is high Good.

またタッチ操作のヒットエリアを拡げること限定されるものではなく、マウス操作のクリックエリアや、ジェスチャー操作時の選択判定の閾値を広げてもよい。 Further, it is not limited to expanding the hit area for the touch operation, and the click area for the mouse operation and the threshold for selection determination at the time of the gesture operation may be expanded.

以上説明したとおり、本撮影投影装置を用いて帳票からの情報抽出を行った場合、抽出位置誤りの修正時に、候補となる領域に重みをつけることで正解らしい文字列の入力を容易にし、ユーザの入力の間違いを減らすことが可能となる。 As described above, when information extraction from a form is performed using this photographing and projection apparatus, it is possible to easily input a character string that seems to be correct by weighting candidate areas when correcting an extraction position error. It becomes possible to reduce mistakes in input.

＜その他の実施形態＞
以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 <Other embodiments>
The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims.・ Change is possible.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２０１ＣＰＵ
２０２バス
２０３メモリ
２０４記憶装置
２０５撮影装置
２０６入力装置
２０７投影装置 201 CPU
202 Bus 203 Memory 204 Storage Device 205 Imaging Device 206 Input Device 207 Projector

Claims

An imaging means for imaging a paper document;
A recognizing unit for recognizing a region where a character is arranged in an image captured by the imaging unit;
Extraction means for extracting a target character string from a character string of a recognition result by the recognition means;
Display means for displaying the character string data extracted by the extraction means and the attribute data of the character string;
An acquisition means for acquiring an indicated position for at least one of the data displayed by the display means and the paper document;
Correction means for performing different correction processing based on the indicated position acquired by the acquisition means when correcting the data of the character string displayed by the display means;
An image processing apparatus comprising:

When the indication position acquired by the acquisition unit is a position where the display unit displays character string data, in the correction unit,
The imaging unit changes the setting to increase the resolution of the image to be captured and re-images the paper document,
The recognizing unit re-recognizes at least the recognition source area of the character string in the re-captured image,
The image processing apparatus according to claim 1, wherein the display unit displays a character string obtained as a result of character recognition again in place of the character string displayed at the designated position.

The image processing apparatus according to claim 2, wherein the imaging unit changes a focal length setting.

The image processing apparatus according to claim 2, wherein the imaging unit changes a setting of a shooting direction.

When the indicated position acquired by the acquisition means is the position of the character area of the paper document,
The display unit displays the character string recognized by the recognition unit in the designated position area of the paper document with the character string displayed side by side with the attribute data corresponding to the character string. The image processing apparatus according to claim 1.

The image processing apparatus according to claim 1, wherein the display unit performs projection.

Causing the imaging means to image the paper document;
A recognition step for recognizing a region where a character is arranged in an image captured by the imaging unit;
An extraction step of extracting a target character string from the character string of the result recognized in the recognition step;
Displaying the extracted character string data and attribute data of the character string on a display means;
An acquisition step of acquiring an indication position for at least one of the data displayed by the display means and the paper document;
When correcting the data of the character string displayed by the display means, a correction step for performing different correction processing based on the indicated position acquired in the acquisition step;
An image processing method comprising:

An imaging process for imaging a paper document;
A recognition step for recognizing a region where characters are arranged in the image captured in the imaging step;
An extraction step of extracting a target character string from the character string of the result recognized in the recognition step;
Displaying the extracted character string data and attribute data of the character string on a display means;
An acquisition step of acquiring an indication position for at least one of the data displayed by the display means and the paper document;
When correcting the data of the character string displayed by the display means, a correction step for performing different correction processing based on the indicated position acquired in the acquisition step;
A program that causes a computer to execute.