JP2019153053A

JP2019153053A - Image processing device, image processing method and program

Info

Publication number: JP2019153053A
Application number: JP2018037491A
Authority: JP
Inventors: 洋次郎登内; Yojiro Touchi
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2018-03-02
Filing date: 2018-03-02
Publication date: 2019-09-12

Abstract

To enable re-detection of a letter string by a further simple input operation when a desired letter string is undetectable from an image.SOLUTION: An image processing device according to the embodiment includes a receiving unit, a determining unit, a display control unit, and a display unit. The receiving unit receives input information input to the image. The determining unit specifies an input trace on the input information until a present time after the receiving of the input information starts, and determines a first letter string region that contains the input trace. The display control unit displays, while the receiving unit is receiving the input information, information indicating the first letter string region output by the determining unit for each unit time on the display unit.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は画像処理装置、画像処理方法及びプログラムに関する。 Embodiments described herein relate generally to an image processing apparatus, an image processing method, and a program.

看板、標識、及び、レストランのメニュー等に記載された文字列を、スマートフォン及びタブレット等に内蔵されたカメラにより撮影することにより取得された画像から検出する技術が従来から知られている。例えば、画像に対するユーザの入力操作に基づいて、画像から文字列を検出する技術が従来から知られている。 2. Description of the Related Art Conventionally, a technique for detecting a character string described in a signboard, a sign, a restaurant menu, or the like from an image acquired by photographing with a camera built in a smartphone, a tablet, or the like is known. For example, a technique for detecting a character string from an image based on a user input operation on the image is conventionally known.

特開２０１６−００４５５３号公報JP, 2006-004553, A 特開２０１６−０４５８７７号公報JP, 2006-045877, A 米国特許出願公開第２０１１／００９０２５３号明細書US Patent Application Publication No. 2011/0090253 特開２００６−１１９９４２号公報JP 2006-119842 A 国際公開第２０１７／０２９７３７号公報International Publication No. 2017/029737

ＴｏｎｏｕｃｈｉＹ．，ＳｕｚｕｋｉＫ．，ＯｓａｄａＫ．，ＡＨｙｂｒｉｄＡｐｐｒｏａｃｈｔｏＤｅｔｅｃｔＴｅｘｔｓｉｎＮａｔｕｒａｌＳｃｅｎｅｓｂｙＩｎｔｅｇｒａｔｉｏｎｏｆａＣｏｎｎｅｃｔｅｄ−ＣｏｍｐｏｎｅｎｔＭｅｔｈｏｄａｎｄａＳｌｉｄｉｎｇ−ＷｉｎｄｏｗＭｅｔｈｏｄ，ＩＷＲＲ２０１４（Ｓｉｎｇａｐｏｒｅ）Tonouchi Y. , Suzuki K. Osada K .; , A Hybrid Approach to Detect Texts in Natural Scenes by Integration of a Connected-Component Method and a Sliding-Window Method, IWRR 2014 (SWR 2014) 永沢茂、“もはや視覚のほんやくコンニャク、「Ｇｏｏｇｌｅ翻訳」アプリが「ＷｏｒｄＬｅｎｓ」でパワーアップ”、［ｏｎｌｉｎｅ］、平成２７年１月１５日、［平成２９年９月４日検索］、インターネット〈ＵＲＬ：ｈｔｔｐ：／／ｉｎｔｅｒｎｅｔ．ｗａｔｃｈ．ｉｍｐｒｅｓｓ．ｃｏ．ｊｐ／ｄｏｃｓ／ｎｅｗｓ／６８３８２９．ｈｔｍｌ〉Shigeru Nagasawa, “Now Visual Konjac,“ Google Translate ”App Powers Up with“ Word Lens ””, [online], January 15, 2015, [Search September 4, 2017], Internet < URL: http://internet.watch.impress.co.jp/docs/news/6838329.html> “パワフルになったＧｏｏｇｌｅ翻訳アプリの使い方、と使ってみた！”、［ｏｎｌｉｎｅ］、平成２７年１月１６日、［平成２９年９月４日検索］、インターネット〈ＵＲＬ：ｈｔｔｐ：／／ｄｉｙ−ｉｌａｎｄｓ．ｃｏｍ／２０１５／０１／１６／ｇｏｏｇｌｅ−ｔｒａｎｓｌａｔｅ−ｕｓａｇｅ／〉“I tried using the powerful Google translation app!”, [Online], January 16, 2015, [September 4, 2017 search], Internet <URL: http: // diy -Ilands. com / 2015/01/16 / google-translate-usage /> “判別分析法（大津のニ値化）”、［ｏｎｌｉｎｅ］、平成２１年２月９日、［平成２９年９月４日検索］、インターネット〈ＵＲＬ：ｈｔｔｐ：／／ｉｍａｇｉｎｇｓｏｌｕｔｉｏｎ．ｂｌｏｇ１０７．ｆｃ２．ｃｏｍ／ｂｌｏｇ−ｅｎｔｒｙ−１１３．ｈｔｍｌ〉“Discrimination analysis method (Otsu's binarization)”, [online], February 9, 2009, [Search September 4, 2017], Internet <URL: http: // imaginingsolution. blog107. fc2. com / blog-entry-113. html>

しかしながら、従来の技術では、画像から所望の文字列を検出できなかった場合、簡易な操作で文字列を再検出することができなかった。例えば、ユーザは、画像から所望の文字列を検出できなかった場合、入力操作を再度はじめからやり直さなければならなかった。 However, in the conventional technique, when a desired character string cannot be detected from an image, the character string cannot be detected again with a simple operation. For example, when the user cannot detect a desired character string from the image, the user has to start the input operation again from the beginning.

実施形態の画像処理装置は、受付部と判定部と表示制御部と表示部を備える。受付部は、画像に入力された入力情報を受け付ける。判定部は、前記入力情報の受け付けが開始されてから現在時刻までの前記入力情報の入力軌跡を特定し、前記入力軌跡を含む第１文字列領域を判定する。表示制御部は、前記入力情報が前記受付部により受け付けられている間、前記判定部により単位時刻毎に出力される前記第１文字列領域を示す情報を表示部に表示する。 The image processing apparatus according to the embodiment includes a reception unit, a determination unit, a display control unit, and a display unit. The reception unit receives input information input to the image. The determination unit specifies an input locus of the input information from the start of reception of the input information to the current time, and determines a first character string region including the input locus. The display control unit displays information indicating the first character string region output at each unit time by the determination unit on the display unit while the input information is received by the reception unit.

実施形態の画像処理装置の機能構成の例を示す図。FIG. 3 is a diagram illustrating an example of a functional configuration of the image processing apparatus according to the embodiment. 実施形態の判定部の機能構成の例を示す図。The figure which shows the example of a function structure of the determination part of embodiment. 実施形態の入力軌跡及び初期判定領域の例を示す図。The figure which shows the example of the input locus | trajectory and initial determination area | region of embodiment. 実施形態の初期判定領域の２値化処理の結果の例を示す図。The figure which shows the example of the result of the binarization process of the initial determination area | region of embodiment. 実施形態の周辺領域の例を示す図。The figure which shows the example of the peripheral region of embodiment. 実施形態の周辺領域の２値化処理の結果の例を示す図。The figure which shows the example of the result of the binarization process of the peripheral region of embodiment. 実施形態の表示情報の例１を示す図。The figure which shows the example 1 of the display information of embodiment. 実施形態の表示情報の例２を示す図。The figure which shows the example 2 of the display information of embodiment. 実施形態の変形例の判定部の機能構成の例を示す図。The figure which shows the example of a function structure of the determination part of the modification of embodiment. 実施形態の画像処理装置のハードウェア構成の例を示す図。FIG. 3 is a diagram illustrating an example of a hardware configuration of the image processing apparatus according to the embodiment.

以下に添付図面を参照して、画像処理装置、画像処理方法及びプログラムの実施形態を詳細に説明する。 Hereinafter, embodiments of an image processing apparatus, an image processing method, and a program will be described in detail with reference to the accompanying drawings.

従来の文字認識では、スキャナーが、紙に印刷された文書画像を画像データとして取り込み、ほぼ水平か垂直方向に並んだ、直角平行、歪みのない状態の文字列を検出してきた。一方、カメラ画像中の文字列の場合、被写体として含まれる文字列と、カメラとの相対的な位置関係によって、カメラ画像中の文字列が大きく傾いたりゆがみが発生したりすることがある。カメラ画像中の文字列は、これまでのスキャナーで取り込んだ画像中の文字列とは見え方が大きく異なる。そのため、カメラ画像中の文字列を画像情報だけを使って自動的に検出することは困難である。 In conventional character recognition, a scanner captures a document image printed on paper as image data, and detects character strings arranged in a substantially horizontal or vertical direction and in a right-angle parallel and undistorted state. On the other hand, in the case of a character string in a camera image, the character string in the camera image may be greatly inclined or distorted depending on the relative positional relationship between the character string included as the subject and the camera. The character string in the camera image is significantly different from the character string in the image captured by the conventional scanner. Therefore, it is difficult to automatically detect a character string in a camera image using only image information.

さらに、仮にカメラ画像から全文字列の領域を検出できたとしても、カメラ画像内に複数の文字列がある場合、複数の検出領域の中からユーザが注目する文字列を特定することは困難である。そのため、ユーザが注目する文字列の位置を示す指示を、何らかの手段によりユーザから受け付ける場合が多い。例えばタッチパネルが付いたスマートフォン及びタブレット等では、ユーザは、カメラ画像を表示した表示画面上に対して、おおよそ文字列が表示された領域を文字列方向に指で沿ってなぞる操作等を行う。ユーザの指示後、ユーザの指示情報に基づきカメラ画像からユーザが注目する文字列の領域が検出される。ユーザの指示がなぞり操作の場合、注目文字列の位置と行方向がおおよそ推定できる。 Furthermore, even if the entire character string region can be detected from the camera image, if there are a plurality of character strings in the camera image, it is difficult to specify the character string to which the user pays attention from the plurality of detection regions. is there. For this reason, an instruction indicating the position of the character string that the user is interested in is often received from the user by some means. For example, in a smartphone and a tablet with a touch panel, the user performs an operation of tracing a region where a character string is roughly displayed along a finger along the character string direction on a display screen displaying a camera image. After the user's instruction, a character string region of interest to the user is detected from the camera image based on the user's instruction information. When the user's instruction is a tracing operation, the position and line direction of the target character string can be roughly estimated.

しかし、この場合でも、ユーザが指定した後で検出処理が行われ、検出結果が出力されるため、検出処理によってユーザが望む文字列が検出できなかった場合、ユーザは再び文字列の位置を指示しなくてはならない。 However, even in this case, since the detection process is performed after the user designates and the detection result is output, if the character string desired by the user cannot be detected by the detection process, the user again indicates the position of the character string. I have to do it.

以下、画像から所望の文字列を検出できなかった場合でも、より簡易な入力操作で文字列を再検出することができる画像処理装置、画像処理方法及びプログラムの実施形態について説明する。 Hereinafter, embodiments of an image processing apparatus, an image processing method, and a program that can re-detect a character string with a simpler input operation even when a desired character string cannot be detected from the image will be described.

［機能構成の例］
図１は実施形態の画像処理装置１０の機能構成の例を示す図である。実施形態の画像処理装置１０は、受付部１、判定部２、表示制御部３及び表示部４を備える。 [Example of functional configuration]
FIG. 1 is a diagram illustrating an example of a functional configuration of an image processing apparatus 10 according to the embodiment. The image processing apparatus 10 according to the embodiment includes a reception unit 1, a determination unit 2, a display control unit 3, and a display unit 4.

受付部１は、画像に入力された入力情報を受け付ける。画像は任意でよい。画像は、例えばカメラ等で撮影された画像である。また、入力情報の入力方法は任意でよい。入力情報の入力方法は、例えばユーザによるタップ及びマウス操作等である。受付部１は、タッチパッド及びタブレット等から入力情報が時系列で得られる場合には、表示部４の指示位置（ｘ（ｋ），ｙ（ｋ））を取得する。ここで、ｋは現在時刻を示す。ユーザによる入力が開始された時刻０から現在時刻ｋまでの入力軌跡は、（ｘ（ｉ），ｙ（ｉ））（ｉ＝０，…，ｋ）をつないだ折れ線となる。 The receiving unit 1 receives input information input to an image. The image may be arbitrary. The image is an image taken with a camera or the like, for example. Moreover, the input method of input information may be arbitrary. The input information input method is, for example, a user tap or mouse operation. The receiving unit 1 acquires the indicated position (x (k), y (k)) of the display unit 4 when input information is obtained in time series from a touch pad, a tablet, or the like. Here, k indicates the current time. The input trajectory from the time 0 when the input by the user is started to the current time k is a broken line connecting (x (i), y (i)) (i = 0,..., K).

判定部２は、入力情報の受け付けが開始されてから現在時刻までの入力情報の入力軌跡を特定し、入力軌跡を含む文字列領域（第１文字列領域）を判定する。文字列領域は、例えば文字、数字及び記号等を含む。判定部２の詳細は図２を参照して後述する。 The determination unit 2 specifies an input trajectory of input information from the start of reception of input information to the current time, and determines a character string region (first character string region) including the input trajectory. The character string area includes, for example, characters, numbers, symbols, and the like. Details of the determination unit 2 will be described later with reference to FIG.

表示制御部３は、入力情報が、受付部１により受け付けられている間、判定部２により単位時刻毎に出力される文字列領域を示す情報を表示部４に表示する。単位時間は任意でよい。単位時間は、例えば０．１秒、０．５秒及び１秒等である。また、表示部４は任意でよい。表示部４は、例えば液晶タッチパネルである。 While the input information is received by the receiving unit 1, the display control unit 3 displays information indicating the character string area output for each unit time by the determination unit 2 on the display unit 4. The unit time may be arbitrary. The unit time is, for example, 0.1 second, 0.5 second, 1 second, or the like. Moreover, the display part 4 may be arbitrary. The display unit 4 is a liquid crystal touch panel, for example.

図２は実施形態の判定部２の機能構成の例を示す図である。実施形態の判定部２は、設定部２１、初期判定領域判定部２２及び周辺領域判定部２３を備える。なお判定部２を、図２のように複数の機能ブロックに分けずに、１つの判定部２として実現してもよい。実施形態では、判定部２の動作の説明を分かり易くするため、判定部２を複数の機能ブロックに分けて説明する。 FIG. 2 is a diagram illustrating an example of a functional configuration of the determination unit 2 according to the embodiment. The determination unit 2 according to the embodiment includes a setting unit 21, an initial determination region determination unit 22, and a peripheral region determination unit 23. The determination unit 2 may be realized as one determination unit 2 without being divided into a plurality of functional blocks as shown in FIG. In the embodiment, in order to make the description of the operation of the determination unit 2 easy to understand, the determination unit 2 will be described by being divided into a plurality of functional blocks.

設定部２１は、入力軌跡を含む初期判定領域を設定する。 The setting unit 21 sets an initial determination area including an input locus.

図３は実施形態の入力軌跡１０１及び初期判定領域１０２の例を示す図である。設定部２１は、入力軌跡１０１を含む初期判定領域１０２を設定する。入力軌跡１０１は、入力開始から現在時刻ｋまでの入力軌跡を示す。初期判定領域１０２の形状は任意でよい。図３の例では、初期判定領域１０２は、入力軌跡１０１を含む四角形の領域である。例えば、初期判定領域１０２は、入力軌跡１０１の始点及び終点の少なくとも一方に外接する四角形に余白をつけた四角形内の領域としてもよい。 FIG. 3 is a diagram illustrating an example of the input locus 101 and the initial determination region 102 according to the embodiment. The setting unit 21 sets an initial determination area 102 including the input locus 101. An input trajectory 101 indicates an input trajectory from the input start to the current time k. The shape of the initial determination area 102 may be arbitrary. In the example of FIG. 3, the initial determination area 102 is a rectangular area including the input locus 101. For example, the initial determination area 102 may be an area in a quadrangle in which a margin is added to a quadrilateral circumscribing at least one of the start point and the end point of the input trajectory 101.

図２に戻り、初期判定領域判定部２２は、初期判定領域１０２に含まれる画素を、当該画素の輝度値（画素値）に基づいて２値化する処理を実行する。具体的には、はじめに、初期判定領域判定部２２は、画像をグレースケールの画像に変換する。次に、初期判定領域判定部２２は、例えば非特許文献４等の２値化方法を使用して、閾値ｔを算出する。次に、初期判定領域判定部２２は、初期判定領域１０２中の画素毎に、画素の輝度値と閾値とを比較する。初期判定領域判定部２２は、画素値が閾値ｔよりも大きい場合、黒画素、すなわち文字と判定し、画素値を１にする。一方、初期判定領域判定部２２は、画素値が閾値ｔ以下の場合、白画素、すなわち文字以外と判定し、画素値を０にする。 Returning to FIG. 2, the initial determination region determination unit 22 executes a process of binarizing pixels included in the initial determination region 102 based on the luminance value (pixel value) of the pixel. Specifically, first, the initial determination region determination unit 22 converts an image into a grayscale image. Next, the initial determination area determination unit 22 calculates a threshold value t using a binarization method such as Non-Patent Document 4 or the like. Next, the initial determination area determination unit 22 compares the luminance value of the pixel with a threshold value for each pixel in the initial determination area 102. When the pixel value is larger than the threshold value t, the initial determination area determination unit 22 determines that the pixel is a black pixel, that is, a character, and sets the pixel value to 1. On the other hand, when the pixel value is equal to or less than the threshold value t, the initial determination region determination unit 22 determines that the pixel is not a white pixel, that is, a character, and sets the pixel value to 0.

図４は実施形態の初期判定領域１０２の２値化処理の結果の例を示す図である。図４の例では、初期判定領域１０２に含まれる文字列領域（第２文字列領域）が黒画素で表され、文字列領域以外の領域が白画素で表されている。また、２値化処理が行われていない領域（初期判定領域１０２の外の領域）の画素は、カメラ等で撮影された画像の画素値のままで表されている。 FIG. 4 is a diagram illustrating an example of a result of the binarization process in the initial determination area 102 according to the embodiment. In the example of FIG. 4, the character string area (second character string area) included in the initial determination area 102 is represented by black pixels, and areas other than the character string areas are represented by white pixels. In addition, pixels in an area where the binarization processing has not been performed (an area outside the initial determination area 102) are represented with the pixel values of an image captured by a camera or the like.

図２に戻り、周辺領域判定部２３は、以下の処理１〜４を実行することにより、初期判定領域１０２を含む周辺領域の画素を２値化する。 Returning to FIG. 2, the peripheral region determination unit 23 binarizes pixels in the peripheral region including the initial determination region 102 by executing the following processes 1 to 4.

［処理１．］周辺領域判定部２３は、初期判定領域１０２内の全ての画素を判定済み集合Ｃとする。次に、処理は処理２に進む。 [Process 1. The surrounding area determination unit 23 sets all pixels in the initial determination area 102 as the determined set C. Next, the process proceeds to process 2.

［処理２．］周辺領域判定部２３は、初期判定領域１０２内の文字列領域の画素を、判定対象の文字画素集合Ｍとする。以下、判定対象の文字画素集合Ｍに含まれる画素を文字画素ｐという。次に、処理は処理３に進む。 [Processing 2. The surrounding area determination unit 23 sets the pixels in the character string area in the initial determination area 102 as a determination target character pixel set M. Hereinafter, a pixel included in the character pixel set M to be determined is referred to as a character pixel p. Next, the process proceeds to process 3.

［処理３．］周辺領域判定部２３は、判定対象の文字画素集合Ｍから文字画素ｐを抜き出し、文字画素ｐの周囲８画素のうち、判定済み集合Ｃに含まれていない全ての画素ｑを判定済み集合Ｃに追加するとともに、それぞれの画素ｑに対して下記処理３．１の画素判定処理を実行する。判定対象の文字画素集合Ｍから抜き出された文字画素ｐは、判定済みの画素として、判定対象の文字画素集合Ｍから削除される。周辺領域判定部２３は、文字画素ｐの周囲８画素全てが、既に判定済み集合Ｃに含まれており、判定済み集合Ｃに新たに追加される画素ｑがない場合は、下記処理４を実行する。 [Process 3. The surrounding area determination unit 23 extracts the character pixel p from the character pixel set M to be determined, and determines all the pixels q not included in the determined set C among the eight pixels around the character pixel p. In addition, the pixel determination process of the following process 3.1 is executed for each pixel q. The character pixel p extracted from the character pixel set M to be determined is deleted from the character pixel set M to be determined as a determined pixel. The surrounding area determination unit 23 executes the following process 4 when all eight pixels around the character pixel p are already included in the determined set C and there is no pixel q to be newly added to the determined set C. To do.

［処理３．１．］周辺領域判定部２３は、画素ｑの輝度と閾値ｔとを比較することにより、画素ｑが、文字画素であるか否かを判定する。そして、周辺領域判定部２３は、画素ｑが文字画素と判定された場合、当該画素ｑを、新たな文字画素ｐとして、判定対象の文字画素集合Ｍに追加する。次に、処理は処理４に進む。 [Process 3.1. The peripheral area determination unit 23 determines whether or not the pixel q is a character pixel by comparing the luminance of the pixel q with a threshold value t. When the pixel q is determined to be a character pixel, the surrounding area determination unit 23 adds the pixel q to the determination target character pixel set M as a new character pixel p. Next, the process proceeds to process 4.

［処理４．］周辺領域判定部２３は、判定対象の文字画素集合Ｍが空集合である場合、処理を終了し、空集合でなければ、上記処理３を実行する。 [Process 4. The peripheral area determination unit 23 ends the process when the character pixel set M to be determined is an empty set, and executes the above-described process 3 when it is not an empty set.

図５は実施形態の周辺領域１０３の例を示す図である。周辺領域判定部２３は、初期判定領域１０２の周辺領域１０３の画素を、上述の処理１〜４で２値化することにより、初期判定領域１０２と重なる文字列領域（第２文字列領域）以外の文字列領域（第３文字列領域）を表す画素が、周辺領域１０３にあるか否かを判定する。 FIG. 5 is a diagram illustrating an example of the peripheral region 103 according to the embodiment. The peripheral area determination unit 23 binarizes the pixels in the peripheral area 103 of the initial determination area 102 by the above-described processes 1 to 4, so that the area other than the character string area (second character string area) overlapping the initial determination area 102 It is determined whether or not a pixel representing the character string area (third character string area) is in the peripheral area 103.

図６は実施形態の周辺領域１０３の２値化処理の結果の例を示す図である。図６の例では、文字列領域が黒画素で表され、文字列領域以外の領域が白画素で表されている。また、２値化処理が行われていない領域（周辺領域１０３の外の領域）の画素は、カメラ等で撮影された画像の画素値のままで表されている。 FIG. 6 is a diagram illustrating an example of a result of the binarization processing of the peripheral area 103 according to the embodiment. In the example of FIG. 6, the character string region is represented by black pixels, and the region other than the character string region is represented by white pixels. In addition, pixels in an area where the binarization processing has not been performed (an area outside the peripheral area 103) are represented with the pixel values of an image captured by a camera or the like.

図６の例は、現在時刻ｋでの２値化処理の結果を示す。なお、現在時刻ｋでの判定処理で、時刻ｋ−１までの判定処理により得られた判定結果を使用してもよい。時刻ｋ−１までの判定処理により得られた判定結果が使用される場合、周辺領域判定部２３は、時刻ｋ−１の判定処理で得られた文字列領域の画素を示す文字画素集合Ｎ（ｋ−１）と、時刻ｋ−１の判定処理の終了時の判定済み集合Ｃ（ｋ−１）とを保持する。そして、周辺領域判定部２３は、現在時刻ｋでの判定処理では、周辺領域１０３の画素のうち、判定済み集合Ｃ（ｋ−１）に含まれる画素については、時刻ｋ−１の判定結果である文字画素集合Ｎ（ｋ−１）に含まれる画素を文字画素と判定し、文字画素集合Ｎ（ｋ−１）に含まれない画素を文字以外の画素と判定する。また、周辺領域判定部２３は、周辺領域１０３の画素のうち、判定済み集合Ｃ（ｋ−１）に含まれない画素については、上述の処理１〜４で２値化することにより、周辺領域１０３に、文字列領域を表す画素があるか否かを判定する。 The example of FIG. 6 shows the result of the binarization process at the current time k. In the determination process at the current time k, the determination result obtained by the determination process up to time k−1 may be used. When the determination result obtained by the determination process up to time k−1 is used, the surrounding area determination unit 23 sets the character pixel set N () indicating the pixels in the character string area obtained by the determination process at time k−1. k-1) and the determined set C (k-1) at the end of the determination process at time k-1. Then, in the determination process at the current time k, the surrounding area determination unit 23 uses the determination result at the time k−1 for pixels included in the determined set C (k−1) among the pixels in the surrounding area 103. Pixels included in a certain character pixel set N (k−1) are determined as character pixels, and pixels not included in the character pixel set N (k−1) are determined as pixels other than characters. In addition, the peripheral region determination unit 23 binarizes the pixels in the peripheral region 103 that are not included in the determined set C (k−1) by the above-described processes 1 to 4, thereby generating the peripheral region. In step 103, it is determined whether there is a pixel representing a character string area.

上述の画像処理装置１０の構成により、画像中の注目文字列の領域をユーザがなぞり操作をしている間、現在時刻ｋのなぞり位置を含む局所的な領域に対して文字列領域（例えば値１）と背景領域（例えば値０）とに２値化することができる。 With the configuration of the image processing apparatus 10 described above, a character string area (for example, a value) with respect to a local area including the tracing position at the current time k while the user performs a tracing operation on the area of the character string of interest in the image. 1) and a background area (for example, value 0) can be binarized.

＜表示情報の例＞
図７は実施形態の表示情報の例１を示す図である。表示制御部３は、判定部２により出力される２値画像に基づいて、文字列領域の表示色と、文字以外の領域の表示色とを変えることで、表示部４に判定部２の処理結果を表示する。２値画像は、例えば、文字列領域の画素の画素値を１とし、文字列領域ではない画素の画素値を０とした画像である。図７の例では、文字列領域は、「ｋｉｌ」を表す画素の領域である。図７の例では、表示制御部３は、文字列領域を第１の色（例えば黒色）で表示部４に表示し、周辺領域１０３のうち、文字列領域以外の領域を第２の色（例えば白色）で表示部４に表示する。 <Example of display information>
FIG. 7 is a diagram illustrating an example 1 of display information according to the embodiment. The display control unit 3 changes the display color of the character string region and the display color of the region other than the character based on the binary image output from the determination unit 2, thereby causing the display unit 4 to process the determination unit 2. Display the results. A binary image is an image in which, for example, the pixel value of a pixel in a character string region is set to 1, and the pixel value of a pixel that is not a character string region is set to 0. In the example of FIG. 7, the character string area is a pixel area representing “kil”. In the example of FIG. 7, the display control unit 3 displays the character string region on the display unit 4 in a first color (for example, black), and the region other than the character string region in the peripheral region 103 is displayed in the second color ( For example, white is displayed on the display unit 4.

図８は実施形態の表示情報の例２を示す図である。図８の例では、表示制御部３は、更に、入力軌跡１０１を第３の色（例えば赤色）で表示部４に表示する。 FIG. 8 is a diagram illustrating a second example of display information according to the embodiment. In the example of FIG. 8, the display control unit 3 further displays the input locus 101 on the display unit 4 in a third color (for example, red).

なお上述の第１〜第３の色は任意でよい。また、表示制御部３は、一部の領域だけ色が変更された表示情報を表示部４に表示してもよい。例えば、表示制御部３は、文字列領域の色だけが変更された表示情報を表示部４に表示してもよい。また例えば、表示制御部３は、周辺領域１０３のうち、文字列領域以外の領域の色だけが変更された表示情報を表示部４に表示してもよい。 The first to third colors described above may be arbitrary. Further, the display control unit 3 may display the display information in which the color is changed only in a part of the area on the display unit 4. For example, the display control unit 3 may display the display information in which only the color of the character string area is changed on the display unit 4. For example, the display control unit 3 may display display information in which only the color of the region other than the character string region in the peripheral region 103 is changed on the display unit 4.

以上説明したように、実施形態の画像処理装置１０では、受付部１が、画像に入力された入力情報を受け付ける。判定部２が、入力情報の受け付けが開始されてから現在時刻ｋまでの入力情報の入力軌跡１０１を特定し、入力軌跡１０１を含む文字列領域（第１文字列領域）を判定する。そして、表示制御部３が、入力情報が受付部１により受け付けられている間、判定部２により単位時刻毎に出力される文字列領域を示す情報を表示部４に表示する。 As described above, in the image processing apparatus 10 according to the embodiment, the reception unit 1 receives input information input to an image. The determination unit 2 specifies the input trajectory 101 of input information from the start of acceptance of input information to the current time k, and determines a character string region (first character string region) including the input trajectory 101. Then, while the input information is received by the receiving unit 1, the display control unit 3 displays information indicating the character string area output for each unit time by the determination unit 2 on the display unit 4.

これにより実施形態の画像処理装置１０によれば、画像から所望の文字列を検出できなかった場合、より簡易な入力操作で文字列を再検出することができる。具体的には、例えば、ユーザは、なぞり操作中も処理の途中経過をリアルタイムに確認することができる。また例えば、ユーザが望む文字列領域を正しく検出できていないことに気付いた場合には、なぞり位置を遂次修正するか操作をいったん中断し操作をやり直す等することで、望ましい検出処理が行われるように、操作の途中で修正を加えることができる。 Thereby, according to the image processing apparatus 10 of the embodiment, when a desired character string cannot be detected from the image, the character string can be detected again by a simpler input operation. Specifically, for example, the user can check the progress of the process in real time even during the tracing operation. Further, for example, when the user notices that the character string region desired by the user has not been correctly detected, desirable detection processing is performed by sequentially correcting the stroking position or interrupting the operation and restarting the operation. In this way, corrections can be made during the operation.

（変形例）
次に実施形態の変形例について説明する。実施形態の変形例の説明では、上述の実施形態と同様の説明については省略し、上述の実施形態と異なる箇所について説明する。 (Modification)
Next, a modification of the embodiment will be described. In the description of the modification of the embodiment, the description similar to that of the above-described embodiment will be omitted, and different points from the above-described embodiment will be described.

実施形態の変形例では、通常の場合とは異なり、反転文字が画像に含まれている可能性がある場合について説明する。反転文字を含む領域では、輝度値が低い画素（例えば黒色の画素等）が背景を示し、輝度値が高い画素（例えば白色の画素等）が文字（反転文字）を示す。 In the modification of the embodiment, a case will be described in which an inverted character may be included in an image, unlike a normal case. In an area including inverted characters, a pixel having a low luminance value (for example, a black pixel) indicates the background, and a pixel having a high luminance value (for example, a white pixel) indicates the character (inverted character).

図９は実施形態の変形例の判定部２−２の機能構成の例を示す図である。実施形態の判定部２−２は、設定部２１、初期判定領域判定部２２、周辺領域判定部２３及び反転判定部２４を備える。すなわち、実施形態の変形例では、上述の実施形態の構成（図２参照）に更に反転判定部２４が追加されている。 FIG. 9 is a diagram illustrating an example of a functional configuration of the determination unit 2-2 according to a modification of the embodiment. The determination unit 2-2 of the embodiment includes a setting unit 21, an initial determination region determination unit 22, a peripheral region determination unit 23, and an inversion determination unit 24. That is, in the modification of the embodiment, the inversion determination unit 24 is further added to the configuration of the above-described embodiment (see FIG. 2).

反転判定部２４は、周囲の画素の輝度値に比べて輝度値が高い画素が、文字列領域を示す画素であるか否かを判定し、周囲の画素の輝度値に比べて輝度値が高い画素が、文字列領域を示す画素である場合、２値化処理を実行する前に、画像の輝度値を反転させる。 The inversion determination unit 24 determines whether a pixel whose luminance value is higher than the luminance value of surrounding pixels is a pixel indicating a character string region, and has a higher luminance value than the luminance value of surrounding pixels. When the pixel is a pixel indicating a character string area, the luminance value of the image is inverted before the binarization process is executed.

反転判定部２４の具体的な処理について、輝度値が低い画素が黒画素であり、輝度値が高い画素が白画素である場合を例にして説明する。すなわち、反転判定部２４が、輝度値が低い画素が示す文字が黒い文字であり、輝度値が高い画素が白い背景（白地）であるか、または、輝度値が高い画素が示す文字が白い文字であり、輝度値が低い画素が黒い背景（黒地）であるか（反転文字であるか）を判定する例について説明する。 Specific processing of the inversion determination unit 24 will be described by taking as an example a case where a pixel having a low luminance value is a black pixel and a pixel having a high luminance value is a white pixel. In other words, the inversion determination unit 24 indicates that the character indicated by the pixel having a low luminance value is a black character, the pixel having a high luminance value is a white background (white background), or the character indicated by the pixel having a high luminance value is a white character. An example of determining whether a pixel having a low luminance value is a black background (black background) (whether it is an inverted character) will be described.

具体的には、まず、反転判定部２４は、初期判定領域１０２を含む領域が、反転文字を含むか否かを判定する。反転判定部２４は、初期判定領域１０２を含む領域に反転文字が含まれる場合、画像の輝度値を反転する。画像の輝度値を反転させると、明るい画素ほど暗い画素になり、暗い画素ほど明るい画素になる。 Specifically, the inversion determination unit 24 first determines whether or not the area including the initial determination area 102 includes inverted characters. The inversion determination unit 24 inverts the luminance value of the image when an inversion character is included in the area including the initial determination area 102. When the luminance value of the image is inverted, the brighter pixels become darker and the darker pixels become brighter.

ここで、背景が白地で黒い文字なのか、または、反転文字（背景が黒地で白い文字）なのかを判定する方法について説明する。反転判定部２４は、初期判定領域１０２、及び、初期判定領域１０２の周囲領域、それぞれに含まれる白画素の比率（白画素数／全画素数）を求める。なお、ここでの周囲領域は、例えば、初期判定領域１０２の重心を中心に初期判定領域１０２を一定倍率で拡大した領域から初期判定領域１０２を除いた領域である。反転判定部２４は、白画素の比率が、周囲領域よりも初期判定領域１０２の方が大きい場合、初期判定領域１０２を含む領域が、反転文字を含むと判定する。 Here, a method for determining whether the background is a white background and black characters or an inverted character (background is black and white characters) will be described. The inversion determination unit 24 obtains the ratio of white pixels (number of white pixels / total number of pixels) included in each of the initial determination region 102 and the surrounding region of the initial determination region 102. The surrounding area here is, for example, an area obtained by removing the initial determination area 102 from an area obtained by enlarging the initial determination area 102 at a constant magnification around the center of gravity of the initial determination area 102. When the ratio of white pixels is greater in the initial determination area 102 than in the surrounding area, the inversion determination unit 24 determines that the area including the initial determination area 102 includes inverted characters.

実施形態の変形例の画像処理装置１０によれば、画像に反転文字が含まれる場合でも、上述の実施形態と同様の効果を得ることができる。 According to the image processing apparatus 10 of the modified example of the embodiment, the same effect as that of the above-described embodiment can be obtained even when an inverted character is included in the image.

最後に、実施形態の画像処理装置１０のハードウェア構成の例について説明する。 Finally, an example of a hardware configuration of the image processing apparatus 10 according to the embodiment will be described.

［ハードウェア構成の例］
図１０は実施形態の画像処理装置１０のハードウェア構成の例を示す図である。実施形態の画像処理装置１０は、制御装置３０１、主記憶装置３０２、補助記憶装置３０３、表示装置３０４、入力装置３０５、通信装置３０６及び撮像装置３０７を備える。制御装置３０１、主記憶装置３０２、補助記憶装置３０３、表示装置３０４、入力装置３０５、通信装置３０６及び撮像装置３０７は、バス３１０を介して接続されている。 [Example of hardware configuration]
FIG. 10 is a diagram illustrating an example of a hardware configuration of the image processing apparatus 10 according to the embodiment. The image processing apparatus 10 according to the embodiment includes a control device 301, a main storage device 302, an auxiliary storage device 303, a display device 304, an input device 305, a communication device 306, and an imaging device 307. A control device 301, a main storage device 302, an auxiliary storage device 303, a display device 304, an input device 305, a communication device 306, and an imaging device 307 are connected via a bus 310.

制御装置３０１は補助記憶装置３０３から主記憶装置３０２に読み出されたプログラムを実行する。制御装置３０１は、例えばＣＰＵ等の１以上のプロセッサである。上述の受付部１、判定部２及び表示制御部３は、例えば制御装置３０１により実現される。主記憶装置３０２はＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、及び、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリである。補助記憶装置３０３はメモリカード、及び、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等である。 The control device 301 executes the program read from the auxiliary storage device 303 to the main storage device 302. The control device 301 is one or more processors such as a CPU, for example. The reception unit 1, the determination unit 2, and the display control unit 3 described above are realized by the control device 301, for example. The main storage device 302 is a memory such as a ROM (Read Only Memory) and a RAM (Random Access Memory). The auxiliary storage device 303 is a memory card, an HDD (Hard Disk Drive), or the like.

表示装置３０４は情報を表示する。表示装置３０４は、例えば液晶ディスプレイである。上述の表示部４は、例えば表示装置３０４により実現される。入力装置３０５は、情報の入力を受け付ける。入力装置３０５は、例えばキーボード及びマウス等である。なお表示装置３０４及び入力装置３０５は、表示機能と入力機能とを兼ねる液晶タッチパネル等でもよい。通信装置３０６は他の装置と通信する。撮像装置３０７は情景画像等の画像を撮像する。 The display device 304 displays information. The display device 304 is a liquid crystal display, for example. The above-described display unit 4 is realized by the display device 304, for example. The input device 305 receives input of information. The input device 305 is, for example, a keyboard and a mouse. Note that the display device 304 and the input device 305 may be a liquid crystal touch panel having both a display function and an input function. The communication device 306 communicates with other devices. The imaging device 307 captures an image such as a scene image.

実施形態の画像処理装置１０で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、メモリカード、ＣＤ−Ｒ、及び、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記憶媒体に記憶されてコンピュータ・プログラム・プロダクトとして提供される。 A program executed by the image processing apparatus 10 according to the embodiment is a file in an installable format or an executable format, and is read by a computer such as a CD-ROM, a memory card, a CD-R, and a DVD (Digital Versatile Disk). It is stored in a possible storage medium and provided as a computer program product.

また実施形態の画像処理装置１０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また実施形態の画像処理装置１０が実行するプログラムを、ダウンロードさせずにインターネット等のネットワーク経由で提供するように構成してもよい。 The program executed by the image processing apparatus 10 according to the embodiment may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. The program executed by the image processing apparatus 10 according to the embodiment may be provided via a network such as the Internet without being downloaded.

また実施形態の画像処理装置１０で実行されるプログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 Further, the program executed by the image processing apparatus 10 according to the embodiment may be configured to be provided by being incorporated in advance in a ROM or the like.

実施形態の画像処理装置１０で実行されるプログラムは、実施形態の画像処理装置１０の機能構成のうち、プログラムにより実現可能な機能を含むモジュール構成となっている。 The program executed by the image processing apparatus 10 according to the embodiment has a module configuration including functions that can be realized by the program among the functional configurations of the image processing apparatus 10 according to the embodiment.

プログラムにより実現される機能は、制御装置３０１が補助記憶装置３０３等の記憶媒体からプログラムを読み出して実行することにより、プログラムにより実現される機能が主記憶装置３０２にロードされる。すなわちプログラムにより実現される機能は、主記憶装置３０２上に生成される。 Functions realized by the program are loaded into the main storage device 302 by the control device 301 reading the program from a storage medium such as the auxiliary storage device 303 and executing the program. That is, the function realized by the program is generated on the main storage device 302.

なお実施形態の画像処理装置１０の機能の一部を、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等のハードウェアにより実現してもよい。ＩＣは、例えば専用の処理を実行するプロセッサである。 Note that some of the functions of the image processing apparatus 10 according to the embodiment may be realized by hardware such as an IC (Integrated Circuit). The IC is a processor that executes dedicated processing, for example.

また複数のプロセッサを用いて各機能を実現する場合、各プロセッサは、各機能のうち１つを実現してもよいし、各機能のうち２以上を実現してもよい。 When each function is realized using a plurality of processors, each processor may realize one of the functions or two or more of the functions.

また実施形態の画像処理装置１０の動作形態は任意でよい。実施形態の画像処理装置１０を、例えばネットワーク上のクラウドシステムとして動作させてもよい。 The operation mode of the image processing apparatus 10 according to the embodiment may be arbitrary. The image processing apparatus 10 according to the embodiment may be operated as a cloud system on a network, for example.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１受付部
２判定部
３表示制御部
４表示部
１０画像処理装置
２１設定部
２２初期判定領域判定部
２３周辺領域判定部
２４反転判定部
３０１制御装置
３０２主記憶装置
３０３補助記憶装置
３０４表示装置
３０５入力装置
３０６通信装置
３０７撮像装置
３１０バス DESCRIPTION OF SYMBOLS 1 Reception part 2 Determination part 3 Display control part 4 Display part 10 Image processing apparatus 21 Setting part 22 Initial determination area | region determination part 23 Peripheral area | region determination part 24 Inversion determination part 301 Control apparatus 302 Main storage apparatus 303 Auxiliary storage apparatus 304 Display apparatus 305 Input device 306 Communication device 307 Imaging device 310 Bus

Claims

A reception unit for receiving input information input to the image;
A determination unit that identifies an input locus of the input information from the start of reception of the input information to a current time, and determines a first character string region including the input locus;
A display control unit that displays on the display unit information indicating the first character string region that is output every unit time by the determination unit while the input information is received by the reception unit;
An image processing apparatus comprising:

The determination unit determines an area other than the first character string area and the first character string area based on a binarization process that indicates the pixel as a binary value based on a luminance value of a pixel included in the image. To
The image processing apparatus according to claim 1.

The determination unit sets an initial determination region including the input locus, determines the second character string region included in the initial determination region by executing the binarization process in the initial determination region, and By performing the binarization process on a peripheral area of the second character string area, a third character string area included in the peripheral area is determined, and the second character string area, the third character string area, To determine the first character string region,
The image processing apparatus according to claim 2.

The determination unit determines whether a pixel whose luminance value is higher than the luminance value of surrounding pixels is a pixel indicating a character string region, and has a higher luminance value than the luminance value of surrounding pixels. Is a pixel indicating a character string region, the luminance value of the image is inverted before executing the binarization process.
The image processing apparatus according to claim 2.

The display control unit displays the first character string region on the display unit in a first color.
The image processing apparatus according to claim 2.

The display control unit displays a region other than the first character string region in the peripheral region including the first character string region on the display unit in a second color.
The image processing apparatus according to claim 5.

The display control unit displays the input locus in a third color on the display unit;
The image processing apparatus according to claim 6.

Receiving input information entered in the image;
Identifying an input trajectory of the input information from the start of acceptance of the input information to the current time, and determining a first character string region including the input trajectory;
Displaying the information indicating the first character string area output at each unit time on the display unit while the input information is received;
An image processing method including:

Computer
A reception unit for receiving input information input to the image;
A determination unit that identifies an input locus of the input information from the start of reception of the input information to a current time, and determines a first character string region including the input locus;
While the input information is received by the receiving unit, a display control unit that displays on the display unit information indicating the first character string region that is output at every unit time by the determination unit;
Program to function as.