JP2017091455A

JP2017091455A - Image processing device, image processing method and image processing program

Info

Publication number: JP2017091455A
Application number: JP2015224895A
Authority: JP
Inventors: 洋次郎登内; Yojiro Touchi
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2015-11-17
Filing date: 2015-11-17
Publication date: 2017-05-25

Abstract

PROBLEM TO BE SOLVED: To provide an image processing device capable of reliably selecting a desired character string, and an image processing method and an image processing program.SOLUTION: An image processing device includes first and second acquisition sections, and a processing section. The first acquisition section acquires data related to a plurality of images including character strings in time series. The second acquisition section receives an input. The processing section executes detection operation and selection operation. The detection operation includes detection of a plurality of image regions from the plurality of images. Each of the plurality of image regions individually includes each of the character strings included in each of the plurality of images. Each of the plurality of image regions is detected in detection time, and each of the plurality of image regions has an images region position. The selection operation includes selection of at least one of the plurality of image regions. The selection is performed on the basis of a first difference between each detection time of the plurality of image regions and reception time when the input is received and a second difference between each image region position of the plurality of image regions and a position of the input.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、画像処理装置、画像処理方法及び画像処理プログラムに関する。 Embodiments described herein relate generally to an image processing apparatus, an image processing method, and an image processing program.

実空間に存在する文字列をカメラで撮影し、撮影した画像から文字列を検出する画像処理装置がある。画像処理装置においては、リアルタイムで撮影している画像を表示させながら、ユーザに画像内の所望の文字列を選択させ、文字列に関する様々な情報を表示する場合がある。この場合、ユーザが画像内の所望の文字列を認識し、選択するまでにタイムラグが生じる。画像内の文字列の位置はタイムラグの間に変化してしまう。このため、ユーザは所望の文字列を選択できない可能性がある。このような画像処理装置においては、確実に所望の文字列を選択できることが望まれる。 There is an image processing apparatus that captures a character string existing in real space with a camera and detects the character string from the captured image. In an image processing apparatus, there are cases where a user selects a desired character string in an image while displaying an image taken in real time, and displays various information related to the character string. In this case, there is a time lag until the user recognizes and selects a desired character string in the image. The position of the character string in the image changes during the time lag. For this reason, the user may not be able to select a desired character string. In such an image processing apparatus, it is desired that a desired character string can be reliably selected.

特開２０１５−８８０４６号公報Japanese Patent Laying-Open No. 2015-88046

本発明の実施形態は、確実に所望の文字列を選択可能な画像処理装置、画像処理方法及び画像処理プログラムを提供する。 Embodiments of the present invention provide an image processing apparatus, an image processing method, and an image processing program capable of reliably selecting a desired character string.

本発明の実施形態によれば、第１取得部と、第２取得部と、処理部と、を備えた画像処理装置が提供される。前記第１取得部は、文字列を含む複数の画像に関するデータを時系列に取得する。前記第２取得部は、入力を受け取る。前記処理部は、検出動作と、選択動作と、を実施する。前記検出動作は、前記複数の画像から複数の画像領域を検出することを含む。前記複数の画像領域のそれぞれは、前記複数の画像のそれぞれに含まれる前記文字列のそれぞれを含む。前記複数の画像領域のそれぞれは検出時刻に検出され、前記複数の画像領域のそれぞれは画像領域位置を有する。前記選択動作は、前記複数の画像領域のうちの少なくとも１つを選択することを含む。前記選択は、前記複数の画像領域のそれぞれの前記検出時刻と、前記第２取得部で前記入力を受け取った受取時刻と、の第１差、及び、前記複数の画像領域のそれぞれの前記画像領域位置と、前記入力の位置と、の第２差に基づいて行う。 According to the embodiment of the present invention, an image processing apparatus including a first acquisition unit, a second acquisition unit, and a processing unit is provided. The first acquisition unit acquires data related to a plurality of images including a character string in time series. The second acquisition unit receives an input. The processing unit performs a detection operation and a selection operation. The detection operation includes detecting a plurality of image regions from the plurality of images. Each of the plurality of image regions includes each of the character strings included in each of the plurality of images. Each of the plurality of image areas is detected at a detection time, and each of the plurality of image areas has an image area position. The selection operation includes selecting at least one of the plurality of image regions. The selection includes a first difference between the detection time of each of the plurality of image regions and a reception time at which the input is received by the second acquisition unit, and the image region of each of the plurality of image regions. Based on the second difference between the position and the input position.

第１の実施形態に係る画像処理装置を例示するブロック図である。1 is a block diagram illustrating an image processing apparatus according to a first embodiment. 複数の画像を例示する図である。It is a figure which illustrates a plurality of images. 第１の実施形態に係る検出部の動作例を例示する模式図である。It is a schematic diagram which illustrates the operation example of the detection part which concerns on 1st Embodiment. 図４（ａ）〜図４（ｃ）は、指示位置の取得方法を例示する模式図である。FIG. 4A to FIG. 4C are schematic views illustrating the method for acquiring the designated position. 第１の実施形態に係る選択部の動作例を例示する模式図である。It is a schematic diagram which illustrates the operation example of the selection part which concerns on 1st Embodiment. 図６（ａ）及び図６（ｂ）は、参考例に係る画像処理装置の画像を例示する模式図である。FIG. 6A and FIG. 6B are schematic views illustrating images of the image processing apparatus according to the reference example. 第２の実施形態に係る画像処理装置を例示するブロック図である。It is a block diagram which illustrates the image processing device concerning a 2nd embodiment. 図８（ａ）及び図８（ｂ）は、第２の実施形態に係るグループ化部の動作例を例示する模式図である。FIG. 8A and FIG. 8B are schematic views illustrating an operation example of the grouping unit according to the second embodiment. 第３の実施形態に係る画像処理装置を例示するブロック図である。It is a block diagram which illustrates the image processing device concerning a 3rd embodiment. 第３の実施形態に係る傾き補正方法を例示する模式図である。It is a schematic diagram which illustrates the inclination correction method which concerns on 3rd Embodiment. 第４の実施形態に係る画像処理装置を例示するブロック図である。It is a block diagram which illustrates the image processing device concerning a 4th embodiment.

以下に、本発明の各実施の形態について図面を参照しつつ説明する。
なお、図面は模式的または概念的なものであり、各部分の厚みと幅との関係、部分間の大きさの比率などは、必ずしも現実のものと同一とは限らない。また、同じ部分を表す場合であっても、図面により互いの寸法や比率が異なって表される場合もある。
なお、本願明細書と各図において、既出の図に関して前述したものと同様の要素には同一の符号を付して詳細な説明は適宜省略する。 Embodiments of the present invention will be described below with reference to the drawings.
The drawings are schematic or conceptual, and the relationship between the thickness and width of each part, the size ratio between the parts, and the like are not necessarily the same as actual ones. Further, even when the same part is represented, the dimensions and ratios may be represented differently depending on the drawings.
Note that, in the present specification and each drawing, the same elements as those described above with reference to the previous drawings are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.

（第１の実施形態）
図１は、第１の実施形態に係る画像処理装置を例示するブロック図である。
実施形態に係る画像処理装置１１０は、第１取得部１１と、第２取得物１２と、処理部２０と、表示部３０と、を含む。第１取得部１１には、例えば、入出力端子が用いられる。取得部１０は、有線または無線を介して外部と通信する入出力インタフェースを含む。処理部２０には、例えば、ＣＰＵ（Central Processing Unit）やメモリなどを含む演算装置が用いられる。処理部２０の各ブロックの一部、又は全部には、ＬＳＩ（Large Scale Integration）等の集積回路またはＩＣ（Integrated Circuit）チップセットを用いることができる。各ブロックに個別の回路を用いてもよいし、一部又は全部を集積した回路を用いてもよい。各ブロック同士が一体として設けられてもよいし、一部のブロックが別に設けられてもよい。また、各ブロックのそれぞれにおいて、その一部が別に設けられてもよい。集積化には、ＬＳＩに限らず、専用回路又は汎用プロセッサを用いてもよい。 (First embodiment)
FIG. 1 is a block diagram illustrating an image processing apparatus according to the first embodiment.
The image processing apparatus 110 according to the embodiment includes a first acquisition unit 11, a second acquisition object 12, a processing unit 20, and a display unit 30. For example, an input / output terminal is used for the first acquisition unit 11. The acquisition unit 10 includes an input / output interface that communicates with the outside via a wired or wireless connection. For the processing unit 20, for example, an arithmetic device including a CPU (Central Processing Unit) and a memory is used. An integrated circuit such as LSI (Large Scale Integration) or an IC (Integrated Circuit) chip set can be used for some or all of the blocks of the processing unit 20. An individual circuit may be used for each block, or a circuit in which part or all of the blocks are integrated may be used. Each block may be provided integrally, or a part of the blocks may be provided separately. In addition, a part of each block may be provided separately. The integration is not limited to LSI, and a dedicated circuit or a general-purpose processor may be used.

処理部２０には、検出部２１と、選択部２２と、が設けられる。これらの各部は、例えば、画像処理プログラムとして実現される。すなわち、画像処理装置１１０は、汎用のコンピュータ装置を基本ハードウェアとして用いることでも実現される。画像処理装置１１０に含まれる各部の機能は、上記のコンピュータ装置に搭載されたプロセッサに画像処理プログラムを実行させることにより実現することができる。このとき、画像処理装置１１０は、上記の画像処理プログラムをコンピュータ装置にあらかじめインストールすることで実現してもよいし、ＣＤ−ＲＯＭなどの記憶媒体に記憶して、あるいはネットワークを介して上記の画像処理プログラムを配布して、この画像処理プログラムをコンピュータ装置に適宜インストールすることで実現してもよい。また、処理部２０は、上記のコンピュータ装置に内蔵あるいは外付けされたメモリ、ハードディスクもしくはＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＡＭ、ＤＶＤ−Ｒなどの記憶媒体などを適宜利用して実現することができる。 The processing unit 20 is provided with a detection unit 21 and a selection unit 22. Each of these units is realized as an image processing program, for example. That is, the image processing apparatus 110 can also be realized by using a general-purpose computer apparatus as basic hardware. The functions of the units included in the image processing apparatus 110 can be realized by causing a processor mounted on the computer apparatus to execute an image processing program. At this time, the image processing apparatus 110 may be realized by installing the above-described image processing program in a computer device in advance, or may be stored in a storage medium such as a CD-ROM or via the network. It may be realized by distributing a processing program and installing the image processing program in a computer apparatus as appropriate. The processing unit 20 is realized by appropriately using a memory, a hard disk or a storage medium such as a CD-R, a CD-RW, a DVD-RAM, a DVD-R, or the like that is built in or externally attached to the computer device. Can do.

表示部３０は、例えば、液晶ディスプレイなどで構成される。この例においては、表示部３０は、第２取得部１２を含む。第２取得部１２は、例えば、表示部３０に一体で設けられたタッチパネルを含む。 The display unit 30 is configured by a liquid crystal display, for example. In this example, the display unit 30 includes the second acquisition unit 12. The second acquisition unit 12 includes, for example, a touch panel provided integrally with the display unit 30.

実施形態において、第１取得部１１は、文字列を含む複数の画像に関するデータを時系列に取得する。
第２取得部１２は、ユーザからの入力を受け取る。
検出部２１は、検出動作を実施する。検出動作は、複数の画像から複数の画像領域を検出することを含む。複数の画像領域のそれぞれは、複数の画像のそれぞれに含まれる文字列のそれぞれを含む。複数の画像領域のそれぞれは、検出時刻に検出される。複数の画像領域のそれぞれは、画像領域位置を有する。
選択部２２は、選択動作を実施する。選択動作は、複数の画像領域のうちの少なくとも１つを選択することを含む。選択は、複数の画像領域のそれぞれの検出時刻と、第２取得部１２で入力を受け取った受取時刻と、の第１差、及び、複数の画像領域のそれぞれの画像領域位置と、入力の位置と、の第２差に基づいて行う。 In the embodiment, the first acquisition unit 11 acquires data related to a plurality of images including a character string in time series.
The second acquisition unit 12 receives input from the user.
The detection unit 21 performs a detection operation. The detection operation includes detecting a plurality of image regions from the plurality of images. Each of the plurality of image regions includes each of character strings included in each of the plurality of images. Each of the plurality of image areas is detected at the detection time. Each of the plurality of image areas has an image area position.
The selection unit 22 performs a selection operation. The selection operation includes selecting at least one of the plurality of image regions. The selection includes the first difference between the detection time of each of the plurality of image areas and the reception time at which the input is received by the second acquisition unit 12, the position of each image area of the plurality of image areas, and the input position. And based on the second difference.

画像処理装置において、リアルタイムで撮影している画像を表示させながら、ユーザに画像内の所望の文字列を選択させ、文字列に関する様々な情報を表示する場合を想定する。この場合、ユーザが画像内の所望の文字列を認識し、文字列を選択するまでにタイムラグが生じる。画像内の文字列の位置はタイムラグの間に変化してしまう場合がある。このため、ユーザは所望の文字列を選択できない可能性がある。 In the image processing apparatus, it is assumed that the user selects a desired character string in the image and displays various information regarding the character string while displaying the image taken in real time. In this case, a time lag occurs until the user recognizes a desired character string in the image and selects the character string. The position of the character string in the image may change during the time lag. For this reason, the user may not be able to select a desired character string.

これに対して、実施形態に係る画像処理装置１１０は、文字列を含む複数の画像を時系列に取得し、複数の画像から複数の画像領域を検出する。複数の画像領域のそれぞれは文字列のそれぞれを含む。画像処理装置１１０は、ユーザからの入力を受け取り、入力の受取時刻を基準として、複数の画像領域のうちで入力の位置に近い画像領域を選択する。これにより、ユーザは、確実に所望の文字列を選択することができる。 On the other hand, the image processing apparatus 110 according to the embodiment acquires a plurality of images including a character string in time series, and detects a plurality of image regions from the plurality of images. Each of the plurality of image areas includes each of character strings. The image processing apparatus 110 receives an input from the user, and selects an image area close to the input position from among a plurality of image areas with reference to the input reception time. Thereby, the user can select a desired character string reliably.

図２は、複数の画像を例示する図である。
図２に表すように、複数の画像に関するデータ４０_ｊ、４０_ｊ＋１、…、４０_ｋ（以下、画像４０_ｊ、４０_ｊ＋１、…、４０_ｋという）は、、時系列に取得される。複数の画像４０_ｊ、４０_ｊ＋１、…、４０_ｋのそれぞれは、文字列を含む。画像４０_ｊ、４０_ｊ＋１、…、４０_ｋは、例えば、看板、標識、レストランのメニューなどの実空間に存在する文字列を含む画像であればよい。第１取得部１１は、時刻ｔ_ｊにおいて画像４０_ｊを取得し、時刻ｔ_ｊ＋１において画像４０_ｊ＋１を取得し、時刻ｔ_ｋにおいて画像４０_ｋを取得する。第１取得部１１は、画像４０_ｊ、４０_ｊ＋１、…、４０_ｋを、デジタルスチルカメラなどの撮像デバイスから取得してもよい。この場合、撮像デバイスは、画像処理装置１１０と一体で設けられていてもよいし、別体で設けられていてもよい。表示部３０は、複数の画像４０_ｊ、４０_ｊ＋１、…、４０_ｋを取得した順に表示する。 FIG. 2 is a diagram illustrating a plurality of images.
As shown in FIG. 2, data 40 _j , 40 _{j + 1} ,..., 40 _k (hereinafter referred to as images 40 _j , 40 _{j + 1} ,..., 40 _k ) related to a plurality of images are acquired in time series. Each of the plurality of images 40 _j , 40 _{j + 1} ,..., 40 _k includes a character string. The images 40 _j , 40 _{j + 1} ,..., 40 _k may be images including character strings existing in real space such as signs, signs, and restaurant menus. The first acquisition unit 11 acquires the image 40 _j at time t _j , acquires the image 40 _{j + 1} at time t _{j + 1} , and acquires the image 40 _k at time t _k . The first acquisition unit 11 may acquire the images 40 _j , 40 _{j + 1} ,..., 40 _k from an imaging device such as a digital still camera. In this case, the imaging device may be provided integrally with the image processing apparatus 110 or may be provided separately. Display unit 30, a plurality of images _{_{40 j, 40 j + 1,}} ..., are displayed in the order that has acquired the 40 _k.

検出部２１は、複数の画像４０_ｊ、４０_ｊ＋１、…、４０_ｋから複数の画像領域を検出する。この例においては、画像４０_ｊから画像領域Ｒ１_ｊ〜Ｒ４_ｊが検出される。画像領域Ｒ１_ｊは文字列ｃ１_ｊを含む。画像領域Ｒ２_ｊは文字列ｃ２_ｊを含む。画像領域Ｒ３_ｊは文字列ｃ３_ｊを含む。画像領域Ｒ４_ｊは文字列ｃ４_ｊを含む。同様に、画像４０_ｊ＋１から画像領域Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１が検出される。画像領域Ｒ１_ｊ＋１は文字列ｃ１_ｊ＋１を含む。画像領域Ｒ２_ｊ＋１は文字列ｃ２_ｊ＋１を含む。画像領域Ｒ３_ｊ＋１は文字列ｃ３_ｊ＋１を含む。画像領域Ｒ４_ｊ＋１は文字列ｃ４_ｊ＋１を含む。画像４０_ｋから画像領域Ｒ１_ｋ〜Ｒ３_ｋが検出される。画像領域Ｒ１_ｋは文字列ｃ１_ｋを含む。画像領域Ｒ２_ｋは文字列ｃ２_ｋを含む。画像領域Ｒ３_ｋは文字列ｃ３_ｋを含む。これらの画像４０_ｊ、４０_ｊ＋１、…、４０_ｋの１つを、単に画像４０ともいう。 The detection unit 21 detects a plurality of image regions from the plurality of images 40 _j , 40 _{j + 1} ,..., 40 _k . In this example, image regions R1 _{j to} R4 _j are detected from the image 40 _j . The image area R1 _j includes a character string c1 _j . The image area R2 _j includes a character string c2 _j . The image area R3 _j includes a character string c3 _j . The image region R4 _j includes a character string c4 _j . Similarly, image regions R1 _{j + 1 to} R4 _{j + 1} are detected from the image 40 _{j + 1} . The image area R1 _{j + 1} includes a character string c1 _{j + 1} . The image area R2 _{j + 1} includes a character string c2 _{j + 1} . The image area R3 _{j + 1} includes a character string c3 _{j + 1} . The image area R4 _{j + 1} includes a character string c4 _{j + 1} . Image regions R1 _{k to} R3 _k are detected from the image 40 _k . The image area R1 _k includes a character string c1 _k . The image area R2 _k includes a character string c2 _k . The image region R3 _k includes a character string c3 _k . One of these images 40 _j , 40 _{j + 1} ,..., 40 _k is also simply referred to as an image 40.

画像領域Ｒ１_ｊ〜Ｒ４_ｊ、Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１、…、Ｒ１_ｋ〜Ｒ３_ｋのそれぞれは、例えば、矩形状の領域で表される。検出部２１は、画像領域Ｒ１_ｊ〜Ｒ４_ｊ、Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１、…、Ｒ１_ｋ〜Ｒ３_ｋのそれぞれの検出時刻及び画像領域位置を取得する。なお、処理部２０は、時刻を取得するための時計機能を有する。この例においては、画像領域Ｒ１_ｊ〜Ｒ４_ｊの検出時刻は、ｔ_ｊ（検出時刻ｔ_ｊ）である。画像領域Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１の検出時刻は、ｔ_ｊ＋１（検出時刻ｔ_ｊ＋１）である。画像領域Ｒ１_ｋ〜Ｒ３_ｋの検出時刻は、ｔ_ｋ（検出時刻ｔ_ｋ）である。これらの画像領域Ｒ１_ｊ〜Ｒ４_ｊ、Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１、…、Ｒ１_ｋ〜Ｒ３_ｋの１つを、単に画像領域Ｒともいう。表示部３０は、複数の画像領域Ｒをユーザが識別可能なように、文字列を囲む枠などで表示してもよい。 Each of the image areas R1 _{j to} R4 _j , R1 _{j + 1 to} R4 _{j + 1} ,..., R1 _{k to} R3 _k is represented by a rectangular area, for example. The detection unit 21 acquires the detection times and image region positions of the image regions R1 _{j to} R4 _j , R1 _{j + 1 to} R4 _{j + 1} ,..., R1 _{k to} R3 _k . Note that the processing unit 20 has a clock function for acquiring time. In this example, the detection times of the image regions R1 _{j to} R4 _j are t _j (detection time t _j ). The detection times of the image areas R1 _{j + 1 to} R4 _{j + 1} are t _{j + 1} (detection time t _{j + 1} ). The detection times of the image areas R1 _{k to} R3 _k are t _k (detection time t _k ). One of these image regions R1 _{j to} R4 _j , R1 _{j + 1 to} R4 _{j + 1} ,..., R1 _{k to} R3 _k is also simply referred to as an image region R. The display unit 30 may display a plurality of image regions R with a frame surrounding a character string so that the user can identify the image regions R.

図３は、第１の実施形態に係る検出部２１の動作例を例示する模式図である。
図中、Ｘ方向は画像の水平方向に沿う。Ｙ方向は画像の垂直方向に沿う。Ｔ方向は時間方向を表す。 FIG. 3 is a schematic view illustrating an operation example of the detection unit 21 according to the first embodiment.
In the figure, the X direction is along the horizontal direction of the image. The Y direction is along the vertical direction of the image. The T direction represents the time direction.

図３に表すように、画像４０及び画像４０_ｋは、時系列に取得される。画像４０及び画像４０_ｋのそれぞれは、文字列（図示せず）を含む。画像４０は、任意の時刻ｔ（ｔ_ｊ≦ｔ＜ｔ_ｋ）において取得される。画像４０は、画像４０_ｊ、４０_ｊ＋１、…の中のいずれかである。画像４０_ｋは、時刻ｔ_ｋにおいて取得される。ここでは２つの画像を例示しているが、３つ以上の画像でもよい。 As illustrated in FIG. 3, the image 40 and the image 40 _k are acquired in time series. Each of the image 40 and the image 40 _k includes a character string (not shown). The image 40 is acquired at an arbitrary time t (t _j ≦ t <t _k ). The image 40 is one of the images 40 _j , 40 _{j + 1,.} Image 40 _k is acquired at time t _k . Although two images are illustrated here, three or more images may be used.

例えば、画像４０から、文字列を含む画像領域Ｒ_ｉ ^ｔが検出される。画像領域Ｒ_ｉ ^ｔは、時刻ｔでのｉ番目の検出結果であることを意味する。画像領域Ｒ_ｉ ^ｔ内の文字列が直線状である場合には、画像領域Ｒ_ｉ ^ｔの画像領域位置ｐ_ｔは、４点の２次元座標（ｘ_ｉ ^ｔ(1)、ｙ_ｉ ^ｔ(1)）、（ｘ_ｉ ^ｔ(2)、ｙ_ｉ ^ｔ(2)）、（ｘ_ｉ ^ｔ(3)、ｙ_ｉ ^ｔ(3)）、（ｘ_ｉ ^ｔ(4)、ｙ_ｉ ^ｔ(4)）を用いて表すことができる。 For example, the image 40, the image region _R ^{i t} including the character string is detected. The image area R _i ^t means the i-th detection result at time t. If the character string in the image region _R ^{i t} is linear, the image area position of the image region _R ^{i t} _{p t} is two-dimensional coordinates of 4 points _{^{_{(x i t (1),}}} y i t (1 _{^{)), (x i t (}} 2), y i t (2)), (x i t (3), y i t (3)), (x i t (4), y i t (4)) Can be used.

この例においては、時刻ｔのｉ番目の画像領域Ｒ_ｉ ^ｔの画像領域位置ｐ_ｔは、（ｘ_ｉ ^ｔ(1)、ｙ_ｉ ^ｔ(1)、ｘ_ｉ ^ｔ(2)、ｙ_ｉ ^ｔ(2)、ｘ_ｉ ^ｔ(3)、ｙ_ｉ ^ｔ(3)、ｘ_ｉ ^ｔ(4)、ｙ_ｉ ^ｔ(4)、ｔ）、で表される。 In this example, the image area position p _t of the i-th image area R _i ^t at time t is (x _i ^t (1), y _i ^t (1), x _i ^t (2), y _i ^t ( 2), x _i ^t (3), y _i ^t (3), x _i ^t (4), y _i ^t (4), t).

すなわち、複数の画像領域Ｒのそれぞれの画像領域位置ｐ_ｔは、画像領域Ｒ_ｉ ^ｔと同様に、４点の２次元座標を用いて表すことができる。このようにして、複数の画像領域Ｒのそれぞれの検出時刻ｔ及び画像領域位置ｐ_ｔを得ることができる。 That is, each image area position p _t of the plurality of image regions R, as in the image region R _i ^t, can be expressed using the two-dimensional coordinates of 4 points. In this way, it is possible to obtain the respective detection time t and the image area position p _t of the plurality of image areas R.

ここで、文字列の検出方法の具体例について説明する。例えば、画像中の画素のうち、隣接する画素間で、画素の色などの特徴が類似する画素同士を連結して、１つ以上の連結成分を生成する。具体的には、画像中の画素を白及び黒で二値化し、二値化された画素のうち黒画素が隣接して２つ以上連続する場合、連続する黒画素の集合を連結成分として生成する。 Here, a specific example of a character string detection method will be described. For example, among pixels in an image, pixels having similar characteristics such as pixel color are connected between adjacent pixels to generate one or more connected components. Specifically, when pixels in the image are binarized with white and black, and two or more black pixels adjacent to each other are consecutively generated, a set of continuous black pixels is generated as a connected component. To do.

連結成分の位置関係及び連結成分の類似度に応じて、略同一直線上に並ぶ連結成分を組み合わせ、文字列として検出する。具体的には、例えば、連結成分毎に特徴ベクトルを生成し、２つの連結成分間の位置関係及び特徴の類似度を、特徴ベクトルの距離で定義する。特徴ベクトルの距離が閾値未満であれば、２つの連結成分は類似しており同一直線上に並ぶと考えられる。この場合、２つの連結成分を接続する。特徴ベクトルの各要素としては、例えば、連結成分の中心点を表すｘ座標及びｙ座標、各連結成分の平均色、連結成分のサイズ（高さ、幅、周囲長など）が挙げられる。なお、中心点とは、例えば、連結成分に対して外接する矩形の中央点であればよい。このようにして検出された文字列に外接する矩形状の領域を、画像領域として検出する。文字列の検出方法は、これに限定されない。 In accordance with the positional relationship of the connected components and the similarity of the connected components, the connected components arranged on substantially the same straight line are combined and detected as a character string. Specifically, for example, a feature vector is generated for each connected component, and the positional relationship between the two connected components and the feature similarity are defined by the distance between the feature vectors. If the distance of the feature vector is less than the threshold value, the two connected components are similar and are considered to be aligned on the same straight line. In this case, two connected components are connected. Examples of each element of the feature vector include an x coordinate and ay coordinate representing the center point of the connected component, an average color of each connected component, and a size (height, width, perimeter length, etc.) of the connected component. The center point may be, for example, a rectangular center point that circumscribes the connected component. A rectangular area circumscribing the character string thus detected is detected as an image area. The method for detecting the character string is not limited to this.

第２取得部１２は、ユーザから指示された指示位置ｐの入力を受け取る。第２取得部１２は、例えば、タッチパネルを含む。処理部２０は、時計機能を用いて、第２取得部１２が指示位置ｐの入力を受け取った受取時刻ｔ_ｐ（＞検出時刻ｔ_ｋ）を取得する。指示位置ｐは、例えば、ユーザがタッチパネルに対してタップ操作などを行うことで指示される表示画面上の位置である。指示位置ｐは、例えば、受取時刻ｔ_ｐにおける表示画面上の２次元座標を用いて表すことができる。指示位置ｐは、例えば、（ｘ_ｋ、ｙ_ｋ、ｔ_ｐ）で表される。なお、指示位置ｐの取得方法は、タッチパネルに限定されない。指示位置ｐは、例えば、マウスなどのポインティングデバイスを用いて取得してもよい。 The second acquisition unit 12 receives an input of the designated position p designated by the user. The second acquisition unit 12 includes, for example, a touch panel. The processing unit 20 acquires the reception time t _p (> detection time t _k ) when the second acquisition unit 12 receives the input of the designated position p using the clock function. The designated position p is, for example, a position on the display screen that is designated when the user performs a tap operation or the like on the touch panel. Indication position p, for example, can be represented using a two-dimensional coordinates on the display screen in the reception time t _p. The designated position p is represented, for example, by (x _k , y _k , t _p ). In addition, the acquisition method of the designated position p is not limited to a touch panel. The pointing position p may be acquired using a pointing device such as a mouse, for example.

図４（ａ）〜図４（ｃ）は、指示位置ｐの取得方法を例示する模式図である。
実施形態に係る表示部３０は、第２取得部１２として、タッチパネルを一体で備えている。表示部３０は、画像４０_ｋを表示する。指示位置ｐは、図４（ａ）に表すように、表示部３０の表示画面上の１つの点として入力される。指示位置ｐは、例えば、ユーザがタップ操作を行うことで入力される。この場合、指示位置ｐは、１つの座標で表される。このようにして、指示位置ｐの座標点（ｘ_ｋ、ｙ_ｋ）と、受取時刻ｔ_ｐと、が取得される。 FIG. 4A to FIG. 4C are schematic views illustrating a method for obtaining the designated position p.
The display unit 30 according to the embodiment integrally includes a touch panel as the second acquisition unit 12. The display unit 30 displays the image _40k . The designated position p is input as one point on the display screen of the display unit 30 as shown in FIG. The designated position p is input, for example, when the user performs a tap operation. In this case, the designated position p is represented by one coordinate. In this way, the coordinate point of the indication position p _(x _{k, y} k) and a receiving time _{t p,} it is obtained.

指示位置ｐは、図４（ｂ）に表すように、表示部３０の表示画面上に線分として入力される。指示位置ｐは、例えば、ユーザがドラッグ操作で線分を描くことで入力される。線分の形状は限定されない。線分は折れ曲がっていてもよい。この場合、指示位置ｐは、座標群で表される。座標群は、表示画面上で連続して指定される複数の座標からなる。このようにして、指示位置ｐの座標群を構成する各座標点ｐ_ｋ（ｘ_ｋ(k)、ｙ_ｋ(k））（k=1,2,…,K）と、受取時刻ｔ_ｐと、が取得される。 The designated position p is input as a line segment on the display screen of the display unit 30 as shown in FIG. The designated position p is input, for example, when the user draws a line segment by a drag operation. The shape of the line segment is not limited. The line segment may be bent. In this case, the designated position p is represented by a coordinate group. The coordinate group is composed of a plurality of coordinates that are successively specified on the display screen. In this way, each coordinate point p _k (x _k (k), y _k (k)) (k = 1, 2,..., K) constituting the coordinate group of the designated position p, and the reception time t _p , Is acquired.

指示位置ｐは、図４（ｃ）に表すように、表示部３０の表示画面上に領域として入力される。指示位置ｐは、例えば、ユーザがドラッグ操作で囲み線を描くことで入力される。囲み線の形状は限定されない。囲み線は折れ曲がっていてもよい。この場合、指示位置ｐは、座標領域で表される。座標領域は、表示画面上で連続して指定される複数の座標からなる座標群で囲まれている。このようにして、指示位置ｐの座標領域を構成する各座標点ｐ_ｋ（ｘ_ｋ(k)、ｙ_ｋ(k））（k=1,2,…,K）と、受取時刻ｔ_ｐと、が取得される。 The designated position p is input as an area on the display screen of the display unit 30 as shown in FIG. The designated position p is input, for example, when the user draws a surrounding line by a drag operation. The shape of the surrounding line is not limited. The encircling line may be bent. In this case, the designated position p is represented by a coordinate area. The coordinate area is surrounded by a coordinate group composed of a plurality of coordinates that are successively designated on the display screen. In this way, each coordinate point p _k (x _k (k), y _k (k)) (k = 1, 2,..., K) constituting the coordinate area of the designated position p, and the reception time t _p , Is acquired.

図５は、第１の実施形態に係る選択部２２の動作例を例示する模式図である。
図中、Ｘ方向は画像の水平方向に沿う。Ｙ方向は画像の垂直方向に沿う。Ｔ方向は時間方向を表す。 FIG. 5 is a schematic view illustrating an operation example of the selection unit 22 according to the first embodiment.
In the figure, the X direction is along the horizontal direction of the image. The Y direction is along the vertical direction of the image. The T direction represents the time direction.

選択部２２は、複数の画像領域Ｒのうちの少なくとも１つを選択する選択動作を行う。選択は、第１差及び第２差に基づいて行われる。第１差は、複数の画像領域Ｒのそれぞれの検出時刻ｔと、受取時刻ｔ_ｐと、の差である。第２差は、複数の画像領域Ｒのそれぞれの画像領域位置ｐ_ｔと、指示位置ｐと、の差である。具体的には、複数の画像領域Ｒのうちで選択された少なくとも１つの第１差は、複数の画像領域Ｒのうちで選択されなかった別の１つの第１差よりも小さい。複数の画像領域Ｒのうちで選択された少なくとも１つの第２差は、複数の画像領域Ｒのうちで選択されなかった別の１つの第２差よりも小さい。 The selection unit 22 performs a selection operation for selecting at least one of the plurality of image regions R. The selection is made based on the first difference and the second difference. The first difference, each of the detection time t of the plurality of image areas R, which is the difference between the reception time t _p,. The second difference, each of the image area position p _t of the plurality of image areas R, which is the difference between the indicated position p,. Specifically, at least one first difference selected from among the plurality of image regions R is smaller than another one first difference that has not been selected from among the plurality of image regions R. At least one second difference selected from among the plurality of image regions R is smaller than another second difference that has not been selected from among the plurality of image regions R.

例えば、選択部２２は、複数の画像領域Ｒの中から、受取時刻ｔ_ｐから一定時間δ_ｔ内に検出された画像領域Ｒを抽出する。一定時間δ_ｔは、予め実験などに基づいて定められた値である。 For example, the selection unit 22 from among the plurality of image regions R, and extracts an image region R that has been detected from the receiving time t _p in a certain time period [delta] _t. The fixed time δ _t is a value determined in advance based on experiments or the like.

図５に表すように、画像領域Ｒ１ａ〜Ｒ１ｃ、画像領域Ｒ２ａ〜Ｒ２ｃ、画像領域Ｒ３ａ、Ｒ３ｂ、及び、画像領域Ｒ４ａ、Ｒ４ｂが検出されている。画像領域Ｒ１ａ〜Ｒ１ｃ、画像領域Ｒ２ａ〜Ｒ２ｃ、画像領域Ｒ３ａ、Ｒ３ｂ、及び、画像領域Ｒ４ａ、Ｒ４ｂの各画像領域Ｒについて、検出時刻ｔが取得されている。 As shown in FIG. 5, image areas R1a to R1c, image areas R2a to R2c, image areas R3a and R3b, and image areas R4a and R4b are detected. The detection time t is acquired for each of the image areas R1a to R1c, image areas R2a to R2c, image areas R3a and R3b, and image areas R4a and R4b.

選択部２２は、各画像領域Ｒのそれぞれの検出時刻ｔに基づいて、指示位置ｐの受取時刻ｔ_ｐから一定時間δ_ｔ内に検出された画像領域Ｒを、画像領域Ｒ１ａ〜Ｒ１ｃ、画像領域Ｒ２ａ〜Ｒ２ｃ、画像領域Ｒ３ａ、Ｒ３ｂ、及び、画像領域Ｒ４ａ、Ｒ４ｂの中から抽出する。つまり、画像領域Ｒ１ａ〜Ｒ１ｃ、画像領域Ｒ２ａ〜Ｒ２ｃ、画像領域Ｒ３ａ、Ｒ３ｂ、及び、画像領域Ｒ４ａ、Ｒ４ｂのそれぞれの検出時刻ｔと、受取時刻ｔ_ｐと、の差（第１差）が、一定時間δ_ｔ内にある画像領域Ｒが抽出される。以下では、抽出された画像領域Ｒを、抽出領域ＲＡという。 Selection unit 22 based on the respective detection time t of the image region R, an image region R that has been detected from the receiving time t _p of the indicated position p within a predetermined time [delta] _t, the image area R1a～R1c, the image area Extracted from R2a to R2c, image regions R3a and R3b, and image regions R4a and R4b. That is, the image area R1a～R1c, the image area R2a～R2c, image region R3a, R3b, and the image area R4a, respectively and detection time t of R4b, a receiving time _{t p,} the difference (first difference), image region R are extracted within a certain time period [delta] _t. Hereinafter, the extracted image region R is referred to as an extraction region RA.

この例においては、画像領域Ｒ１ａ〜Ｒ１ｃ、画像領域Ｒ２ａ〜Ｒ２ｃ、画像領域Ｒ３ａ、Ｒ３ｂ、及び、画像領域Ｒ４ａが、抽出領域ＲＡとなる。抽出領域ＲＡ１ａ〜ＲＡ１ｃは、画像領域Ｒ１ａ〜Ｒ１ｃに対応する。抽出領域ＲＡ２ａ〜ＲＡ２ｃは、画像領域Ｒ２ａ〜Ｒ２ｃに対応する。抽出領域ＲＡ３ａ、ＲＡ３ｂは、画像領域Ｒ３ａ、Ｒ３ｂに対応する。抽出領域ＲＡ４ａは、画像領域Ｒ４ａに対応する。画像領域Ｒ４ｂは、一定時間δ_ｔ内に検出されていない。このため、画像領域Ｒ４ｂは抽出されない。 In this example, the image areas R1a to R1c, the image areas R2a to R2c, the image areas R3a and R3b, and the image area R4a are the extraction area RA. The extraction areas RA1a to RA1c correspond to the image areas R1a to R1c. The extraction areas RA2a to RA2c correspond to the image areas R2a to R2c. The extraction areas RA3a and RA3b correspond to the image areas R3a and R3b. The extraction area RA4a corresponds to the image area R4a. Image area R4b is not detected within a predetermined time [delta] _t. For this reason, the image region R4b is not extracted.

人間の視覚の平均的な反応時間は、例えば、１８０ｍｓ以上、２００ｍｓ以下、とされる。この反応時間は、ユーザが画像内に所望の文字列を認識してから、指などで画像にタッチ操作するまでのタイムラグとなる。このため、一定時間δ_ｔとしては、タイムラグを考慮し、例えば、１００ミリ秒（ｍｓ）以上、５００ｍｓ以下とすることが望ましい。このように、実施形態においては、ユーザの操作を受け取った受取時刻ｔ_ｐから一定時間δ_ｔだけ過去に検出された画像領域Ｒを選択対象とする。 The average reaction time of human vision is, for example, 180 ms or more and 200 ms or less. This reaction time is a time lag from when the user recognizes a desired character string in the image until the user touches the image with a finger or the like. For this reason, the fixed time δ _t is preferably set to, for example, 100 milliseconds (ms) or more and 500 ms or less in consideration of a time lag. Thus, in the embodiment, the image region R that has been detected only in the past _t certain time δ from the receiving time t _p, which has received the user operation a selection target.

選択部２２は、複数の抽出領域ＲＡ（ここでは、抽出領域ＲＡ１ａ〜ＲＡ１ｃ、抽出領域ＲＡ２ａ〜ＲＡ２ｃ、抽出領域ＲＡ３ａ、ＲＡ３ｂ、及び、抽出領域ＲＡ４ａ）のそれぞれの画像領域位置ｐ_ｔと、指示位置ｐと、の第２差（距離）を算出し、算出した距離に基づいて、複数の抽出領域ＲＡの少なくとも１つを選択する。具体的には、例えば、図３に表すように、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）を算出する。ｘ、ｙは画像の水平及び垂直方向の座標を表し、ｔは検出時刻を表す。距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）は、画像領域Ｒ_ｉ ^ｔと、指示位置ｐと、の間の距離を表す。距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）は、次式により算出できる。この例においては、図４（ａ）に表すように、指示位置ｐが座標点の場合について示す。 Selecting unit 22 (here, the extraction region RA1a～RA1c, extraction region RA2a～RA2c, extraction region RA3a, RA3b, and extraction region RA4a) a plurality of extraction regions RA and each image area position _{p t} of the designation position A second difference (distance) from p is calculated, and at least one of the plurality of extraction regions RA is selected based on the calculated distance. Specifically, for example, as shown in FIG. 3, a distance scale dist (p, R _i ^t ) is calculated. x and y represent the horizontal and vertical coordinates of the image, and t represents the detection time. The distance measure dist (p, R _i ^t ) represents the distance between the image region R _i ^t and the designated position p. The distance measure dist (p, R _i ^t ) can be calculated by the following equation. In this example, as shown in FIG. 4A, a case where the designated position p is a coordinate point is shown.

点ｐ′（ｘ′、ｙ′、ｔ′）は、画像領域Ｒ_ｉ ^ｔに含まれる任意の座標点を表す。つまり、式２により、距離Ｄ（ｐ、ｐ′）を求める。距離Ｄ（ｐ、ｐ′）は、指示位置ｐの座標点（ｘ_ｋ、ｙ_ｋ、ｔ_ｐ）と、画像領域Ｒ_ｉ ^ｔに含まれる複数の点ｐ′（ｘ′、ｙ′、ｔ′）のそれぞれとの間の距離である。そして、式１により、距離Ｄ（ｐ、ｐ′）の最小値を、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）として求める。なお、この例においては、検出時刻ｔを考慮し、（ｘ、ｙ、ｔ）の座標系を用いたが、検出時刻ｔを考慮せずに、（ｘ、ｙ）のみを用いて距離尺度を求めてもよい。 The point p ′ (x ′, y ′, t ′) represents an arbitrary coordinate point included in the image region R _i ^t . In other words, the distance D (p, p ′) is obtained from Equation 2. The distance D (p, p ′) is determined by the coordinate point (x _k , y _k , t _p ) of the designated position p and a plurality of points p ′ (x ′, y ′, t ′) included in the image region R _i ^t. ) Between each of them. Then, using Equation 1, the minimum value of the distance D (p, p ′) is obtained as a distance scale dist (p, R _i ^t ). In this example, the coordinate system of (x, y, t) is used in consideration of the detection time t, but the distance scale is calculated using only (x, y) without considering the detection time t. You may ask for it.

ここで、図４（ｂ）に表すように、指示位置ｐが線分である場合、または、図４（ｃ）に表すように、指示位置ｐが領域である場合、式２により、指示位置ｐの各座標点ｐ_ｋ（ｘ_ｋ(k)、ｙ_ｋ(k））（k=1,2,…,K）について、画像領域Ｒ_ｉ ^ｔに含まれる複数の点ｐ′（ｘ′、ｙ′、ｔ′）のそれぞれとの間の距離Ｄ（ｐ_ｋ、ｐ′）を求める。式１により、距離Ｄ（ｐ_ｋ、ｐ′）の最小値を、ｄｉｓｔ（ｐ_ｋ、Ｒ_ｉ ^ｔ）として求める。ｄｉｓｔ（ｐ_ｋ、Ｒ_ｉ ^ｔ）は、指示位置ｐの各座標点ｐ_ｋについて求められる。そして、次式により、ｄｉｓｔ（ｐ_ｋ、Ｒ_ｉ ^ｔ）の合計値を算出し、これを距離尺度とすればよい。 Here, when the designated position p is a line segment as shown in FIG. 4B or when the designated position p is a region as shown in FIG. each coordinate point _p k of _{p (x k (k),} y k (k)) (k = 1,2, ..., K) for a plurality of points included in the image region _{^{R i t p '(x'}} , A distance D (p _k , p ′) between each of y ′ and t ′) is _obtained . According to Equation 1, the minimum value of the distance D (p _k , p ′) is obtained as dist (p _k , R _i ^t ). _{_{^{dist (p k, R i t}}} ) is determined for each coordinate point _{p k} of the indication position p. Then, the total value of dist (p _k , R _i ^t ) may be calculated by the following formula and used as a distance scale.

なお、ここでは、距離尺度を、ｄｉｓｔ（ｐ_ｋ、Ｒ_ｉ ^ｔ）の合計値としたが、距離尺度は、ｄｉｓｔ（ｐ_ｋ、Ｒ_ｉ ^ｔ）の最大値または最小値としてもよい。 Here, the distance scale is the sum of dist (p _k , R _i ^t ), but the distance scale may be the maximum value or the minimum value of dist (p _k , R _i ^t ).

選択部２２は、例えば、次式により、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）が最も小さい抽出領域ＲＡを選択する。ここで、選択された抽出領域ＲＡの第１差（検出時刻ｔと受取時刻ｔ_ｐとの差）は、一定時間δ_ｔ内にある。 For example, the selection unit 22 selects the extraction region RA having the smallest distance scale dist (p, R _i ^t ) according to the following equation. Here, first differential of the selected extraction area RA (difference between the detection time t and receive time t _p) is within the predetermined time [delta] _t.

すなわち、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）を、ｔ_ｐ−δ_ｔ≦ｔ≦ｔ_ｐの範囲で、最小となる（ｔ_ｍｉｎ、ｉ_ｍｉｎ）を求める。これにより、時刻ｔ_ｍｉｎのｉ_ｍｉｎ番目の抽出領域ＲＡが選択される。ここで、表示部３０は、選択された抽出領域ＲＡを、ユーザが画像内で識別可能な状態で表示してもよい。例えば、抽出領域ＲＡの枠の色や、抽出領域ＲＡ内の文字列の色を変化させるなどの方法が考えられる。また、選択された抽出領域ＲＡは、位置情報（座標及び時刻）のみをユーザに提供するようにしてもよい。画像は、例えば、外部のサーバ装置などに記憶される。この場合、位置情報と画像とを対応付けておき、位置情報から画像を読み出せるようにしておくとよい。 That is, the distance measure dist _{(p, R} ^{i _t),} and a range of _{_{_{t p -δ t ≦ t ≦ t}}} p, determining the smallest _(t _{min, i} min). As a result, the i _min th extraction region RA at time t _min is selected. Here, the display unit 30 may display the selected extraction region RA in a state where the user can identify it in the image. For example, a method of changing the frame color of the extraction area RA or the color of the character string in the extraction area RA is conceivable. Further, the selected extraction area RA may provide only the position information (coordinates and time) to the user. The image is stored in, for example, an external server device. In this case, it is preferable to associate the position information and the image so that the image can be read from the position information.

選択部２２は、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）が小さい順に複数の抽出領域ＲＡを選択するようにしてもよい。例えば、距離尺度ｄｉｓｔ（ｐ、Ｒ_ｉ ^ｔ）が小さい順に５つの抽出領域ＲＡを選択するように予め設定しておくとよい。この場合、選択された５つの抽出領域ＲＡの画像がユーザに提示される。ユーザは、これら５つの抽出領域ＲＡの画像の中から、所望の抽出領域ＲＡの画像を選択することができる。 The selection unit 22 may select a plurality of extraction regions RA in ascending order of the distance scale dist (p, R _i ^t ). For example, it may be set in advance so that the five extraction regions RA are selected in ascending order of the distance scale dist (p, R _i ^t ). In this case, images of the five selected extraction areas RA are presented to the user. The user can select a desired image of the extraction area RA from the images of the five extraction areas RA.

図６（ａ）及び図６（ｂ）は、参考例に係る画像処理装置の画像を例示する模式図である。
前述したように、リアルタイムで撮影している画像を表示させながら、ユーザに画像内の所望の文字列を選択させ、文字列に関する様々な情報を表示する画像処理装置を想定する。 FIG. 6A and FIG. 6B are schematic views illustrating images of the image processing apparatus according to the reference example.
As described above, an image processing apparatus is assumed in which a user selects a desired character string in an image and displays various information related to the character string while displaying an image captured in real time.

図６（ａ）の参考例において、複数の画像４０_ｊ、４０_ｊ＋１、…、４０_ｋは、文字列を含み、時系列に取得される。画像４０_ｊは時刻ｔ_ｊにおいて取得され、画像４０_ｊ＋１は時刻ｔ_ｊ＋１において取得され、画像４０_ｋは時刻ｔ_ｋにおいて取得される。複数の画像４０_ｊ、４０_ｊ＋１、…、４０_ｋから複数の画像領域Ｒが検出される。 In the reference example of FIG. 6A, the plurality of images 40 _j , 40 _{j + 1} ,..., 40 _k include character strings and are acquired in time series. Image 40 _j is acquired at time t _j , image 40 _{j + 1} is acquired at time t _{j + 1} , and image 40 _k is acquired at time t _k . A plurality of image regions R are detected from the plurality of images 40 _j , 40 _{j + 1} ,..., 40 _k .

この例においては、画像４０_ｊから画像領域Ｒ１_ｊ〜Ｒ４_ｊが検出される。画像領域Ｒ１_ｊは文字列ｃ１_ｊを含む。画像領域Ｒ２_ｊは文字列ｃ２_ｊを含む。画像領域Ｒ３_ｊは文字列ｃ３_ｊを含む。画像領域Ｒ４_ｊは文字列ｃ４_ｊを含む。同様に、画像４０_ｊ＋１から画像領域Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１が検出される。画像領域Ｒ１_ｊ＋１は文字列ｃ１_ｊ＋１を含む。画像領域Ｒ２_ｊ＋１は文字列ｃ２_ｊ＋１を含む。画像領域Ｒ３_ｊ＋１は文字列ｃ３_ｊ＋１を含む。画像領域Ｒ４_ｊ＋１は文字列ｃ４_ｊ＋１を含む。画像４０_ｋから画像領域Ｒ１_ｋ〜Ｒ３_ｋが検出される。画像領域Ｒ１_ｋは文字列ｃ１_ｋを含む。画像領域Ｒ２_ｋは文字列ｃ２_ｋを含む。画像領域Ｒ３_ｋは文字列ｃ３_ｋを含む。ここで、画像４０_ｋにおいては、画像領域Ｒ４_ｋが検出されていない。 In this example, image regions R1 _{j to} R4 _j are detected from the image 40 _j . The image area R1 _j includes a character string c1 _j . The image area R2 _j includes a character string c2 _j . The image area R3 _j includes a character string c3 _j . The image region R4 _j includes a character string c4 _j . Similarly, image regions R1 _{j + 1 to} R4 _{j + 1} are detected from the image 40 _{j + 1} . The image area R1 _{j + 1} includes a character string c1 _{j + 1} . The image area R2 _{j + 1} includes a character string c2 _{j + 1} . The image area R3 _{j + 1} includes a character string c3 _{j + 1} . The image area R4 _{j + 1} includes a character string c4 _{j + 1} . Image regions R1 _{k to} R3 _k are detected from the image 40 _k . The image area R1 _k includes a character string c1 _k . The image area R2 _k includes a character string c2 _k . The image region R3 _k includes a character string c3 _k . Here, in the image 40 _k , the image region R4 _k is not detected.

時刻ｔ_ｊにおいて、ユーザが画像４０_ｊ内の文字列ｃ４_ｊを認識し、これを選択するために、指ｆを動かす。時刻ｔ_ｋにおいて、ユーザは指ｆで文字列ｃ４_ｋにタッチする。しかしながら、文字列ｃ４_ｋを含む画像領域Ｒ４_ｋは未検出であるため、画像領域Ｒ４_ｋを選択することはできない。 At time t _j , the user recognizes the character string c4 _j in the image 40 _j and moves the finger f to select it. At time t _k , the user touches character string c4 _k with finger f. However, since the image region R4 _k including the character string c4 _k has not been detected, the image region R4 _k cannot be selected.

図６（ｂ）の参考例においては、画像４０_ｋから画像領域Ｒ４_ｋが検出されている。しかし、時刻ｔ_ｋにおいてカメラが上を向いてしまい、画像４０_ｋ内で文字列ｃ４_ｋの位置が下側に移動している。つまり、時刻ｔ_ｊにおいて、ユーザが画像４０_ｊ内の文字列ｃ４_ｊを認識し、これを選択するために、指ｆを動かしたときに、文字列ｃ４_ｋの移動のために、時刻ｔ_ｋにおいて、ユーザは意図に反して指ｆで文字列ｃ４_ｋの上をタッチしてしまう。この場合も、画像領域Ｒ４_ｋを選択することはできない。 In the reference example of FIG. 6B, an image region R4 _k is detected from the image 40 _k . However, the camera turns upward at time t _k , and the position of the character string c4 _k has moved downward in the image 40 _k . That is, at time _{t j,} for the user to recognize the character string c4 _j in the image 40 _j, selected, when you move the finger f, for movement of the string c4 _k, time _{t k} in, the user would touch the top of the string c4 _k in the finger f contrary to the intention. Again, it is not possible to select an image area R4 _k.

すなわち、ユーザが画像内の所望の文字列を認識し、選択するまでにタイムラグが生じる。タイムラグの間に画像内の文字列が検出されなかったり、タイムラグの間に画像内の文字列の位置が変化してしまうことが起こり得る。このため、ユーザは所望の文字列を選択できない可能性がある。 That is, there is a time lag until the user recognizes and selects a desired character string in the image. It is possible that the character string in the image is not detected during the time lag, or the position of the character string in the image changes during the time lag. For this reason, the user may not be able to select a desired character string.

これに対して、実施形態に係る画像処理装置１１０は、複数の画像４０_ｊ〜４０_ｋを時系列に取得し、複数の画像４０_ｊ〜４０_ｋから複数の画像領域Ｒ１_ｊ〜を検出する。複数の画像領域Ｒ１_ｊ〜のそれぞれは文字列を含む。画像処理装置１１０は、ユーザからの入力を受け取り、入力の受取時刻ｔ_ｐを基準として、複数の画像領域Ｒ１_ｊ〜のうちで指示位置ｐに近い画像領域Ｒを選択する。これにより、ユーザは、確実に所望の文字列を選択することができる。 In contrast, the image processing apparatus 110 according to the embodiment, to obtain a plurality of images ₄₀ j to 40 _k in a time series, detecting a plurality of image regions R1 _j ~ from a plurality of images ₄₀ j to 40 _k. Each of the plurality of image regions R1 _j to includes a character string. The image processing apparatus 110 receives an input from the user, based on the reception time t _p of the input, selects the image region R near the indicated position p among the _~ plurality of image regions R1 j. Thereby, the user can select a desired character string reliably.

（第２の実施形態）
図７は、第２の実施形態に係る画像処理装置を例示するブロック図である。
実施形態に係る画像処理装置１１１は、処理部２０を含む。処理部２０は、検出部２１、選択部２２を含み、さらに、グループ化部２３、グループ内選択部２４を含む。 (Second Embodiment)
FIG. 7 is a block diagram illustrating an image processing apparatus according to the second embodiment.
The image processing apparatus 111 according to the embodiment includes a processing unit 20. The processing unit 20 includes a detection unit 21 and a selection unit 22, and further includes a grouping unit 23 and an in-group selection unit 24.

検出部２１は、検出動作を実施する。検出動作は、複数の画像４０_ｊ〜４０_ｋから複数の画像領域Ｒを検出すること、複数の画像領域Ｒのそれぞれの検出時刻ｔ及び画像領域位置ｐ_ｔを取得することを含む。検出部２１は、さらに、複数の文字列のそれぞれについて文字列らしさを表す評価値Ｓ（Ｒ）を算出する。評価値Ｓ（Ｒ）は、例えば、文字列を構成する黒画素の連結成分それぞれの連結成分度Ｃｉの特徴ベクトルをＶｉとしたときに、文字列に含まれる連結成分ｉ＝１〜Ｎ(ただし、１〜Ｎが文字列の片方の始端からもう片方の始端にソートされて並んでいるものとする)として、以下の式を用いて計算することができる。なお、Ｄ（Ａ、Ｂ）は、ベクトルＡとベクトルＢとのユークリッド距離を表す。 The detection unit 21 performs a detection operation. Detecting operation comprises detecting a plurality of image regions R from a plurality of images 40 _j to 40 _k, to obtain the respective detection time t and the image area position p _t of the plurality of image areas R. The detection unit 21 further calculates an evaluation value S (R) that represents the character string likelihood for each of the plurality of character strings. The evaluation value S (R) is, for example, the connected components i = 1 to N (provided that the connected component degree Ci feature vector of each connected component of the black pixels constituting the character string is Vi). , 1 to N are assumed to be sorted and arranged from one starting end of the character string to the other starting end), and can be calculated using the following equation. D (A, B) represents the Euclidean distance between the vector A and the vector B.

第２取得部１２は、ユーザからの指示位置ｐの入力を受け取る。このとき、処理部２０は、時計機能を用いて、第２取得部１２が指示位置ｐの入力を受け取った受取時刻ｔ_ｐを取得する。 The second acquisition unit 12 receives an input of the designated position p from the user. At this time, the processing unit 20 uses the clock function, and acquires the reception time t _p the second obtaining section 12 receives the input of the indication position p.

図８（ａ）及び図８（ｂ）は、第２の実施形態に係るグループ化部２３の動作例を例示する模式図である。
図８（ｂ）は、図８（ａ）のＷ部を拡大した拡大図である。
図中、Ｘ方向は画像の水平方向に沿う。Ｙ方向は画像の垂直方向に沿う。Ｔ方向は時間方向を表す。 FIG. 8A and FIG. 8B are schematic views illustrating an operation example of the grouping unit 23 according to the second embodiment.
FIG. 8B is an enlarged view of the W portion of FIG.
In the figure, the X direction is along the horizontal direction of the image. The Y direction is along the vertical direction of the image. The T direction represents the time direction.

グループ化部２３は、グループ化動作を実施する。グループ化動作は、複数の画像領域Ｒの中で同一の文字列を含む画像領域Ｒをグループ化し、複数のグループＧを生成することを含む。すなわち、グループ化部２３は、近接する複数の画像領域Ｒを１つのグループにまとめる処理を行う。より具体的には、グループ化部２３は、複数の画像４０_ｊ〜４０_ｋの中から複数の文字列を検出し、同一と判定された文字列同士を画像４０_ｊ〜４０_ｋ間で対応付ける。これにより、文字列の追跡を行い、追跡結果に基づいて、同一の文字列を含む複数の画像領域Ｒを１つのグループにまとめる処理を行う。グループ化処理には、例えば、移動体（人物や車等）の追跡などで一般的に用いられる手法を利用することができる。 The grouping unit 23 performs a grouping operation. The grouping operation includes grouping image regions R including the same character string among the plurality of image regions R to generate a plurality of groups G. That is, the grouping unit 23 performs a process of grouping a plurality of adjacent image regions R into one group. More specifically, the grouping unit 23 detects a plurality of character strings from a plurality of images ₄₀ j to 40 _k, associating the character string with each other is determined identical between the image ₄₀ j to 40 _k. As a result, the character string is tracked, and a process of collecting a plurality of image regions R including the same character string into one group is performed based on the tracking result. For the grouping process, for example, a method generally used for tracking a moving object (person, car, etc.) can be used.

複数の画像領域Ｒが１つのグループＧにグループ化されると、３次元の時空間上では立体的な形状として表される。この例においては、図８（ａ）に表すように、４つのグループＧ１〜Ｇ４が生成される。４つのグループＧ１〜Ｇ４のそれぞれは、例えば、直方体形状として表される。例えば、グループＧ１に含まれる画像領域Ｒの集合Ｒ^ｇは、次式のように、Ｎ（ｇ）個の画像領域Ｒで構成される。 When a plurality of image regions R are grouped into one group G, they are represented as a three-dimensional shape on a three-dimensional space-time. In this example, as shown in FIG. 8A, four groups G1 to G4 are generated. Each of the four groups G1 to G4 is represented as a rectangular parallelepiped shape, for example. For example, the set R ^g of the image region R included in the group G1, as the following equation, and a N (g) pieces of the image region R.

選択部２２は、複数のグループＧ１〜Ｇ４のうちの少なくとも１つを選択する選択動作を実施する。選択は、第３差及び第４差に基づいて行われる。第３差は、複数のグループＧ１〜Ｇ４のそれぞれのグループ検出時刻ｔと、受取時刻ｔ_ｐと、の差である。第４差は、複数のグループＧ１〜Ｇ４のそれぞれのグループ位置ｐ_ｔと、指示位置ｐと、の差である。グループ検出時刻ｔは、各画像領域Ｒの検出時刻ｔに基づく。グループ位置ｐ_ｔは、各画像領域Ｒの画像領域位置ｐ_ｔに基づく。 The selection unit 22 performs a selection operation for selecting at least one of the plurality of groups G1 to G4. The selection is made based on the third difference and the fourth difference. The third difference is each a group detection time t of the plurality of groups G1 to G4, which is the difference between the reception time _{t p,.} The fourth difference is each a group position _{p t} of the plurality of groups G1 to G4, which is the difference between the indicated position p,. The group detection time t is based on the detection time t of each image region R. Group position _{p t} is based on the image area position _{p t} of each image region R.

複数のグループＧ１〜Ｇ４のうちで選択された少なくとも１つの第３差は、複数のグループＧ１〜Ｇ４のうちで選択されなかった別の１つの第３差よりも小さい。複数のグループＧ１〜Ｇ４のうちで選択された少なくとも１つの第４差は、複数のグループＧ１〜Ｇ４のうちで選択されなかった別の１つの第４差よりも小さい。 At least one third difference selected from the plurality of groups G1 to G4 is smaller than another third difference that has not been selected from the plurality of groups G1 to G4. At least one fourth difference selected from the plurality of groups G1 to G4 is smaller than another one fourth difference that has not been selected from the plurality of groups G1 to G4.

例えば、選択部２２は、複数のグループＧ１〜Ｇ４の中から、複数の抽出領域ＲＡを含むグループＧを抽出する。なお、抽出領域ＲＡは、指示位置ｐの受取時刻ｔ_ｐから一定時間δ_ｔ内に検出された画像領域Ｒのことを意味する。複数のグループＧ１〜Ｇ４のそれぞれのグループ検出時刻ｔと、受取時刻ｔ_ｐと、の差（第３差）が、一定時間δ_ｔ内にあるグループＧが抽出される。以下では、抽出されたグループＧを、抽出グループＧＡという。本例においては、グループＧ１〜Ｇ４が全て抽出されている。このため、グループＧ１〜Ｇ４と、抽出グループＧＡ１〜ＧＡ４と、が一致する。 For example, the selection unit 22 extracts a group G including a plurality of extraction regions RA from the plurality of groups G1 to G4. Incidentally, the extraction area RA, this means that the detected image area R for a predetermined time δ in _t from the receiving time t _p of the indication position p. And each group detection time t of the plurality of groups G1 to G4, a receiving time t _p, the difference (third differential), the group G is extracted within a certain time period [delta] _t. Hereinafter, the extracted group G is referred to as an extracted group GA. In this example, all the groups G1 to G4 are extracted. For this reason, the groups G1 to G4 coincide with the extraction groups GA1 to GA4.

選択部２２は、複数の抽出グループＧＡ１〜ＧＡ４のそれぞれと、指示位置ｐと、の第４差（距離）を算出し、算出した距離に基づいて、複数の抽出グループＧＡ１〜ＧＡ４の少なくとも１つを選択する。この距離は、例えば、距離尺度ｄｉｓｔ（ｐ、Ｒ^ｇ）で表される。選択部２２は、次式により、距離尺度ｄｉｓｔ（ｐ、Ｒ^ｇ）を算出し、距離尺度ｄｉｓｔ（ｐ、Ｒ^ｇ）が最も小さい抽出グループＧＡを選択する。例えば、抽出グループＧＡ１が選択される。 The selection unit 22 calculates a fourth difference (distance) between each of the plurality of extraction groups GA1 to GA4 and the designated position p, and based on the calculated distance, at least one of the plurality of extraction groups GA1 to GA4. Select. This distance is represented by, for example, a distance scale dist (p, R ^g ). The selection unit 22 calculates the distance measure dist (p, R ^g ) according to the following equation, and selects the extraction group GA having the smallest distance measure dist (p, R ^g ). For example, the extraction group GA1 is selected.

式７において、点ｐ′（ｘ′、ｙ′、ｔ′）は、抽出グループＧＡを構成する画像領域Ｒ_１ ^ｇ、Ｒ_２ ^ｇ、…Ｒ_Ｎ（ｇ） ^ｇのそれぞれに含まれる任意の座標点を表す。つまり、前述の式２により、距離Ｄ（ｐ、ｐ′）を求める。距離Ｄ（ｐ、ｐ′）は、指示位置ｐの座標点（ｘ_ｋ、ｙ_ｋ、ｔ_ｐ）と、画像領域Ｒ_１ ^ｇ、Ｒ_２ ^ｇ、…Ｒ_Ｎ（ｇ） ^ｇに含まれる複数の点ｐ′（ｘ′、ｙ′、ｔ′）のそれぞれとの間の距離である。そして、式１により、距離Ｄ（ｐ、ｐ′）の最小値を、距離尺度ｄｉｓｔ（ｐ、Ｒ^ｇ）として求める。 In Expression 7, the point p ′ (x ′, y ′, t ′) is an arbitrary coordinate included in each of the image regions R ₁ ^g , R ₂ ^g ,... _{RN (g)} ^g constituting the extraction group GA. Represents a point. That is, the distance D (p, p ′) is obtained by the above-described equation 2. The distance D (p, p ′) is a plurality of coordinate points (x _k , y _k , t _p ) of the designated position p and a plurality of image regions R ₁ ^g , R ₂ ^g ,... _{RN (g)} ^g The distance between each of the points p ′ (x ′, y ′, t ′). Then, the minimum value of the distance D (p, p ′) is obtained as a distance scale dist (p, R ^g ) using Equation 1.

グループ内選択部２４は、グループ内選択動作を実施する。グループ内選択動作は、選択部２２で選択された抽出グループＧＡ１に含まれる複数の抽出領域ＲＡ〜ＲＡＮの中から少なくとも１つを選択することを含む。例えば、複数の抽出領域ＲＡ１〜ＲＡＮの中で、評価値Ｓ（Ｒ）が最も高い文字列を含む抽出領域ＲＡを選択する。ある抽出グループＧＡに属するＮ（ｇ）個の抽出領域ＲＡの中から、評価値Ｓ（Ｒ_ｊ ^ｇ）が最も高い抽出領域Ｒ_ｍａｘ ^ｇを、次式により選択することができる。なお、Ｒ_ｊ ^ｇとは、ある抽出グループＧＡのｊ番目の抽出領域であることを意味する。 The intra-group selection unit 24 performs an intra-group selection operation. The intra-group selection operation includes selecting at least one from among a plurality of extraction regions RA to RAN included in the extraction group GA1 selected by the selection unit 22. For example, the extraction region RA including the character string having the highest evaluation value S (R) is selected from the plurality of extraction regions RA1 to RAN. From among N (g) extraction regions RA belonging to a certain extraction group GA, the extraction region R _max ^g having the highest evaluation value S (R _j ^g ) can be selected by the following equation. Note that R _j ^g means the j-th extraction region of a certain extraction group GA.

グループ内選択部２４は、別のグループ内選択動作を実施してもよい。この場合、選択は、第５差に基づいて行われる。第５差は、複数の抽出領域ＲＡ１〜ＲＡＮのそれぞれの画像領域位置ｐ_ｔと、指示位置ｐと、の差である。つまり、グループ内選択部２４は、複数の抽出領域ＲＡ１〜ＲＡＮのそれぞれと、指示位置ｐと、の間の第５差（距離）を算出し、算出した距離が最も短い抽出領域ＲＡを選択してもよい。この距離は、例えば、距離尺度ｄ（ｐ、Ｒ_ｊ ^ｇ）で表される。例えば、ある検出グループＧＡに属するＮ（ｇ）個の抽出領域ＲＡの中から、距離尺度ｄ（ｐ、Ｒ_ｊ ^ｇ）が最も小さい抽出領域Ｒ_ｍｉｎ ^ｇを、次式により算出することができる。なお、Ｒ_ｊ ^ｇとは、ある抽出グループＧＡのｊ番目の抽出領域であることを意味する。 The intra-group selection unit 24 may perform another intra-group selection operation. In this case, the selection is made based on the fifth difference. Fifth difference, the respective image area position _{p t} of the plurality of extraction regions RA1-RAn, which is the difference between the indicated position p,. That is, the intra-group selection unit 24 calculates the fifth difference (distance) between each of the plurality of extraction regions RA1 to RAN and the designated position p, and selects the extraction region RA with the shortest calculated distance. May be. This distance is represented by a distance scale d (p, R _j ^g ), for example. For example, an extraction region R _min ^g having the smallest distance scale d (p, R _j ^g ) among N (g) extraction regions RA belonging to a certain detection group GA can be calculated by the following equation. Note that R _j ^g means the j-th extraction region of a certain extraction group GA.

グループ内選択部２４は、距離尺度ｄ（ｐ、Ｒ_ｊ ^ｇ）が小さい順に複数の抽出領域ＲＡを選択するようにしてもよい。例えば、距離尺度ｄ（ｐ、Ｒ_ｊ ^ｇ）が小さい順に５つの抽出領域ＲＡを選択するように予め設定しておくとよい。この場合、選択された５つの抽出領域ＲＡの画像がユーザに提示される。ユーザは、これら５つの抽出領域ＲＡの画像の中から、所望の抽出領域ＲＡの画像を選択することができる。 The intra-group selection unit 24 may select a plurality of extraction regions RA in ascending order of the distance scale d (p, R _j ^g ). For example, it may be set in advance so that the five extraction regions RA are selected in ascending order of the distance scale d (p, R _j ^g ). In this case, images of the five selected extraction areas RA are presented to the user. The user can select a desired image of the extraction area RA from the images of the five extraction areas RA.

このように、実施形態によれば、文字列のグループ化処理を追加することで、所望の文字列の選択精度をさらに高めることが可能となる。 As described above, according to the embodiment, it is possible to further improve the accuracy of selecting a desired character string by adding a character string grouping process.

（第３の実施形態）
図９は、第３の実施形態に係る画像処理装置を例示するブロック図である。
実施形態に係る画像処理装置１１２は、処理部２０を含む。処理部２０は、検出部２１、選択部２２を含み、さらに、傾き検出部２５、位置補正部２６を含む。 (Third embodiment)
FIG. 9 is a block diagram illustrating an image processing apparatus according to the third embodiment.
The image processing apparatus 112 according to the embodiment includes a processing unit 20. The processing unit 20 includes a detection unit 21 and a selection unit 22, and further includes an inclination detection unit 25 and a position correction unit 26.

傾き検出部２５は、傾き検出動作を実施する。傾き検出動作は、指示位置ｐの入力を受け取ったときに画像処理装置１１２の傾きを検出することを含む。傾きの検出には、例えば、ジャイロセンサなどの各種のセンサが用いられる。
位置補正部２６は、位置補正動作を実施する。位置補正動作は、複数の画像領域Ｒのそれぞれの画像領域位置ｐ_ｔを、傾きに応じて補正することを含む。 The inclination detection unit 25 performs an inclination detection operation. The tilt detection operation includes detecting the tilt of the image processing apparatus 112 when the input of the designated position p is received. For detecting the tilt, for example, various sensors such as a gyro sensor are used.
The position correction unit 26 performs a position correction operation. Position correcting operation includes a respective image area position p _t of the plurality of image regions R, so as to correct according to the inclination.

図１０は、第３の実施形態に係る傾き補正方法を例示する模式図である。
画像処理装置１１２は、カメラ５０を備える。この例においては、ワールド座標系Ｗｃと、カメラ座標系Ｃｃと、を用いて説明する。 FIG. 10 is a schematic view illustrating a tilt correction method according to the third embodiment.
The image processing apparatus 112 includes a camera 50. This example will be described using the world coordinate system Wc and the camera coordinate system Cc.

カメラ５０の位置を、ワールド座標系Ｗｃで（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ）と表す。カメラ５０の方向は、ローリング角ω、ピッチング角φ、ヨーイング角κで表される。ローリング角ω、ピッチング角φ及びヨーイング角κはそれぞれワールド座標系ＷｃのＸ軸、Ｙ軸、Ｚ軸における回転角を表す。 The position of the camera 50 is expressed as (x _o , _yo , z _o ) in the world coordinate system Wc. The direction of the camera 50 is represented by a rolling angle ω, a pitching angle φ, and a yawing angle κ. The rolling angle ω, the pitching angle φ, and the yawing angle κ represent rotation angles on the X axis, the Y axis, and the Z axis of the world coordinate system Wc, respectively.

カメラ５０の投影中心から撮影された画像領域Ｒへのベクトルを考える。画像領域Ｒ上のある頂点のワールド座標を（ｘ_ｐ、ｙ_ｐ、ｚ_ｐ）とし、現時刻のカメラ５０の位置が（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ）、カメラ５０の方向が(ω_o、φ_o、κ_o)のときに、カメラ座標系Ｃｃでのベクトルの成分（ｕ_ｐ、ｖ_ｐ、ｗ_ｐ）と、ワールド座標系Ｗｃでのベクトルの成分（ｘ_ｐ-ｘ_ｏ、ｙ_ｐ-ｙ_ｏ、ｚ_ｐ-ｚ_ｏ）と、の間で次の関係が成り立つ。 Consider a vector from the projection center of the camera 50 to the image region R taken. The world coordinates of a certain vertex on the image area R are (x _p , y _p , z _p ), the position of the camera 50 at the current time is (x _o , _yo , z _o ), and the direction of the camera 50 is (ω _o , phi _o, when the kappa _o), components of the vector of the camera coordinate system _{_{Cc (u p, v p,}} w p) and the component of the vector in the world coordinate system _{_{_{Wc (x p -x o, y}}} p -y _o , z _p -z _o ) and the following relationship holds:

式１０を単純化するために、３軸それぞれの回転行列をａ_１１〜ａ_３３を要素とする１つの行列で表すと、次式となる。 In order to simplify Expression 10, if the rotation matrix for each of the three axes is represented by one matrix having elements a _{11 to} a ₃₃ as elements, the following expression is obtained.

さらに、画像領域Ｒのカメラ座標系Ｃｃ上での位置（ｕ、ｖ）は、次式の関係が成り立つ。なお、ｃはレンズの焦点距離を表す。 Further, the position of the image region R on the camera coordinate system Cc (u, v) has the following relationship. Note that c represents the focal length of the lens.

カメラ５０の位置と方向がそれぞれ（Δｘ_ｏ、Δｙ_ｏ、Δｚ_ｏ）、（Δω_o、Δφ_o、Δκ_o）だけ変動した場合、画像領域Ｒのカメラ座標系Ｃｃ上での位置の変動（Δｕ、Δｖ）は、式１２、式１３から近似的に求めることができる。 When the position and direction of the camera 50 change by (Δx _o , Δy _o , Δz _o ) and (Δω _o , Δφ _o , Δκ _o ), the position change (Δu) of the image region R on the camera coordinate system Cc. , Δv) can be approximately obtained from Equations 12 and 13.

式１２の右辺及び式１３の右辺を、それぞれ、Ｆｕ（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ、ω_o、φ_o、κ_o）、Ｆｖ（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ、ω_o、φ_o、κ_o）とおくと、次式が得られる。 The right side of Expression 12 and the right side of Expression 13 are respectively expressed as Fu (x _o , _yo , _zo, ω _o , φ _o , κ _o ), Fv (x _o , _yo , z _o, ω _o , φ _o). , Κ _o ), the following equation is obtained.

Ｆｕ（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ、ω_o、φ_o、κ_o）、Ｆｖ（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ、ω_o、φ_o、κ_o）をそれぞれ（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ、ω_o、φ_o、κ_o）の周りでテイラー展開すると、(Δｕ,Δｖ)は、次式により算出できる。 Fu (x _o , _yo , _zo, ω _o , φ _o , κ _o ), Fv (x _o , _yo , _zo, ω _o , φ _o , κ _o ) are respectively (x _o , _yo , If Taylor expansion is performed around z _o, ω _o , φ _o , κ _o ), (Δu, Δv) can be calculated by the following equation.

テイラー展開による微係数は、以下のように算出される。 The derivative by Taylor expansion is calculated as follows.

図１０に表すように、ワールド座標系Ｗｃを、ワールド座標系ＷｃのＺ軸方向とカメラ５０の方向(ω_o、φ_o、κ_o)とが同じ向きになるように設定する。現時刻のカメラ５０の位置（ｘ_ｏ、ｙ_ｏ、ｚ_ｏ）は、ワールド座標系ＷｃのＺ軸上の距離Ｈにある。このとき、カメラ位置（０、０、Ｈ）、カメラ方向（０、０、０）となり、(Δｕ、Δｖ)が計算できる。実際には距離Ｈは不明であるため、予め定めた値で代用すればよい。これにより、大凡の位置を補正することができる。そして、(Δｕ、Δｖ)を、カメラ座標系Ｃｃでの位置補正値として出力する。 As shown in FIG. 10, the world coordinate system Wc is set so that the Z-axis direction of the world coordinate system Wc and the directions of the camera 50 (ω _o , φ _o , κ _o ) are the same. The position (x _o , y _o , z _o ) of the camera 50 at the current time is at a distance H on the Z axis of the world coordinate system Wc. At this time, the camera position (0, 0, H) and the camera direction (0, 0, 0) are obtained, and (Δu, Δv) can be calculated. Actually, since the distance H is unknown, a predetermined value may be used instead. As a result, the approximate position can be corrected. Then, (Δu, Δv) is output as a position correction value in the camera coordinate system Cc.

このように、実施形態によれば、ユーザからのタッチ操作等を受け取ったときに画像処理装置１１２の傾きを検出し、傾きに応じて画像領域の検出位置を補正することができる。このため、より高精度に所望の文字列を選択することが可能となる。 As described above, according to the embodiment, it is possible to detect the inclination of the image processing apparatus 112 when a touch operation or the like from the user is received, and to correct the detection position of the image area according to the inclination. For this reason, it becomes possible to select a desired character string with higher accuracy.

（第４の実施形態）
図１１は、第４の実施形態に係る画像処理装置を例示するブロック図である。
実施形態に係る画像処理装置２００は、デスクトップ型またはラップトップ型の汎用計算機、携帯型の汎用計算機、その他の携帯型の情報機器、撮像デバイスを有する情報機器、スマートフォン、その他の情報処理装置など、様々なデバイスによって実現可能である。 (Fourth embodiment)
FIG. 11 is a block diagram illustrating an image processing apparatus according to the fourth embodiment.
The image processing apparatus 200 according to the embodiment includes a desktop or laptop general-purpose computer, a portable general-purpose computer, other portable information devices, an information device having an imaging device, a smartphone, and other information processing devices. It can be realized by various devices.

図１１に表すように、実施形態の画像処理装置２００は、ハードウェアの構成例として、ＣＰＵ２０１と、入力部２０２と、出力部２０３と、ＲＡＭ２０４と、ＲＯＭ２０５と、外部メモリインタフェース２０６と、通信インタフェース２０７と、を含む。 As illustrated in FIG. 11, the image processing apparatus 200 according to the embodiment includes a CPU 201, an input unit 202, an output unit 203, a RAM 204, a ROM 205, an external memory interface 206, and a communication interface as hardware configuration examples. 207.

上述の実施形態の中で示した処理手順に示された指示は、ソフトウェアであるプログラムに基づいて実行されることが可能である。汎用の計算機システムが、このプログラムを予め記憶しておき、このプログラムを読み込むことにより、上述した実施形態の画像処理装置による効果と同様な効果を得ることも可能である。上述の実施形態に記載された指示は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＯＭ、ＤＶＤ±Ｒ、ＤＶＤ±ＲＷなど）、半導体メモリ、またはこれに類する記録媒体に記録される。コンピュータまたは組み込みシステムが読み取り可能な記録媒体であれば、その記憶形式は何れの形態であってもよい。コンピュータは、この記録媒体からプログラムを読み込み、このプログラムに基づいてプログラムに記述されている指示をＣＰＵで実行させれば、上述した実施形態の画像処理装置と同様な動作を実現することができる。もちろん、コンピュータがプログラムを取得する場合または読み込む場合はネットワークを通じて取得または読み込んでもよい。 The instructions shown in the processing procedure shown in the above-described embodiment can be executed based on a program that is software. The general-purpose computer system stores this program in advance and reads this program, so that the same effect as that obtained by the image processing apparatus according to the above-described embodiment can be obtained. The instructions described in the above-described embodiments are, as programs that can be executed by a computer, magnetic disks (flexible disks, hard disks, etc.), optical disks (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD). ± R, DVD ± RW, etc.), semiconductor memory, or a similar recording medium. As long as the recording medium is readable by the computer or the embedded system, the storage format may be any form. If the computer reads the program from the recording medium and causes the CPU to execute instructions described in the program based on the program, the same operation as the image processing apparatus of the above-described embodiment can be realized. Of course, when the computer acquires or reads the program, it may be acquired or read through a network.

また、記録媒体からコンピュータや組み込みシステムにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワーク等で動作するＭＷ（ミドルウェア）などが実施形態を実現するための各処理の一部を実行してもよい。 Further, an OS (operating system) operating on a computer based on instructions from a program installed in a computer or an embedded system from a recording medium, database management software, MW (middleware) operating on a network, etc. You may perform a part of each process for implement | achieving.

さらに、実施形態における記録媒体は、コンピュータあるいは組み込みシステムと独立した記録媒体に限らず、ＬＡＮやインターネット等により伝達されたプログラムをダウンロードして記憶または一時記憶した記録媒体も含まれる。また、記録媒体は１つに限らず、複数の記録媒体から実施形態における処理が実行される場合も、実施形態における記録媒体に含まれる。記録媒体の構成は何れの構成であってもよい。 Furthermore, the recording medium in the embodiment is not limited to a recording medium independent of a computer or an embedded system, but also includes a recording medium in which a program transmitted via a LAN or the Internet is downloaded and stored or temporarily stored. Further, the number of recording media is not limited to one, and the case where the processing in the embodiment is executed from a plurality of recording media is also included in the recording medium in the embodiment. The configuration of the recording medium may be any configuration.

なお、実施形態におけるコンピュータまたは組み込みシステムは、記録媒体に記憶されたプログラムに基づき、実施形態における各処理を実行するためのものであって、パーソナルコンピュータ、マイクロコンピュータ等の１つからなる装置、あるいは、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。 The computer or the embedded system in the embodiment is for executing each process in the embodiment based on a program stored in a recording medium, and is a device composed of one of a personal computer, a microcomputer, or the like, or Any configuration such as a system in which a plurality of devices are network-connected may be used.

また、実施形態におけるコンピュータとは、パーソナルコンピュータに限らず、情報処理機器に含まれる演算処理装置、マイクロコンピュータ等も含み、プログラムによって実施形態における機能を実現することが可能な機器、装置を総称している。 In addition, the computer in the embodiment is not limited to a personal computer, and includes an arithmetic processing device, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions in the embodiment by a program. ing.

実施形態によれば、確実に所望の文字列を選択可能な画像処理装置、画像処理方法及び画像処理プログラムが提供できる。 According to the embodiment, it is possible to provide an image processing apparatus, an image processing method, and an image processing program capable of reliably selecting a desired character string.

以上、具体例を参照しつつ、本発明の実施の形態について説明した。しかし、本発明は、これらの具体例に限定されるものではない。例えば、取得部、処理部などの各要素の具体的な構成に関しては、当業者が公知の範囲から適宜選択することにより本発明を同様に実施し、同様の効果を得ることができる限り、本発明の範囲に包含される。 The embodiments of the present invention have been described above with reference to specific examples. However, the present invention is not limited to these specific examples. For example, regarding the specific configuration of each element such as the acquisition unit and the processing unit, the present invention can be implemented in the same manner by appropriately selecting from a well-known range by those skilled in the art, as long as the same effect can be obtained. It is included in the scope of the invention.

また、各具体例のいずれか２つ以上の要素を技術的に可能な範囲で組み合わせたものも、本発明の要旨を包含する限り本発明の範囲に含まれる。 Moreover, what combined any two or more elements of each specific example in the technically possible range is also included in the scope of the present invention as long as the gist of the present invention is included.

その他、本発明の実施の形態として上述した画像処理装置、画像処理方法及び画像処理プログラムを基にして、当業者が適宜設計変更して実施し得る全ての画像処理装置、画像処理方法及び画像処理プログラムも、本発明の要旨を包含する限り、本発明の範囲に属する。 In addition, all image processing apparatuses, image processing methods, and image processing that can be implemented by those skilled in the art based on the image processing apparatus, the image processing method, and the image processing program described above as the embodiments of the present invention. A program also belongs to the scope of the present invention as long as it includes the gist of the present invention.

その他、本発明の思想の範疇において、当業者であれば、各種の変更例及び修正例に想到し得るものであり、それら変更例及び修正例についても本発明の範囲に属するものと了解される。 In addition, in the category of the idea of the present invention, those skilled in the art can conceive of various changes and modifications, and it is understood that these changes and modifications also belong to the scope of the present invention. .

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１１…第１取得部、１２…第２取得部、２０…処理部、２１…検出部、２２…選択部、２３…グループ化部、２４…グループ内選択部、２５…傾き検出部、２６…位置補正部、３０…表示部、４０、４０_ｊ、４０_ｊ＋１、４０_ｋ…画像（データ）、５０…カメラ、 δ_ｔ…一定時間、 κ…ヨーイング角、 φ…ピッチング角、 ω…ローリング角、１１０〜１１２、２００…画像処理装置、２０１…ＣＰＵ、２０２…入力部、２０３…出力部、２０４…ＲＡＭ、２０５…ＲＯＭ、２０６…外部メモリインタフェース、２０７…通信インタフェース、Ｃｃ…カメラ座標系、Ｇ、Ｇ１〜Ｇ４…グループ、ＧＡ、ＧＡ１〜ＧＡ４…検出グループ、Ｈ…距離、Ｒ、Ｒ１_ｊ〜Ｒ４_ｊ、Ｒ１_ｊ＋１〜Ｒ４_ｊ＋１、Ｒ１_ｋ〜Ｒ４_ｋ、Ｒ１ａ〜Ｒ１ｃ、Ｒ２ａ〜Ｒ２ｃ、Ｒ３ａ、Ｒ３ｂ、Ｒ４ａ、Ｒ４ｂ…画像領域、ＲＡ、ＲＡ１ａ〜ＲＡ１ｃ、ＲＡ２ａ〜ＲＡ２ｃ、ＲＡ３ａ、ＲＡ３ｂ、ＲＡ４ａ…抽出領域、Ｗｃ…ワールド座標系、ｃ１_ｊ〜ｃ４_ｊ、ｃ１_ｊ＋１〜ｃ４_ｊ＋１、ｃ１_ｋ〜ｃ４_ｋ…文字列、ｆ…指、ｐ…指示位置、ｐ_ｔ…画像領域位置、ｔ、ｔ_ｊ、ｔ_ｊ＋１、ｔ_ｋ…検出時刻、ｔ_ｐ…受取時刻 DESCRIPTION OF SYMBOLS 11 ... 1st acquisition part, 12 ... 2nd acquisition part, 20 ... Processing part, 21 ... Detection part, 22 ... Selection part, 23 ... Grouping part, 24 ... In-group selection part, 25 ... Inclination detection part, 26 ... position correction unit, 30 ... display _{_{unit, 40,40 j, 40 j + 1}} , 40 k ... image (data), 50 ... camera, [delta] _t ... fixed time, kappa ... yawing angle, phi ... pitching angle, omega ... rolling angle, 110-112, 200 ... image processing apparatus, 201 ... CPU, 202 ... input unit, 203 ... output unit, 204 ... RAM, 205 ... ROM, 206 ... external memory interface, 207 ... communication interface, Cc ... camera coordinate system, G , G1 to G4 ... group, GA, GA1~GA4 ... detector groups, H ... _{_{_{_{distance, R, R1 j ~R4 j,}}}} R1 j + 1 ~R4 j + 1, R1 k ~R4 k R1a~R1c, R2a~R2c, R3a, R3b, R4a, R4b ... image area, RA, RA1a~RA1c, RA2a~RA2c, RA3a , RA3b, RA4a ... extraction region, Wc ... world coordinate _system, _c1 j ~c4 j, c1 _{j + 1 to} c4 _{j + 1} , c1 _{k to} c4 _k ... character string, f ... finger, p ... designated position, p _t ... image area position, t, t _j , t _{j + 1} , t _k ... detection time, t _p ... reception time

Claims

A first acquisition unit that acquires data related to a plurality of images including character strings in time series;
A second acquisition unit for receiving input;
A processing unit;
With
The processor is
A detection operation for detecting a plurality of image areas from the plurality of images, wherein each of the plurality of image areas includes each of the character strings included in each of the plurality of images; Each of the plurality of image areas is detected at a detection time, and each of the plurality of image areas has an image area position;
A selection operation for selecting at least one of the plurality of image areas, wherein the selection includes the detection time of each of the plurality of image areas and the reception time at which the input is received by the second acquisition unit. And the selection operation performed based on the first difference between the first image area and the second difference between the image area position of each of the plurality of image areas and the input position;
An image processing apparatus that implements

2. The at least one first difference selected among the plurality of image regions is smaller than another one of the first differences that was not selected among the plurality of image regions. Image processing device.

The at least one second difference selected among the plurality of image regions is smaller than another second difference that is not selected among the plurality of image regions. An image processing apparatus according to 1.

The image processing apparatus according to claim 1, wherein the at least one first difference selected from the plurality of image regions is not less than 100 milliseconds and not more than 500 milliseconds.

A first acquisition unit that acquires data related to a plurality of images including character strings in time series;
A second acquisition unit for receiving input;
A processing unit;
With
The processor is
A detection operation for detecting a plurality of image areas from the plurality of images, wherein each of the plurality of image areas includes each of the character strings included in each of the plurality of images; Each of the plurality of image areas is detected at a detection time, and each of the plurality of image areas has an image area position;
A grouping operation for grouping image regions including the same character string among the plurality of image regions and generating a plurality of groups, each of the plurality of groups having a group detection time based on the detection time, And the grouping operation having a group position based on the image area position;
A selection operation for selecting at least one of the plurality of groups, wherein the selection includes a third difference between the group detection time and the reception time of each of the plurality of groups; The selection operation performed based on a fourth difference between the group position of each of the groups and the position of the input;
An image processing apparatus that implements

The image processing according to claim 5, wherein the at least one third difference selected from the plurality of groups is smaller than another third difference that has not been selected from the plurality of groups. apparatus.

The at least one fourth difference selected from the plurality of groups is smaller than another one fourth difference that was not selected from the plurality of groups. Image processing apparatus.

The detection operation further includes calculating an evaluation value representing character-likeness for each of the character strings,
The processing unit performs an intra-group selection operation of selecting an image region including a character string having the highest evaluation value from among a plurality of image regions included in the at least one selected from the plurality of groups. The image processing apparatus according to claim 5, further implemented.

The processing unit is an intra-group selection operation that selects at least one of a plurality of image regions included in the at least one selected from the plurality of groups, and the selection includes the plurality of the plurality of image regions. The image according to any one of claims 5 to 7, further performing the intra-group selection operation performed based on a fifth difference between each of the image region positions of the image regions and the input position. Processing equipment.

The processor is
An inclination detection operation for detecting an inclination of the image processing apparatus when the input is received;
A position correction operation for correcting the image area position of each of the plurality of image areas according to the inclination;
The image processing apparatus according to claim 1, wherein the image processing apparatus is further implemented.

The image according to any one of claims 1 to 10, wherein the image region position includes horizontal and vertical coordinates in each of the plurality of images and the detection times of the plurality of image regions. Processing equipment.

A display unit for displaying the image;
The image processing apparatus according to claim 1, wherein the display unit displays the at least one selected from the plurality of image areas so as to be identifiable.

The image processing apparatus according to claim 12, wherein the second acquisition unit includes a touch panel provided on the display unit.

Data on multiple images including character strings is acquired in time series,
Takes input,
A plurality of image regions are detected from the plurality of images, each of the plurality of image regions includes each of the character strings included in each of the plurality of images, and each of the plurality of image regions is detected at a detection time. Each of the plurality of image areas has an image area position;
Selecting at least one of the plurality of image regions, wherein the selection includes a first difference between the detection time of each of the plurality of image regions and the reception time at which the input is received; An image processing method performed based on a second difference between each of the image region positions of the image region and the input position.

Acquiring data related to a plurality of images including character strings in time series;
Receiving input; and
A plurality of image regions are detected from the plurality of images, each of the plurality of image regions includes each of the character strings included in each of the plurality of images, and each of the plurality of image regions is detected at a detection time. Detected and each of the plurality of image regions has an image region position;
Selecting at least one of the plurality of image regions, wherein the selection includes a first difference between the detection time of each of the plurality of image regions and the reception time at which the input is received; A step of performing based on a second difference between each of the image region positions of the image region and the input position;
An image processing program for causing a computer to execute.