JP2016021155A

JP2016021155A - Imaging apparatus

Info

Publication number: JP2016021155A
Application number: JP2014144630A
Authority: JP
Inventors: 雅司川上; Masashi Kawakami
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-07-15
Filing date: 2014-07-15
Publication date: 2016-02-04

Abstract

PROBLEM TO BE SOLVED: To provide an imaging apparatus capable of generating an alternate still picture when failed in character recognition or translation by using a still picture which is taken for character recognition or translation during recording a moving image.SOLUTION: The imaging apparatus takes a first still picture during recording a moving image, and inputs the first still picture in a character recognition section. When failed in character recognition, a still picture segmentation section segments out a second still picture from the moving image to generate a still picture. When the similarity of the first still picture and the second still picture exceeds a predetermined threshold value, the image similarity evaluation section inputs the second still picture in the character recognition section to repeat the character recognition.SELECTED DRAWING: Figure 2

Description

本発明は、文字認識機能もしくは翻訳機能を備えた撮像装置に関する。 The present invention relates to an imaging apparatus having a character recognition function or a translation function.

近年、文字認識の機能及び認識した文字を他言語へ翻訳する機能を備えた撮像装置が提案されている。文字認識及び翻訳に関して、従来の撮像装置は、文字抽出部と抽出された文字を他言語に翻訳する翻訳処理部を有し、翻訳処理において翻訳できなかった場合に失敗表示を行う構成を有するものがある（特許文献１参照）。 In recent years, an imaging apparatus having a function of character recognition and a function of translating recognized characters into another language has been proposed. Regarding character recognition and translation, a conventional imaging apparatus has a character extraction unit and a translation processing unit that translates the extracted character into another language, and has a configuration that displays failure when translation cannot be performed in the translation processing (See Patent Document 1).

特開２００８−５４２３６号公報JP 2008-54236 A

しかしながら、上記従来技術の構成では、文字認識、翻訳処理において失敗した際には、画像を取り直す必要がある。動画像を撮像可能なビデオカメラ等の撮像装置において、動画像を記録中に、文字認識及び翻訳用の静止画像を同時に撮像することができる。翻訳処理ではまず該静止画像を文字抽出部に入力し、文字認識を行うが、該文字認識に失敗することがある。この時撮影者が動画を記録しながら移動していた場合、動画の記録終了後に翻訳失敗もしくは文字認識失敗の通知を受け取っても、翻訳したい文字を含む被写体が撮影者の周辺にすでにない場合がある。この時翻訳処理をやり直すには、撮影者に再び翻訳したい文字を含む被写体周辺まで移動してもらうなどといった撮影者にとって煩雑な作業が発生するという課題がある。 However, in the configuration of the above prior art, when the character recognition or translation process fails, it is necessary to retake the image. In an imaging device such as a video camera capable of capturing a moving image, it is possible to simultaneously capture a still image for character recognition and translation while recording the moving image. In the translation process, the still image is first input to the character extraction unit to perform character recognition, but the character recognition may fail. If the photographer was moving while recording a video at this time, there may be no subject around the photographer that includes the characters that he / she wants to translate even if he / she receives a translation failure or character recognition failure notification after the video has been recorded. is there. In order to redo the translation process at this time, there is a problem that a troublesome work for the photographer occurs such as having the photographer move again to the vicinity of the subject including the character to be translated.

本発明は、上記の問題点に鑑み、動画記録中に文字認識用もしくは翻訳用に撮像した静止画を用いて、文字認識もしくは翻訳が失敗した際に、代わりとなる静止画像を生成する撮像装置を提供することを目的とする。 In view of the above problems, the present invention uses an image captured for character recognition or translation during moving image recording, and generates an alternative still image when character recognition or translation fails. The purpose is to provide.

上記課題を解決するために、本発明は、静止画像と動画像を同時に撮像可能な撮像装置において、撮像された画像を入力し、文字認識を行う文字認識部と、動画像から静止画像を生成する静止画切出し部と、画像の類似度を算出する画像類似度評価部とを有し、動画記録中に第一の静止画像を撮像し、該第一の静止画像を前記文字認識部に入力し、文字認識に失敗した場合に、前記静止画切出し部において、前記動画像から第二の静止画像を切出して生成し、前記画像類似度評価部において、第一の静止画像と第二の静止画像の類似度が所定の閾値を超えた場合に、前記第二の静止画像を前記文字認識部に入力し、文字認識をやり直すことを特徴とする。 In order to solve the above problems, the present invention provides a character recognition unit that inputs a captured image and performs character recognition in an imaging apparatus capable of simultaneously capturing a still image and a moving image, and generates a still image from the moving image. A still image cutout unit that performs image similarity and an image similarity evaluation unit that calculates image similarity. The first still image is captured during moving image recording, and the first still image is input to the character recognition unit. When the character recognition fails, the still image cutout unit generates and extracts a second still image from the moving image, and the image similarity evaluation unit generates the first still image and the second still image. When the image similarity exceeds a predetermined threshold, the second still image is input to the character recognition unit, and character recognition is performed again.

本発明によれば、動画像と静止画像を同時に記録した時の静止画像を用いて文字認識もしくは翻訳処理が失敗した際にも、代わりとなる静止画像を容易に生成することができ、ユーザーの利便性が向上する。 According to the present invention, even when character recognition or translation processing fails using a still image obtained by simultaneously recording a moving image and a still image, an alternative still image can be easily generated, Convenience is improved.

実施形態に対応する撮像装置の構成例を示すブロック図。1 is a block diagram illustrating a configuration example of an imaging device corresponding to an embodiment. 実施形態に対応する撮像装置の動作例を示すフローチャート。6 is a flowchart illustrating an operation example of the imaging apparatus corresponding to the embodiment. 実施形態に対応する動画像から画像切り出しを行うフレームを示す図。The figure which shows the flame | frame which cuts out an image from the moving image corresponding to embodiment. 実施形態に対応する動画像から画像切り出しを行うフレームを示す図。The figure which shows the flame | frame which cuts out an image from the moving image corresponding to embodiment. 実施形態に対応する撮像装置の構成例を示すブロック図。1 is a block diagram illustrating a configuration example of an imaging device corresponding to an embodiment.

以下、図面を参照しながら本発明の好適な実施の形態を説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

図１は、撮像装置全体の構成を示した図である。図１の撮像装置は、撮像部１０１、動画符号化部１０２、静止画符号化部１０３、記録媒体１０４、動画復号部１０５、静止画切り出し部１０６、画像類似度評価部１０７、文字認識部１０８から構成されている。 FIG. 1 is a diagram illustrating a configuration of the entire imaging apparatus. 1 includes an imaging unit 101, a moving image encoding unit 102, a still image encoding unit 103, a recording medium 104, a moving image decoding unit 105, a still image cutout unit 106, an image similarity evaluation unit 107, and a character recognition unit 108. It is composed of

撮像部１０１は、レンズ、センサ、ノイズ除去回路、Ａ／Ｄ変換部などから構成されており、被写体を撮像し、得られた被写体光を電気信号に変換し、さらにＡ／Ｄ変換を行ってアナログ信号をデジタル信号に変換する。 The imaging unit 101 includes a lens, a sensor, a noise removal circuit, an A / D conversion unit, and the like. The imaging unit 101 captures an image of a subject, converts the obtained subject light into an electrical signal, and further performs A / D conversion. Convert analog signals to digital signals.

符号化部１０２は、該撮像部１０１からの入力画像を所定の符号化方式に従って圧縮する。符号化方式は、例えばＭＰＥＧ２やＨ．２６４などの高能率符号化方式が知られている。 The encoding unit 102 compresses the input image from the imaging unit 101 according to a predetermined encoding method. The encoding method is, for example, MPEG2 or H.264. High-efficiency encoding schemes such as H.264 are known.

静止画符号化部１０３は、該撮像部１０１からの入力画像をＪＰＥＧ等の静止画符号化方式で符号化する。 The still image encoding unit 103 encodes the input image from the imaging unit 101 using a still image encoding method such as JPEG.

記録媒体１０４は、該動画符号化部１０２及び該静止画符号化部１０３によって生成された動画像及び静止画像を記録するデバイスである。動画像用の記録媒体、静止画像用の記録媒体をそれぞれ具備してもよい。 The recording medium 104 is a device that records a moving image and a still image generated by the moving image encoding unit 102 and the still image encoding unit 103. A moving image recording medium and a still image recording medium may be provided.

動画復号部１０５は、記録媒体１０４に記録されている動画像を該動画符号化部１０２の符号化方式に対する復号処理を行う。 The moving image decoding unit 105 performs a decoding process on the moving image recorded on the recording medium 104 according to the encoding method of the moving image encoding unit 102.

静止画切り出し部１０６は、該動画復号部１０５で復号された動画像に含まれる１フレームから静止画を生成する処理を行う。 The still image cutout unit 106 performs processing for generating a still image from one frame included in the moving image decoded by the moving image decoding unit 105.

画像類似度評価部１０７は、静止画同士を比較し、類似度が高いか否かを判定する。ここでは、前記静止画符号化部１０３で符号化された静止画像と前記静止画切り出し部１０６で生成された静止画像を比較する。類似度が高い場合は、前記静止画切り出し部１０６で生成された画像を出力する。 The image similarity evaluation unit 107 compares still images and determines whether the similarity is high. Here, the still image encoded by the still image encoding unit 103 is compared with the still image generated by the still image clipping unit 106. When the degree of similarity is high, the image generated by the still image cutout unit 106 is output.

文字認識部１０８は、入力される静止画像から文字を認識する処理を行う。文字認識の技術に関しては、既知の技術を用いる。 The character recognition unit 108 performs processing for recognizing characters from the input still image. A known technique is used for the character recognition technique.

次に動画像を記録中に、文字認識及び翻訳用の静止画像を同時に撮像し、文字認識に失敗した際の動作について説明する。 Next, a description will be given of the operation when character recognition and translation still images are simultaneously captured during recording of moving images and character recognition fails.

動画符号化部１０２は、撮像部１０１からの入力画像を符号化している。ユーザーが静止画撮影した際には、撮像部１０１から静止画符号化部１０３に画像が入力され、該静止画符号化部１０３において、文字認識及び翻訳用の静止画像が生成される。生成された静止画像は文字認識部１０８に入力され、文字認識が行われる。この時に文字認識の精度が低く、文字認識に失敗した場合は、失敗した旨を不図示のＣＰＵ等の制御部に通知し、文字認識に失敗した静止画像を画像類似度評価部１０７に送信する。 The moving image encoding unit 102 encodes an input image from the imaging unit 101. When a user takes a still image, an image is input from the imaging unit 101 to the still image encoding unit 103, and the still image encoding unit 103 generates a still image for character recognition and translation. The generated still image is input to the character recognition unit 108 and character recognition is performed. At this time, if the character recognition accuracy is low and character recognition fails, the failure is notified to a control unit such as a CPU (not shown) and a still image that has failed character recognition is transmitted to the image similarity evaluation unit 107. .

動画像の記録が終了した時点で、ＣＰＵ等の制御部は、動画像の記録終了後に記録媒体１０４から動画像を読み出し、動画復号部１０５において復号処理を行い、静止画切り出し部１０６において、前記文字認識に失敗した静止画像の代わりとなる静止画像を生成する。生成した静止画像を画像類似度評価部１０７に入力し、文字認識に失敗した静止画像と類似度を計算し、所定の閾値よりも高ければ文字認識部１０８に入力し、文字認識を行う。 When the recording of the moving image is completed, a control unit such as a CPU reads the moving image from the recording medium 104 after the recording of the moving image is completed, performs a decoding process in the moving image decoding unit 105, and in the still image cutout unit 106, A still image is generated as a substitute for a still image for which character recognition has failed. The generated still image is input to the image similarity evaluation unit 107, and the similarity is calculated with the still image that has failed in character recognition. If the similarity is higher than a predetermined threshold, it is input to the character recognition unit 108 to perform character recognition.

前記文字認識の精度を算出するアルゴリズムは既知の技術を用いる。また類似度を計算するアルゴリズムはパターンマッチング等の既知の技術を用いる。 A known technique is used as an algorithm for calculating the accuracy of the character recognition. The algorithm for calculating the similarity uses a known technique such as pattern matching.

上記流れを図２を用いて詳細に説明する。ステップ２０１は、動画像記録の開始を示すステップである。ユーザーが撮像装置の録画ボタン等を押すことにより処理が開始される。 The above flow will be described in detail with reference to FIG. Step 201 is a step indicating the start of moving image recording. The process is started when the user presses a recording button or the like of the imaging apparatus.

ステップ２０２は、静止画像の撮像を示すステップである。ステップ２０１で動画像の記録が始まっているので、該ステップ２０２の静止画像撮像は、動画記録中の静止画撮像であることを示す。 Step 202 is a step showing still image capturing. Since recording of a moving image has started in step 201, it indicates that the still image capturing in step 202 is still image capturing during moving image recording.

ステップ２０３は、文字認識を行うステップである。該ステップ２０２で生成された静止画像を用いて文字認識を行う。 Step 203 is a step of performing character recognition. Character recognition is performed using the still image generated in step 202.

ステップ２０４は、文字認識に成功したか否かを判断するステップである。該静止画像で文字認識が成功した場合には終了に進み、文字認識が失敗した場合にはステップ２０５に進む。文字認識の精度が所定の閾値よりも高い場合を成功とし、低い場合を失敗とする。ここで、文字認識の精度の算出に関しては、前述した通り既知の技術を用いる。 Step 204 is a step of determining whether or not the character recognition is successful. If character recognition has succeeded in the still image, the process proceeds to the end. If character recognition has failed, the process proceeds to step 205. A case where the accuracy of character recognition is higher than a predetermined threshold is regarded as success, and a case where the accuracy is low is regarded as failure. Here, as described above, a known technique is used for calculating the accuracy of character recognition.

ステップ２０５は、動画像記録が終了したか否かの判断を行うステップである。動画像の記録が終了するまでステップ２０５で待機するステップである。動画像記録が終了したらステップ２０６に進む。 Step 205 is a step of determining whether or not the moving image recording has been completed. This is a step of waiting at step 205 until the recording of the moving image is completed. When the moving image recording is completed, the process proceeds to step 206.

ステップ２０６は、前記記録された動画像から静止画像を切り出すステップである。動画像から静止画を切り出す処理内容については後述する。 Step 206 is a step of cutting out a still image from the recorded moving image. The processing content for cutting out a still image from a moving image will be described later.

ステップ２０７は、前記動画像から切り出された静止画像とステップ２０２において生成された静止画像の類似度を計算するステップである。 Step 207 is a step of calculating the similarity between the still image cut out from the moving image and the still image generated in step 202.

ステップ２０８は、前記類似度算出ステップにおいて算出された類似度が所定の閾値以上か否かを判定するステップである。類似度と閾値を比較し、所定の閾値よりも大きな場合はステップ２０９に進み、所定の閾値よりも小さな場合はステップ２１０に進む。このステップでの目的を説明する。ステップ２０２で生成した静止画は、ユーザーが文字認識あるいは翻訳したい文字が含まれている画像である。ステップ２０６で生成した静止画像にユーザーが文字認識あるいは翻訳したい文字が含まれているか否かの判断のために両画像の類似度を計算している。所定の閾値よりも低い場合は、静止画像の絵柄大きくことなっており、文字認識あるいは翻訳したい文字が含まれていない可能性が高い。 Step 208 is a step of determining whether or not the similarity calculated in the similarity calculation step is greater than or equal to a predetermined threshold value. The similarity is compared with the threshold value, and if it is larger than the predetermined threshold value, the process proceeds to step 209, and if it is smaller than the predetermined threshold value, the process proceeds to step 210. The purpose of this step will be described. The still image generated in step 202 is an image containing characters that the user wants to recognize or translate. In order to determine whether or not the user wants to recognize or translate characters in the still image generated in step 206, the similarity between both images is calculated. If it is lower than the predetermined threshold, the pattern of the still image is large, and there is a high possibility that the character to be recognized or translated is not included.

ステップ２０９は、動画から静止画像を切り出した回数を管理し、該回数が所定の値よりも大きいか否かを判断するステップである。切り出し回数が所定の値よりも大きい場合は、終了に進み、小さい場合は、ステップ２０３に進み、生成した静止画像を文字認識させる。該所定値はユーザーが設定できる値でも良い。このステップは、動画像からの切り出し処理を無限に行わないようにするためのステップである。 Step 209 is a step of managing the number of times that a still image has been cut out from a moving image and determining whether or not the number is greater than a predetermined value. If the number of cutouts is larger than a predetermined value, the process proceeds to the end, and if it is smaller, the process proceeds to step 203, and the generated still image is recognized. The predetermined value may be a value that can be set by the user. This step is a step for preventing infinite cutout processing from a moving image.

ステップ２１０は、該類似度が所定の閾値よりも小さい場合に、ステップ２０９と同じく、動画から静止画像を切り出した回数を管理し、該回数が所定の値よりも大きいか否かを判断するステップである。ステップ２０９と同じく切り出し回数が、所定回数よりも大きかった場合は、終了に進み、小さい場合は、ステップ２０６に進む。動画像から静止画像を生成したものの、前記静止画撮像２０２で生成した静止画像との差分が大きく、類似した静止画を生成することができなかったので、もう一度動画像から静止画像を生成するステップを進む。 Step 210 is a step of managing the number of times that a still image is cut out from a moving image and determining whether or not the number of times is larger than a predetermined value, as in step 209, when the similarity is smaller than a predetermined threshold. It is. As in step 209, if the number of cutouts is greater than the predetermined number, the process proceeds to the end, and if smaller, the process proceeds to step 206. Although a still image is generated from a moving image, a difference from the still image generated by the still image capturing 202 is large and a similar still image cannot be generated. Continue on.

次に図１における静止画切り出し部１０６及び、図２における静止画切り出し２０６ステップの詳細を図３を用いて説明する。 Next, details of the still image cutout unit 106 in FIG. 1 and the still image cutout 206 step in FIG. 2 will be described with reference to FIG.

図３は、動画像を時間軸に展開した時のイメージ図である。３０１は動画像を構成する１枚のフレームを示している。該フレーム３０１が所定枚数集まった集まりをＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）と呼ぶ。 FIG. 3 is an image diagram when a moving image is developed on the time axis. Reference numeral 301 denotes one frame constituting a moving image. A group in which a predetermined number of frames 301 are collected is referred to as GOP (Group Of Pictures).

静止画像を切り出す元となるフレームは、１つのＧＯＰに一枚含まれるＩピクチャと呼ばれるフレームから生成する。ＧＯＰにはＩピクチャ、Ｐピクチャ、Ｂピクチャと呼ばれるフレームで構成されている。これは、前記符号化方式の中で定められているフレームのタイプである。Ｉピクチャは他のピクチャに比べ、符号量が多く割り当てられる傾向にある。これは、動画像の画質を保つために必要なためである。静止画像を切り出す元となるフレームをこのＩピクチャにすることで、切り出して生成した静止画像の画質を上げ、文字認識の精度を上げる目的がある。 A frame from which a still image is cut out is generated from a frame called an I picture included in one GOP. The GOP is composed of frames called I picture, P picture, and B picture. This is the type of frame defined in the encoding scheme. An I picture tends to be assigned a larger amount of code than other pictures. This is because it is necessary to maintain the quality of moving images. By using this I picture as a frame from which a still image is cut out, there is an object of improving the image quality of the still image generated by cutting out and improving the accuracy of character recognition.

図２で説明したように、文字認識が成功するまで静止画切り出しが行われる。例えば、動画記録中にｔ３の時間に静止画像を撮像し、文字認識及び翻訳に失敗したとする。この時動画像から静止画像の切り出しはｔ３の時間的に隣接するＧＯＰのＩピクチャから行う。ＧＯＰは一般的に０．５秒分のフレームの集まりである。よってｔ２、ｔ４のフレームから静止画像を生成すれば、動画記録中に撮像した文字認識及び翻訳したい被写体が映った静止画像に類似した静止画が生成される可能性が高い。 As described with reference to FIG. 2, still image clipping is performed until the character recognition is successful. For example, assume that a still image is captured at time t3 during moving image recording, and character recognition and translation have failed. At this time, the still image is cut out from the moving image from the I picture of the GOP temporally adjacent at t3. A GOP is generally a collection of frames for 0.5 seconds. Therefore, if a still image is generated from the frames t2 and t4, there is a high possibility that a still image similar to a still image in which a subject to be recognized and translated in moving image recording is captured.

以上のように、本実施例形態によれば、動画記録中に撮像した静止画像を用いて文字認識し、文字認識に失敗した場合でも、動画像から静止画像を生成するのでユーザーの利便性が向上する。 As described above, according to the present embodiment, character recognition is performed using a still image captured during moving image recording, and even when character recognition fails, a still image is generated from a moving image. improves.

次に本発明の第二の実施形態について図を用いて説明する。図５は、撮像装置全体の構成を示した図である。図５の撮像装置に含まれる撮像部５０１．動画符号化部５０２、静止画符号化部５０３、記録媒体５０４、動画復号部５０５、静止画切り出し部５０６、画像類似度評価部５０７は、図１の撮像部１０１．動画符号化部１０２、静止画符号化部１０３、記録媒体１０４、動画復号部１０５、静止画切り出し部１０６、画像類似度評価部１０７と動作は等しいので、ここでは説明を省略する。 Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a diagram illustrating a configuration of the entire imaging apparatus. Imaging unit 501... Included in the imaging apparatus of FIG. The moving image encoding unit 502, the still image encoding unit 503, the recording medium 504, the moving image decoding unit 505, the still image cutout unit 506, and the image similarity evaluation unit 507 are the imaging unit 101. Since the operations are the same as those of the moving image encoding unit 102, the still image encoding unit 103, the recording medium 104, the moving image decoding unit 105, the still image cutout unit 106, and the image similarity evaluation unit 107, description thereof is omitted here.

図５の撮像装置は、図１の撮像装置に対して新たに通信部５０８、表示部５０９が追加された構成となっている。通信部５５０、文字認識部５５１、翻訳部５５２は撮像装置外部にある外部機器で例としてサーバーなどがあげられる。 The imaging apparatus of FIG. 5 has a configuration in which a communication unit 508 and a display unit 509 are newly added to the imaging apparatus of FIG. The communication unit 550, the character recognition unit 551, and the translation unit 552 are external devices outside the imaging apparatus, and examples include a server.

通信部５０８は、画像類似度評価部５０７から出力された画像を入力し、撮像装置外部の機器と通信を行うことができる。ここでは、入力した該画像を外部機器に出力する。また入力側では、外部機器から文字列を入力することができる。また後述する文字認識及び翻訳の結果通知を入力することができる。前記入力した文字列を表示部５０９に出力する。文字認識及び翻訳の結果通知が文字認識及び翻訳が失敗したという通知である場合には、不図示の制御部へ「失敗」という情報を通知する。 The communication unit 508 can input the image output from the image similarity evaluation unit 507 and communicate with a device outside the imaging apparatus. Here, the input image is output to an external device. On the input side, a character string can be input from an external device. In addition, it is possible to input a result notification of character recognition and translation to be described later. The input character string is output to the display unit 509. When the result notification of character recognition and translation is a notification that character recognition and translation have failed, information “failure” is notified to a control unit (not shown).

ここで外部機器に含まれる通信部５５０、文字認識部５５１、翻訳部５５２について説明する。通信部５５０は、通信部５０８から画像を入力し、外部機器内の文字認識部５５１へ画像を出力する。前記通信部５０８と前記通信部５５０の間の通信は、インターネット、無線、有線等が例としてあげられる。 Here, the communication unit 550, the character recognition unit 551, and the translation unit 552 included in the external device will be described. The communication unit 550 inputs an image from the communication unit 508 and outputs the image to the character recognition unit 551 in the external device. Examples of the communication between the communication unit 508 and the communication unit 550 include the Internet, wireless communication, and wired communication.

文字認識部５５１は、入力された画像から文字を認識し、文字列を生成する。生成した文字列を通信部５５０に返してもよいし、翻訳部５５２に出力してもよい。 The character recognition unit 551 recognizes characters from the input image and generates a character string. The generated character string may be returned to the communication unit 550 or output to the translation unit 552.

翻訳部５５２は、文字認識部５５１から入力された文字列を他言語の文字列に変換する。どの他言語に翻訳するかは予めユーザーによって設定されている。デフォルトの設定は撮像装置のＵＩを表現している言語に基づいてよい。翻訳した結果を文字認識部５５１を介して通信部５５０に出力する。 The translation unit 552 converts the character string input from the character recognition unit 551 into a character string in another language. Which other language is to be translated is set in advance by the user. The default setting may be based on the language expressing the UI of the imaging device. The translated result is output to the communication unit 550 via the character recognition unit 551.

以上のように、本実施例形態によれば、演算が多く、翻訳用のデータベース等の記憶領域を要する処理を撮像装置外部の機器に任せた場合においても、文字認識処理及び翻訳処理動画記録中に撮像した静止画像を用いて文字認識し、文字認識に失敗した場合でも、動画像から静止画像を生成するのでユーザーの利便性が向上する。 As described above, according to the present embodiment, character recognition processing and translation processing video recording are being performed even when a large amount of computation is performed and processing that requires a storage area such as a database for translation is left to a device outside the imaging apparatus. Even if character recognition is performed using a still image captured in the image and character recognition fails, a still image is generated from the moving image, so that convenience for the user is improved.

次に本発明の第三の実施形態について図を用いて説明する。図４は、前記静止画切り出し部が動画像から静止画像を切り出す際に使用するフレームを示した図である。ｔ３において動画記録中に撮像した静止画像を用いて文字認識及び翻訳処理を行ったが、文字認識及び翻訳処理に失敗した場合、ｔ３に時間的に近いフレーム４０２及びフレーム４０３から静止画像を切り出す。フレーム４０２及びフレーム４０３から切り出した静止画像でも文字認識及び翻訳が失敗した場合は、次にｔ３に時間的に違いフレーム４０１及びフレーム４０４から静止画像を切り出す。 Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a diagram illustrating frames used when the still image cutout unit cuts out a still image from a moving image. Character recognition and translation processing is performed using a still image captured during moving image recording at t3. If character recognition and translation processing fails, still images are cut out from frames 402 and 403 that are temporally close to t3. If character recognition and translation have failed even in the still images cut out from the frames 402 and 403, the still images are cut out from the frames 401 and 404 at a different time t3.

この実施例が有効なのは、動画記録中のユーザーの単位時間当たりの移動距離が大きい場合に有効である。つまり、１ＧＯＰ中のＩピクチャのみから切り出すと、移動距離が大きい場合には、文字認識及び翻訳したい被写体がフレーム内に含まれていない場合がある。よって撮像装置に移動距離測定部を設け、撮像装置の単位時間当たりの移動距離等を算出し、移動距離が大きいすなわち撮像装置が高速に移動している場合には、図４のようにｔ３から時間的に近いフレームから静止画切り出し対象フレームにしていく構成でもよい。 This embodiment is effective when the moving distance of the user per unit time during moving image recording is large. In other words, if only the I picture in 1 GOP is cut out, the subject to be recognized and translated may not be included in the frame when the moving distance is long. Therefore, a moving distance measuring unit is provided in the imaging apparatus, and the moving distance per unit time of the imaging apparatus is calculated. When the moving distance is large, that is, when the imaging apparatus is moving at a high speed, from t3 as shown in FIG. A configuration may be adopted in which a still image cut-out target frame is selected from a temporally close frame.

１０１撮像部、１０２動画符号化部、１０３静止画符号化部、１０４記録媒体、
１０５動画復号部、１０６静止画切り出し部、１０７画像類似度評価部、
１０８文字認識部、２０１動画像記録を開始するステップ、
２０２静止画を撮像するステップ、２０３文字認識を行うステップ、
２０４文字認識の成功失敗を判断するステップ、
２０５動画像記録終了を監視するステップ、
２０６静止画像を動画像から切り出すステップ、
２０７静止画同士の類似度を算出するステップ、
２０８類似度と閾値の大小を比較するステップ、
２０９静止画切り出し処理回数と所定の回数を比較するステップ、
２１０静止画切り出し処理回数と所定の回数を比較するステップ、
３０１動画像を構成する１フレーム、
３０２静止画像に切り出す元となる１フレーム 101 imaging unit, 102 moving image encoding unit, 103 still image encoding unit, 104 recording medium,
105 video decoding unit, 106 still image clipping unit, 107 image similarity evaluation unit,
108 character recognition unit, 201 step of starting moving image recording,
202 steps of capturing a still image, 203 performing character recognition,
204 determining the success or failure of character recognition;
205 monitoring the end of moving image recording;
206 cutting out still images from moving images;
207 calculating a similarity between still images;
208 comparing the degree of similarity with the threshold value;
209 a step of comparing the number of still image cutout processing times with a predetermined number of times,
210 a step of comparing the number of still image cutout processing times with a predetermined number of times,
301 One frame constituting a moving image,
302 One frame from which to extract a still image

Claims

In an imaging device capable of simultaneously capturing still images and moving images,
A character recognition unit that inputs a captured image and performs character recognition;
A still image cutout unit for generating a still image from a moving image;
An image similarity evaluation unit that calculates the similarity of images,
When a first still image is captured during moving image recording, the first still image is input to the character recognition unit, and character recognition fails, the still image cut-out unit receives a second image from the moving image. A still image is cut out and generated. When the similarity between the first still image and the second still image exceeds a predetermined threshold in the image similarity evaluation unit, the character recognition of the second still image is performed. An image pickup apparatus that inputs data into a computer and performs character recognition again.

2. The character recognition unit according to claim 1, wherein when the character recognition is unsuccessful, the character recognition unit outputs information notifying the failure and information notifying the imaging time of the still image subjected to the character recognition process. The imaging device described.

The imaging apparatus according to claim 1, wherein the still image cutout unit cuts out and generates a still image from a moving image including an imaging time output by the character recognition unit.

The picture of the moving image from which the still image cutout unit cuts out the still image from the moving image includes the GOP including the imaging time output by the character recognition unit, and the N number before the GOP (N is 1 or more in time) The imaging apparatus according to any one of claims 1 to 3, wherein the imaging apparatus is included in an integer) GOP and M in the back (M is an integer of 1 or more).

5. The imaging apparatus according to claim 1, wherein the still image cutout unit cuts and generates a still image from an I picture of the GOP. 6.

The apparatus further includes a movement amount measurement unit capable of measuring the movement amount of the imaging apparatus, and the still image extraction unit increases the number of GOPs to be extracted when the movement speed of the imaging apparatus is high. The imaging device according to any one of claims 1 to 5.

7. The character recognition unit according to claim 1, wherein the character recognition unit generates a still image from a moving image and inputs the moving image to the character recognition unit until the character recognition is successful. The imaging device according to one item.

In the character recognition unit, when the number of character recognition failures exceeds a specified threshold, the still image cutout unit cuts out and generates a still image from a picture other than the I picture of the GOP. The imaging device according to any one of claims 1 to 7.

The imaging apparatus further includes a translation unit and a display unit, and translates characters obtained by the character recognition unit into a specified language and displays the translated result on the display unit. The imaging device according to claim 8.