JP2010103694A

JP2010103694A - Camera with translation function, and method of displaying text

Info

Publication number: JP2010103694A
Application number: JP2008272152A
Authority: JP
Inventors: Hisashi Matsuyama; 久松山; Aijiro Gohara; 愛二郎郷原; Sada Yasuoka; 貞安岡
Original assignee: Seiko Precision Inc
Current assignee: Seiko Precision Inc
Priority date: 2008-10-22
Filing date: 2008-10-22
Publication date: 2010-05-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a camera with a translation function capable of quickly displaying prescribed text, and a method of displaying the text. <P>SOLUTION: The camera with a translation function includes a camera for picking up a frame image, a character extracting means for extracting characters reflected on the frame image, a character recognizing means for recognizing the extracted characters, a translating means for translating text composed of the recognized characters, a displaying means for displaying the translated text in the frame image, and a frame image analyzing means for comparing consecutively imaged frame images, calculating a difference in a position and a difference in zooming of the same object shown on the both frame images, and analyzing a difference between imaging conditions. The displaying means specifies a display position in the frame image of the translated text displayed in a frame image that is just previously imaged from the difference between the imaging conditions of the both frame images analyzed by the frame image analyzing means. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画像中の文字を他の言語に翻訳して表示する技術に関する。 The present invention relates to a technique for translating characters in an image into another language for display.

外国における自動車の運転や歩行の際、標識や看板に記された文字が理解できずしばしば混乱を来たすことがある。特許文献１にはこのような自動車運転時の混乱を回避するため、通り沿いに設けられた標識や他のオブジェクトに記された文字を翻訳し、翻訳した文字を運転手に知らせる車両ナビゲーションシステムが開示されている。この車両ナビゲーションシステムは、カメラで標識や他のオブジェクトに記された文字を撮像し、撮像した画像から文字を抽出、認識して所定の言語に翻訳する。そして、表示装置に撮像した画像とともに翻訳した文字を表示して、自動車の運転を補助する。 When driving or walking in a foreign country, the characters on the signs and signs are often confusing and sometimes confusing. Patent Document 1 discloses a vehicle navigation system that translates characters written on signs and other objects provided along the street and informs the driver of the translated characters in order to avoid such confusion when driving a car. It is disclosed. This vehicle navigation system captures characters on signs and other objects with a camera, extracts characters from the captured images, recognizes them, and translates them into a predetermined language. Then, the translated characters are displayed together with the captured image on the display device to assist the driving of the automobile.

特開２００３−３２３６９３号公報JP 2003-323893 A

特許文献１に開示された車両ナビゲーションシステムは、撮像画像からの文字の抽出、翻訳、及び翻訳した文字の表示位置の設定を、連続撮像したフレーム毎に実行する。そのため、車両ナビゲーションシステムを構成する各装置の負荷が大きく、翻訳した文字を表示するまでに時間がかかる。 The vehicle navigation system disclosed in Patent Document 1 performs character extraction from a captured image, translation, and setting of the display position of the translated character for each frame that is continuously captured. For this reason, the load on each device constituting the vehicle navigation system is heavy, and it takes time to display the translated characters.

本発明は、上記問題点を解決するためになされたものであり、所定のテキストをすばやく表示することができる翻訳機能付きカメラ、及びテキストの表示方法を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide a camera with a translation function capable of quickly displaying a predetermined text and a text display method.

上記目的を達成するため、本発明に係る翻訳機能付きカメラは、複数のフレーム画像を連続的に撮像する画像撮像手段と、前記画像撮像手段が撮像したフレーム画像から文字を抽出する文字抽出手段と、前記文字抽出手段が抽出した文字を認識する文字認識手段と、前記文字認識手段が認識した文字から構成されたテキストを所定の言語のテキストに翻訳する翻訳手段と、前記フレーム画像に、前記翻訳手段が翻訳したテキストを合成して表示する表示手段と、前記画像撮像手段が撮像したフレーム画像とその直前に撮像したフレーム画像との撮像条件の差を求めるフレーム画像解析手段と、を備え、前記表示手段は、前記フレーム画像解析手段が解析した両フレーム画像の撮像条件の差と、前記直前に撮像したフレーム画像で表示した翻訳したテキストの位置とから、その直後に撮像したフレーム画像における翻訳したテキストの表示位置を特定し、該翻訳したテキストを前記直後に撮像したフレーム画像に合成することを特徴とする。 In order to achieve the above object, a camera with a translation function according to the present invention includes an image capturing unit that continuously captures a plurality of frame images, and a character extraction unit that extracts characters from the frame images captured by the image capturing unit. A character recognition means for recognizing the character extracted by the character extraction means; a translation means for translating a text composed of the characters recognized by the character recognition means into text of a predetermined language; Display means for combining and displaying the text translated by the means; and frame image analysis means for obtaining a difference in imaging condition between the frame image captured by the image capturing means and the frame image captured immediately before the frame image, The display means translates the difference between the imaging conditions of both frame images analyzed by the frame image analysis means and the frame image captured immediately before And a position of the text was to identify the display position of the text translated in the frame image captured immediately thereafter, characterized by combining the text and the translated frame image captured immediately after the.

前記文字認識手段は、さらに、前記画像撮像手段が撮像したフレーム画像に写ったテキストの表示エリアの寸法を特定し、前記表示手段は、前記フレーム画像解析手段が解析した両フレーム画像の撮像条件の差と、前記直前に撮像したフレーム画像において特定した表示エリアの寸法とから、その直後に撮像したフレーム画像における翻訳したテキストの表示エリアの寸法を決定してもよい。 The character recognizing unit further specifies a size of a display area of a text captured in the frame image captured by the image capturing unit, and the display unit is configured to determine an imaging condition of both frame images analyzed by the frame image analyzing unit. From the difference and the dimension of the display area specified in the frame image captured immediately before, the dimension of the display area of the translated text in the frame image captured immediately after that may be determined.

また、前記表示手段は、前記翻訳手段が翻訳したテキストの表示寸法を調整して、前記表示手段が決定した表示エリア内に翻訳したテキストを表示してもよい。 The display means may display the translated text in a display area determined by the display means by adjusting a display size of the text translated by the translation means.

また、前記フレーム画像解析手段は、前記画像撮像手段が撮像したフレーム画像とその直前に撮像したフレーム画像とに共通する特徴あるパターンを検出し、この両フレーム画像において検出したパターンの表示位置の違いから、両フレーム画像における撮像条件の差を求めてもよい。 Further, the frame image analysis unit detects a characteristic pattern common to the frame image captured by the image capturing unit and the frame image captured immediately before the frame image, and the difference in display position of the pattern detected in both the frame images From this, the difference between the imaging conditions in both frame images may be obtained.

また、上記目的を達成するため、本発明に係るテキストの表示方法は、複数のフレーム画像を連続的に撮像する撮像工程と、所定のテキストを付加した前記フレーム画像を連続的に表示する表示工程と、前記撮像工程において撮像したフレーム画像とその直前に撮像したフレーム画像とを比較して、両フレーム画像に写し出された同一の被写体の位置の違い及びズームの差を算出して、該両フレーム画像の撮像条件の差を解析するフレーム画像解析工程と、を備え、前記表示工程では、前記フレーム画像解析工程において解析した両フレーム画像の撮像条件の差と、前記直前に撮像したフレーム画像で表示した翻訳したテキストの位置とから、その直後に撮像したフレーム画像における翻訳したテキストの表示位置を特定し、該翻訳したテキストを前記直後に撮像したフレーム画像に付加することを特徴とする。 In order to achieve the above object, a text display method according to the present invention includes an imaging step of continuously capturing a plurality of frame images, and a display step of continuously displaying the frame images to which a predetermined text is added. And the frame image captured in the imaging step and the frame image captured immediately before it are calculated to calculate the difference in position and zoom of the same subject imaged in both frame images. A frame image analysis step for analyzing a difference between image capturing conditions, and displaying the difference between the image capturing conditions of both frame images analyzed in the frame image analyzing step and the frame image captured immediately before in the display step From the translated text position, the display position of the translated text in the frame image taken immediately after that is specified, and the translated text Characterized in that it added to the frame image of the captured strike immediately after the.

また、前記撮像工程で撮像したフレーム画像から文字を抽出する文字抽出工程と、前記文字抽出工程で抽出した文字を認識する文字認識工程と、前記文字認識工程で認識した文字から構成されたテキストを所定の言語に翻訳する翻訳工程と、をさらに備え、前記表示工程で表示するテキストを、前記翻訳工程で翻訳したテキストとしてもよい。 Further, a character extraction step for extracting characters from the frame image captured in the imaging step, a character recognition step for recognizing characters extracted in the character extraction step, and a text composed of the characters recognized in the character recognition step A translation step of translating into a predetermined language, and the text displayed in the display step may be a text translated in the translation step.

また、前記文字認識工程では、さらに、前記撮像工程で撮像したフレーム画像に写ったテキストの表示エリアの寸法を特定し、前記表示工程では、前記フレーム画像解析工程で解析した両フレーム画像の撮像条件の差と、前記直前に撮像したフレーム画像において特定した表示エリアの寸法とから、その直後に撮像したフレーム画像における翻訳したテキストの表示エリアの寸法を決定してもよい。 Further, in the character recognition step, the size of the display area of the text shown in the frame image captured in the imaging step is specified, and in the display step, the imaging conditions of both frame images analyzed in the frame image analysis step The size of the display area of the translated text in the frame image captured immediately after that may be determined from the difference between the two and the size of the display area specified in the frame image captured immediately before.

また、前記表示工程では、さらに、前記翻訳工程で翻訳したテキストの表示寸法を調整して、前記表示工程で決定した表示エリア内に翻訳したテキストを表示してもよい。 In the display step, the translated text may be displayed in the display area determined in the display step by adjusting the display size of the text translated in the translation step.

本発明によれば、所定のテキストをすばやく表示することができる。 According to the present invention, it is possible to quickly display a predetermined text.

以下、本発明の実施形態に係る翻訳機能付きカメラ、及びテキストの表示方法を図面を参照して説明する。図１に示すように、本発明の実施形態に係る翻訳機能付きカメラ１は、装置本体１０と、カメラ２０と、入力部３０と、表示部４０とを備えている。 A camera with a translation function and a text display method according to embodiments of the present invention will be described below with reference to the drawings. As shown in FIG. 1, the camera with a translation function 1 according to the embodiment of the present invention includes an apparatus main body 10, a camera 20, an input unit 30, and a display unit 40.

また、装置本体１０は、テキスト抽出部１１と、テキスト読取部１２と、翻訳処理部１３と、撮像画像解析部１４と、表示処理部１５と、データベース１６とを備えている。装置本体１０はこのような構成により、カメラ２０が撮像したフレーム画像から文字を抽出、翻訳し、翻訳した文字の表示位置や表示寸法を設定して表示部４０に表示する。以下、装置本体１０を構成する各部を詳細に説明する。 The apparatus body 10 includes a text extraction unit 11, a text reading unit 12, a translation processing unit 13, a captured image analysis unit 14, a display processing unit 15, and a database 16. With this configuration, the apparatus body 10 extracts and translates characters from the frame image captured by the camera 20, sets the display position and display dimensions of the translated characters, and displays them on the display unit 40. Hereinafter, each part which comprises the apparatus main body 10 is demonstrated in detail.

テキスト抽出部１１は、文字（数字、記号、符号等を含む）のフォントを予め記憶しており、カメラ２０が撮像したフレーム画像に写った文字を、パターンマッチングなどの手法により抽出する。 The text extraction unit 11 stores fonts of characters (including numbers, symbols, codes, and the like) in advance, and extracts characters that appear in a frame image captured by the camera 20 by a technique such as pattern matching.

テキスト読取部１２は、抽出した各文字を文字認識技術を用いて文字情報として認識する。ここで文字認識技術とは、例えば、抽出したテキストを構成する文字の特徴と、データベース１６に記憶した文字の特徴（あるいは文字そのもののテンプレート）とが合致するものを探し出し、合致したデータベース１６に記憶された文字として認識する技術をいう。 The text reading unit 12 recognizes each extracted character as character information using a character recognition technique. Here, the character recognition technique refers to, for example, searching for a match between the characteristics of characters constituting the extracted text and the characteristics of the characters stored in the database 16 (or a template of the characters themselves), and storing them in the matched database 16. The technology to recognize as a letter.

翻訳処理部１３は、テキスト読取部１２が認識した文字から構成されるテキストを所定の言語に翻訳する。例えば翻訳処理部１３は、多数の日本語の言葉と、その言葉に対応する複数の言語による言葉とを記憶したデータベース１６を検索することで、テキストを所定の言語に翻訳する。 The translation processing unit 13 translates text composed of characters recognized by the text reading unit 12 into a predetermined language. For example, the translation processing unit 13 translates the text into a predetermined language by searching the database 16 storing a large number of Japanese words and words in a plurality of languages corresponding to the words.

撮像画像解析部１４は、カメラ２０が連続して撮像した異なるフレーム画像間の撮像条件の差異をパターンマッチング法により解析する。ここで撮像条件の差異とは、異なるフレーム画像間の構図やズームの差異を指している。 The captured image analysis unit 14 analyzes a difference in imaging conditions between different frame images continuously captured by the camera 20 by a pattern matching method. Here, the difference in imaging conditions refers to a difference in composition and zoom between different frame images.

具体的に撮像画像解析部１４による解析内容について説明すると、まず撮像画像解析部１４は、翻訳処理を施すフレーム画像（以下、対象フレーム画像と記載する）と、この対象フレーム画像の直前に撮影したフレーム画像（以下、前フレーム画像と記載する）とに共通するパターンを見つけ出す、いわゆるパターンマッチング法による解析を実行する。このパターンマッチング法は、例えば、前フレーム画像において特徴あるパターン（例えば、彩度が著しく変化する領域等）を検出し、次に、対象フレーム画像において、前フレーム画像で検出したパターンと合致する領域を見つけ出す画像解析手法である。 The analysis contents by the captured image analysis unit 14 will be described in detail. First, the captured image analysis unit 14 captures a frame image to be subjected to translation processing (hereinafter referred to as a target frame image) and a frame image immediately before the target frame image. An analysis based on a so-called pattern matching method for finding a pattern common to a frame image (hereinafter referred to as a previous frame image) is executed. This pattern matching method detects, for example, a characteristic pattern in the previous frame image (for example, a region where the saturation changes significantly), and then matches the pattern detected in the previous frame image in the target frame image. It is an image analysis technique to find out.

そして図２に示すように、撮像画像解析部１４が前フレーム画像４１において特徴あるパターンａ１（４，３）、ｂ１（０，０）を検出し、対象フレーム画像４２においてパターンａ１と合致するパターンａ１´（６，６）、及びパターンｂ１と合致するパターンｂ１´（−２，０）を検出したとする。すると、撮像画像解析部１４は各パターンの座標値から、パターンａ１とパターンｂ１との距離“５”と、パターンａ１´とパターンｂ１´との距離“１０”とを算出する。続いて、撮像画像解析部１４は、算出した距離“５”と“１０”とを比較して、対象フレーム画像４２が、前フレーム画像４１を撮像した状態から２倍にズームアップして撮像した画像であると認識する。なお、図に示したＸ，Ｙ座標は、前フレーム画像４１及び対象フレーム画像４２の撮像中心を原点としている。 As shown in FIG. 2, the captured image analysis unit 14 detects characteristic patterns a1 (4, 3) and b1 (0, 0) in the previous frame image 41, and matches the pattern a1 in the target frame image 42. Assume that a1 ′ (6, 6) and a pattern b1 ′ (−2, 0) matching the pattern b1 are detected. Then, the captured image analysis unit 14 calculates the distance “5” between the pattern a1 and the pattern b1 and the distance “10” between the pattern a1 ′ and the pattern b1 ′ from the coordinate values of each pattern. Subsequently, the captured image analysis unit 14 compares the calculated distances “5” and “10”, and the target frame image 42 is captured by zooming in twice from the state of capturing the previous frame image 41. Recognize it as an image. Note that the X and Y coordinates shown in the figure have the imaging centers of the previous frame image 41 and the target frame image 42 as the origin.

また、撮像画像解析部１４は、前フレーム画像４１及び対象フレーム画像４２において、共通して写し出された被写体位置の差を算出する。図２で示した画像を用いて説明すると、まず撮像画像解析部１４は、前フレーム画像４１と対象フレーム画像４２とに生じているズームの差を解消するため、前フレーム画像４１のパターンａ１及びパターンｂ１の座標値をそれぞれ２倍する。次に、撮像画像解析部１４は、乗算後の座標値と、対象フレーム画像４２のパターンａ１´及びパターンｂ１´の座標値とを比較する。すなわち、前フレーム画像４１の各パターンの座標値を２倍した座標であるパターンａ１（８，６）及びパターンｂ１（０，０）と、対象フレーム画像４２のパターンａ１´（６，６）及びパターンｂ１´（−２，０）とを比較すると、パターンａ１´及びパターンｂ１´のＸ座標は、パターンａ１及びパターンｂ１のＸ座標と比べてそれぞれ２小さい。これにより撮像画像解析部１４は、対象フレーム画像４２に映し出された被写体は、前フレーム画像４１に映し出された同一の被写体よりもＸ方向に−２移動した位置にあると認識する。 In addition, the captured image analysis unit 14 calculates the difference between the subject positions that are commonly projected in the previous frame image 41 and the target frame image 42. To explain using the image shown in FIG. 2, first, the captured image analysis unit 14 eliminates the zoom difference between the previous frame image 41 and the target frame image 42, and the pattern a <b> 1 of the previous frame image 41 and Each of the coordinate values of the pattern b1 is doubled. Next, the captured image analysis unit 14 compares the multiplied coordinate values with the coordinate values of the pattern a1 ′ and the pattern b1 ′ of the target frame image 42. That is, the pattern a1 (8, 6) and the pattern b1 (0, 0), which are coordinates obtained by doubling the coordinate value of each pattern of the previous frame image 41, and the pattern a1 ′ (6, 6) of the target frame image 42 and Comparing the pattern b1 ′ (−2, 0), the X coordinates of the pattern a1 ′ and the pattern b1 ′ are 2 smaller than the X coordinates of the pattern a1 and the pattern b1, respectively. Accordingly, the captured image analysis unit 14 recognizes that the subject displayed in the target frame image 42 is at a position moved −2 in the X direction with respect to the same subject displayed in the previous frame image 41.

また、撮像画像解析部１４は、上記で求めた前フレーム画像４１と対象フレーム画像４２とにおける同一の被写体位置の違いやズームの差異に基づいて、対象フレーム画像４２で新たな被写体が写し出された領域を求める。 In addition, the captured image analysis unit 14 displays a new subject in the target frame image 42 based on the difference in the same subject position and the difference in zoom between the previous frame image 41 and the target frame image 42 obtained above. Find the area.

表示処理部１５は、翻訳したテキストの表示位置や表示寸法を設定し、表示部４０に対象フレーム画像４２とともに翻訳したテキストを表示する。
具体的には、表示処理部１５は、フレーム画像において抽出したテキストの位置及びその表示エリアを特定し、抽出したテキストを翻訳後のテキストに書き換える。その際、翻訳後のテキストが抽出したテキストの表示エリアに収まるように、翻訳後のテキストの寸法を適宜変更し、あるいは改行等の処理を施して表示部４０に表示する。 The display processing unit 15 sets the display position and display dimensions of the translated text, and displays the translated text together with the target frame image 42 on the display unit 40.
Specifically, the display processing unit 15 specifies the position of the extracted text in the frame image and its display area, and rewrites the extracted text into the translated text. At this time, the size of the translated text is appropriately changed or a line feed or the like is performed so that the translated text fits in the extracted text display area.

カメラ２０は、例えば１秒間に数〜数十のフレーム画像を撮像し、撮像毎にフレーム画像を装置本体１０に送信する。カメラ２０は、撮像したテキストの抽出、認識を容易にするため、シャッタスピードが速く、被写界深度が深いものが好ましい。 For example, the camera 20 captures several to several tens of frame images per second, and transmits the frame image to the apparatus main body 10 for each imaging. The camera 20 preferably has a high shutter speed and a deep depth of field in order to facilitate the extraction and recognition of the captured text.

入力部３０は、操作者が翻訳機能付きカメラ１の所望の動作を実行するための各種スイッチから構成されている。例えば、入力部３０は、カメラ２０で撮像するための撮像スイッチや、翻訳後の言語を選択する言語切替スイッチ等から構成されている。 The input unit 30 includes various switches for the operator to perform a desired operation of the camera 1 with a translation function. For example, the input unit 30 includes an imaging switch for capturing an image with the camera 20, a language switching switch for selecting a translated language, and the like.

表示部４０は、例えば翻訳機能付きカメラ１に搭載された液晶ディスプレイであり、カメラ２０が連続的に撮像したフレーム画像をリアルタイムで表示する。また、表示部４０は、翻訳前のテキストを翻訳後のテキストに書き換えて表示する。 The display unit 40 is a liquid crystal display mounted on the camera 1 with a translation function, for example, and displays frame images continuously captured by the camera 20 in real time. Moreover, the display part 40 rewrites the text before translation into the text after translation, and displays it.

次に、本発明に係る翻訳機能付きカメラ１による撮像したテキストの翻訳語表示処理について、図３及び図４に示したフローチャートを参照して説明する。なお、説明を簡略化するため、撮像したフレーム画像には常にテキストが含まれているものとする。 Next, the translated word display processing of the text captured by the camera 1 with a translation function according to the present invention will be described with reference to the flowcharts shown in FIGS. In order to simplify the description, it is assumed that the captured frame image always includes text.

まず、カメラ２０は、フレーム画像を撮像し、撮像したフレーム画像を装置本体１０に送信する（ステップＳ１１０）。 First, the camera 20 captures a frame image and transmits the captured frame image to the apparatus main body 10 (step S110).

次に、テキスト抽出部１１は、撮像したフレーム画像に写ったテキストを抽出する（ステップＳ１２０）。なお、テキストのフレーム画像からの抽出方法については前述した通りである。 Next, the text extraction unit 11 extracts the text shown in the captured frame image (step S120). The method for extracting the text from the frame image is as described above.

続いて、テキスト読取部１２は、ステップＳ１２０で抽出したテキストを構成する文字の特徴が、データベース１６に記憶された文字の特徴と合致するか調べる（ステップＳ１３０）。 Subsequently, the text reading unit 12 checks whether or not the characteristics of the characters constituting the text extracted in step S120 match the characteristics of the characters stored in the database 16 (step S130).

ステップＳ１３０において、テキストを構成する文字の特徴がデータベース１６に記憶されている文字の特徴と合致すると（ステップＳ１３０：Ｙｅｓ）、テキスト読取部１２は、テキストを構成する文字を、特徴が合致したデータベース１６に記憶された文字として認識する（ステップＳ１４０）。また、テキスト読取部１２は、このようにして認識した文字が構成するテキストの表示エリアの寸法を求める。 In step S130, when the characteristics of the characters constituting the text match the characteristics of the characters stored in the database 16 (step S130: Yes), the text reading unit 12 changes the characters constituting the text to the database whose characteristics match. It is recognized as a character stored in 16 (step S140). Further, the text reading unit 12 obtains the size of the display area of the text formed by the characters recognized in this way.

次に、翻訳処理部１３は、テキスト読取部１２が認識した文字から構成されたテキストを所定の言語に翻訳する（ステップＳ１５０）。例えば翻訳処理部１３は、多数の日本語の言葉と、その言葉に対応する複数の言語による言葉とを記憶したデータベース１６を検索することで、テキストを所定の言語に翻訳する。 Next, the translation processing unit 13 translates text composed of characters recognized by the text reading unit 12 into a predetermined language (step S150). For example, the translation processing unit 13 translates the text into a predetermined language by searching the database 16 storing a large number of Japanese words and words in a plurality of languages corresponding to the words.

次に、表示処理部１５は、翻訳したテキストの表示位置や表示寸法を設定し、表示部４０にフレーム画像とともに翻訳したテキストを表示する（ステップＳ１６０）。なお前述したように、表示処理部１５は、翻訳したテキストが翻訳前のテキストの表示エリアに収まるように表示部４０に表示する。 Next, the display processing unit 15 sets the display position and display dimensions of the translated text, and displays the translated text together with the frame image on the display unit 40 (step S160). As described above, the display processing unit 15 displays the translated text on the display unit 40 so as to fit in the display area of the text before translation.

ステップＳ１３０において、抽出したテキストを構成する文字の特徴が、データベース１６に記憶した文字の特徴と合致しないと（ステップＳ１３０：Ｎｏ）、表示処理部１５は、撮像したフレーム画像を表示部４０にそのまま表示する（ステップＳ１８０）。そして、ステップをステップＳ１７０に進める。 In step S130, if the characteristics of the characters constituting the extracted text do not match the characteristics of the characters stored in the database 16 (step S130: No), the display processing unit 15 directly displays the captured frame image on the display unit 40. Displayed (step S180). Then, the step proceeds to step S170.

なお、上述までのステップは、カメラ２０により最初のフレーム画像を撮像した際の翻訳語表示処理のステップを示している。ところで、カメラ２０は前述したように、１秒間に数〜数十のフレーム画像を撮像する。そこで、このようなフレーム画像の連続的な撮影とその翻訳語表示処理については、ステップＳ１７０において実施する。以下、ステップＳ１７０については図４に示したフローチャートを参照して説明する。 Note that the steps up to the above show the translated word display processing steps when the first frame image is captured by the camera 20. By the way, as described above, the camera 20 captures several to several tens of frame images per second. Therefore, such continuous shooting of frame images and translated word display processing are performed in step S170. Hereinafter, step S170 will be described with reference to the flowchart shown in FIG.

まず、ステップＳ１１０と同様に、カメラ２０はフレーム画像を撮像し、撮像したフレーム画像を装置本体に送信する（ステップＳ１７１）。 First, as in step S110, the camera 20 captures a frame image and transmits the captured frame image to the apparatus main body (step S171).

次に、撮像画像解析部１４はパターンマッチング法を実行して、前フレーム画像４１（ステップＳ１１０で撮像）及び対象フレーム画像４２（ステップＳ１７１で撮像）に共通する特徴あるパターンを検出する。そして、撮像画像解析部１４は、前フレーム画像４１と対象フレーム画像４２とにおける同一の被写体位置の違いやズームの差異を解析する（ステップＳ１７２）。そして、この解析結果に基づいて撮像画像解析部１４は、対象フレーム画像４２において、新たに撮像された領域を求める。なお、これらの具体的な解析方法についは、前述した通りである。 Next, the captured image analysis unit 14 executes a pattern matching method to detect a characteristic pattern common to the previous frame image 41 (imaged in step S110) and the target frame image 42 (imaged in step S171). Then, the captured image analysis unit 14 analyzes the same subject position difference and zoom difference between the previous frame image 41 and the target frame image 42 (step S172). Based on the analysis result, the captured image analysis unit 14 obtains a newly captured area in the target frame image 42. Note that these specific analysis methods are as described above.

次に、テキスト抽出部１１は、ステップＳ１７２で求めた新たに撮像した領域においてテキスト抽出処理を実行する（ステップＳ１７３）。 Next, the text extraction unit 11 executes text extraction processing in the newly imaged area obtained in step S172 (step S173).

続いて、テキスト抽出部１１は、ステップＳ１７３で新たにテキストを抽出したかどうか判断する（ステップＳ１７４）。 Subsequently, the text extraction unit 11 determines whether a new text is extracted in step S173 (step S174).

ステップＳ１７４において、テキスト抽出部１１が新たなテキストを抽出しなければ（ステップＳ１７４：Ｎｏ）、表示処理部１５は、ステップＳ１７２で求めた連続した異なるフレーム画像の撮像条件の差異から、前フレーム画像４１で表示済みのテキストを対象フレーム画像４２に表示する際の表示位置やテキストの寸法を設定する。そして、表示処理部１５は、対象フレーム画像４２とともに翻訳したテキストを表示部４０に表示する（ステップＳ１７５）。なお、前フレーム画像４１で表示済みのテキストが対象フレーム画像４２の外にある場合、表示処理部１５はこのテキストは表示しない。 In step S174, if the text extraction unit 11 does not extract a new text (step S174: No), the display processing unit 15 determines that the previous frame image from the difference in the imaging conditions of the consecutive different frame images obtained in step S172. The display position and text size when the text already displayed in 41 is displayed on the target frame image 42 are set. Then, the display processing unit 15 displays the translated text together with the target frame image 42 on the display unit 40 (step S175). Note that when the text already displayed in the previous frame image 41 is outside the target frame image 42, the display processing unit 15 does not display this text.

ここで、ステップＳ１７５の処理について例を挙げて説明する。例えば、前フレーム画像４１と対象フレーム画像４２が前述した図２に示した関係にあるとする。また、前フレーム画像４１のテキストの表示位置がｂ１（０，０）であり、テキストの表示エリア４３がＸＹ座標上で２×４の大きさであったとする。
まず、表示処理部１５は、前フレーム画像４１と対象フレーム画像４２とに生じているズームの差を解消するため、前フレーム画像４１のテキストの表示エリア４３を２倍の大きさに拡大して、対象フレーム画像４２における表示エリア４３´のＸＹ座標上の大きさ４×８を設定する。次に、表示処理部１５は、テキストの表示位置ｂ１（０，０）の座標を２倍し、Ｘ座標を２減じた座標ｂ１´（−２，０）を対象フレーム画像４２におけるテキストの表示位置とする。続いて、表示処理部１５は、前フレーム画像４１で表示したテキストを２倍の大きさとして、対象フレーム画像４２の表示位置ｂ１´（−２．０）に拡大したテキストを表示する。 Here, the process of step S175 will be described with an example. For example, assume that the previous frame image 41 and the target frame image 42 have the relationship shown in FIG. Further, it is assumed that the text display position of the previous frame image 41 is b1 (0, 0), and the text display area 43 has a size of 2 × 4 on the XY coordinates.
First, the display processing unit 15 enlarges the text display area 43 of the previous frame image 41 to double the size in order to eliminate the zoom difference between the previous frame image 41 and the target frame image 42. Then, the size 4 × 8 on the XY coordinates of the display area 43 ′ in the target frame image 42 is set. Next, the display processing unit 15 doubles the coordinate of the text display position b1 (0,0) and subtracts the X coordinate by 2 to display the text b1 ′ (− 2,0) in the target frame image 42. Position. Subsequently, the display processing unit 15 displays the enlarged text at the display position b1 ′ (−2.0) of the target frame image 42 with the text displayed in the previous frame image 41 being twice as large.

続いて、操作者の入力等に従い、再度フレーム画像を撮像するか否かを判断する（ステップＳ１７６）。 Subsequently, it is determined whether or not to capture a frame image again according to the operator's input or the like (step S176).

ステップＳ１７６において、フレーム画像を撮像しない場合は（ステップＳ１７６：Ｎｏ）、連続翻訳語表示処理を終了して、図３に示した翻訳語表示処理を終了する。 In step S176, when a frame image is not taken (step S176: No), the continuous translation word display process is terminated, and the translation word display process shown in FIG. 3 is terminated.

また、ステップＳ１７６において、さらにフレーム画像を撮像する場合は（ステップＳ１７６：Ｙｅｓ）、ステップをステップＳ１７１に戻して、再度フレーム画像を撮像して上述したステップを繰り返す。 In step S176, when further frame images are to be captured (step S176: Yes), the process returns to step S171, the frame images are captured again, and the above-described steps are repeated.

ステップＳ１７４において、テキスト抽出部１１が新たなテキストを抽出した場合、（ステップＳ１７４：Ｙｅｓ）、ステップＳ１３０と同様に、テキスト読取部１２は、新たに抽出したテキストを構成する文字の特徴が、データベース１６に記憶された文字の特徴と合致するか調べる（ステップＳ１７７）。 When the text extraction unit 11 extracts a new text in step S174 (step S174: Yes), the text reading unit 12 determines that the characteristics of the characters constituting the newly extracted text are the database, as in step S130. It is checked whether or not it matches the character feature stored in 16 (step S177).

ステップＳ１７７において、テキストを構成する文字の特徴がデータベース１６に記憶されている文字の特徴と合致すると（ステップＳ１７７：Ｙｅｓ）、テキスト読取部１２は、テキストを構成する文字を、特徴が合致したデータベース１６に記憶された文字として認識する。そして、翻訳処理部１３は、認識した文字から構成されたテキストを所定の言語に翻訳し（ステップＳ１７８）、ステップをステップＳ１７５に進める。
なお、このようにステップＳ１７８で新たに翻訳したテキストは、ステップＳ１７５において、表示処理部１５により対象フレーム画像４２内での表示位置や表示寸法が設定され、表示部４０に表示される。 In step S177, when the characteristics of the characters constituting the text match the characteristics of the characters stored in the database 16 (step S177: Yes), the text reading unit 12 changes the characters constituting the text to the database whose characteristics match. It is recognized as a character stored in 16. Then, the translation processing unit 13 translates the text composed of the recognized characters into a predetermined language (step S178), and advances the step to step S175.
Note that the text newly translated in step S178 is displayed on the display unit 40 in step S175, with the display processing unit 15 setting the display position and display dimensions in the target frame image 42.

また、ステップＳ１７７において、テキストを構成する文字の特徴がデータベース１６に記憶されている文字の特徴と合致しないと（ステップＳ１７７：Ｎｏ）、ステップをステップＳ１７５に進める。なお、このように、翻訳されなかったテキストは、ステップＳ１７５において、表示処理部１５により撮像された状態のまま表示部４０に表示される。 In step S177, if the character features constituting the text do not match the character features stored in the database 16 (step S177: No), the process proceeds to step S175. In this way, the text that has not been translated is displayed on the display unit 40 in a state of being captured by the display processing unit 15 in step S175.

次に、本発明にかかる翻訳機能付きカメラ１が撮像したフレーム画像の概略図を図５及び図６に示し、翻訳したテキストがどのようにフレーム画像に表示されるのか説明する。なお、各図（ａ）〜（ｃ）は、フレーム画像を撮像順に並べたものである。 Next, schematic diagrams of frame images taken by the camera with a translation function 1 according to the present invention are shown in FIGS. 5 and 6, and how the translated text is displayed on the frame images will be described. In addition, each figure (a)-(c) arranges the frame image in order of imaging.

図５（ａ）に示すように、フレーム画像５０には道路５１を中心にして、その両脇に沿って建てられた建物５２が写しだされている。この建物５２には、ハングルのテキスト５５が記載された看板５３が設置されている。また、図５（ａ）に示したフレーム画像５０を撮像した状態から、図中左側へ撮像中心を移動させつつズームアップしながら撮像したフレーム画像５０が図５（ｂ），（ｃ）である。当然ながら、フレーム画像５０に写し出されたテキスト５５は、撮像を重ねる度にその位置が変化し、寸法も大きく写し出されることとなる。 As shown in FIG. 5A, the frame image 50 shows buildings 52 built around the road 51 and along both sides thereof. The building 52 is provided with a signboard 53 on which a Korean text 55 is written. Further, FIGS. 5B and 5C show the frame images 50 captured while zooming up while moving the imaging center to the left side in the figure from the state in which the frame images 50 shown in FIG. . As a matter of course, the position of the text 55 projected on the frame image 50 changes each time the image is taken, and the size is also projected larger.

フレーム画像撮像の際に、翻訳機能付きカメラ１の操作者が日本語への翻訳機能をオンにしておくと、図５に示したテキスト５５は、図６に示すように翻訳後のテキスト５６に変換されて表示される。例えば、図５（ａ）の右側に示した表示エリア５８内に記されたハングル語のテキスト５５を、翻訳後のテキスト５６“ソルロンタン”に書き換える場合を例にあげる。翻訳後のテキスト５６“ソルロンタン”は、図６に示した他の翻訳後のテキスト５６と比べて文字数が多いにもかかわらず、表示エリア５８はそれほど大きくない。そのため、図６に示すように、翻訳後のテキスト５６“ソルロンタン”は他の翻訳後のテキスト５６よりも小さく表示されることとなる。 When the operator of the camera 1 with a translation function turns on the translation function into Japanese at the time of capturing a frame image, the text 55 shown in FIG. 5 is changed to the translated text 56 as shown in FIG. It is converted and displayed. For example, a case where the Hangul text 55 written in the display area 58 shown on the right side of FIG. 5A is rewritten to the translated text 56 “Sollon Tan” is taken as an example. Although the translated text 56 “Sollon Tan” has a larger number of characters than the other translated text 56 shown in FIG. 6, the display area 58 is not so large. Therefore, as shown in FIG. 6, the translated text 56 “Sollon Tan” is displayed smaller than the other translated text 56.

また、図６（ｂ），（ｃ）に示すように、連続撮像したフレーム画像５０に写し出された同一の被写体の位置の違いやズームの差が生じた場合にも、これらの撮像条件の変化に基づいて、翻訳後のテキストの表示位置及び寸法を決定し、表示することができる。 In addition, as shown in FIGS. 6B and 6C, even when a difference in the position of the same subject and a difference in zoom appear in the continuously captured frame images 50, the change in these imaging conditions occurs. Based on the above, the display position and size of the translated text can be determined and displayed.

以上説明したように、本発明に係る翻訳機能付きカメラ及びテキストの表示方法は、連続撮像したフレーム画像において求めた撮像条件の差から、前フレーム画像において表示したテキストの対象フレーム画像における表示位置を特定する。そのため、一旦、フレーム画像から抽出、認識、翻訳したテキストは、その後に撮像したフレーム画像における表示位置、及び表示寸法を設定するだけで、テキストを適切に表示部に表示し続けることができる。そのため、翻訳したテキストをすばやく表示することが可能となり、フレーム画像の表示と翻訳したテキストの表示とのタイミングがずれるという視覚上のジッタを抑制することができる。 As described above, the camera with a translation function and the text display method according to the present invention determine the display position of the text displayed in the previous frame image in the target frame image from the difference in imaging conditions obtained in the continuously captured frame images. Identify. For this reason, once extracted, recognized, and translated text from the frame image, it is possible to continue displaying the text appropriately on the display unit simply by setting the display position and display size in the captured frame image. Therefore, it is possible to quickly display the translated text, and it is possible to suppress the visual jitter that the timing between the display of the frame image and the display of the translated text is shifted.

また、本発明に係る翻訳機能付きカメラは、フレーム画像に写ったテキストが表示された表示エリアの寸法を特定し、連続撮像したフレーム画像において求めた撮像条件の差から、前フレーム画像において特定した表示エリアの対象フレーム画像における寸法を決定する。そして、翻訳機能付きカメラは、翻訳したテキストを表示エリア内に収まるように表示する。そのため、翻訳したテキストがどこに標記された文字の翻訳であるかが明確になるとともに、翻訳したテキストを重ねて表示することをなくすことができる。 In addition, the camera with a translation function according to the present invention specifies the size of the display area where the text captured in the frame image is displayed, and specifies the previous frame image from the difference in imaging conditions obtained in the continuously captured frame images. The size of the target frame image in the display area is determined. Then, the camera with a translation function displays the translated text so as to be within the display area. Therefore, it becomes clear where the translated text is the translation of the marked character, and the translated text can be prevented from being displayed in an overlapping manner.

本発明は上述した実施形態に限られず、様々な変形及び応用が可能である。
例えば、上述では前フレーム画像と対象フレーム画像とのズームの差を求めるために、特徴ある２点の距離を算出したが、例えば、前フレーム画像と対象フレーム画像とに共通する特徴あるパターンの領域をそれぞれ求め、このパターンの大きさを比較することでズームの差を求めてもよい。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible.
For example, in the above description, the distance between two characteristic points is calculated in order to obtain the zoom difference between the previous frame image and the target frame image. For example, a characteristic pattern area common to the previous frame image and the target frame image is calculated. May be obtained, and the zoom difference may be obtained by comparing the sizes of the patterns.

また、上記実施形態では、翻訳したテキストを表示する場合について述べたが、図７に示すように、日本語による発音テキスト５７をハングルのテキストに併記するようにしてもよい。この場合も、連続撮像したフレーム画像間の撮像条件の差異を求めることで、発音テキスト５７の表示位置および表示寸法を適切に設定することができるとともに、上述した視覚上のジッタも抑制することができる。 Moreover, although the case where the translated text was displayed was described in the said embodiment, as shown in FIG. 7, you may make it write together the pronunciation text 57 in Japanese with the Korean text. Also in this case, by obtaining a difference in imaging conditions between consecutively captured frame images, the display position and display size of the pronunciation text 57 can be set appropriately, and the above-described visual jitter can be suppressed. it can.

本発明の実施形態に係る翻訳機能付きカメラの構成図。The block diagram of the camera with a translation function which concerns on embodiment of this invention. ＸＹ座標を付したフレーム画像の説明図で、（ａ）は前フレーム画像の説明図、（ｂ）は対象フレーム画像の説明図。It is explanatory drawing of the frame image which attached | subjected XY coordinate, (a) is explanatory drawing of a previous frame image, (b) is explanatory drawing of an object frame image. 翻訳語表示処理を示したフローチャート。The flowchart which showed the translation word display process. 連続翻訳語表示処理を示したフローチャート。The flowchart which showed the continuous translation word display process. 翻訳機能付きカメラの撮像画像を撮像順に（ａ）〜（ｃ）で並べた概略図。Schematic which arranged the picked-up image of the camera with a translation function in order of image pick-up by (a)-(c). 翻訳したテキストを表示した撮像画像を撮像順に（ａ）〜（ｃ）で並べた概略図。Schematic which arranged the picked-up image which displayed the translated text in order of picking-up in (a)-(c). 他の実施形態の翻訳機能付きカメラの撮像画像を撮像順に（ａ）〜（ｃ）で並べた概略図。Schematic which arranged the picked-up image of the camera with a translation function of other embodiment in order of image pick-up (a)-(c).

Explanation of symbols

１翻訳機能付きカメラ
１０装置本体
１１テキスト抽出部
１２テキスト読取部
１３翻訳処理部
１４撮像画像解析部
１５表示処理部
１６データベース
２０カメラ
３０入力部
４０表示部
４１前フレーム画像
４２対象フレーム画像
４３表示エリア DESCRIPTION OF SYMBOLS 1 Camera with a translation function 10 Apparatus main body 11 Text extraction part 12 Text reading part 13 Translation processing part 14 Captured image analysis part 15 Display processing part 16 Database 20 Camera 30 Input part 40 Display part 41 Previous frame image 42 Target frame image 43 Display area

Claims

Image capturing means for continuously capturing a plurality of frame images;
Character extraction means for extracting characters from a frame image captured by the image capturing means;
Character recognition means for recognizing the character extracted by the character extraction means;
Translation means for translating a text composed of characters recognized by the character recognition means into text in a predetermined language;
Display means for combining and displaying the text translated by the translation means on the frame image;
Frame image analysis means for obtaining a difference in imaging conditions between a frame image captured by the image capturing means and a frame image captured immediately before the frame image;
The display means translates the frame image captured immediately after the difference between the imaging conditions of the two frame images analyzed by the frame image analysis means and the position of the translated text displayed in the frame image captured immediately before. A camera with a translation function, characterized in that the display position of the translated text is specified, and the translated text is synthesized with a frame image captured immediately after the text.

The character recognizing unit further specifies a size of a display area of a text captured in a frame image captured by the image capturing unit;
The display means translates the frame image captured immediately after the difference between the imaging conditions of both frame images analyzed by the frame image analysis means and the size of the display area specified in the frame image captured immediately before. The camera with a translation function according to claim 1, wherein a dimension of a text display area is determined.

3. The translation function according to claim 2, wherein the display unit adjusts a display size of the text translated by the translation unit, and displays the translated text in a display area determined by the display unit. camera.

The frame image analysis unit detects a characteristic pattern common to the frame image captured by the image capturing unit and the frame image captured immediately before the frame image, and from the difference in display position of the pattern detected in both the frame images, The camera with a translation function according to claim 1, wherein a difference in imaging conditions between both frame images is obtained.

An imaging step of continuously capturing a plurality of frame images;
A display step of continuously displaying the frame image with a predetermined text added thereto;
Comparing the frame image captured in the imaging step with the frame image captured immediately before it, calculating the difference in position and zoom of the same subject imaged in both frame images, A frame image analysis step for analyzing a difference in imaging conditions,
In the display step, the translation in the frame image captured immediately after the difference between the imaging conditions of the both frame images analyzed in the frame image analysis step and the position of the translated text displayed in the frame image captured immediately before A method for displaying a text, wherein the display position of the text is specified and the translated text is added to the frame image captured immediately after the text.

A character extraction step of extracting characters from the frame image captured in the imaging step;
A character recognition step for recognizing the character extracted in the character extraction step;
A translation step of translating the text composed of the characters recognized in the character recognition step into a predetermined language,
The text display method according to claim 5, wherein the text displayed in the display step is a text translated in the translation step.

In the character recognition step, further specify the size of the display area of the text shown in the frame image captured in the imaging step,
In the display step, the difference between the imaging conditions of both frame images analyzed in the frame image analysis step and the size of the display area specified in the frame image captured immediately before are translated in the frame image captured immediately thereafter. The text display method according to claim 6, wherein the size of the text display area is determined.

8. The text according to claim 7, wherein the display step further includes adjusting the display size of the text translated in the translation step and displaying the translated text in the display area determined in the display step. How to display.