JP2009130899A

JP2009130899A - Image playback apparatus

Info

Publication number: JP2009130899A
Application number: JP2007307108A
Authority: JP
Inventors: Naoki Kizu; 直樹木津; Miki Sugano; 美樹菅野; Kazuhiro Sugiyama; 和宏杉山
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-11-28
Filing date: 2007-11-28
Publication date: 2009-06-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image playback apparatus capable of accessing a telephone or Web by extracting, as text data, a telephone number, URL, retrieval word, and the like from a moving image of one-segment broadcasting or the like. <P>SOLUTION: An image playback apparatus includes: a character region extracting means 3 for extracting, as a character region image, a region containing characters from input image information; a character region image accumulating means for storing the character region image extracted by the character region extracting means 3; a character region processing means 5 for processing characters contained in the character region image; a character recognizing means 6 for recognizing characters from the character region image processed by the character region processing means 5 to produce character text data; a character information accumulating means 7 for storing the character text data produced by the character recognizing means 6; a key input means 9 for performing input operation from the outside; and a control means 8 for controlling respective operational sections based on input instructions of the key input means 9. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、動画像からテキストデータを抽出し、抽出したテキストデータから直接外部に対してアクセス可能な画像再生装置に関する。 The present invention relates to an image reproducing apparatus that extracts text data from a moving image and can directly access the outside from the extracted text data.

近年の携帯電話では、ワンセグ放送などのテレビ放送が視聴可能であるものや、フルブラウジング機能を有しているものが数多く提案されている。ワンセグ放送において、特にコマーシャルメッセージ（以下、ＣＭとする）部分にはフリーダイアルなどの電話番号、インターネット上のＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）や検索ワードなどの情報が含まれることが多い。ＣＭに関心をもった視聴者やユーザーは、これらの情報を記憶し、別のステップで電話やＷｅｂ（ＷｏｒｌｄＷｉｄｅＷｅｂ）にアクセスする必要があった。 In recent years, many mobile phones have been proposed that can watch TV broadcasts such as one-segment broadcasting and those that have a full browsing function. In one-segment broadcasting, in particular, a commercial message (hereinafter referred to as CM) portion often includes information such as a telephone number such as a free dial, a URL (Uniform Resource Locator) on the Internet, and a search word. Viewers and users who are interested in the CM need to store this information and access the telephone and the Web (World Wide Web) in another step.

従来の携帯電話における情報を記録する手段として、例えば、携帯電話などの画像再生装置に内蔵されているデジタルカメラで名刺や時刻表などを撮影し、その情報を画像形式のデータとして保存・活用しているものがある。また、例えば、携帯電話に内蔵されているデジタルカメラで撮影した画像形式のデータを、光学式文字認識（ＯＣＲ：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）と組み合わせてテキスト形式のデータベースとして保存することによって、保存した情報が変更されたときなどに更新を容易にすることができる装置および方法が提案されている（例えば、特許文献１参照）。 As a means of recording information in a conventional mobile phone, for example, a business card or timetable is photographed with a digital camera built in an image playback device such as a mobile phone, and the information is stored and utilized as image format data. There is something that is. In addition, for example, data in an image format captured by a digital camera built in a mobile phone is stored as a text format database in combination with optical character recognition (OCR), so that the stored information can be stored. There has been proposed an apparatus and a method that can be easily updated when changed (see, for example, Patent Document 1).

特開２００７−１６４７８５号公報JP 2007-164785 A

特許文献１では、携帯電話に内蔵されているデジタルカメラを用いて情報を記録することについて記載されており、ワンセグ放送中に表示される情報を記録することについての記載がなされていない。 Patent Document 1 describes recording information using a digital camera built in a mobile phone, and does not describe recording information displayed during one-segment broadcasting.

ユーザーが電話番号、ＵＲＬ、検索ワードを正確に記録しておくことには困難が多い。また、電車などの移動体の中で携帯電話でワンセグ放送を視聴した場合に、移動体の揺れや画面に表示されている情報の解像度の低さが原因となって読み取ることが難しく、見逃してしまうことも多い。 It is often difficult for a user to accurately record a telephone number, URL, and search word. Also, when watching 1Seg broadcasting with a mobile phone in a moving body such as a train, it is difficult to read due to the shaking of the moving body and the low resolution of the information displayed on the screen. It often ends up.

本発明は、これらの問題を解決するためになされたもので、ワンセグ放送などの動画像から、電話番号、ＵＲＬ、検索ワードなどをテキストデータとして抽出することによって電話やＷｅｂにアクセス可能な画像再生装置を提供することを目的とする。 The present invention has been made to solve these problems, and by reproducing a telephone number, a URL, a search word, and the like as text data from a moving image such as one-segment broadcasting, image reproduction that can be accessed on the telephone or the Web. An object is to provide an apparatus.

上記の課題を解決するために、本発明による画像再生装置は、入力された画像情報から文字を含む領域を文字領域画像として抽出する文字領域抽出手段と、文字領域抽出手段にて抽出された文字領域画像を記憶する文字領域画像蓄積手段と、文字領域画像に含まれる文字を加工する文字領域加工手段と、文字領域加工手段にて加工された文字領域画像から文字を認識して文字テキストデータを生成する文字認識手段と、文字認識手段にて生成された文字テキストデータを記憶する文字情報蓄積手段と、画像情報や、文字テキストデータあるいは文字領域画像に関する操作メニューを表示する表示手段と、外部から入力操作を行うキー入力手段と、キー入力手段の入力指示に基づいて各動作部を制御する制御手段とを備えることを特徴とする。 In order to solve the above-described problems, an image reproduction apparatus according to the present invention includes a character area extraction unit that extracts a region including a character from the input image information as a character region image, and a character extracted by the character region extraction unit. Character area image storage means for storing area images, character area processing means for processing characters included in the character area image, and character text data by recognizing characters from the character area image processed by the character area processing means A character recognizing means to be generated, a character information accumulating means for storing character text data generated by the character recognizing means, a display means for displaying an operation menu relating to image information, character text data or character area image, and externally It is characterized by comprising key input means for performing an input operation, and control means for controlling each operation unit based on an input instruction of the key input means.

本発明によると、文字領域抽出手段は入力された画像情報から文字を含む領域を文字領域画像として抽出し、文字領域画像蓄積手段は文字領域抽出手段にて抽出された文字領域画像を記憶し、文字領域加工手段は文字領域画像に含まれる文字を加工し、文字認識手段は文字領域加工手段にて加工された文字領域画像から文字を認識して文字テキストデータを生成し、文字情報蓄積手段は文字認識手段にて生成された文字テキストデータを記憶するため、ワンセグ放送などの動画像から、電話番号、ＵＲＬ、検索キーワードなどをテキストデータとして抽出することによって電話やＷｅｂにアクセス可能となる。 According to the present invention, the character area extraction unit extracts a region including characters from the input image information as a character region image, the character region image storage unit stores the character region image extracted by the character region extraction unit, The character area processing means processes characters included in the character area image, the character recognition means recognizes characters from the character area image processed by the character area processing means, generates character text data, and the character information storage means Since character text data generated by the character recognition means is stored, a telephone number, a URL, a search keyword, and the like are extracted as text data from a moving image such as one-segment broadcasting, thereby enabling access to the telephone or the Web.

本発明の実施形態について、図面を用いて以下に説明する。 Embodiments of the present invention will be described below with reference to the drawings.

〈実施形態１〉
図１は、本発明の実施形態１による画像再生装置１０１の構成を示すブロック図である。図１に示すように、画像再生装置１０１は、入力された画像データ（画像情報）を１フレームごとに記録するフレームメモリ１と、ＣＭ期間を検出するＣＭ期間検出部２（ＣＭ期間検出手段）と、入力された画像データから文字を含む領域を文字領域画像として抽出する文字領域抽出部３（文字領域抽出手段）と、文字領域抽出部３にて抽出された文字領域画像を記憶する文字領域画像蓄積部４（文字領域蓄積手段）と、文字領域画像に含まれる文字を加工する文字領域加工部５（文字領域加工手段）と、文字領域加工部５にて加工された文字領域画像から文字を認識して文字テキストデータ（テキストデータ）を生成する文字認識部６（文字認識手段）と、文字認識部６にて生成された文字テキストデータを記憶する文字情報蓄積部７（文字情報蓄積手段）と、画像データや操作メニューを表示する表示部１０（表示手段）と、外部から入力操作を行うキー入力部９（キー入力手段）と、キー入力部９の入力指示に基づいて各動作部を制御する制御部８（制御手段）とからなる。 <Embodiment 1>
FIG. 1 is a block diagram showing a configuration of an image reproduction apparatus 101 according to Embodiment 1 of the present invention. As shown in FIG. 1, an image reproducing device 101 includes a frame memory 1 that records input image data (image information) for each frame, and a CM period detector 2 (CM period detector) that detects a CM period. A character region extraction unit 3 (character region extraction means) that extracts a region including characters from the input image data as a character region image, and a character region that stores the character region image extracted by the character region extraction unit 3 Characters from the image storage unit 4 (character region storage unit), the character region processing unit 5 (character region processing unit) that processes characters included in the character region image, and the character region image processed by the character region processing unit 5 A character recognition unit 6 (character recognition means) that generates character text data (text data) and a character information storage unit 7 (sentence) that stores character text data generated by the character recognition unit 6 An information storage unit), a display unit 10 (display unit) for displaying image data and an operation menu, a key input unit 9 (key input unit) for performing an input operation from the outside, and an input instruction from the key input unit 9 It comprises a control unit 8 (control means) that controls each operation unit.

本発明の実施形態１の特徴は、画像データが入力されている間、制御部８はキー入力部９の入力指示に基づいて文字情報蓄積部７に記憶されているテキストデータを選択し、選択したテキストデータに含まれる電話番号による電話接続、アドレスまたはキーワードによるインターネット接続を行うことである。 The feature of Embodiment 1 of the present invention is that, while image data is being input, the control unit 8 selects and selects text data stored in the character information storage unit 7 based on an input instruction from the key input unit 9. Phone connection using a telephone number included in the text data, and Internet connection using an address or keyword.

フレームメモリ１は、ＳＤＲＡＭなどのメモリからなり、ＭＰＥＧ４（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ４）などで符号化された映像コンテンツを復号化した後に入力された画像データを１フレーム単位で記録する。フレームメモリ１から読み出された画像データは、順次、ＣＭ期間抽出部２に入力される。 The frame memory 1 is composed of a memory such as SDRAM, and records image data input after decoding video content encoded by MPEG4 (Moving Picture Experts Group 4) or the like in units of one frame. The image data read from the frame memory 1 is sequentially input to the CM period extraction unit 2.

ＣＭ期間検出部２は、ＣＭの特徴であるステレオ放送への切り替わり時や、放送番組がＣＭに移り変わるときに存在する無音期間を検出することによって、放送番組の合間であるＣＭ期間を検出する。また、ＣＭ毎に挿入されている無音期間の検出や、ＣＭ期間の１５秒間や３０秒間のカウント、２フレームの画像データ間におけるヒストグラムの変化などから映像の切り替わり（シーンチェンジ）を検出した結果などを合わせて用いると、より正確なＣＭ期間の検出が可能となる。ＣＭ期間検出部２にて検出された画像データであるＣＭ映像は、フレームメモリ１から読み出されて文字領域抽出部３に入力される。 The CM period detection unit 2 detects a CM period that is between broadcast programs by detecting a silent period that is present when switching to stereo broadcasting, which is a characteristic of CM, or when a broadcast program is changed to CM. In addition, detection of silent periods inserted for each CM, count of CM periods for 15 seconds or 30 seconds, results of detecting video change (scene change) from changes in histograms between 2-frame image data, etc. When used together, a more accurate CM period can be detected. CM video, which is image data detected by the CM period detection unit 2, is read from the frame memory 1 and input to the character region extraction unit 3.

なお、本発明の実施形態において、放送番組などの動画像に表示される電話番号、ＵＲＬ、検索ワードは、主にＣＭ期間中に表示されることが多いため、ＣＭ期間検出部２にて検出したＣＭ映像のみを文字領域抽出部３に出力しているが、ＣＭ以外の番組中に表示される検索ワードなども検出して利用する場合には、ＣＭ期間検出部２による処理を省略してもよい。 In the embodiment of the present invention, since the telephone number, URL, and search word displayed on a moving image such as a broadcast program are often displayed mainly during the CM period, the CM period detection unit 2 detects them. Only the CM video that has been processed is output to the character area extraction unit 3, but when the search word displayed in a program other than the CM is detected and used, the processing by the CM period detection unit 2 is omitted. Also good.

次に、文字領域抽出部３について詳細に説明する。 Next, the character area extraction unit 3 will be described in detail.

文字領域抽出部３では、入力された画像データから文字を含む領域を文字領域画像として抽出している。文字領域画像中の文字は、ほとんど変化がないか、または動かないことが特徴である。そのため、入力された画像データのフレームと直前に入力された画像データのフレームとを比較し、変化が少ないかまたは変化がない領域が文字領域画像である可能性が高い。 The character region extraction unit 3 extracts a region including characters from the input image data as a character region image. Characters in the character area image are characterized by little change or movement. For this reason, the frame of the input image data is compared with the frame of the image data input immediately before, and there is a high possibility that an area with little or no change is a character area image.

文字領域画像を抽出する方法としては、文字らしさの指標を利用して文字領域画像を限定する。文字らしさの指標とは、
・周波数成分が高い、すなわち、入力された画像データを分割した領域内において検出されるエッジの数が多い。
・文字の線の太さ分だけ同一画素値が連続する。すなわち、文字の輪郭内の画素値は略一様である。
・文字の輪郭に対して矩形処理を行うと、矩形が一定間隔で整列されている。
などがある。 As a method of extracting the character area image, the character area image is limited using an index of character character. What is the character quality indicator?
-The frequency component is high, that is, the number of edges detected in a region obtained by dividing the input image data is large.
-The same pixel value continues for the thickness of the character line. That is, the pixel values in the outline of the character are substantially uniform.
-When rectangle processing is performed on the outline of a character, the rectangles are aligned at regular intervals.
and so on.

図２は、本発明の実施形態１による文字領域抽出部３の構成を示すブロック図である。図２に示すように、文字領域抽出部３は、エッジ抽出部３０１、２値化部３０２、２値情報保持部３０３、エッジ静止検出部３０４、エッジ動き検出部３０５、文字らしさ判定部３０６、静止領域保持部３０７、動き領域保持部３０８、文字領域入力検出部３０９、文字領域消滅検出部３１０、文字領域検出部３１１、ラベリング／矩形整形部３１２とからなる。 FIG. 2 is a block diagram showing the configuration of the character area extraction unit 3 according to Embodiment 1 of the present invention. As shown in FIG. 2, the character region extraction unit 3 includes an edge extraction unit 301, a binarization unit 302, a binary information holding unit 303, an edge stationary detection unit 304, an edge motion detection unit 305, a character likelihood determination unit 306, It consists of a static area holding unit 307, a movement area holding unit 308, a character area input detection unit 309, a character area disappearance detection unit 310, a character area detection unit 311, and a labeling / rectangular shaping unit 312.

エッジ抽出部３０１は、ＳｏｂｅｌフィルタやＰｒｅｗｉｔｔフィルタなどの空間フィルタを用いることによって、文字領域画像中に含まれるエッジ情報を抽出する。 The edge extraction unit 301 extracts edge information included in the character area image by using a spatial filter such as a Sobel filter or a Prewitt filter.

エッジ抽出部３０１にて抽出されたエッジ情報は、２値化部３０２によって白と黒の２値で表現される。このとき、例えば、エッジを白（１）、それ以外を黒（０）としてもよい。２値化を行った結果、文字領域画像の情報量が削減され、後の処理が簡易化される。 The edge information extracted by the edge extraction unit 301 is expressed by the binarization unit 302 as binary values of white and black. At this time, for example, the edge may be white (1) and the other may be black (0). As a result of the binarization, the information amount of the character area image is reduced, and subsequent processing is simplified.

なお、２値化の方法は、例えば本発明で対象となる文字はＣＭなどで用いられる強調された文字が多く、これらの文字は高輝度でコントラストが高いため、所定の画素値で２値化を行う固定閾値でも可能である。また、文字が背景と比べて、高輝度、もしくは、コントラストが高いという特徴を持たない場合、着目画素の近傍の画素値の平均を求め、平均値との大小関係から２値化する自動２値化処理方法（可変閾値法や適応閾値法）などを適用することで改善されることがあるなど、最適な２値化法は入力画像の特徴によって異なるため、選択的に２値化の方法を適用することが理想的である。 In the binarization method, for example, there are many emphasized characters used in CM or the like as target characters in the present invention, and these characters are binarized with a predetermined pixel value because they have high brightness and high contrast. It is also possible to use a fixed threshold value for performing. In addition, when the character does not have a feature of high brightness or high contrast compared to the background, an automatic binary that obtains an average of pixel values in the vicinity of the pixel of interest and binarizes it from the magnitude relationship with the average value. The optimal binarization method varies depending on the characteristics of the input image. For example, the binarization method can be selectively used. Ideal to apply.

２値情報保持部３０３では、入力された画像データの１フレーム前の２値情報が保持されている。 The binary information holding unit 303 holds binary information of one frame before the input image data.

エッジ静止検出部３０４では、２値化部３０２における現フレームの２値情報と、２値情報保持部３０３における１フレーム前の２値情報とを比較し、エッジが静止していることを検出する。例えば、フレーム内の各画素単位ごとに、２値化部３０２の２値情報が白（１）、２値情報保持部３０３の２値情報が白（１）の、ともにエッジである箇所を検出する。検出結果として、静止しているエッジ情報を示すフラグを１、それ以外は０などと２値表現する。 The edge stationary detection unit 304 compares the binary information of the current frame in the binarizing unit 302 with the binary information of the previous frame in the binary information holding unit 303, and detects that the edge is stationary. . For example, for each pixel unit in the frame, a portion where both the binary information of the binarizing unit 302 is white (1) and the binary information of the binary information holding unit 303 is white (1) is detected as an edge. To do. As a detection result, a flag indicating stationary edge information is expressed as a binary value, 1 otherwise, and 0.

エッジ動き検出部３０５では、２値化部３０２における現フレームの２値情報と、２値情報保持部３０３における１フレーム前の２値情報とを比較し、エッジが動いたことを検出する。例えば、２値情報保持部３０３の２値情報が白（１）であった画素が、２値化部３０２の２値情報では黒（０）となった画素を検出する。あるいは、２値情報保持部３０３の２値情報が黒（０）であった画素が、２値化部３０２の２値情報では白（１）となった画素を検出する。または、２値情報保持部３０３と２値化部３０２との２値情報を比較し、白（１）から黒（０）、および黒（０）から白（１）となった箇所を検出する。検出結果として、動きのあったエッジ情報を示すフラグを１、それ以外は０などと２値表現する。 The edge motion detection unit 305 compares the binary information of the current frame in the binarization unit 302 with the binary information of the previous frame in the binary information holding unit 303 and detects that the edge has moved. For example, a pixel in which the binary information in the binary information holding unit 303 is white (1) is detected as a pixel in which the binary information in the binarizing unit 302 is black (0). Alternatively, a pixel in which the binary information in the binary information holding unit 303 is black (0) is detected as a pixel in which the binary information in the binarization unit 302 is white (1). Alternatively, the binary information of the binary information holding unit 303 and the binarizing unit 302 is compared, and a portion where white (1) is changed to black (0) and black (0) is changed to white (1) is detected. . As a detection result, a flag indicating edge information that has moved is represented as a binary value, and otherwise, it is represented as a binary value.

文字らしさ判定部３０６では、例えば、４ピクセル×４ピクセルに領域を分割し、文字が存在する領域ではエッジの数が多くて高周波であることを利用し、各領域内において静止していたエッジおよび動きのあったエッジが何％（所定の閾値以上）あるかによって文字らしさを判定する。分割された各領域ごとに、静止している文字領域画像および動きのあった文字領域画像を示すフラグを付加する。 The character-likeness determination unit 306 divides the region into, for example, 4 pixels × 4 pixels, and uses the fact that there are a large number of edges and a high frequency in the region where characters exist, The character-likeness is determined by the percentage of the moving edge (greater than a predetermined threshold). A flag indicating a stationary character area image and a moving character area image is added to each divided area.

なお、本発明の実施形態では、文字らしさの指標として、分割された領域内でのエッジの数が多いこととしたが、他の文字らしさの指標として前述のような方法を用いてもよい。また、本発明の実施形態では、２値化部３０２と２値情報保持部３０３との２値情報を比較することによってエッジの静止および動きを検出し、検出結果に基づいて文字らしさ判定部３０６で文字らしさを判定しているが、２値化部３０２および２値情報保持部３０３で２値化された情報に基づいて文字らしさ判定部３０６で文字らしさを判定し、判定結果である文字らしい領域においてエッジの静止および動きを検出してもよい。 In the embodiment of the present invention, the number of edges in the divided area is large as a character-like index, but the above-described method may be used as another character-like index. In the embodiment of the present invention, the stationary and moving edges are detected by comparing the binary information of the binarizing unit 302 and the binary information holding unit 303, and the character likelihood determining unit 306 is detected based on the detection result. The character likelihood is determined by the character likelihood determination unit 306 based on the information binarized by the binarization unit 302 and the binary information holding unit 303, and the character is a determination result. Edge stationary and motion may be detected in the region.

静止領域保持部３０７および動き領域保持部３０８では、文字らしさ判定部３０６にて付加された静止している文字領域および動きのあった文字領域を示すフラグをそれぞれ保持している。 The still area holding unit 307 and the moving area holding unit 308 hold flags indicating the stationary character area and the moving character area added by the character likelihood determination unit 306, respectively.

文字領域入力検出部３０９では、現在のエッジが静止しているという情報と、動き領域保持部３０８に保持されている過去にエッジに動きがあったという情報とから、入力された画像データに文字情報が挿入されたことを検出する。例えば、テロップのように文字情報が挿入された直後の文字領域画像ではエッジに動きがあり、その後エッジが静止する。 The character area input detection unit 309 uses the information that the current edge is stationary and the information that the edge has moved in the past that is held in the movement area holding unit 308 to the character data in the input image data. Detect that information has been inserted. For example, in a character area image immediately after character information is inserted, such as a telop, there is movement at the edge, and then the edge is stationary.

文字領域消滅検出部３１０では、現在のエッジに動きがあったという情報と、静止領域保持部３０７に保持されている過去にエッジが静止していたという情報とから、入力画像データから文字情報が消滅したことを検出する。たとえば、テロップのような文字情報が消滅した直前の文字領域画像ではエッジは静止し、その後エッジに動きがある。 The character area disappearance detection unit 310 obtains character information from the input image data from the information that the current edge has moved and the information that the edge has been stationary in the past held in the static region holding unit 307. Detect that it has disappeared. For example, in the character area image immediately before the character information such as telop disappears, the edge is stationary and then the edge moves.

文字領域検出部３１１では、所定の複数フレーム間においてエッジが静止し、さらに文字らしいと判定される領域を文字領域画像として検出する。文字領域検出部３１１で検出される文字領域画像は、同様の特徴を持つ背景画像において誤検出が生じる可能性がある。この誤検出を改善するために、文字領域入力検出部３０９からの出力を保持・蓄積しておきマスク信号として利用する。このマスク信号を利用して、文字領域入力検出部３０９にて文字領域が挿入されたと判断された文字領域画像以外の領域を文字の検出の対象外とする。一方、文字領域消滅検出部３１０からの出力によってマスク信号をリセットすることによって、文字領域画像の更新を行う。 The character area detection unit 311 detects, as a character area image, an area where an edge is stationary between a plurality of predetermined frames and is determined to be a character. The character area image detected by the character area detection unit 311 may be erroneously detected in a background image having similar characteristics. In order to improve this erroneous detection, the output from the character area input detection unit 309 is held and accumulated and used as a mask signal. Using this mask signal, the area other than the character area image for which the character area is determined to be inserted by the character area input detection unit 309 is excluded from character detection. On the other hand, the character area image is updated by resetting the mask signal by the output from the character area disappearance detection unit 310.

文字領域検出部３１１からの文字領域画像のアドレス、または文字領域画像の画像情報の出力は、文字領域消滅検出部３１０の出力である文字領域情報更新タイミングで行う。このことは、後の文字領域画像蓄積部４に対して同一または同様の情報を出力しないためである。 The output of the character area image address or the image information of the character area image from the character area detection unit 311 is performed at the character area information update timing that is the output of the character area disappearance detection unit 310. This is because the same or similar information is not output to the subsequent character area image storage unit 4.

なお、文字領域画像蓄積部４の蓄積容量に余裕がある場合などは、上記のタイミングで出力することに限定するものではない。また、文字領域検出部３１１は、文字領域入力検出部３０９の出力に基づいて、文字領域が挿入されてから所定の複数フレーム間エッジの静止を検出することによって文字らしいと判断される領域を文字領域画像として検出するようにしてもよい。このとき、文字領域入力検出部３０９の出力から所定フレーム後を文字領域情報更新タイミングとしてもよい。 In addition, when there is a margin in the storage capacity of the character area image storage unit 4, the output is not limited to the above timing. In addition, the character area detection unit 311 detects an area that is determined to be a character by detecting stillness of a predetermined edge between a plurality of frames after the character area is inserted based on the output of the character area input detection unit 309. It may be detected as a region image. At this time, the character region information update timing may be a predetermined frame after the output of the character region input detection unit 309.

ラベリング／矩形整形部３１２では、文字領域検出部３１１から出力された歯抜けの矩形の歯抜け部分を埋め、孤立している領域を除外するために膨張および収縮処理を行なう。さらに、文字領域画像に対してラベリング処理を施すことによって識別情報を付加し、付加された識別情報ごとに面積や矩形の縦横比などを調整することによって、対象とする文字列を絞るなどの処理を行なう。 In the labeling / rectangular shaping unit 312, expansion and contraction processing is performed to fill in the missing tooth portion of the missing tooth outputted from the character region detecting unit 311 and to remove the isolated region. Furthermore, a process such as adding identification information by applying a labeling process to the character area image, and narrowing down the target character string by adjusting the area, the aspect ratio of the rectangle, etc. for each added identification information To do.

以上の処理によって、文字領域画像が始点アドレスと終点アドレスのみから指定されるようになる。その後、フレームメモリ１を介して文字領域画像蓄積部４に文字領域画像が出力される。 With the above processing, the character area image is designated only from the start point address and the end point address. Thereafter, the character area image is output to the character area image storage unit 4 via the frame memory 1.

文字領域抽出部３から文字領域画像蓄積部４への文字領域画像の記憶は、文字領域消滅検出部３１０によって文字領域画像の消滅が検出されたときに行ってもよい。また、文字領域入力検出部３１０によって文字領域画像の挿入が検出されたとき、または文字領域画像の挿入が検出されてから所定画像フレーム後に行ってもよい。 The storage of the character region image from the character region extraction unit 3 to the character region image storage unit 4 may be performed when the disappearance of the character region image is detected by the character region disappearance detection unit 310. Alternatively, it may be performed when a character region image insertion is detected by the character region input detection unit 310, or after a predetermined image frame after the insertion of the character region image is detected.

文字認識部６における文字認識は、一般的なフォント辞書などとのパターン照合によって行われるため、高解像度の文字が要求される。また、文字は黒、背景は白といったように、文字と背景との区別が容易なことが要求され、縁取り文字など特殊フォントでは文字認識できないことが多い。そのため、文字領域加工部５では、文字領域画像蓄積部４に記憶された文字領域画像に対して画像処理を行なうことによって、文字画像の品質改善を行う。 Since character recognition in the character recognition unit 6 is performed by pattern matching with a general font dictionary or the like, high-resolution characters are required. In addition, it is required that the character and the background can be easily distinguished, such as black for the character and white for the background. In many cases, the character cannot be recognized by a special font such as a border character. Therefore, the character region processing unit 5 performs image processing on the character region image stored in the character region image storage unit 4 to improve the quality of the character image.

次に、文字領域加工部５について詳細に説明する。 Next, the character area processing unit 5 will be described in detail.

図３は、本発明の実施形態１による文字領域加工部３の構成を示すブロック図である。図３に示すように、文字領域加工部５は、文字サイズ判定部５０１、拡大処理部５０２、２値化部５０３、イコライズ処理部５０４、エッジ強調部５０５、明るさ／コントラスト調整部５０６、滲み除去部５０７からなる。 FIG. 3 is a block diagram showing the configuration of the character area processing unit 3 according to Embodiment 1 of the present invention. As shown in FIG. 3, the character area processing unit 5 includes a character size determination unit 501, an enlargement processing unit 502, a binarization unit 503, an equalization processing unit 504, an edge enhancement unit 505, a brightness / contrast adjustment unit 506, a blur. It consists of a removal unit 507.

文字サイズ判定部５０１では、文字間および文字列間（行間）にはエッジが存在しないことを検出することによって、実際の文字の大きさを判定する。例えば、文字領域画像が一行の文字列で構成されている場合と三行で構成されている場合とでは、拡大処理部５０２で拡大処理を行なうときの拡大率に影響があるため文字サイズの把握が必要である。具体的には、文字領域画像を判定するときよりもさらに分割領域を小さくすることによって、矩形領域内で歯抜け部分として検出する。 The character size determination unit 501 determines the actual character size by detecting that there is no edge between characters and between character strings (lines). For example, when the character area image is composed of one line of character string and when it is composed of three lines, the enlargement ratio when the enlargement processing unit 502 performs enlargement processing is affected, so that the character size is grasped. is required. Specifically, the divided area is made smaller than when the character area image is determined, thereby detecting a missing portion in the rectangular area.

拡大処理部５０２では、文字認識部６にて高い解像度が要求されることに対応するために、少しでも高性能な拡大処理が要求される。データ補間方法としては、３次補間（バイキュービック、Ｌａｎｃｏｚ３）法などのアルゴリズムに対して、さらに斜め線部分で特に生じやすい輪郭のギザギザ（ジャギー）対策を盛り込むなどする。 The enlargement processing unit 502 requires a high-performance enlargement process to cope with the fact that the character recognition unit 6 requires high resolution. As a data interpolation method, a countermeasure against a jaggedness (jaggy) of a contour that is particularly likely to occur in an oblique line portion is included in an algorithm such as a cubic interpolation (Bancubic, Lancoz 3) method.

図３Iでの処理は、文字領域抽出部３にて抽出された文字領域画像の背景および文字の各々が、単色またはそれに近いものであり、２値化によって背景と文字の分離が容易であるときに行われる。文字領域画像に対して２値化が容易であるか否かは、文字領域画像の各画素値に関するヒストグラムを調べることによって判定可能である。 The processing in FIG. 3I is performed when the background and characters of the character region image extracted by the character region extraction unit 3 are each in a single color or similar, and the background and characters can be easily separated by binarization. To be done. Whether or not binarization is easy for a character area image can be determined by examining a histogram relating to each pixel value of the character area image.

図３IIでの処理は、背景と文字の分離が容易ではないときに行われる。イコライズ処理部５０４では、文字領域画像の明度に関するヒストグラムを平均化し、一度ぼやけた画像を作ることになる。その後、エッジ強調部５０５にてエッジ部分を強調する。明るさ／コントラスト調整部５０６では、例えば背景が白で文字が黒である場合に、コントラストの高い画像に変換後、明るさ調整で明るい画像とすることによって背景画像との分離が可能となる。 The processing in FIG. 3II is performed when it is not easy to separate the background and the characters. The equalization processing unit 504 averages the histogram relating to the brightness of the character area image to create a once blurred image. Thereafter, the edge enhancement unit 505 emphasizes the edge portion. For example, when the background is white and the characters are black, the brightness / contrast adjustment unit 506 can separate the image from the background image by converting the image to a high-contrast image and then adjusting the brightness to a bright image.

図３IIIでの処理は、図３IIでの処理後に２値化部５０３にて２値化を行い、文字領域画像の背景と文字をよりはっきりと分離する。図３Iの２値化部５０３での処理も同様であるが、２値化を行うと文字が滲むなど画像の劣化が生じることが多いため、２値化後に滲み除去部５０７による処理によって文字の線幅を一様にする。 In the processing in FIG. 3III, binarization is performed by the binarization unit 503 after the processing in FIG. 3II, and the background of the character area image and the characters are more clearly separated. The processing in the binarization unit 503 in FIG. 3I is also the same. However, when binarization is performed, there are many cases in which image deterioration occurs, such as blurring of characters. Make the line width uniform.

以上のように、文字領域加工部５にて画像処理を行なった文字領域画像は、文字認識部６に出力される。 As described above, the character area image subjected to image processing by the character area processing unit 5 is output to the character recognition unit 6.

文字認識部６では、フォント辞書などに対してパターン照合を行うことによって文字認識が行われる。ワンセグ放送のＣＭ中に表示される電話番号、ＵＲＬ、検索ワードといった文字列は解像度が不足しているため、このような文字列を認識する方法として、
・パターン照合の方法を最適化する。
・対象となる文字列を、数字または英文字（アルファベット）に限定する。
・辞書に含まれる文字レベル（漢字レベル）を下げる。
などによって最適化を行う。文字認識部６にて文字認識されてテキストデータとして抽出された文字情報は、文字情報蓄積部７に記憶される。なお、文字認識部６で文字認識を完了したら、直ちに「文字認識を完了しました！」などと表示してユーザーに対して知らせるように構成してもよい。 The character recognition unit 6 performs character recognition by performing pattern matching on a font dictionary or the like. Since character strings such as phone numbers, URLs, and search words displayed during one-segment broadcasting commercials lack resolution, as a method for recognizing such character strings,
• Optimize the pattern matching method.
・ Limit the target character string to numbers or English letters (alphabet).
-Lower the character level (kanji level) included in the dictionary.
Optimize by such as. Character information that has been recognized by the character recognition unit 6 and extracted as text data is stored in the character information storage unit 7. In addition, when the character recognition is completed by the character recognition unit 6, it may be configured to immediately notify the user by displaying “character recognition completed!” Or the like.

ワンセグ放送の映像から抽出された文字には、文字認識部６にて１００％正確な文字認識ができずに誤認識をした文字も含まれ得る。前述のように、文字領域画像蓄積部４に記憶される文字領域画像は文字領域抽出部３での処理結果であるため、例えば、文字領域画像を文字領域抽出部３の文字領域入力検出部３０９および文字領域消滅検出部３１０の各々から出力される文字領域情報更新タイミングに従って文字領域画像蓄積部４に記憶させる場合には、同じ情報が２つ記憶されることになる。しかし、文字領域入力検出部３０９からの文字領域情報更新タイミングと文字領域消滅検出部３１０からの文字領域情報更新タイミングとは画像データのフレームが異なって背景画像が変化しているため、２つのうちのいずれか一方は正確な検出ができていないことがあり得る。従って、処理時間とメモリの容量が許す限り、複数のフレームデータから文字認識を行うことは文字認識の精度向上の観点からも望ましい方法である。 The characters extracted from the one-segment broadcast video may include characters that are not recognized 100% correctly by the character recognition unit 6 and are erroneously recognized. As described above, since the character area image stored in the character area image storage unit 4 is a result of processing in the character area extraction unit 3, for example, the character area image is converted into the character area input detection unit 309 of the character area extraction unit 3. When the character region image storage unit 4 stores the same information in accordance with the character region information update timing output from each of the character region disappearance detection unit 310, two pieces of the same information are stored. However, the character region information update timing from the character region input detection unit 309 and the character region information update timing from the character region disappearance detection unit 310 are different in the frame of the image data and the background image changes. Either one of these may not be detected accurately. Therefore, as long as processing time and memory capacity allow, performing character recognition from a plurality of frame data is a desirable method from the viewpoint of improving character recognition accuracy.

本発明の実施形態では、表示部１０にてワンセグ放送などの映像再生と並行して、フレームメモリ１からフレーム画像を一定間隔で読み出し、電話番号、ＵＲＬ、検索ワードを抽出して文字認識していることを想定している。ＣＭ中には常に文字領域画像を抽出して文字認識を行い、文字情報をテキストデータとして文字情報蓄積部７に記憶するように構成しているため、例えば、
・電話番号：数字のみ、０１２０−で始まる。
・ＵＲＬ：アルファベット文字列が大半、ｗｗｗや．ｃｏｍが含まれる。
・検索ワード：上記以外の文字列
というように、文字情報を解析することによって、例えば電話番号、ＵＲＬ、検索ワードに分類することができる。 In the embodiment of the present invention, frame images are read from the frame memory 1 at regular intervals in parallel with video playback such as one-segment broadcasting on the display unit 10, and phone numbers, URLs, and search words are extracted and characters are recognized. Assuming that In the CM, a character area image is always extracted and character recognition is performed, and character information is stored in the character information storage unit 7 as text data.
Phone number: numbers only, starting with 0120-
・ URL: Most of alphabet letters, www and. com.
Search word: By analyzing character information such as a character string other than the above, it can be classified into, for example, a telephone number, a URL, and a search word.

次に、放送番組を視聴するユーザーが、アクセスしたいキーワードを選択する際の動作について説明する。 Next, an operation when a user who views a broadcast program selects a keyword to be accessed will be described.

図４は、本発明の実施形態１によるユーザーの操作を示す模式図である。ユーザーはキー入力部９を介して表示部１０に表示される操作メニューの操作を行っている。ここで、操作メニューは、文字テキストデータあるいは文字領域画像に関する情報を表示している。 FIG. 4 is a schematic diagram illustrating a user operation according to the first embodiment of the present invention. The user operates the operation menu displayed on the display unit 10 via the key input unit 9. Here, the operation menu displays information related to character text data or character region images.

図４（ａ）において、ユーザーは視聴中または視聴していたＣＭに対して興味を抱いて電話やＷｅｂにアクセスするとき、”キーワードリンク”ボタンＡを押下する。左右ボタンＢを押下することによって、”電話番号”、”ＵＲＬ”、”検索ワード”と分類されている中から、ユーザーの目的に合った１つを選択する。例えば、図４（ａ）の”電話番号”の状態から、左右ボタンＢを押下することにより図４（ｂ）に示すような”検索ワード”を選択して表示させている。 In FIG. 4A, the user presses the “keyword link” button A when accessing the telephone or the Web with interest in the CM being watched or watched. By pressing the left / right button B, one that is classified into “phone number”, “URL”, and “search word” is selected according to the purpose of the user. For example, from the state of “telephone number” in FIG. 4A, the “search word” as shown in FIG. 4B is selected and displayed by pressing the left / right button B.

このとき、制御部８では”キーワードリンク”ボタンＡが押下されたことを検出すると、文字情報蓄積部７に記憶されているテキストデータから電話番号を読み出し、表示部１０の表示を”電話番号”とする。そして、ユーザーが左右ボタンＢを押下して”検索ワード”を選択したことを検出すると、文字情報蓄積部９に記憶されているテキストデータから検索ワードを読み出し、表示部１０の表示を”検索ワード”とする。 At this time, when the control unit 8 detects that the “keyword link” button A is pressed, the telephone number is read from the text data stored in the character information storage unit 7 and the display on the display unit 10 is changed to “phone number”. And When it is detected that the user has selected the “search word” by pressing the left / right button B, the search word is read from the text data stored in the character information storage unit 9 and the display of the display unit 10 is changed to “search word”. ".

図４（ｂ）において、ユーザーは、上下ボタンＣを押下することによって”検索ワード”の一覧から所望の文字列を選択する。 In FIG. 4B, the user selects a desired character string from the “search word” list by pressing the up / down button C.

図４（ｃ）において、表示部１０には”検索ワード”の一覧とともに、「選択したキーワードに誤りはないですか？」などの確認を促すメッセージを表示する。選択した文字列に誤りがない場合には「はい」を選択して決定する。このとき、キー入力部９から「はい」を選択して決定すると、制御部８によってブラウザの検索エンジンを介してＷｅｂにアクセスする。なお、”電話番号”の場合には電話接続を、ＵＲＬ（アドレス）の場合にはＷｅｂに直接アクセスする。 In FIG. 4C, a message prompting confirmation such as “Is the selected keyword correct?” Is displayed on the display unit 10 together with a list of “search words”. If there is no error in the selected character string, “Yes” is selected and determined. At this time, if “Yes” is selected and determined from the key input unit 9, the control unit 8 accesses the Web via a browser search engine. In the case of “telephone number”, the telephone connection is made, and in the case of URL (address), the Web is directly accessed.

また、文字の誤認識によって表示部１０に表示される電話番号や検索ワードに誤りがある場合には、キー入力部９によって「いいえ」を選択して決定すると、ユーザーが電話番号や検索ワードを修正できる状態となる。 If there is an error in the phone number or the search word displayed on the display unit 10 due to erroneous recognition of characters, the user can select the phone number or the search word by selecting “No” with the key input unit 9 and determining it. It can be corrected.

その後、図４（ｄ）の修正入力箇所Ｄにてユーザーは自分で修正を行い、修正が完了すると電話やＷｅｂにアクセスする”リンク”キーを押下して決定キーＥを押下すると、制御部８はＷｅｂにアクセスする。 Thereafter, the user corrects himself / herself at the correction input portion D in FIG. 4D, and when the correction is completed, the user presses the “link” key for accessing the telephone or the Web, and presses the decision key E, thereby controlling the control unit 8. Accesses the Web.

ユーザーが自分で検索ワードを修正する場合には、図４（ｄ）に示すような文字領域のオリジナル画像を表示することによって、ユーザーが修正するときにオリジナル画像を見ながら入力するべき文字を確認することができる。また、図５（ｅ）に示すように、電話番号を修正する場合には、単なる数字の羅列ではどこの電話番号なのか判断できなくなるため、選択した電話番号が表示されているフレーム画像を縮小表示する。 When the user modifies the search word by himself / herself, by displaying the original image of the character area as shown in FIG. 4D, the user confirms the characters to be input while viewing the original image when correcting can do. Further, as shown in FIG. 5E, when correcting a telephone number, it is impossible to determine which telephone number is based on a simple enumeration of numbers, so the frame image on which the selected telephone number is displayed is reduced. indicate.

なお、図５（ｆ）に示すように、文字領域のオリジナル画像やフレームの縮小画像は、ユーザーがアクセスしたい文字列を一覧から選択する際に同時に表示するようにしてもよい。また、図示していないが、同様の目的のために、選択した電話番号が表示されているＣＭの音声をユーザーは聴くことができるようにしてもよい。 As shown in FIG. 5F, the original image of the character region and the reduced image of the frame may be displayed simultaneously when the user selects a character string to be accessed from the list. Although not shown, for the same purpose, the user may be able to listen to the voice of the CM on which the selected telephone number is displayed.

以上のことから、ＣＭなどの放送番組から文字領域の抽出や文字認識などを自動的に行うことによって、電話番号、ＵＲＬ、検索ワードをテキストデータとして記憶することができ、必要に応じてテキストデータを選択することによって、電話やＷｅｂへのアクセスが可能となる。また、文字認識して記憶されたテキストデータに誤りがある場合であっても、テキストデータが表示されるオリジナル画像と関連付けることによって修正も可能で
容易となる。 From the above, the phone number, URL, and search word can be stored as text data by automatically extracting character areas and recognizing characters from broadcast programs such as commercials. By selecting, it becomes possible to access the telephone and the Web. Even if there is an error in the text data stored by character recognition, correction is possible and easy by associating it with the original image on which the text data is displayed.

〈実施形態２〉
図６は、本発明の実施形態２による画像再生装置１０１ａの構成を示すブロック図である。実施形態２では、制御部８ａの処理が実施形態１と異なっており、その他の構成および処理については同様であるため、ここでは説明を省略する。実施形態２の画像再生装置１０１ａでは、文字領域画像蓄積部４に記憶された文字領域画像を表示部１０にて表示することができる。 <Embodiment 2>
FIG. 6 is a block diagram showing the configuration of the image playback device 101a according to the second embodiment of the present invention. In the second embodiment, the process of the control unit 8a is different from that of the first embodiment, and the other configurations and processes are the same, and thus the description thereof is omitted here. In the image reproduction device 101a of the second embodiment, the character area image stored in the character area image storage unit 4 can be displayed on the display unit 10.

本発明の実施形態２の特徴は、制御部８ａはキー入力部９の入力指示に基づいて文字領域画像蓄積部４から所望の文字領域画像を選択し、当該文字領域画像を文字領域加工部５および文字認識部６によって処理してテキストデータを得、当該テキストデータに含まれる電話番号による電話接続、アドレスまたはキーワードによるインターネット接続を行うことである。 The feature of Embodiment 2 of the present invention is that the control unit 8a selects a desired character region image from the character region image storage unit 4 based on an input instruction of the key input unit 9, and the character region processing unit 5 selects the character region image. And text data obtained by processing by the character recognition unit 6, and telephone connection using a telephone number included in the text data, and Internet connection using an address or a keyword.

図７は、本発明の実施形態２によるユーザーの操作を示す模式図である。ユーザーはキー入力部９を介して表示部１０に表示される操作メニューの操作を行っている。ここで、操作メニューは、文字テキストデータあるいは文字領域画像に関する情報を表示している。 FIG. 7 is a schematic diagram illustrating a user operation according to the second embodiment of the present invention. The user operates the operation menu displayed on the display unit 10 via the key input unit 9. Here, the operation menu displays information related to character text data or character region images.

図７（ａ）において、ユーザーは視聴中または視聴していたＣＭに対して興味を抱いて電話やＷｅｂにアクセスするとき、”キーワードリンク”ボタンＡを押下する。 In FIG. 7A, the user presses the “keyword link” button A when accessing the telephone or the Web with interest in the CM being watched or watched.

このとき、制御部８ａでは”キーワードリンク”ボタンＡが押下されたことを検出すると、文字領域画像蓄積部４に記憶されている文字領域画像を読み出し、読み出した文字領域画像は”キーワード文字列”として表示部１０に一覧表示される。 At this time, when the control unit 8a detects that the “keyword link” button A is pressed, the character region image stored in the character region image storage unit 4 is read, and the read character region image is “keyword character string”. As a list on the display unit 10.

ユーザーは、表示部１０に表示された”キーワード文字列”の一覧からアクセスしたい文字列を上下ボタンＢによって選択し、図７（ｂ）の決定キーＥを押下する。 The user selects a character string to be accessed from the list of “keyword character strings” displayed on the display unit 10 by using the up / down button B, and presses the decision key E in FIG.

ユーザーがアクセスしたい文字列をキー入力部９を介して選択すると、制御部８ａは対象となる文字領域画像を文字領域画像蓄積部４から読み出して文字領域加工部５に出力する。そして、文字領域加工部５にて画像処理を施した後に、文字認識部６にて文字認識されて、一度文字情報蓄積部７に記憶される。文字情報蓄積部７に記憶されたことが制御部８ａに伝わると、制御部８ａは「選択したキーワードに誤りはないですか？」などの確認を促すメッセージを表示部１０に表示する（図示せず）。また、制御部８ａは、文字情報蓄積部７に記憶されたテキストデータを解析して、電話番号、ＵＲＬ、検索ワードのいずれであるのかを判定しておく。 When the user selects a character string to be accessed via the key input unit 9, the control unit 8 a reads out the target character region image from the character region image storage unit 4 and outputs it to the character region processing unit 5. Then, after image processing is performed by the character region processing unit 5, the character recognition unit 6 recognizes the character and stores it once in the character information storage unit 7. When the information stored in the character information storage unit 7 is transmitted to the control unit 8a, the control unit 8a displays a message prompting confirmation such as “Is the selected keyword correct?” On the display unit 10 (not shown). ) Further, the control unit 8a analyzes the text data stored in the character information storage unit 7 and determines whether it is a telephone number, a URL, or a search word.

その後の処理については、実施形態１と同様である。すなわち、「はい」を選択すると電話やＷｅｂにアクセスし、「いいえ」を選択するとユーザーが自分で修正した後に電話やＷｅｂにアクセスする。 The subsequent processing is the same as in the first embodiment. That is, if “Yes” is selected, the telephone or the web is accessed, and if “No” is selected, the user accesses the telephone or the web after making corrections by himself / herself.

以上のことから、ユーザーが文字領域画像蓄積部４に記憶された文字領域画像をキーワードとして選択した後に、選択されたキーワードのみについて加工処理および文字認識を行なうため、処理に要する時間を削減することが可能となる。 From the above, after the user selects a character area image stored in the character area image storage unit 4 as a keyword, only the selected keyword is processed and recognized, thereby reducing the time required for the process. Is possible.

本発明の実施形態１による画像再生装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image reproduction apparatus by Embodiment 1 of this invention. 本発明の実施形態１による文字領域抽出部の構成を示すブロック図である。It is a block diagram which shows the structure of the character area extraction part by Embodiment 1 of this invention. 本発明の実施形態１による文字領域加工部の構成を示すブロック図である。It is a block diagram which shows the structure of the character area process part by Embodiment 1 of this invention. 本発明の実施形態１によるユーザーの操作を示す模式図である。It is a schematic diagram which shows a user's operation by Embodiment 1 of this invention. 本発明の実施形態１によるユーザーの操作を示す模式図である。It is a schematic diagram which shows a user's operation by Embodiment 1 of this invention. 本発明の実施形態２による画像再生装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image reproduction apparatus by Embodiment 2 of this invention. 本発明の実施形態２によるユーザーの操作を示す模式図である。It is a schematic diagram which shows a user's operation by Embodiment 2 of this invention.

Explanation of symbols

１０１画像再生装置、１０１ａ画像再生装置、１フレームメモリ、２ＣＭ期間検出部、３文字領域抽出部、４文字領域画像蓄積部、５文字領域加工部、６文字認識部、７文字情報蓄積部、８制御部、８ａ制御部、９キー入力部、１０表示部、３０１エッジ抽出部、３０２２値化部、３０３２値情報保持部、３０４エッジ静止検出部、３０５エッジ動き検出部、３０６文字らしさ判定部、３０７静止領域保持部、３０８動き領域保持部、３０９文字領域入力検出部、３１０文字領域消滅検出部、３１１文字領域検出部、３１２ラベリング／矩形整形部、５０１文字サイズ判定部、５０２拡大処理部、５０３２値化部、５０４イコライズ処理部、５０５エッジ強調部、５０６明るさ／コントラスト調整部、５０７滲み除去部。 101 image playback device, 101a image playback device, 1 frame memory, 2 CM period detection unit, 3 character region extraction unit, 4 character region image storage unit, 5 character region processing unit, 6 character recognition unit, 7 character information storage unit, 8 control unit, 8a control unit, 9 key input unit, 10 display unit, 301 edge extraction unit, 302 binarization unit, 303 binary information holding unit, 304 edge stationary detection unit, 305 edge motion detection unit, 306 character character Determination unit, 307 Still area holding unit, 308 Motion area holding unit, 309 Character area input detection unit, 310 Character area disappearance detection unit, 311 Character area detection unit, 312 Labeling / rectangular shaping unit, 501 Character size determination unit, 502 Enlarge Processing unit, 503 binarization unit, 504 equalization processing unit, 505 edge enhancement unit, 506 brightness / contrast adjustment Department, 507 blur removal unit.

Claims

A character region extraction means for extracting a region including characters from the input image information as a character region image;
A character area image storage means for storing the character area image extracted by the character area extraction means;
Character region processing means for processing characters included in the character region image;
Character recognition means for recognizing characters from the character area image processed by the character area processing means and generating character text data;
Character information storage means for storing the character text data generated by the character recognition means;
Display means for displaying an operation menu relating to the image information, the character text data or the character region image;
A key input means for performing an input operation from the outside;
Control means for controlling each operation unit based on an input instruction of the key input means;
An image reproducing apparatus comprising:

CM period detecting means for detecting a CM (commercial message) period is further provided.
2. The image reproducing apparatus according to claim 1, wherein extraction of the character area image by the character area extracting unit is limited to the CM period.

While the image information is input, the control unit selects the character text data stored in the character information storage unit based on an input instruction from the key input unit, and is included in the selected character text data. 3. The image reproducing apparatus according to claim 1, wherein a telephone connection by a telephone number and an Internet connection by an address or a keyword are performed.

The control means selects a desired character area image from the character area image storage means based on an input instruction of the key input means, and processes the character area image by the character area processing means and the character recognition means. The image reproducing apparatus according to claim 1 or 2, wherein the character text data is obtained, and a telephone connection using a telephone number included in the character text data and an Internet connection using an address or a keyword are performed.

2. The character area image stored in the character area image storage unit and the character text data stored in the character information storage unit are associated with each other and displayed on the display unit. The image reproducing device according to claim 4.

The character area extraction means includes character area disappearance detection means for detecting that the character area image has disappeared from the input image information.
6. The storage of the character area image in the character area image storage means is performed when the disappearance of the character area image is detected by the character area disappearance detection means. The image reproducing device according to any one of the above.

The character area extraction means includes character area input detection means for detecting that the character area image has been inserted into the input image information.
The character area image is stored in the character area image storage means when the insertion of the character area image is detected by the character area input detection means or after the insertion of the character area image is detected. 6. The image reproducing apparatus according to claim 1, which is performed later.