JP2003018462A

JP2003018462A - Character inserting device and character inserting method

Info

Publication number: JP2003018462A
Application number: JP2001197285A
Authority: JP
Inventors: Masanobu Nakachi; 正亘中地
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-06-28
Filing date: 2001-06-28
Publication date: 2003-01-17

Abstract

PROBLEM TO BE SOLVED: To provide a device for automatically inserting characters to a proper position on an image by using a voice recognition device for automatically converting voice into characters. SOLUTION: The character inserting device comprises an image data extract section 204 and a voice data extract section 205 that extracts image data 202 and voice data 203 from voice attached image data 201 comprising the image data 202 and the voice data 203, a voice recognition section 206 that converts the voice data 203 into character information 208, and a character inserting section 209 that calculates an optimum arrangement of the character information 208 to be inserted to the image data 202 and the size and arrangement of the characters on the basis of area information denoting a character inserted area acquired from the image data 202 and the character information 208 to insert the character information 208 to the image data 202. The character inserting device outputs character attached image data 211 resulting from inserting the character information 108 to the image data 202 by the character inserting section 209.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声付画像データに
音声で表現されている効果音や人声を文字に変換し、画
像に文字を挿入する装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for converting a sound effect or a human voice expressed in voice in image data with voice into a character and inserting the character into the image.

【０００２】[0002]

【従来の技術】近年の音声認識技術はめざましいものが
あり、人声だけではなく効果音も認識できるようになっ
てきた。2. Description of the Related Art Recently, voice recognition technology has been remarkable and it has become possible to recognize not only human voice but also sound effect.

【０００３】一方、画像にテロップなど文字を挿入する
作業は、まだ人手に頼る必要があり自動化が行われてい
ない状況にある。On the other hand, the work of inserting characters such as a telop into an image needs to be manually performed and is not automated.

【０００４】これを解決すべく、先行技術「特開２００
０−２４４８１６号」のように、字幕内容をプログラム
することによって映像に文字を自動挿入する装置などが
考案されてきた。また、先行技術「特開２０００−３０
７９１５号」などでは、画像上に音声認識した文字をイ
ンポーズして表示する技術や画像外に表示できる技術が
考案されてきた。In order to solve this, the prior art "JP-A-200
No. 0-244816 ”has been devised for automatically inserting characters into video by programming subtitle content. In addition, the prior art "Japanese Patent Laid-Open No. 2000-30"
No. 7915 ”and the like have been devised as a technique for displaying a character recognized by voice recognition on an image by imposing it and for displaying it outside the image.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、先行技
術「特開２０００−２４４８１６号」では、映像に表示
する文字は予め入力して設定おかねばならないと言う不
便性を含んでいる。また、先行技術「特開２０００−２
４４８１６号」などでは、音声認識した文字を画像上に
表示できるが、画像に対する配慮や文字表示位置の設定
などが欠けている。However, in the prior art "Japanese Unexamined Patent Publication No. 2000-244816", there is an inconvenience that the characters to be displayed on the image must be input and set in advance. In addition, the prior art "Japanese Patent Laid-Open No. 2000-2"
No. 44816 ”and the like can display a voice-recognized character on an image, but lacks consideration for the image and setting of a character display position.

【０００６】本発明は、上記課題を解決すべく、音声認
識装置を用いて自動的に音声を文字に変換し、画像上の
適当な位置に文字を自動的に挿入する装置を提供するこ
とを目的とする。In order to solve the above problems, the present invention provides a device for automatically converting a voice into a character using a voice recognition device and automatically inserting the character at an appropriate position on an image. To aim.

【０００７】[0007]

【課題を解決するための手段】かかる課題を解決するた
め、例えば本発明の文字挿入装置は以下の構成を備え
る。すなわち、画像データと音声データとを備える音声
付画像データから該画像データと該音声データを抽出す
る抽出手段と、前記抽出手段により抽出された音声デー
タを文字情報に変換する変換手段と、前記抽出手段によ
り抽出された画像データを解析して文字挿入可能な領域
を示す領域情報を取得する画像解析手段と、前記変換手
段により変換された文字情報に関する属性情報を取得す
る属性取得手段と、前記画像解析手段により取得された
領域情報と、前記属性取得手段により取得された属性情
報とに基づいて、前記画像データに挿入される前記文字
情報の配置を決定しし、該文字情報を該画像データに挿
入する挿入手段とを備え、前記挿入手段により前記文字
情報が挿入された文字付画像データを出力することを特
徴とする。In order to solve such a problem, for example, a character insertion device of the present invention has the following configuration. That is, extraction means for extracting the image data and the voice data from the image data with voice including image data and voice data, conversion means for converting the voice data extracted by the extraction means into character information, and the extraction Image analysis means for analyzing the image data extracted by the means to acquire area information indicating a character insertable area; attribute acquisition means for acquiring attribute information relating to the character information converted by the conversion means; The arrangement of the character information to be inserted into the image data is determined based on the area information acquired by the analyzing unit and the attribute information acquired by the attribute acquiring unit, and the character information is set in the image data. It is characterized by comprising an inserting means for inserting, and outputting the character-attached image data in which the character information is inserted by the inserting means.

【０００８】[0008]

【発明の実施の形態】以下、添付図面を参照して本発明
の好適な実施の形態を詳細に説明する。＜文字挿入装置のハードウエア構成(図１)＞図１は、本
発明の一実施形態にかかる文字挿入装置のハードウエア
構成を示すブロック図である。図１において１０１は制
御メモリ（ＲＯＭ）、１０２は中央演算処理装置（ＣＰ
Ｕ）、１０３はメモリ（ＲＡＭ）、１０４は外部記憶装
置、１０５は入力部、１０６は表示部、１０７はバスで
ある。本実施形態にかかる文字挿入装置を実現するため
の制御プログラムやその制御プログラムで用いるデータ
は中央演算処理装置１０２のもと、バス１０７を通じて
適宜メモリ１０３に取り込まれ、中央演算処理装置１０
２によって実行される。＜文字挿入装置の概要図(図２)＞図２は、本実施形態の
文字挿入装置の機能構成を示す図である。２０１は音声
付画像データで、画像データ２０２と音声データ２０３
とを備える。入力部１０５を介して文字挿入装置に入力
された音声付画像データ２０１は画像データ抽出部２０
４において、画像データ２０２が抽出される。また、音
声データ抽出部２０５により、音声データ２０３が抽出
される。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. <Hardware Configuration of Character Insertion Device (FIG. 1)> FIG. 1 is a block diagram showing the hardware configuration of the character insertion device according to the embodiment of the present invention. In FIG. 1, 101 is a control memory (ROM), 102 is a central processing unit (CP).
U), 103 is a memory (RAM), 104 is an external storage device, 105 is an input unit, 106 is a display unit, and 107 is a bus. A control program for realizing the character insertion device according to the present embodiment and data used in the control program are appropriately fetched into the memory 103 via the bus 107 under the central processing unit 102, and the central processing unit 10 is controlled.
Executed by 2. <Outline of Character Inserting Device (FIG. 2)> FIG. 2 is a diagram showing a functional configuration of the character inserting device of the present embodiment. Reference numeral 201 is image data with sound, which is image data 202 and sound data 203.
With. The image data 201 with voice input to the character insertion device via the input unit 105 is the image data extraction unit 20.
In 4, the image data 202 is extracted. Further, the audio data extraction unit 205 extracts the audio data 203.

【０００９】音声データ抽出部２０５において抽出され
た音声データ２０３は、音声認識部２０６において、音
声辞書２０７を用いて解析され、文字に変換できる音声
が音声認識される。音声認識部２０６において音声認識
された音声データは文字情報２０８として、文字挿入部
２０９に入力される。The voice data 203 extracted by the voice data extraction unit 205 is analyzed by the voice recognition unit 206 using a voice dictionary 207, and voices that can be converted into characters are voice recognized. The voice data recognized by the voice recognition unit 206 is input to the character insertion unit 209 as the character information 208.

【００１０】一方、画像データ抽出部２０４において抽
出された画像データ２０２は、文字挿入部２０９に入力
され、音声認識部２０６より入力された文字情報２０８
が挿入される。これにより文字付画像データ２１１が生
成される。On the other hand, the image data 202 extracted by the image data extraction unit 204 is input to the character insertion unit 209 and the character information 208 input from the voice recognition unit 206.
Is inserted. As a result, the character-attached image data 211 is generated.

【００１１】文字挿入部２０９において生成された文字
付画像データ２１１は、画像出力部２１０を介して、た
とえば表示部１０６に文字付画像２１２として出力する
ことができる。The character-added image data 211 generated by the character insertion unit 209 can be output as a character-added image 212 to the display unit 106, for example, via the image output unit 210.

【００１２】＜音声認識部２０６の処理の流れ図(図３)
＞図３は、図２における音声認識部２０６の詳細な機能
ブロック図である。<Flow chart of processing of the voice recognition unit 206 (FIG. 3)
> FIG. 3 is a detailed functional block diagram of the voice recognition unit 206 in FIG.

【００１３】音声データ入力部３０１を介して入力され
た音声データ２０３は、音声解析部３０２において解析
され、音声データ２０３の中から人の声、動物の声、周
囲の音、効果音などをそれぞれ抽出する。音声解析部３
０２で抽出された音声データ２０３が人の声の場合は人
声データ３０３として人声文字認識部３０５に渡され、
人の声以外の場合は効果音データ３０４として効果音文
字認識部３０７に渡される。人声文字認識部３０５は人
声辞書３０６を用いて、効果音文字認識部３０７は効果
音辞書３０８を用いて、それぞれ入力された人声データ
３０３及び効果音データ３０４の文字化を行う。人声文
字認識部３０５で文字化された人声データ、及び効果音
文字認識部３０７で文字化された効果音データは文字情
報出力部３０９により文字情報２０８として出力され
る。The voice data 203 input through the voice data input unit 301 is analyzed by the voice analysis unit 302, and from the voice data 203, a human voice, an animal voice, a surrounding sound, a sound effect, etc. are respectively extracted. Extract. Speech analysis unit 3
When the voice data 203 extracted in 02 is human voice, it is passed to the human voice character recognition unit 305 as human voice data 303,
In the case of a voice other than a human voice, it is passed to the sound effect character recognition unit 307 as the sound effect data 304. The human voice character recognizing unit 305 uses the human voice dictionary 306, and the sound effect character recognizing unit 307 uses the sound effect dictionary 308 to characterize the input human voice data 303 and sound effect data 304, respectively. The human voice data characterized by the human voice character recognition unit 305 and the sound effect data characterized by the sound effect character recognition unit 307 are output as character information 208 by the character information output unit 309.

【００１４】次に、文字挿入部２０９の処理の詳細を説
明する。＜文字挿入部２０９の処理の流れ図(図４)＞図４は、文
字挿入部２０９の処理の詳細を示す図である。画像デー
タ入力部４０１により入力された画像データ２０２は画
像解析部４０３において画像解析が行われる。画像解析
部４０３では、空や雲、星空のような背景に関する画像
パターンを有する画像認識エンジンを用いて、文字情報
２０８の挿入が可能な背景領域を抽出し、それを背景領
域情報４０４として画像挿入位置演算部４０７にわた
す。Next, details of the processing of the character insertion unit 209 will be described. <Flowchart of Process of Character Inserting Unit 209 (FIG. 4)> FIG. 4 is a diagram showing details of the process of the character inserting unit 209. The image analysis unit 403 performs image analysis on the image data 202 input by the image data input unit 401. The image analysis unit 403 extracts a background area into which the character information 208 can be inserted by using an image recognition engine having an image pattern related to the background such as the sky, clouds, and starry sky, and inserts it as the background area information 404. The operation is passed to the position calculation unit 407.

【００１５】一方、文字情報入力部４０２より入力され
た文字情報２０８は文字情報演算部４０５において解析
され、発声数と一発声毎の文字数を演算し属性情報４０
６を得る。例えば、音声認識された音声が「ごぶさたし
ています」と「こちらこそ」の２つだった場合、文字情
報演算部４０５は「発声数：２、文字数：Ｖ１＝９、Ｖ
２＝５」という属性情報を得る。On the other hand, the character information 208 input from the character information input unit 402 is analyzed in the character information calculation unit 405, and the attribute information 40 is calculated by calculating the number of utterances and the number of characters for each utterance.
Get 6. For example, when there are two voice-recognized voices, "Gobushatashita" and "Kokoda Kore", the character information calculation unit 405 outputs "Voice count: 2, character count: V1 = 9, V".
2 = 5 ”is obtained as attribute information.

【００１６】文字情報演算部４０５で取得された属性情
報４０６は画像挿入位置演算部４０７に渡される。画像
挿入位置演算部４０７では、画像解析部４０３から得ら
れた画像データの領域情報（背景領域情報４０４）と、
文字情報演算部４０５から得られた属性情報４０６とを
用い、画像の中に文字を挿入するのに最も適した位置と
文字の大きさと文字の並びを演算し、挿入位置情報４０
８を得る。The attribute information 406 acquired by the character information calculation unit 405 is passed to the image insertion position calculation unit 407. In the image insertion position calculation unit 407, the area information (background area information 404) of the image data obtained from the image analysis unit 403,
Using the attribute information 406 obtained from the character information calculation unit 405, the position most suitable for inserting a character into an image, the size of the character, and the arrangement of the characters are calculated, and the insertion position information 40
Get 8.

【００１７】挿入位置情報４０８の算出にあたっては、
例えば、背景領域情報４０４から得られる輪郭に沿っ
て、各輪郭の点を一端とする矩形領域を作成し、作成さ
れた矩形領域の中から最も面積が広い矩形領域を文字挿
入領域として採用する。文字挿入領域を得た後、抽出さ
れた文字情報２０８がすべて文字挿入領域に入るように
文字の大きさおよび配列を演算する。In calculating the insertion position information 408,
For example, along the contour obtained from the background area information 404, a rectangular area with each contour point as one end is created, and the rectangular area having the largest area among the created rectangular areas is adopted as the character insertion area. After obtaining the character insertion area, the size and arrangement of the characters are calculated so that all the extracted character information 208 is in the character insertion area.

【００１８】文字情報挿入部４０９では、画像データ入
力部４０１からの画像データ２０２と、文字情報入力部
４０２からの文字情報２０８と、画像挿入位置演算部４
０７からの挿入位置情報４０８とに基づいて文字情報２
０８が挿入され、文字付画像データ２１１が生成され
る。In the character information insertion unit 409, the image data 202 from the image data input unit 401, the character information 208 from the character information input unit 402, and the image insertion position calculation unit 4
Character information 2 based on the insertion position information 408 from 07
08 is inserted, and the character-added image data 211 is generated.

【００１９】なお、挿入位置情報４０８に含まれる文字
の大きさに基づいて、文字情報挿入部４０９において、
文字情報２０８の文字の大きさを変換する場合には文字
フォント４１０を使用する。It should be noted that, based on the size of the character included in the insertion position information 408, the character information insertion unit 409
A character font 410 is used when converting the character size of the character information 208.

【００２０】上記実施形態によれば、音声付画像データ
に音声で表現されている情報を自動的に文字に変換し画
像中の最適な位置に自動挿入することができるようにな
る。＜文字挿入装置の概要図(図５)＞なお、文字挿入部２０
９に入力される文字情報２０８は、文字編集部５０１に
おいて、編集指令５０２からの編集指令に基づいて編集
することができる。図５は図２に示す文字挿入装置の機
能構成に、編集機能を付加した機能構成を示す図であ
る。According to the above embodiment, it is possible to automatically convert information represented by voice in image data with voice and automatically insert it into an optimum position in the image. <Outline of character insertion device (FIG. 5)> The character insertion unit 20
The character information 208 input to 9 can be edited in the character editing unit 501 based on the editing command from the editing command 502. FIG. 5 is a diagram showing a functional configuration in which an editing function is added to the functional configuration of the character insertion device shown in FIG.

【００２１】文字編集部５０１における編集とは、文字
フォントの種類、文字飾りの種類の編集をいい、文字挿
入部２０９において生成された文字付画像データ２１１
を文字付画像２１２としてプレビューしながらの文字情
報２０８の文字サイズの変更、表示する発声文字の選択
などの編集も含む。また文字編集部５０１は、画像の背
景領域に配置される文字情報を被写体領域上に配置した
り、文字情報を被写体領域や背景領域を交差する位置に
配置する編集も可能である。The editing in the character editing unit 501 means the editing of the type of character font and the type of character decoration, and the character-added image data 211 generated in the character inserting unit 209.
It also includes editing such as changing the character size of the character information 208 while previewing as the image with character 212, and selecting a vocal character to be displayed. The character editing unit 501 can also perform editing by arranging the character information arranged in the background area of the image on the subject area or by arranging the character information at a position intersecting the subject area and the background area.

【００２２】文字編集部５０１により編集済みの文字付
画像データ２１１は画像出力部２１０を介して、表示部
１０６に文字付画像２１２として出力される。The text-added image data 211 edited by the text editing unit 501 is output to the display unit 106 as a text-added image 212 via the image output unit 210.

【００２３】上記により、音声付画像データに音声を文
字として画像の最適な位置に自動挿入するだけではな
く、挿入する文字情報を編集できる機能を付加すること
により、画像編集者の意図に沿った文字付画像データを
簡便に生成することができる。次に実際に文字が画像に
どのように挿入されるかを図６を用いて説明する。As described above, not only the voice is automatically inserted into the image data with voice as the character at the optimum position of the image, but also the function of editing the character information to be inserted is added to meet the intention of the image editor. Image data with text can be easily generated. Next, how characters are actually inserted in an image will be described with reference to FIG.

【００２４】＜音声付画像データの文字挿入例(図６)＞
図６は、図５において示した文字挿入装置の機能構成に
したがって、処理された音声付画像データの一例であ
る。<Character insertion example of image data with voice (FIG. 6)>
FIG. 6 is an example of audio-added image data processed according to the functional configuration of the character insertion device shown in FIG.

【００２５】６０１は音声付画像データで、画像データ
６０２と音声データ６０３とを備える。文字挿入装置に
入力された音声付画像データ６０１は画像データ抽出部
２０４において、画像データ６０６が抽出される。ま
た、音声データ抽出部２０５により、音声データ６０７
が抽出される。Reference numeral 601 denotes image data with voice, which includes image data 602 and voice data 603. The image data extraction unit 204 extracts the image data 606 from the image data 601 with voice input to the character insertion device. In addition, the voice data extraction unit 205 causes the voice data 607
Is extracted.

【００２６】音声データ抽出部２０５において抽出され
た音声データ６０７は、音声認識部２０６において、音
声辞書２０７を用いて解析され、文字にできる音声が音
声認識される。音声認識部２０６において音声認識され
た音声データ６０７は文字情報として、文字挿入部２０
９に入力される。The voice data 607 extracted by the voice data extracting unit 205 is analyzed by the voice recognizing unit 206 using the voice dictionary 207, and voices that can be converted into characters are voice recognized. The voice data 607 that has been voice-recognized by the voice recognition unit 206 is used as character information and is used as the character insertion unit 20.
9 is input.

【００２７】一方、画像データ抽出部２０４において抽
出された画像データ６０６は、背景領域が抽出される
（画像データ６０６のグレー部分は背景領域であること
を示している）。On the other hand, the background area is extracted from the image data 606 extracted by the image data extraction unit 204 (the gray portion of the image data 606 indicates the background area).

【００２８】画像データ抽出部２０４において抽出され
た画像データ６０６の背景領域に文字情報が挿入され、
文字付画像データ６１０が文字挿入部２０９において生
成される。Character information is inserted in the background area of the image data 606 extracted by the image data extraction unit 204,
The character-added image data 610 is generated by the character insertion unit 209.

【００２９】生成された文字付画像データ６１０は、音
声付画像データ６０１を文字挿入装置に入力した利用者
が、文字編集部５０１を使って文字挿入部２０９が生成
した文字付画像データ６１０の文字位置や大きさなどを
変更する編集作業を行うことができ、編集作業を行った
結果、文字付画像６１２のような文字と効果を付加した
画像を生成することができる。The generated character-attached image data 610 is the character of the character-attached image data 610 generated by the character insertion unit 209 by the user who inputs the voice-attached image data 601 to the character insertion device. It is possible to perform editing work for changing the position, size, and the like, and as a result of performing the editing work, it is possible to generate an image such as a character-added image 612 to which characters and effects are added.

【００３０】なお、この発明は上記実施形態に限定され
ず、この発明を逸脱しない範囲内において種々変形応用
可能である。例えば、本実施形態では、自動生成された
文字入り画像の文字を編集できるが、挿入する文字を編
集できるだけではなく同時に映像プリント供給装置に見
られるような図画やフレーム画像等も編集し挿入できる
装置は、この発明の応用の範疇にあることは勿論であ
る。The present invention is not limited to the above-mentioned embodiment, and various modifications can be applied without departing from the scope of the present invention. For example, in the present embodiment, the character of the automatically generated character-added image can be edited, but not only the character to be inserted can be edited, but also a drawing, a frame image, etc. that are seen in the video print supply apparatus can be edited and inserted at the same time. Are, of course, within the scope of application of the invention.

【００３１】[0031]

【他の実施形態】なお、本発明は、複数の機器（例えば
ホストコンピュータ、インタフェイス機器、リーダ、プ
リンタなど）から構成されるシステムに適用しても、一
つの機器からなる装置（例えば、複写機、ファクシミリ
装置など）に適用してもよい。Other Embodiments Even when the present invention is applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), an apparatus including one device (for example, a copying machine). Machine, facsimile machine, etc.).

【００３２】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体（または記録媒体）を、システムあるい
は装置に供給し、そのシステムあるいは装置のコンピュ
ータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納された
プログラムコードを読み出し実行することによっても、
達成されることは言うまでもない。この場合、記憶媒体
から読み出されたプログラムコード自体が前述した実施
形態の機能を実現することになり、そのプログラムコー
ドを記憶した記憶媒体は本発明を構成することになる。
また、コンピュータが読み出したプログラムコードを実
行することにより、前述した実施形態の機能が実現され
るだけでなく、そのプログラムコードの指示に基づき、
コンピュータ上で稼働しているオペレーティングシステ
ム（ＯＳ）などが実際の処理の一部または全部を行い、
その処理によって前述した実施形態の機能が実現される
場合も含まれることは言うまでもない。Further, an object of the present invention is to supply a storage medium (or recording medium) recording a program code of software for realizing the functions of the above-described embodiment to a system or apparatus, and to supply a computer of the system or apparatus ( Alternatively, by the CPU or MPU) reading and executing the program code stored in the storage medium,
It goes without saying that it will be achieved. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.
Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also based on the instruction of the program code,
An operating system (OS) running on the computer does some or all of the actual processing,
It goes without saying that the processing includes the case where the functions of the above-described embodiments are realized.

【００３３】さらに、記憶媒体から読み出されたプログ
ラムコードが、コンピュータに挿入された機能拡張カー
ドやコンピュータに接続された機能拡張ユニットに備わ
るメモリに書込まれた後、そのプログラムコードの指示
に基づき、その機能拡張カードや機能拡張ユニットに備
わるＣＰＵなどが実際の処理の一部または全部を行い、
その処理によって前述した実施形態の機能が実現される
場合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written in the memory provided in the function expansion card inserted in the computer or the function expansion unit connected to the computer, based on the instruction of the program code. , The CPU provided in the function expansion card or the function expansion unit performs some or all of the actual processing,
It goes without saying that the processing includes the case where the functions of the above-described embodiments are realized.

【００３４】本発明を上記記憶媒体に適用する場合、そ
の記憶媒体には、先に説明したフローチャートに対応す
るプログラムコードが格納されることになる。When the present invention is applied to the above storage medium, the storage medium stores the program code corresponding to the above-described flowchart.

【００３５】[0035]

【発明の効果】以上、説明したごとく本発明によれば、
音声認識装置を用いて自動的に音声を文字に変換し、画
像上の適当な位置に文字を自動的に挿入する装置を提供
することが可能となる。As described above, according to the present invention,
It is possible to provide a device which automatically converts a voice into a character by using a voice recognition device and automatically inserts the character at an appropriate position on the image.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施形態にかかる文字挿入装置のハ
ードウエア構成を示すブロック図である。FIG. 1 is a block diagram showing a hardware configuration of a character insertion device according to an embodiment of the present invention.

【図２】本発明の一実施形態にかかる文字挿入装置の機
能構成を示す図である。FIG. 2 is a diagram showing a functional configuration of a character insertion device according to an embodiment of the present invention.

【図３】本発明の一実施形態にかかる文字挿入装置の音
声認識部の詳細な機能ブロック図である。FIG. 3 is a detailed functional block diagram of a voice recognition unit of the character insertion device according to the embodiment of the present invention.

【図４】本発明の一実施形態にかかる文字挿入装置の文
字挿入部の処理の詳細を示す図である。FIG. 4 is a diagram showing details of processing of a character insertion unit of the character insertion device according to the embodiment of the present invention.

【図５】本発明の一実施形態にかかる文字挿入装置の機
能構成を示す図である。FIG. 5 is a diagram showing a functional configuration of a character insertion device according to an embodiment of the present invention.

【図６】本発明の一実施形態にかかる文字挿入装置の機
能構成にしたがって、処理された音声付画像データの一
例を示す図である。FIG. 6 is a diagram showing an example of processed image data with voice according to the functional configuration of the character insertion device according to the embodiment of the present invention.

Claims

[Claims]

1. Extraction means for extracting the image data and the voice data from the image data with voice including image data and voice data, and conversion means for converting the voice data extracted by the extraction means into character information. An image analysis unit that analyzes the image data extracted by the extraction unit to obtain region information indicating a region where a character can be inserted; and an attribute acquisition unit that obtains attribute information regarding the character information converted by the conversion unit. The layout of the character information to be inserted into the image data is determined based on the area information acquired by the image analysis unit and the attribute information acquired by the attribute acquisition unit, and the character information is stored in the image. Character insertion, comprising: inserting means for inserting into data, wherein the inserting means outputs image data with characters in which the character information is inserted. Location.

2. The character insertion device according to claim 1, further comprising an editing unit capable of editing the character information inserted in the image data.

3. The image analysis means divides the image data into a background area, which is an area in which characters can be inserted, and a subject area, which is an area in which characters cannot be inserted, and information on the background area is used as the area information. The character insertion device according to claim 2, wherein the character insertion device acquires the character insertion device.

4. The character insertion device according to claim 3, wherein the attribute information includes the number of utterances and the number of characters.

5. An extracting step of extracting the image data and the audio data from image data with audio including image data and audio data, and a converting step of converting the audio data extracted by the extracting step into character information. An image analysis step of analyzing the image data extracted by the extraction step to obtain area information indicating an area in which characters can be inserted; and an attribute acquisition step of obtaining attribute information regarding the character information converted by the conversion step. The layout of the character information to be inserted into the image data is determined based on the area information acquired in the image analysis step and the attribute information acquired in the attribute acquisition step, and the character information is stored in the image. A character insertion characterized by comprising: an inserting step of inserting into the data, wherein the character-added image data in which the character information is inserted is output by the inserting step. Law.

6. The character insertion method according to claim 5, further comprising an editing step capable of editing the character information inserted into the image data.

7. The image analysis step divides the image data into a background area, which is an area in which characters can be inserted, and a subject area, which is an area in which characters cannot be inserted, and the information about the background area is used as the area information. The character insertion method according to claim 6, wherein the character insertion method acquires the character.

8. The character insertion method according to claim 7, wherein the attribute information includes the number of utterances and the number of characters.

9. A storage medium storing a control program for realizing the character insertion method according to claim 5 by a computer.

10. A control program for implementing the character insertion method according to claim 5 by a computer.