JP4895759B2

JP4895759B2 - Voice message output device

Info

Publication number: JP4895759B2
Application number: JP2006290532A
Authority: JP
Inventors: 一朗山田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2006-10-25
Filing date: 2006-10-25
Publication date: 2012-03-14
Anticipated expiration: 2026-10-25
Also published as: JP2008108076A

Description

この発明は、音声メッセージを出力することができる装置に関し、例えば、テレビ放送受信装置や携帯電話などとして利用される。 The present invention relates to an apparatus capable of outputting a voice message, and is used as, for example, a television broadcast receiving apparatus or a mobile phone.

従来より、操作ガイドを音声で行う音声メッセージ出力装置を搭載したテレビ放送受信録画装置などが提案されている。このようなテレビ放送受信録画装置では、例えば、録画予約に関する設定手順を音声案内する（特許文献１，特許文献２参照）。
特開平１１−６９２８３号公報特開２００５−３３８６６５号公報 2. Description of the Related Art Conventionally, a television broadcast receiving and recording device equipped with a voice message output device that performs voice operation guidance has been proposed. In such a television broadcast receiving and recording apparatus, for example, a setting procedure relating to recording reservation is voice-guided (see Patent Document 1 and Patent Document 2).
Japanese Patent Laid-Open No. 11-69283 JP 2005-338665 A

しかしながら、従来の音声メッセージ出力では、予め登録されている音声（音色）に従って音声ガイダンスが出力されるため、ユーザが望む音声ガイダンスを作成できるものではなかった。なお、音声ガイダンスとして、「標準語」と「地方語」とを用意しておいてユーザが任意の語を選択できたりするものは存在するが（特許文献２参照）、登録済みのものを単にユーザが選択したり、或いはネット上に存在する（登録済みの）音声データを単に選択できるに過ぎず、ユーザが自身で編集した好みの音声ガイダンスを聞くことができるものではなかった。 However, in the conventional voice message output, the voice guidance is output according to the voice (timbre) registered in advance, so that the voice guidance desired by the user cannot be created. As voice guidance, there are those that prepare “standard words” and “local languages” and allow the user to select arbitrary words (see Patent Document 2), but those that have already been registered are simply used. The user can simply select or select (registered) voice data existing on the net, and cannot listen to the user's favorite voice guidance edited by the user.

この発明は、上記事情に鑑み、ユーザが自身で編集した好みの音声ガイダンスを聞くことができる音声メッセージ出力装置を提供することを目的とする。 In view of the above circumstances, an object of the present invention is to provide a voice message output device that allows a user to listen to a favorite voice guidance edited by the user.

この発明の音声メッセージ出力装置は、上記の課題を解決するために、発生したイベントに応じた音声メッセージを選択して音声出力する音声メッセージ出力装置において、各イベントに対してどのような音声メッセージが現在選択されているかをユーザに提示するか又は音声メッセージの設定が可能であるイベントをユーザに提示する手段と、音声認識によって音声データから文字データ又は語句データを生成する手段と、生成された文字データ又は語句データとその生成の原因とされた音声データとを対応付ける手段と、前記生成された文字データ又は語句データに基づく文字又は語句をユーザ選択のために表示する手段と、ユーザにより選択された文字又は語句により文字列を生成する文字列生成手段と、生成された文字列に対する登録操作がユーザによりなされたかどうかを判断する手段と、前記登録操作がなされた場合には対応イベントに対して前記ユーザにより作成された文字列に基づく前記音声データの繋ぎ合わせによって音声出力がなされるように音声メッセージの変更登録を実行する手段と、を備えたことを特徴とする。 In order to solve the above problems, the voice message output device according to the present invention selects a voice message corresponding to an event that has occurred and outputs the selected voice message. Means for presenting the user with an event that is currently selected or for which an audio message can be set, means for generating character data or phrase data from speech data by speech recognition, and generated characters Means for associating the data or phrase data with the voice data that caused the generation thereof, means for displaying a character or phrase based on the generated character data or phrase data for user selection, and selected by the user Character string generation means for generating a character string from characters or phrases, and registration for the generated character string A means for determining whether or not an operation has been performed by the user, and when the registration operation has been performed, voice output is performed by joining the voice data based on a character string created by the user with respect to a corresponding event. And a means for executing change registration of voice messages.

上記の構成であれば、例えば、ユーザが好きな歌手の音声データから、文字や語句データを得て、該文字や語句を組み合わせることで文字列（メッセージ文）を作成し、この文字列に基づく音声出力を音声メッセージとして出力させることができる。すなわち、ユーザが自身で編集した好みの音声メッセージを聞くことができる。 With the above configuration, for example, characters and phrase data are obtained from voice data of a singer that the user likes, a character string (message sentence) is created by combining the characters and phrases, and based on this character string The voice output can be output as a voice message. That is, the user can listen to a favorite voice message edited by the user.

上記構成の音声メッセージ出力装置において、前記音声データを、通信、放送受信、データ転送、装着されたメモリからの読出のいずれか一つ又は複数により取得するように構成されていてもよい。 The voice message output device having the above-described configuration may be configured to acquire the voice data by any one or more of communication, broadcast reception, data transfer, and reading from a mounted memory.

また、この発明の音声メッセージ出力装置は、発生したイベントに応じた音声メッセージを選択して音声出力する音声メッセージ出力装置において、各イベントに対してどのような音声メッセージが現在選択されているかをユーザに提示するか又は音声メッセージの設定が可能であるイベントをユーザに提示する手段と、文字データ又は語句データに基づいて音声データを生成出力する音源手段と、ユーザによる文字又は語句の入力を受け付けて文字列を生成する文字列生成手段と、生成された文字列に対する登録操作がユーザによりなされたかどうかを判断する手段と、前記登録操作がなされた場合には対応イベントに対して前記ユーザにより作成された文字列に基づく前記音声データの繋ぎ合わせによって音声出力がなされるように音声メッセージの変更登録を実行する手段と、を備えたことを特徴とする。 Also, the voice message output device of the present invention is a voice message output device that selects and outputs a voice message corresponding to an event that has occurred, and the user can determine what voice message is currently selected for each event. Means for presenting an event that can be presented to the user or setting a voice message to the user, sound source means for generating and outputting voice data based on the character data or phrase data, and accepting input of characters or phrases by the user Character string generating means for generating a character string, means for determining whether or not a registration operation for the generated character string has been performed by the user, and when the registration operation has been performed, the corresponding event is generated by the user. So that voice output is performed by joining the voice data based on the character string It means for executing the change registration message, characterized by comprising a.

上記の構成であれば、例えば、ユーザは文字や語句を自由に組み合わせて文字列（メッセージ文）を作成し、この文字列に基づく音声出力を音声メッセージとして出力させることができる。すなわち、ユーザが自身で編集した好みの音声メッセージを聞くことができる。 With the above configuration, for example, the user can create a character string (message sentence) by freely combining characters and phrases, and output a voice output based on the character string as a voice message. That is, the user can listen to a favorite voice message edited by the user.

上記構成の音声メッセージ出力装置において、少なくとも４７種の音節の音声データが格納された五十音テーブルを前記音源手段として予め保持するか、又は該五十音テーブルをデータ通信、データ放送受信、データ転送、メモリ装着のいずれか一つ又は複数により後で保持できるように構成されており、前記文字列に基づく音声生成を前記五十音テーブルの音声データに基づいて行うこととしてもよい。かかる構成であれば、ユーザが好きな歌手やタレントの五十音テーブルを用いて、ユーザが自身で編集した好みの音声メッセージを聞くことができる。 In the voice message output apparatus having the above-described configuration, a Japanese syllabary table storing voice data of at least 47 types of syllables is held in advance as the sound source means, or the Japanese syllabary table is used for data communication, data broadcast reception, data It may be configured so that it can be held later by one or more of transfer and memory mounting, and the voice generation based on the character string may be performed based on the voice data of the Japanese syllabary table. With such a configuration, it is possible to listen to a favorite voice message edited by the user himself / herself using the singer / talent 50-sound table of the user's favorite.

前記五十音テーブルを少なくとも二以上保持し、ユーザによって選択された五十音テーブルを用いて前記文字列に基づく音声生成を行うこととしてもよい。 It is good also as carrying out the audio | voice generation | occurrence | production based on the said character string using the said 50 sound table selected by the user by holding at least 2 or more of the said 50 sound tables.

また、各数字キーに一つ又は複数の文字が割り当てられたテンキーを備えており、前記テンキーを用いて文字入力が行えるように構成されていてもよい。 Further, a numeric keypad in which one or a plurality of characters are assigned to each numeric key may be provided, and the character input may be performed using the numeric keypad.

また、これら音声メッセージ出力装置が携帯電話として構成されており、電話着信音及び／又はメール着信音を前記文字列に基づく音声生成によって出力するようにしてもよい。 Further, these voice message output devices may be configured as a mobile phone, and a telephone ringtone and / or a mail ringtone may be output by voice generation based on the character string.

この発明によれば、ユーザが自身で編集した好みの音声メッセージを聞くことができるという効果を奏する。 According to the present invention, there is an effect that a user can listen to a favorite voice message edited by himself.

以下、この発明の実施形態を図１乃至図７に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to FIGS.

図１はこの実施形態の音声メッセージ出力装置を搭載する携帯電話（地上ディジタル放送受信能付）２のブロック図である。 FIG. 1 is a block diagram of a cellular phone (with digital terrestrial broadcast receiving capability) 2 equipped with a voice message output device of this embodiment.

図１に示すように、携帯電話２は、地上波ディジタルチューナ２３０による放送受信により得られた符号化映像音声データ、或いは、スロット２０１に差し込まれたメモリカード３等から読み出した符号化映像音声データにより、映像を液晶表示パネル２０２上に映し出すとともに、音声をスピーカ２０３から出力するようになっている。また、地上波ディジタルチューナ２３０による放送受信により得られた符号化映像音声データを前記メモリカード３等に記録することもできる。なお，この実施形態では符号化映像音声データはＨ．２６４データであるとする。 As shown in FIG. 1, the cellular phone 2 uses encoded video / audio data obtained by broadcast reception by the terrestrial digital tuner 230 or encoded video / audio data read from the memory card 3 or the like inserted into the slot 201. As a result, the image is displayed on the liquid crystal display panel 202 and the sound is output from the speaker 203. Also, encoded video / audio data obtained by broadcast reception by the terrestrial digital tuner 230 can be recorded in the memory card 3 or the like. In this embodiment, the encoded video / audio data is H.264. Suppose that it is H.264 data.

メモリカード３から読み出したＨ．２６４データは、ＰＣＭＣＩＡインターフェイス２２０、及びシステムバス２１３を経てビデオデコーダ２０４に供給される。また、地上ディジタル放送を受信するチューナ２３０は、この地上ディジタル放送で放送されるビットストリームデータを取り出し、ビデオデコーダ２０４やオーディオデコーダ（ＭＰＥＧ４−ＡＡＣ）２０６に与える。 H. read from the memory card 3 The H.264 data is supplied to the video decoder 204 via the PCMCIA interface 220 and the system bus 213. The tuner 230 that receives the terrestrial digital broadcast takes out the bit stream data broadcast by the terrestrial digital broadcast and supplies it to the video decoder 204 and the audio decoder (MPEG4-AAC) 206.

ビデオデコーダ２０４は、ビットストリームデータを復号して量子化係数や動きベクトルを求め、逆ＤＣＴ変換や動きベクトルに基づく動き補償制御などを行うことによって得た映像データをグラフィックスコントローラ２０５に供給する。グラフィックスコントローラ２０５は映像データ（例えば、Ｒ，Ｇ，Ｂデータ）に対して色調整等の処理を施す。また、グラフィックスコントローラ２０５はＣＰＵ２０９から出力指示された文字等を液晶表示パネル２０２に表示するＯＳＤ（オンスクリーンディスプレイ）処理も行う。このＯＳＤ処理により、メニュー画面や文字入力画面などの生成が行える。オーディオデコーダ２０６は、ビットストリーム中の音声符号データを復号して音声データを生成する。ＳＤＲＡＭ２１０は、ビデオデコーダ２０４の上記処理において利用される。 The video decoder 204 obtains quantization coefficients and motion vectors by decoding the bit stream data, and supplies video data obtained by performing inverse DCT transform, motion compensation control based on motion vectors, and the like to the graphics controller 205. The graphics controller 205 performs processing such as color adjustment on video data (for example, R, G, B data). The graphics controller 205 also performs OSD (on-screen display) processing for displaying characters or the like output from the CPU 209 on the liquid crystal display panel 202. By this OSD processing, a menu screen, a character input screen, and the like can be generated. The audio decoder 206 decodes the audio code data in the bit stream to generate audio data. The SDRAM 210 is used in the above processing of the video decoder 204.

ＬＣＤコントローラ２０７は、グラフィックスコントローラ２０５から供給された映像データに基づいてＬＣＤパネル２０２を駆動する。Ｄ／Ａ変換器２０８は、オーディオデコーダ２０６から出力された音声データを受け取ってＤ／Ａ変換を行い、アナログ信号を生成してスピーカ２０３に与える。 The LCD controller 207 drives the LCD panel 202 based on the video data supplied from the graphics controller 205. The D / A converter 208 receives the audio data output from the audio decoder 206, performs D / A conversion, generates an analog signal, and supplies the analog signal to the speaker 203.

本体キー２１４に対する操作情報は、インターフェイス２１５及びシステムバス２１３を介してＣＰＵ２０９に与えられる。ＣＰＵ２０９は上記操作情報に基づいて必要な処理を実行する。本体キー２１４には、テンキー等の他、テレビ視聴開始キー、録画（録音）開始キー、方向キー（矢印キー）なども設けられている。 Operation information for the main body key 214 is given to the CPU 209 via the interface 215 and the system bus 213. The CPU 209 executes necessary processing based on the operation information. The main body key 214 is provided with a TV viewing start key, a recording (recording) start key, a direction key (arrow key), etc. in addition to a numeric keypad.

また、近距離無線ネットワークを可能にするために、通信ブロック２１６及びインターフェイス２１７が設けられている。更に、ＦｌａｓｈＲＯＭ２１８やＳＤＲＡＭ２１９も備える。電池２２１は二次電池であり、図示しない充電器から電力の供給を得てこれを蓄える。携帯電話部２３１は電話や電子メール等に係わる処理を行う。携帯電話部２３１における電話番号入力や文字入力には本体キー２１４におけるテンキーが用いられる。一方、テレビ視聴モードが選択されたときには、前記本体キー２１４におけるテンキーはチャンネル番号入力のためのキー等として機能する。 A communication block 216 and an interface 217 are provided to enable a short-range wireless network. Further, a flash ROM 218 and an SDRAM 219 are also provided. The battery 221 is a secondary battery, and obtains power from a charger (not shown) and stores it. The mobile phone unit 231 performs processing related to telephone calls, e-mails, and the like. A numeric keypad on the main body key 214 is used for inputting a telephone number and characters in the cellular phone unit 231. On the other hand, when the television viewing mode is selected, the numeric keypad in the main body key 214 functions as a key for inputting a channel number.

音声認識部２３２は、オーディオデコーダ２０６から出力される音声データを入力し、音声認識処理を行う。その詳細は後で説明する。音声認識結果は例えばＦｌａｓｈＲＯＭ２１８などの書き換え可能な不揮発性メモリに格納される。また、音声メッセージとなる音声データがＤ／Ａ変換部２０８に供給されると、スピーカ２０３を介して音声が出力される。 The voice recognition unit 232 receives the voice data output from the audio decoder 206 and performs voice recognition processing. Details will be described later. The voice recognition result is stored in a rewritable nonvolatile memory such as the FlashROM 218, for example. Further, when voice data serving as a voice message is supplied to the D / A converter 208, voice is output via the speaker 203.

ＣＰＵ２０９は、無線ネットワークのための処理、受信データに基づく各機能部の制御、ＦｌａｓｈＲＯＭ２１８やＳＤＲＡＭ２１９のリード／ライト制御なども行う。ＣＰＵ２０９を動作させるブートプログラムやアプリケーションプログラムは書き換え可能な不揮発性メモリに格納されており、これらブートプログラムやアプリケーションプログラムは揮発性メモリ上に展開される。音声メッセージ（音声ガイダンス）となる音声データファイルも、前記書き換え可能な不揮発性メモリに格納されており、音声データファイル単位（メッセージ単位）で書き換えることが可能である。なお、この実施形態では、後述するように、現在の選択ガイド文を提示（表示）するようにしているため、音声データファイルだけでなく、音声メッセージの文字列ファイル（テキストファイル）も前記不揮発性メモリに格納している。また、後でも説明するが、音声メッセージの変更登録においては、変更音声メッセージの文字列ファイルと該文字列の音声読み上げ処理にて得られた音声データファイルとを登録すればよいのであるが、該音声データファイルの登録を省くことも可能である。すなわち、文字列ファイルを有していれば、その文字列によって必要な時点で音声読み上げ処理を行うことでメッセージの音声出力が行えるからである。 The CPU 209 also performs processing for a wireless network, control of each functional unit based on received data, read / write control of the Flash ROM 218 and the SDRAM 219, and the like. A boot program and application program for operating the CPU 209 are stored in a rewritable nonvolatile memory, and these boot program and application program are expanded on the volatile memory. A voice data file serving as a voice message (voice guidance) is also stored in the rewritable nonvolatile memory and can be rewritten in units of voice data files (message units). In this embodiment, as will be described later, since the current selection guide sentence is presented (displayed), not only the voice data file but also the character string file (text file) of the voice message is non-volatile. Stored in memory. As will be described later, in the change registration of the voice message, the character string file of the changed voice message and the voice data file obtained by the voice reading process of the character string may be registered. It is also possible to omit the registration of the audio data file. That is, if there is a character string file, the voice of the message can be output by performing the voice reading process at the required time using the character string.

ＣＰＵ２０９は、放送受信などの基本処理の他、音声ガイダンス出力処理、音声ガイダンスのための編集処理、前記編集のための発話データ生成処理、前記発話データ生成のための音声認識処理等も行う。 In addition to basic processing such as broadcast reception, the CPU 209 performs voice guidance output processing, editing processing for voice guidance, utterance data generation processing for editing, voice recognition processing for generating utterance data, and the like.

［音声ガイダンス出力処理］
ＣＰＵ２０９は、音声ガイダンス（音声メッセージ）出力を要するイベントが発生したときに、該イベントに対応付けられている識別番号を得て、この識別番号で示されるファイルを得る。識別番号で示されるファイルの中身は、文字列に基づいて作成された（作成済みの）音声データでもよいし、音声データとされる前の文字列データであってもよい。音声データはＤ／Ａ変換器２０８に供給されて音声となる。音声ガイダンスの例としては、例えば、受信番組を録画する指令がユーザによってなされたときや音楽映像の再生指令がユーザによってなされたときに、ＣＰＵ２０９はメモリカード３がスロット２０１に装着されているかどうかを判断するが、このときにメモリカード３が装着されていなければ、「メモリカードをセットしてください」といった音声ガイダンスを行う。 [Voice guidance output processing]
When an event requiring voice guidance (voice message) output occurs, the CPU 209 obtains an identification number associated with the event, and obtains a file indicated by this identification number. The contents of the file indicated by the identification number may be voice data created (created) based on a character string, or may be character string data before being made voice data. The audio data is supplied to the D / A converter 208 and becomes audio. As an example of voice guidance, for example, when a command to record a received program is given by the user or a music video playback command is given by the user, the CPU 209 determines whether or not the memory card 3 is installed in the slot 201. If the memory card 3 is not attached at this time, voice guidance such as “Please set the memory card” is performed.

［音声ガイダンスのための編集処理］
例えば、メニュー画面において、「音声ガイダンス編集」といった項目を用意しておく。図２に編集画面の一例を示す。この画面では、イベント内容と現在選択されている音声ガイダンスとを表示している。この状態において例えば上下方向キーの操作をＣＰＵ２０９が検出すると、該ＣＰＵ２０９は画面内容をスクロールさせる。これにより、現在表示されている内容以外のイベント内容と現在選択されている音声ガイダンス文を画面上に表示させていくことができる。そして、ＣＰＵ２０９は、決定キーの押下を検出すると、その時点でカーソルが載せられているイベント内容と現在選択されている音声ガイダンスが、ユーザが編集を希望するイベント内容と現在選択されている音声ガイダンスであると判断する。上記決定キーが押下されると、図３に示す画面に切り替わる。 [Editing process for voice guidance]
For example, an item such as “edit voice guidance” is prepared on the menu screen. FIG. 2 shows an example of the edit screen. This screen displays the event contents and the currently selected voice guidance. In this state, for example, when the CPU 209 detects the operation of the up / down direction key, the CPU 209 scrolls the screen content. As a result, event contents other than the currently displayed contents and the currently selected voice guidance sentence can be displayed on the screen. When the CPU 209 detects that the enter key is pressed, the event content on which the cursor is placed and the currently selected voice guidance are the event content that the user wishes to edit and the currently selected voice guidance. It is judged that. When the determination key is pressed, the screen is switched to the screen shown in FIG.

図３に示す表示画面では、画面の上半分側には選択されたイベント内容と現在選択されている音声ガイダンスとが表示され、画面下半分側にはメッセージ文入力領域が形成される。この状態では、ＣＰＵ２０９は本体キー２１４におけるテンキーの押下を文字入力操作として受け付ける。ＣＰＵ２０９は電子メール文の作成（編集）時と同様に、ユーザによって入力された文字を画面（液晶表示パネル２０２）に表示していく処理を行う。図２に示している例では、イベントが「カード未装着時の警告ガイド」であり、現在の選択ガイド文が「メモリカードをセットしてください」であり、ユーザによって入力されたガイド文が「カードいれてんか」となっている。ＣＰＵ２０９は、この状態で決定キーの押下を検出すると、「カード未装着時の警告ガイド」のイベントについて、ガイダンス文を「カードいれてね」に変更する処理、すなわち、ガイド文字列の記憶及びイベントと当該ガイド文字列との対応付けを行う。この対応付けとは、現在編集中の識別番号（図の例では［０００１］）のファイルの中身を前記ガイド文字列（「カードいれてね」）に変更する処理でもある。なお、先にも述べたが、ガイド文字列は音声データに変換できるので、現在編集中の識別番号（図の例では［０００１］）のファイルの中身を前記音声データファイルに変更することとしてもよい。以下に示す例においても同様である。一方、前記ガイド文字列を記憶するのであれば、その読み上げ処理において用いる後述の五十音テーブルがどれであるかについての情報も現在編集中の識別番号（図の例では［０００１］）のファイルに保持しておくことになる。 In the display screen shown in FIG. 3, the selected event content and the currently selected voice guidance are displayed on the upper half side of the screen, and a message text input area is formed on the lower half side of the screen. In this state, the CPU 209 accepts pressing of the numeric keypad on the main body key 214 as a character input operation. The CPU 209 performs a process of displaying characters input by the user on the screen (liquid crystal display panel 202) in the same manner as when creating (editing) an e-mail message. In the example shown in FIG. 2, the event is “Warning guide when card is not installed”, the currently selected guide sentence is “Please set memory card”, and the guide sentence input by the user is “ Is it a card? " When the CPU 209 detects pressing of the enter key in this state, the process of changing the guidance sentence to “insert card” for the event “Warning guide when no card is inserted”, that is, storing the guide character string and the event Is associated with the guide character string. This association is also a process of changing the content of the file whose identification number is currently being edited ([0001] in the example of the figure) to the guide character string ("Please enter the card"). As described above, since the guide character string can be converted into voice data, the contents of the file of the identification number currently being edited ([0001] in the example in the figure) can be changed to the voice data file. Good. The same applies to the examples shown below. On the other hand, if the guide character string is stored, information about which of the later-described Japanese syllabary tables used in the reading process is also a file of the identification number currently being edited ([0001] in the example in the figure). To keep it.

ガイド文字列に対応した音声出力は、いわゆる読み上げソフトウェアを用いることで実現できる。ガイド文における各語の発話は、音声合成音でもよいし、或いは例えばテレビタレントに入力してもらった五十音テーブルに基づいて行うこととしてもよい。この五十音テーブルは、少なくとも日本語の４７種の音節の音声データが格納されていればよいが、濁音，半濁音，拗音の音声データが含まれているのが望ましい。更に、例えば、同じ「で」でも、上げ調子の「で」と下げ調子の「で」の音声データを持つのが望ましい。更に、五十音テーブルが「あー」、「いー」、・・のごとく長音の音声データを持つことにより、カードにおける「かー」の発話が行えることになる。例えば、タレントの○○花子の五十音テーブルを用いれば、あたかも○○花子が「カードいれてね」と言ってくれているように聴こえることになる。 The voice output corresponding to the guide character string can be realized by using so-called reading software. The utterance of each word in the guide sentence may be a voice synthesized sound, or may be performed based on, for example, a Japanese syllabary table input to a television talent. The Japanese syllabary table only needs to store audio data of at least 47 types of Japanese syllables, but preferably includes audio data of muddy sounds, semi-voiced sounds, and stuttering sounds. Further, for example, it is desirable to have voice data of “de” in the up tone and “de” in the down tone even for the same “de”. Furthermore, by having long sound data such as “ah”, “i”,..., The “50” table can utter “ka” on the card. For example, if you use the talent's XX Hanako syllabary table, you will hear as if XX Hanako is saying "Please put a card".

前記五十音テーブルは通信ブロック２１６を用いてインターネット上のサイトからダウンロードしてもよいし、携帯電話部２３１におけるファイル転送機能を用いてインターネット上のサイトからダウンロードしてもよいし、ディジタル放送におけるデータ放送で提供されているダウンロード用のデータをダウンロードしてもよい。勿論、五十音テーブルの提供は有料であってもよい。また、メーカーとタレントとが提携して、幾人かのタレントの五十音テーブルを予めメモリやメモリカードに格納しておくようにしてもよい。また、メーカー側では、数十人程度のタレントの五十音テーブルを提供できるようにしておき、ユーザから希望のあった５人程度のタレントの五十音テーブルを商品受け渡し時にメモリに格納するようにしてもよい。このような処理は、携帯電話の機種交換時におけるメールアドレス等のデータ転送の際に行うことができる。なお、或るファイルが五十音テーブルのファイルかどうかの判断のために、ファイル名中の識別子として五十音テーブルであることを示す特有の識別子を持たせることとすればよい。 The 50-sound table may be downloaded from a site on the Internet using the communication block 216, may be downloaded from a site on the Internet using a file transfer function in the mobile phone unit 231, or in digital broadcasting Data for download provided by data broadcasting may be downloaded. Of course, the provision of the Japanese syllabary table may be charged. In addition, a maker and a talent may partner to store a table of some talents in a memory or a memory card in advance. In addition, the manufacturer should be able to provide a table of about tens of talents, and store the table of about 50 talents desired by the user in the memory when delivering the product. It may be. Such processing can be performed at the time of data transfer such as an e-mail address at the time of changing the model of the mobile phone. In order to determine whether a certain file is a file in the Japanese syllabary table, a unique identifier indicating that it is a Japanese syllabary table may be provided as an identifier in the file name.

図４に編集画面の他の例を示す。この図４の画面は、先述の図３に示した編集画面に遷移する前段階で例えば「五十音テーブルから編集」の項目と「ユーザ録音から編集」の項目とを表示し、後者の「ユーザ録音から編集」が選択された場合に表示されることとしている。なお、「五十音テーブルから編集」が選択された場合に、ａ．○○花子、ｂ．○○太郎、ｃ．○○五郎のごとく、五十音テーブルの主体表示を行い、このうちの誰かがユーザによって選択された後に前記図３の画面に遷移するようにすればよい。 FIG. 4 shows another example of the edit screen. The screen of FIG. 4 displays, for example, the item “Edit from the Japanese syllabary table” and the item “Edit from the user recording” before the transition to the editing screen shown in FIG. This is displayed when "Edit from user recording" is selected. When “Edit from the Japanese syllabary table” is selected, a. XX Hanako, b. XXX Taro, c. As in the case of Goro, the main display of the Japanese syllabary table may be performed, and after the user is selected by the user, the screen shown in FIG.

図４に示す編集画面においては、イベント内容と現在選択されている音声ガイダンスとを画面の上半分側に表示している。そして、画面の下半分側には、音声ソース選択のための画面を表示している。この画面表示例では、音声ソースとして、「○○五郎／ユーザ録音／○年○月○日／ミュージック歌合戦」のタイトルが付された音声認識ファイルと、「○○花子／ユーザ録音／○年○月○日／ドラマ○○山河」のタイトルが付された音声認識ファイルとが存在している。ＣＰＵ２０９は決定キーが押下された時点でカーソルが載せられている方のファイルを開く処理を行う。 In the editing screen shown in FIG. 4, the event content and the currently selected voice guidance are displayed on the upper half side of the screen. A screen for selecting an audio source is displayed on the lower half side of the screen. In this screen display example, as a voice source, a voice recognition file with the title “XX Goro / User Recording / Year Year / Month / Day / Music Song Battle” and “XX Hanako / User Recording / Year There is a voice recognition file with the title of “Month Day Day / Drama XX Yamakawa”. The CPU 209 performs processing for opening the file on which the cursor is placed when the enter key is pressed.

選択されたファイル内には、図５に示すように、例えば、「します」、「できません」、「操作」、「メニュー」、「ディジタル」、「設定」、「カード」、「下さい」、「手紙」、「来た」、「が」といった語句（文字列）及びこれらに対応する音声データが存在している。ユーザは、この編集画面上で、語句を順次選択してガイダンス文を作成する。例えば、「できません」、「カード」、「下さい」の語句を順次選択していくと、「できませんカード下さい」のガイダンス文ができあがり、○○五郎の声で「できませんかーどください」の音声出力が行えることになる。ユーザが個人的に使う機器におけるガイダンス文であるから、ユーザが理解できればガイダンス文として十分に成立する。語句の選択はカーソル移動のための方向キー操作と決定キーの操作で行うことができる。 In the selected file, as shown in FIG. 5, for example, “Yes”, “No”, “Operation”, “Menu”, “Digital”, “Setting”, “Card”, “Please”, There are phrases (character strings) such as “letter”, “coming”, “ga”, and voice data corresponding to them. On the editing screen, the user sequentially selects words and creates a guidance sentence. For example, if you select the words "I can't do", "Card", and "Please" in order, the guidance sentence "I can't do it" will be completed, and the voice output of "I can't do it" will be output in the voice of XX Goro It will be possible. Since it is a guidance sentence in a device that the user personally uses, if the user can understand it, the guidance sentence is sufficiently established. The word can be selected by operating the direction key for moving the cursor and the operation of the enter key.

図６に携帯電話用ガイダンスにおける編集画面の一例を示す。画面の上半分側には、選択されたイベント内容と現在選択されている音声ガイダンスとが表示され、画面下半分側には、メッセージ入力領域が形成される。この状態では、ＣＰＵ２０９は本体キー２１４におけるテンキーの押下を文字入力として受け付ける。ＣＰＵ２０９は電子メール文の作成（編集）時と同様に、ユーザによって入力された文字を画面に表示していく処理を行う。図６に示している例では、イベントが「メール着信音出力」で、メッセージ文は存在しておらず、着信メロディとして○○が設定されていることを示している。ユーザは、文字入力によって、例えば、「メールがきましたよすぐみてくださいね」といった文言を入力する。ＣＰＵ２０９は、上記の状態で決定キーの押下を検出すると、イベントが「メール着信音出力」で、音声メッセージを「メールがきましたよすぐみてくださいね」に変更する処理を行う。 FIG. 6 shows an example of an edit screen in the mobile phone guidance. The selected event content and the currently selected voice guidance are displayed on the upper half side of the screen, and a message input area is formed on the lower half side of the screen. In this state, the CPU 209 accepts pressing of the numeric keypad on the main body key 214 as a character input. The CPU 209 performs a process of displaying characters input by the user on the screen in the same manner as when creating (editing) an e-mail message. In the example shown in FIG. 6, the event is “email ringtone output”, the message text does not exist, and XX is set as the ringing melody. The user inputs, for example, a word such as “I received an e-mail. When the CPU 209 detects that the determination key is pressed in the above state, the CPU 209 performs a process of changing the event to “email ringtone output” and the voice message to “I received a mail now.

メッセージ文における各語の発話は、先述のごとく、テレビタレントに入力してもらった五十音テーブルに基づいて行うこととしてもよい。例えば、タレントの○○花子の五十音テーブルを用いれば、あたかも○○花子が「メールがきましたよすぐみてくださいね」と言ってくれているように聴こえることになる。また、「携帯電話用ガイダンス」における編集においても、図４及び図５に示したごとく、音声ソースとして、「○○五郎／ユーザ録音／○年○月○日／ミュージック歌合戦」のタイトルが付されたファイルを用いてメッセージ文を作成することができる。例えば、「手紙が来た」といったメッセージ文を作成すれば、○○五郎の声で「てがみがきた」の音声出力が行われる。 As described above, the utterance of each word in the message sentence may be performed based on the Japanese syllabary table input by the television talent. For example, if you use the entertainer table of the talent XX Hanako, you will hear as if XX Hanako is saying "I've got an email. In addition, as shown in FIG. 4 and FIG. 5, the title of “XX Goro / User Recording / Year Year / Month Day / Music Song Battle” is added as an audio source in the editing in “Guide for Mobile Phone”. A message sentence can be created using the created file. For example, if a message sentence such as “letter has arrived” is created, the voice of “Tegami-gai” will be output with the voice of XX Goro.

また、或る条件で登録されているメール発信者用のメール着信音（グループ１のメール着信音）、他の条件で登録されているメール発信者用のメール着信音（グループ２のメール着信音）のごとく、或いは、送信者個人単位で、異なる着信音声メッセージを作成できるようにしておいてもよい。個人単位の場合は、図７に示すように、例えば、「じろうさんからメールです」といったガイダンス文を作成することが考えられる。これらのことは電話着信についても同様に適用できる。なお、このような処理を実行する場合においては、基本（共通）となる識別番号で示される音声データファイル（例えば、［０１１０］）と共に、枝番が付された識別番号（例えば、○○次郎については［０１１０−１］）の音声データファイルを併存させることになる。ＣＰＵ２０９は、メール着信があったときに（イベントが発生したときに）、送信者メールアドレスとメールリストとに基づいて誰からのメールかを判断し、特定できた相手に対して音声ガイダンスの識別番号が存在するかどうかを判断し（例えば、○○次郎からのメールであれば、上記識別番号［０１１０−１］が得られる）、識別番号が存在すれば、その識別番号の音声データファイルを再生することになる。なお、個人単位の編集を容易にするために、メールアドレスの選択を経て（メールリスト画面を経て）、上記メール着信音編集処理のための画面に移行できるようにしておいてもよい。 In addition, a mail ring tone for a mail sender registered under a certain condition (group 1 mail ring tone), a mail ring tone for a mail sender registered under other conditions (a group 2 mail ring tone) ), Or different incoming voice messages may be created for each individual sender. In the case of an individual unit, as shown in FIG. 7, for example, it is conceivable to create a guidance sentence such as “E-mail from Jiro”. These can be similarly applied to incoming calls. When such a process is executed, an identification number (for example, Jiro XX) with a branch number is added to the audio data file (for example, [0110]) indicated by the basic (common) identification number. [0110-1]) will be co-existing. When the mail arrives (when an event occurs), the CPU 209 determines who the mail is based on the sender mail address and the mail list, and identifies the voice guidance for the identified partner. It is determined whether or not the number exists (for example, if the mail is from Jiro XX, the above identification number [0110-1] is obtained). If the identification number exists, the voice data file of the identification number is obtained. Will play. In order to facilitate editing in units of individuals, it may be possible to shift to a screen for the above-described mail ringing tone editing process after selecting a mail address (via a mail list screen).

また、ディジタルカメラ付きの構成とする場合には、シャッタ押下時に「撮るよ」といった音声メッセージを出力することが可能である。そして、上記「撮るよ」に代えて、「撮るでー」といったメッセージ文をユーザが作成編集して前記五十音テーブルを用いて読み上げさせることも可能である。 In the case of a configuration with a digital camera, it is possible to output a voice message such as “I will shoot” when the shutter is pressed. Then, instead of “shooting”, it is also possible for the user to create and edit a message sentence such as “shoot it” and to read it out using the above-mentioned 50-sound table.

［編集のための発話データ生成処理］
先述した「○○五郎／ユーザ録音／○年○月○日／ミュージック歌合戦」の音声認識ファイルを作成する処理について説明していく。携帯電話２はユーザ操作に基づいて、「ミュージック歌合戦」という番組をディジタル放送受信によって受信録音する。これにより、録音ファイルが作成される。すなわち、ユーザは、ＴＶ番組から好きなタレントの音声を録音する。勿論、インターネット上で提供される音声ファイルをダウンロードしてもよい。また、他の放送受信装置で「ミュージック歌合戦」を受信してメモリカードに録音を行い、このメモリカードをスロット２０１に装着することとしてもよい。 [Speech data generation processing for editing]
The process of creating the voice recognition file of “XX Goro / User Recording / Year Year / Month / Day / Music Song Battle” will be described. The mobile phone 2 receives and records a program called “music song battle” by digital broadcast reception based on a user operation. Thereby, a recording file is created. That is, the user records the sound of a favorite talent from the TV program. Of course, an audio file provided on the Internet may be downloaded. Alternatively, the “music song battle” may be received by another broadcast receiving device, recorded on a memory card, and the memory card may be inserted into the slot 201.

例えば、メニュー画面上に「音声認識ファイル作成」といった項目を用意しておく。ＣＰＵ２０９は、「音声認識ファイル作成」の項目が選択されたことを検出すると、録音ファイル一覧画面を表示する。ＣＰＵ２０９は、ユーザによって選択された録音ファイルを開いて音声データを生成し、この音声データを音声認識処理にかけ、これによって得られた音声認識ファイルを書き換え可能な不揮発性メモリ上に記録する。すなわち、ユーザは、ＴＶ番組から好きなタレントの音声を録音し、この録音データから好きなタレントの発話データ（文字データ及び音声データ）を獲得できたことになる。音声認識とは、音声データを文字データに変換する処理であり、いわゆる「書き起こし」とも呼ばれているが、ＣＰＵ２０９は、かかる処理に加えて、音声認識で得られた文字データと、該文字データに対応している音声データ部分とを対応付ける処理（音声認識結果を得る原因となった音声データ部分と音声認識結果（文字データ）との対応付け）も行う。ただし、いわゆる音声認識というときには、言葉の意味の識別（語句（複数文字単位）の識別）が基本的には必要となるが、本願の音声認識では言葉の意味の識別は特に必要ではない。一文字単位で音声データ部分と文字とを対応付ければ足りる。例えば、「これからかえります」という音声認識結果が得られたとしたなら、「こ」の音声データ部分、「れ」の音声データ部分、「か」の音声データ部分、・・のごとく、一文字とその音声データとの対応がとれればよい。 For example, an item such as “Create voice recognition file” is prepared on the menu screen. When the CPU 209 detects that the “voice recognition file creation” item has been selected, it displays a recording file list screen. The CPU 209 opens a recording file selected by the user, generates voice data, applies the voice data to voice recognition processing, and records the voice recognition file obtained thereby on a rewritable nonvolatile memory. That is, the user can record the voice of the favorite talent from the TV program, and can acquire the speech data (character data and voice data) of the favorite talent from the recorded data. Voice recognition is a process of converting voice data into character data, which is also called “transcription”. In addition to such processing, the CPU 209 adds character data obtained by voice recognition and the character data. A process of associating the voice data part corresponding to the data (corresponding to the voice data part causing the voice recognition result and the voice recognition result (character data)) is also performed. However, in the case of so-called speech recognition, identification of the meaning of words (identification of words (units of plural characters)) is basically required, but the identification of the meaning of words is not particularly necessary in the speech recognition of the present application. It is sufficient to associate the voice data portion with the character in units of one character. For example, if the voice recognition result “I will return from now on” is obtained, the voice data part of “ko”, the voice data part of “re”, the voice data part of “ka”, etc. It is only necessary to be able to take correspondence with audio data.

例えば、「こ」の文字コード及び「こ」の音声データは、「○○五郎／ユーザ録音／○年○月○日／ミュージック歌合戦」の認識ファイル内で対のファイルとして保存され、また、それぞれのファイルにはデータ種の識別のために識別子が付される。「こ」の文字コードは、ユーザが文字列を作成する際の文字提示（或いは語句提示，図５参照）のために用いられ、「こ」の音声データは、作成された文字列に「こ」が含まれる音声ガイダンスの音声出力において用いられる。 For example, the character code of “ko” and the voice data of “ko” are stored as a pair of files in the recognition file of “XX Goro / User Recording / Year / Month / Day / Music Song Battle” Each file is assigned an identifier for identifying the data type. The character code of “ko” is used for character presentation (or phrase presentation, see FIG. 5) when the user creates a character string, and the voice data of “ko” is added to the created character string. ”Is used in voice output of voice guidance.

なお、ユーザが、「これから」の一連部分を指定した場合に、ＣＰＵ２０９が「こ」「れ」「か」「ら」の複数音声データを一纏まりとして結合することにより、音声認識ファイル内に「これから」という語句（文字列）及びこれらに対応する音声データを存在させることもできる。勿論、辞書データを備えることにより、語句単位での音声認識が可能となる。 When the user designates a series part of “from now on”, the CPU 209 combines a plurality of voice data of “ko”, “re”, “ka”, and “ra” as a group, so that “ The phrase “from now on” (character string) and voice data corresponding to them can also be present. Of course, by providing dictionary data, speech recognition can be performed in units of words.

また、いわゆる電話音声認識システムにおいては、電話の音声の全てを文字データに変換するのではなく、音声認識の困難性に鑑み、あらかじめ用意したキーワードだけを探し出す（認識する）ことが行われている。本願発明においても、「します」、「できません」、「そうさ」、「めにゅー」、「でぃじたる」、「せってい」、「かーど」、「ください」といった語句（文字列）をキーワードとして登録しておき、これらに一致する認識結果だけに基づいて語句（文字列）及びこれらに対応する音声データを保存するようにしてもよい。 In the so-called telephone voice recognition system, not all the voice of a telephone is converted into character data, but only a keyword prepared in advance is searched (recognized) in view of the difficulty of voice recognition. . In the present invention, phrases (character strings) such as “Shi”, “I can't”, “Sosa”, “Menyu”, “Dijitaru”, “Seite”, “Kado”, “Please” are used. It may be registered as a keyword, and a phrase (character string) and voice data corresponding to these may be stored based on only the recognition result that matches them.

また、音声部分の途切れ（アナログ放送音声であれば判断することが可能）を認識することにより、音声（文）を分割することができる。例えば、「これからかえります」ではなく、「これから」「かえります」のごとく、分割して認識することが可能である。 Further, by recognizing a break in the audio part (can be determined if the sound is analog broadcast sound), the sound (sentence) can be divided. For example, it is possible to divide and recognize as “From now on” and “Return from home” instead of “Return from now on”.

なお、録音ファイルを再生して得た音声データ（非圧縮）を音声認識処理にかけることに限定するものではない。放送受信などで得られた音声データをリアルタイムで音声認識処理にかけることも可能である。また、字幕文データが得られる場合には、この字幕文データを参照して音声認識の精度を高めることもできる。例えば、音声認識結果が「今日駅で待ってて」であったとし、この音声認識結果を得た時点或いはその前後の所定時間の範囲において出力された字幕文データとして「東京駅で待ってて」が存在しているとする。ＣＰＵ２０９は、文字一致判定を実行することで、「駅で待ってて」において一致し、「今日」と「東京」とで不一致との判断結果を得る。ＣＰＵ２０９は、不一致部分について字幕文データで示される語句が正しいとして文字認識結果を修正する。 Note that the present invention is not limited to applying voice data (uncompressed) obtained by reproducing a recording file to voice recognition processing. It is also possible to apply voice data obtained by broadcast reception to voice recognition processing in real time. In addition, when subtitle sentence data is obtained, the accuracy of voice recognition can be improved by referring to the subtitle sentence data. For example, if the speech recognition result is “Wait at the station today”, the subtitle sentence data output at the time when the speech recognition result was obtained or within a predetermined time range before and after the speech recognition result was obtained. Is present. The CPU 209 executes the character match determination, thereby obtaining a match result in “Wait at the station” and determining that “Today” and “Tokyo” do not match. The CPU 209 corrects the character recognition result on the assumption that the phrase indicated by the caption text data is correct for the mismatched portion.

また、上記の例では、テレビタレントが五十音テーブルを提供する例を示したが、これに限らず、テレビタレントが、「します」、「できません」、「操作」、「メニュー」、「ディジタル」、「設定」、「カード」、「下さい」といった語句の音声データを提供することとしてもよい。 In the above example, an example is shown in which a TV talent provides a Japanese syllabary table. However, the present invention is not limited to this, and a TV talent can perform “Yes”, “No”, “Operation”, “Menu”, “ It is also possible to provide audio data of phrases such as “digital”, “setting”, “card”, “please”.

また、上記の例では、各イベントに対してどのような音声メッセージが現在選択されているかをユーザに提示したが、現在選択されている音声メッセージの提示を省いてもよい。すなわち、音声メッセージの設定が可能である各イベント（イベント内容）をユーザに提示するだけとしてもよい。 In the above example, the voice message currently selected for each event is presented to the user. However, presentation of the currently selected voice message may be omitted. That is, each event (event content) for which a voice message can be set may be presented to the user.

また、前記音声メッセージとなる文字列の読み上げにおいて、読み上げの速度調整等が行えるようにしておいてもよい。 Further, it may be possible to adjust the reading speed when reading a character string to be the voice message.

また、上記の例では、携帯電話を例示したが、これに限るものではなく、車載型のテレビ受信装置、通信機能付きのカーナビゲーションシステム、レコーダやプレーヤ、据え置き型のテレビ受信装置として構成することができる。また、ディジタル放送受信に限らず、アナログ放送受信を行う装置としてもよい。データ登録のためには、アナログ音声をディジタル信号に変換する。 In the above example, a mobile phone is illustrated, but the present invention is not limited to this, and it is configured as a vehicle-mounted television receiver, a car navigation system with a communication function, a recorder or a player, and a stationary television receiver. Can do. Moreover, it is good also as an apparatus which performs not only digital broadcast reception but analog broadcast reception. For data registration, analog voice is converted into a digital signal.

また、上記の例では、テンキーを用いて文字入力を行ったが、これに限るものではなく、例えば、比較的大きなディスプレイを有する場合には、その画面にキーボードを表示し、方向キーと決定キーの操作で文字を特定して入力が行えるようにしてもよい。勿論、キーボードを備える構成も採用できる。 In the above example, characters are input using the numeric keypad. However, the present invention is not limited to this. For example, when a relatively large display is provided, a keyboard is displayed on the screen, and a direction key and a determination key are displayed. The character may be specified and input can be performed by the above operation. Of course, a configuration including a keyboard can also be employed.

ところで、固定的なデータとしての音声ガイドデータをハードディスクなどに記録する機器では、前記音声ガイドデータ（固定的データ）をファイルとして管理するファイルシステムを搭載している。アプリケーションソフトは、ファイルシステムが提供するＡＰＩ（アプリケーション・プログラミング・インターフェイス）を用いて、ハードディスにアクセスする。ファイルシステムは、アプリケーションソフトが読み書きをすることを許容する一般パーティションと、特殊操作によってのみ書き込みが可能でアプリケーションソフトは通常は読み出しかできない特殊パーティションとに分けてファイルを管理している。ただし、特殊パーティションでも、書き込み許可モードに変更すると書き込みは可能になる。従って、特殊パーティションに音声ガイドデータを書き込む時には、特殊パーティションを書き込み許可モードに変更した上でデータを書き込み、データを書き込んだ後に再び特殊パーティションを書き込み禁止にする手順を取ることとする。 Incidentally, a device that records voice guide data as fixed data on a hard disk or the like is equipped with a file system that manages the voice guide data (fixed data) as a file. The application software accesses the hard disk using an API (Application Programming Interface) provided by the file system. The file system manages files by dividing them into a general partition that allows application software to read and write, and a special partition that can be written only by special operations and that application software cannot normally read. However, even special partitions can be written by changing to the write-permitted mode. Therefore, when voice guide data is written to the special partition, the special partition is changed to the write permission mode, the data is written, and after the data is written, the special partition is again prohibited from being written.

この発明の実施形態の携帯電話（音声メッセージ出力装置）の構成を示したブロック図である。It is the block diagram which showed the structure of the mobile telephone (voice message output device) of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention. この発明の実施形態の編集画面例を示した説明図である。It is explanatory drawing which showed the example of the edit screen of embodiment of this invention.

Explanation of symbols

２携帯電話
２０２ＬＣＤパネル
２０５グラフィックスコントローラ
２０７ＬＣＤコントローラ
２０９ＣＰＵ
２３０地上波ディジタルチューナ
２３１携帯電話部
２３２音声認識部 2 Mobile phone 202 LCD panel 205 Graphics controller 207 LCD controller 209 CPU
230 Terrestrial digital tuner 231 Mobile phone unit 232 Voice recognition unit

Claims

In a voice message output device that selects and outputs a voice message corresponding to an event that has occurred, it is possible to present to the user what voice message is currently selected for each event or to set the voice message Means for presenting the event to the user, means for generating character data or phrase data from the voice data by voice recognition, means for associating the generated character data or phrase data with the voice data that caused the generation A means for displaying a character or phrase based on the generated character data or phrase data for user selection, a character string generation means for generating a character string based on the character or phrase selected by the user, and Means for determining whether or not a registration operation has been performed on the character string by the user; Means for registering the change of the voice message so that the voice output is performed by joining the voice data based on the character string created by the user in response to the corresponding event. A voice message output device characterized by the above.

2. The voice message output device according to claim 1, wherein the voice data is acquired by one or more of communication, broadcast reception, data transfer, and reading from a mounted memory. Voice message output device.

In a voice message output device that selects and outputs a voice message corresponding to an event that has occurred, it is possible to present to the user what voice message is currently selected for each event or to set the voice message Means for presenting an event to the user, sound source means for generating and outputting voice data based on character data or phrase data, means for generating character data or phrase data from voice data by voice recognition, and generated characters Means for associating the data or phrase data with the voice data that caused the generation thereof, means for displaying a character or phrase based on the generated character data or phrase data for user selection, and selected by the user the characters or phrases for inputting text or phrase by either or user generates a character string A character string generating means for generating a character string by dragging, a means for determining whether or not a registration operation for the generated character string has been performed by the user, and for the corresponding event when the registration operation is performed, the user A voice message output device comprising: means for performing change registration of a voice message so that voice output is performed by joining the voice data based on the character string created by the above.

4. The voice message output device according to claim 3, wherein a syllabary table storing at least 47 types of syllable speech data is stored in advance as the sound source means, or the syllabary table is used for data communication and data broadcasting. It is configured so that it can be held later by one or more of reception, data transfer, and memory mounting, and voice generation based on the character string is performed based on voice data of the Japanese syllabary table Voice message output device.

5. The voice message output device according to claim 4, wherein at least two or more of the Japanese syllabary tables are held, and voice generation based on the character string is performed using a Japanese syllabary table selected by a user. Voice message output device.

The voice message output device according to claim 4 or 5, further comprising a numeric keypad in which one or more characters are assigned to each numeric key, and is configured to allow character input using the numeric keypad. A voice message output device.

The voice message output device according to any one of claims 1 to 6, wherein the voice message output device is configured as a mobile phone, and outputs a telephone ringtone and / or a mail ringtone by voice generation based on the character string. Voice message output device.