JP2006127443A

JP2006127443A - E-mail transmitting terminal and e-mail system

Info

Publication number: JP2006127443A
Application number: JP2004373043A
Authority: JP
Inventors: Motoyasu Tanaka; 基康田中; Yusuke Nara; 裕介奈良
Original assignee: MegaChips LSI Solutions Inc
Current assignee: MegaChips Corp
Priority date: 2004-09-30
Filing date: 2004-12-24
Publication date: 2006-05-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide an e-mail system with richer expression using voice. <P>SOLUTION: The voice inputted from a microphone is decomposed into recording data elements V1, V2, V3 by noise gate processing. Timing markers by every bar are set in BGM data (musical piece data with marker) and each of the recording data elements V1, V2, V3 is combined at head positions of each bar of the BGM data. In addition, sound effect data is combined in optional timing according to a user specification operation. The combined sound data generated in this way is transmitted as voice e-mail after compression processing is performed to it in a compression form such as an MP3, for example. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、電子メールのコンテンツ作成および送信技術に関する。 The present invention relates to a technique for creating and sending e-mail content.

携帯電話端末を利用した電子メールは、時間と場所を選ぶことなく手軽にメッセージ交換を行うことができるというメリットがある。携帯電話端末を利用した電子メールでは、テキスト文字に加えて、表現力を高めることを目的とした絵文字などの利用が行われている。また昨今、携帯電話端末の高機能化に伴い、静止画像や動画像を含んだメッセージ交換を行うことも可能となっている。 E-mail using a mobile phone terminal has an advantage that messages can be exchanged easily without selecting time and place. In e-mail using a mobile phone terminal, in addition to text characters, pictograms and the like for the purpose of enhancing expressiveness are used. Recently, along with the enhancement of functions of mobile phone terminals, it is also possible to exchange messages including still images and moving images.

このように、手軽なコミュニケーションの手段として日常的に利用されている携帯電話メールにおいて、ユーザは、他人とは違った表現を求め、絵文字や顔文字を駆使したオリジナリティ溢れるメールを交換し、あるいは、画像を用いたメールの交換などを行っている。しかし、音声メディアを利用したメールサービスは数少なく、多くは視覚的な効果を狙ったメールサービスが提供されているにとどまっている。 In this way, in mobile phone mail that is routinely used as a means of easy communication, users seek different expressions from others, exchange emails full of originality using pictograms and emoticons, or Exchanges emails using images. However, there are few e-mail services using audio media, and many e-mail services are aimed at visual effects.

音声メディアを用いたメールサービスの例として、下記非特許文献１の「歌う♪メール」が挙げられる。これは、アプリケーションプログラムを利用し、入力した仮名文字等を歌声に変換するものである。そして、声質・メロディ・伴奏からなるアレンジの変更を自由に行うことを可能としている。 As an example of a mail service using audio media, “singing ♪ mail” of Non-Patent Document 1 below can be cited. This uses an application program to convert an input kana character or the like into a singing voice. The arrangement of voice quality, melody and accompaniment can be freely changed.

http://www.g-search.or.jp/release/2004/20040726.htmlhttp://www.g-search.or.jp/release/2004/20040726.html

携帯電話端末を利用した電子メールでは、さらにメッセージに対する表現力の多様性が求められている。携帯電話がコミュニケーション手段の一つとして大きな役割を担う中で、電子メールの有する表現力を強化することの意義は大きい。 In e-mail using a mobile phone terminal, diversity of expressiveness for messages is further demanded. While mobile phones play a major role as one of the means of communication, it is significant to enhance the expressiveness of e-mail.

上記非特許文献１では、アプリケーションプログラムを利用し、入力した文字を人工音声に変換しているが、人工音声では人間特有のアクセントや感情を表現することが困難である。 In the said nonpatent literature 1, although the input character is converted into an artificial voice using an application program, it is difficult to express an accent and emotion peculiar to a person with an artificial voice.

そこで、この発明は、音声メディアを利用した表現豊かな電子メールシステムを提供することを目的とする。 Accordingly, an object of the present invention is to provide an expressive e-mail system using audio media.

上記課題を解決するため、請求項１記載の発明は、音声入力手段と、前記音声入力手段により入力した音声を録音し、録音データとして記憶する手段と、前記録音データと楽曲データとを合成し、合成音声データを生成する合成処理手段と、前記合成音声データを電子メールとして送信する送信手段と、を備えることを特徴とする。 In order to solve the above-mentioned problem, the invention described in claim 1 is characterized in that voice input means, means for recording the voice input by the voice input means and storing it as recorded data, and synthesizing the recorded data and music data. And a synthesis processing means for generating synthesized voice data and a sending means for sending the synthesized voice data as an e-mail.

請求項２記載の発明は、請求項１に記載の電子メール送信端末において、前記楽曲データは、効果音データ、を含み、前記合成処理手段は、前記合成音声データの再生時間軸上の指定されたポイントに前記効果音データを合成する手段、を含むことを特徴とする。 According to a second aspect of the present invention, in the electronic mail transmitting terminal according to the first aspect, the music data includes sound effect data, and the synthesis processing means is designated on a reproduction time axis of the synthesized voice data. Means for synthesizing the sound effect data at a point.

請求項３記載の発明は、請求項１または請求項２に記載の電子メール送信端末において、前記楽曲データは、前記電子メール送信端末に着脱可能な記録媒体または前記電子メール送信端末が接続可能なネットワーク上のサーバの記憶装置に格納されていることを特徴とする。 According to a third aspect of the present invention, in the electronic mail transmitting terminal according to the first or second aspect, the music data can be connected to a recording medium removable from the electronic mail transmitting terminal or the electronic mail transmitting terminal. It is stored in a storage device of a server on the network.

請求項４記載の発明は、請求項３に記載の電子メール送信端末において、前記記録媒体または前記記憶装置に格納されている前記楽曲データは暗号化されており、前記合成処理手段は、暗号化されている前記楽曲データを復号化する手段、を含むことを特徴とする。 According to a fourth aspect of the present invention, in the electronic mail transmitting terminal according to the third aspect, the music data stored in the recording medium or the storage device is encrypted, and the composition processing means is encrypted. Means for decoding the music data that has been recorded.

請求項５記載の発明は、請求項１ないし請求項４のいずれかに記載の電子メール送信端末において、前記楽曲データは、予め再生時間軸上に１つあるいは複数のタイミングマーカが設定されたマーカ付き楽曲データ、を含み、前記合成処理手段は、前記録音データのうち所定音量以下の部分を無音部分とし、前記録音データを無音部分を区切りとした複数の録音データ要素に分解する手段と、前記複数の録音データ要素のそれぞれを前記マーカ付き楽曲データの前記タイミングマーカで設定された時間に同期させて合成させる手段と、を含むことを特徴とする。 According to a fifth aspect of the present invention, in the electronic mail transmitting terminal according to any one of the first to fourth aspects, the music data is a marker in which one or more timing markers are set in advance on the reproduction time axis. The composition processing means includes a portion below the predetermined volume of the recording data as a silent portion, and decomposes the recording data into a plurality of recording data elements separated by a silent portion; and Means for synthesizing each of a plurality of recording data elements in synchronism with the time set by the timing marker of the music data with marker.

請求項６記載の発明は、請求項５に記載の電子メール送信端末において、前記マーカ付き楽曲データは、各小節の先頭にタイミングマーカが設定された音楽データ、を含むことを特徴とする。 According to a sixth aspect of the present invention, in the electronic mail transmitting terminal according to the fifth aspect, the music data with a marker includes music data in which a timing marker is set at the head of each measure.

請求項７記載の発明は、請求項１ないし請求項６のいずれかに記載の電子メール送信端末において、前記合成処理手段は、ユーザによる指定操作に応答して前記合成音声データの時間軸上の任意のポイントに映像データを合成し、映像データ付き合成音声データを生成する手段、を含み、前記送信手段は、前記映像データ付き合成音声データを電子メールとして送信する手段、を含むことを特徴とする。 According to a seventh aspect of the present invention, in the electronic mail sending terminal according to any one of the first to sixth aspects, the synthesizing processing unit responds to a designation operation by a user on the time axis of the synthetic voice data. Means for synthesizing video data at an arbitrary point and generating synthesized audio data with video data, and the transmission means includes means for transmitting the synthesized audio data with video data as an e-mail. To do.

請求項８記載の発明は、請求項１ないし請求項６のいずれかに記載の電子メール送信端末において、さらに、前記合成音声データを音声圧縮変換する手段、を備えることを特徴とする。 According to an eighth aspect of the present invention, in the electronic mail transmitting terminal according to any one of the first to sixth aspects, the electronic mail transmission terminal further comprises means for compressing and converting the synthesized voice data.

請求項９記載の発明は、請求項７に記載の電子メール送信端末において、さらに、前記映像データ付き合成音声データを動画圧縮変換する手段、を備えることを特徴とする。 According to a ninth aspect of the present invention, in the electronic mail transmitting terminal according to the seventh aspect, the electronic mail transmitting terminal further comprises means for compressing and converting the synthesized audio data with video data.

請求項１０記載の発明は、請求項１ないし請求項６のいずれかに記載の電子メール送信端末において、前記電子メール送信端末は携帯電話端末であり、前記合成音声データは、携帯電話端末において規定されている標準音楽データ形式に変換されることを特徴とする。 According to a tenth aspect of the present invention, in the electronic mail transmitting terminal according to any one of the first to sixth aspects, the electronic mail transmitting terminal is a mobile phone terminal, and the synthesized voice data is defined in the mobile phone terminal. It is converted to a standard music data format.

請求項１１記載の発明は、請求項１ないし請求項１０のいずれかに記載の電子メール送信端末において、さらに、合成処理のルールを記録したシナリオデータを生成する手段、を備え、前記送信手段は、前記録音データと前記シナリオデータを電子メールとして送信する手段、を含むことを特徴とする。 An eleventh aspect of the present invention is the electronic mail transmitting terminal according to any one of the first to tenth aspects, further comprising means for generating scenario data in which a rule for the synthesis process is recorded, wherein the transmitting means And means for transmitting the recording data and the scenario data as an electronic mail.

請求項１２記載の発明は、請求項１ないし請求項１１のいずれかに記載の電子メール送信端末において、さらに、前記音声入力手段により入力した音声を処理する処理手段、を備え、前記処理手段は、音声を変調する手段および／または音声に特殊効果を与える手段、を含み、前記合成処理手段は、前記処理手段によって処理された後の録音データと楽曲データとを合成することを特徴とする。 A twelfth aspect of the present invention is the electronic mail transmitting terminal according to any one of the first to eleventh aspects, further comprising processing means for processing the voice input by the voice input means, wherein the processing means is Means for modulating the sound and / or means for giving a special effect to the sound, wherein the synthesis processing means synthesizes the recording data and the music data after being processed by the processing means.

請求項１３記載の発明は、請求項１２に記載の電子メール送信端末において、前記処理手段は、前記音声入力手段で入力された音声のテンポ変更処理および／またはピッチシフト処理を実行する手段、を含むことを特徴とする。 According to a thirteenth aspect of the present invention, in the electronic mail transmitting terminal according to the twelfth aspect, the processing means includes means for executing a tempo change process and / or a pitch shift process of the voice input by the voice input means. It is characterized by including.

請求項１４記載の発明は、請求項１２に記載の電子メール送信端末において、前記処理手段は、前記音声入力手段で入力された音声に、イコライザ処理、ハーモナイズ処理およびエコー処理のうち、いずれか１つあるいは複数の処理を実行する手段、を含むことを特徴とする。 According to a fourteenth aspect of the present invention, in the electronic mail transmitting terminal according to the twelfth aspect, the processing means applies any one of equalizer processing, harmonization processing, and echo processing to the voice input by the voice input means. Means for executing one or more processes.

請求項１５記載の発明は、請求項１２ないし請求項１４のいずれかに記載の電子メール送信端末において、複数のテーマに対応した複数の設定情報が予め用意されており、各設定情報には、前記処理手段が実行する処理の内容が規定されており、一の設定情報が選択されることにより、前記処理手段による処理の内容が決定されることを特徴とする。 A fifteenth aspect of the present invention is the electronic mail transmitting terminal according to any one of the twelfth to fourteenth aspects, wherein a plurality of setting information corresponding to a plurality of themes is prepared in advance, The content of the processing executed by the processing means is defined, and the content of the processing by the processing means is determined by selecting one setting information.

請求項１６記載の発明は、請求項１ないし請求項１５のいずれかに記載の電子メール送信端末において、さらに、前記楽曲データにより規定される音楽を変調する手段および／または前記楽曲データにより規定される音楽に特殊効果を与える手段、を含むことを特徴とする。 According to a sixteenth aspect of the present invention, in the electronic mail transmitting terminal according to any one of the first to fifteenth aspects, the music data defined by the music data and / or the music data is further defined. Means for giving special effects to music.

請求項１７記載の発明は、電子メールを転送するシステムであって、端末と、合成サーバと、を備え、前記端末は、音声入力手段と、前記音声入力手段により入力した音声を録音し、録音データとして記憶する手段と、前記録音データを前記合成サーバに送信する手段と、を備え、前記合成サーバは、受信した録音データと楽曲データとを合成し、合成音声データを生成する合成処理手段と、前記合成音声データを電子メールとして記憶装置に格納する手段と、を備えることを特徴とする。 The invention according to claim 17 is a system for transferring electronic mail, comprising a terminal and a synthesis server, wherein the terminal records voice input means and voice input by the voice input means, Means for storing as data, and means for transmitting the recording data to the synthesis server, wherein the synthesis server synthesizes the received recording data and music data and generates synthesized voice data. And means for storing the synthesized voice data as an e-mail in a storage device.

請求項１８記載の発明は、電子メールを転送するシステムであって、端末と、合成サーバと、を備え、前記端末は、音声入力手段と、前記音声入力手段により入力した音声を前記合成サーバに送信する手段と、を備え、前記合成サーバは、受信した音声を録音データとして録音する手段と、前記録音データと楽曲データとを合成し、合成音声データを生成する合成処理手段と、前記合成音声データを電子メールとして記憶装置に格納する手段と、を備えることを特徴とする。 The invention according to claim 18 is a system for transferring an electronic mail, comprising a terminal and a synthesis server, wherein the terminal inputs voice input means and voice input by the voice input means to the synthesis server. Means for transmitting, wherein the synthesis server records the received voice as recorded data, synthesis processing means for synthesizing the recorded data and music data, and generating synthesized voice data, and the synthesized voice Means for storing data in a storage device as electronic mail.

請求項１９記載の発明は、請求項１７または請求項１８に記載の電子メールシステムにおいて、前記楽曲データは、効果音データ、を含み、前記合成処理手段は、前記合成音声データの再生時間軸上の指定されたポイントに効果音データを合成する手段、を含むことを特徴とする。 According to a nineteenth aspect of the present invention, in the electronic mail system according to the seventeenth or eighteenth aspect, the music data includes sound effect data, and the synthesis processing means is on a reproduction time axis of the synthesized voice data. Means for synthesizing sound effect data at designated points.

請求項２０記載の発明は、請求項１７ないし請求項１９のいずれかに記載の電子メールシステムにおいて、前記楽曲データは、予め再生時間軸上に１つあるいは複数のタイミングマーカが設定されたマーカ付き楽曲データ、を含み、前記合成処理手段は、前記録音データのうち所定音量以下の部分を無音部分とし、前記録音データを無音部分を区切りとした複数の録音データ要素に分解する手段と、前記複数の録音データ要素のそれぞれを前記マーカ付き楽曲データの前記タイミングマーカで設定された時間に同期させて合成させる手段と、を含むことを特徴とする。 According to a twentieth aspect of the present invention, in the electronic mail system according to any one of the seventeenth to nineteenth aspects, the music data has a marker in which one or more timing markers are set in advance on a reproduction time axis. The synthesizing processing means includes a means for decomposing the recording data into a plurality of recording data elements having a portion below a predetermined volume as a silence portion and separating the recording data into a silence portion as a delimiter; And means for synthesizing each of the recording data elements in synchronism with the time set by the timing marker of the music data with the marker.

請求項２１記載の発明は、請求項２０に記載の電子メールシステムにおいて、前記マーカ付き楽曲データは、各小節の先頭にタイミングマーカが設定された音楽データ、を含むことを特徴とする。 The invention according to claim 21 is the electronic mail system according to claim 20, wherein the music data with a marker includes music data in which a timing marker is set at the head of each measure.

請求項２２記載の発明は、請求項１７ないし請求項２１のいずれかに記載の電子メールシステムにおいて、さらに、ユーザによる指定操作に応答して前記合成音声データの時間軸上の任意のポイントに映像データを合成し、映像データ付き合成音声データを生成する手段、を備えることを特徴とする。 According to a twenty-second aspect of the present invention, in the electronic mail system according to any one of the seventeenth to twenty-first aspects, an image is displayed at an arbitrary point on the time axis of the synthesized voice data in response to a designation operation by a user. Means for synthesizing data and generating synthesized audio data with video data.

請求項２３記載の発明は、請求項１７ないし請求項２１のいずれかに記載の電子メールシステムにおいて、さらに、前記合成音声データを音声圧縮変換する手段、を備えることを特徴とする。 A twenty-third aspect of the present invention is the electronic mail system according to any one of the seventeenth to twenty-first aspects, further comprising means for compressing and converting the synthesized voice data.

請求項２４記載の発明は、請求項２２に記載の電子メールシステムにおいて、さらに、
前記映像データ付き合成音声データを動画圧縮変換する手段、を備えることを特徴とする。 The invention according to claim 24 is the electronic mail system according to claim 22, further comprising:
Means for compressing and converting the synthesized audio data with video data into a moving image.

請求項２５記載の発明は、請求項１７ないし請求項２１のいずれかに記載の電子メールシステムにおいて、前記端末は携帯電話端末であり、前記合成音声データは、携帯電話端末において規定されている標準音楽データ形式に変換されることを特徴とする。 According to a twenty-fifth aspect of the present invention, in the electronic mail system according to any one of the seventeenth to twenty-first aspects, the terminal is a mobile phone terminal, and the synthesized voice data is a standard defined in the mobile phone terminal. It is converted into a music data format.

請求項２６記載の発明は、請求項１７ないし請求項２５のいずれかに記載の電子メールシステムにおいて、前記合成サーバは、さらに、前記端末から受信した音声を処理する処理手段、を備え、前記処理手段は、音声を変調する手段および／または音声に特殊効果を与える手段、を含み、前記合成処理手段は、前記処理手段によって処理された後の録音データと楽曲データとを合成することを特徴とする。 According to a twenty-sixth aspect of the present invention, in the electronic mail system according to any one of the seventeenth to twenty-fifth aspects, the synthesizing server further includes processing means for processing a voice received from the terminal. The means includes means for modulating sound and / or means for giving a special effect to the sound, and the composition processing means synthesizes the recording data and the music data after being processed by the processing means. To do.

請求項２７記載の発明は、請求項２６に記載の電子メールシステムにおいて、前記処理手段は、前記音声入力手段で入力された音声のテンポ変更処理および／またはピッチシフト処理を実行する手段、を含むことを特徴とする。 According to a twenty-seventh aspect of the present invention, in the electronic mail system according to the twenty-sixth aspect, the processing means includes means for executing a tempo change process and / or a pitch shift process of the voice input by the voice input means. It is characterized by that.

請求項２８記載の発明は、請求項２６に記載の電子メールシステムにおいて、前記処理手段は、前記音声入力手段で入力された音声に、イコライザ処理、ハーモナイズ処理およびエコー処理のうち、いずれか１つあるいは複数の処理を実行する手段、を含むことを特徴とする。 According to a twenty-eighth aspect of the present invention, in the electronic mail system according to the twenty-sixth aspect, the processing means applies any one of equalizer processing, harmonization processing, and echo processing to the voice input by the voice input means. Alternatively, it includes means for executing a plurality of processes.

請求項２９記載の発明は、請求項２６ないし請求項２８のいずれかに記載の電子メールシステムにおいて、複数のテーマに対応した複数の設定情報が予め用意されており、各設定情報には、前記処理手段が実行する処理の内容が規定されており、一の設定情報が選択されることにより、前記処理手段による処理の内容が決定されることを特徴とする。 According to a twenty-ninth aspect of the present invention, in the electronic mail system according to any one of the twenty-sixth to twenty-eighth aspects, a plurality of setting information corresponding to a plurality of themes is prepared in advance, The content of the processing executed by the processing means is defined, and the content of the processing by the processing means is determined by selecting one setting information.

請求項３０記載の発明は、請求項１７ないし請求項２９のいずれかに記載の電子メールシステムにおいて、さらに、前記楽曲データにより規定される音楽を変調する手段および／または前記楽曲データにより規定される音楽に特殊効果を与える手段、を含むことを特徴とする。 A thirty-third aspect of the invention is the electronic mail system according to any one of the seventeenth to thirty-ninth aspects, further comprising means for modulating music defined by the music data and / or the music data. Means for giving special effects to music.

本発明は、携帯電話などの端末において、録音音声や音楽ファイルを再生するだけでなく、音楽に自分の声を重畳させて合成する。これにより、オリジナリティあふれる合成音声メールを作成することが可能である。 The present invention not only reproduces recorded voices and music files on a terminal such as a mobile phone, but also synthesizes them by superimposing their own voice on music. Thereby, it is possible to create a synthesized voice mail full of originality.

また、記録媒体に格納されている楽曲データは暗号化されており、復号化するには所定のプログラムで読み出す必要があるため、コンテンツの無断流用を防止することが可能である。 In addition, since the music data stored in the recording medium is encrypted and needs to be read by a predetermined program for decryption, it is possible to prevent unauthorized use of content.

また、録音データと楽曲データを合成する際、楽曲データにタイミングマーカ情報を付加することで、ＢＧＭのリズムに合わせて録音データを違和感無く合成することが可能である。また、録音中もしくは録音後にノイズゲート処理を行うことで、屋外での環境騒音が除かれ、音楽的にも聞き易いコンテンツを作成することが可能である。 Further, when synthesizing the recording data and the music data, it is possible to synthesize the recording data without a sense of incongruity by adding timing marker information to the music data. In addition, by performing noise gate processing during or after recording, it is possible to create a content that is easy to hear in terms of music by eliminating outdoor environmental noise.

また、合成音声データに映像データを挿入することで、よりオリジナリティあふれるマルチメディアメールの生成が可能である。 In addition, by inserting video data into the synthesized audio data, it is possible to generate multimedia mail with more originality.

また、合成音声データをＡＡＣ／ＭＰ３のような音声圧縮ファイルもしくはＭＰＥＧ４のような動画圧縮ファイルに変換する事で、低容量でかつ、ハイクオリティなコンテンツを生成することが可能である。 Further, by converting the synthesized audio data into an audio compression file such as AAC / MP3 or a moving image compression file such as MPEG4, it is possible to generate a low-capacity and high-quality content.

また、合成音声データを携帯電話端末で標準化されている汎用音楽データ形式に変換することで、再生環境を問わず、携帯電話のキャリア間を越えた汎用性のあるメールシステムとして提供可能である。 In addition, by converting the synthesized voice data into a general-purpose music data format standardized by a mobile phone terminal, it can be provided as a versatile mail system that extends across mobile phone carriers regardless of the playback environment.

また、音声データに様々な処理を施した後に、楽曲データと合成するので、より表現力豊かな音声メールを作成可能である。 In addition, since voice data is subjected to various processes and then synthesized with music data, voice mail with more expressiveness can be created.

｛第１の実施の形態｝
＜携帯電話装置の構成＞
以下、図面を参照しつつ本発明の実施の形態について説明する。図１は、この発明の第１の実施の形態に係わる携帯電話端末１００の構成を示すブロック図である。携帯電話端末１００は、音声を入力するマイク装置１０１、マイクインタフェース（ＭｉｃＩ／Ｆ）１０２、カードメディアであるＲＯＭカード１０３、カードメディアとアクセスするためのカードコントロールインタフェース（ＣａｒｄＣｏｎｔＩ／Ｆ）１０４、各種データの一時記憶領域として、あるいは各種アプリケーションプログラムの格納領域として利用されるメモリ１０６、ＣＰＵ１１２とメモリ１０６との信号を制御するＭＭＵ（ＭｅｍｏｒｙＭａｎａｇｅｍｅｎｔＵｎｉｔ）１０５、携帯電話装置１００に対する各種ユーザ操作を入力する操作部１０７、音声信号のエンコードおよびデコード処理を行うオーディオ処理部１０８、映像信号のエンコードおよびデコード処理を行う映像処理部１０９、携帯電話端末１００が音声通話を行う場合および携帯電話端末１００がデータ通信を行う場合に、基地局との間でアンテナ１１１を介して通信処理を実行する通信部１１０、携帯電話端末１００の制御を行うＣＰＵ１１２を備えている。また、携帯電話端末１００は、モニタ１１３およびスピーカ１１４を備えている。 {First embodiment}
<Configuration of mobile phone device>
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a mobile phone terminal 100 according to the first embodiment of the present invention. The mobile phone terminal 100 includes a microphone device 101 for inputting voice, a microphone interface (Mic I / F) 102, a ROM card 103 as a card medium, and a card control interface (Card Cont I / F) 104 for accessing the card medium. A memory 106 used as a temporary storage area for various data or a storage area for various application programs, a memory management unit (MMU) 105 that controls signals between the CPU 112 and the memory 106, and various user operations on the mobile phone device 100. An input operation unit 107, an audio processing unit 108 for encoding and decoding audio signals, a video processing unit 109 for encoding and decoding video signals, and the mobile phone terminal 100 When performing a voice call and when the mobile phone terminal 100 performs data communication, a communication unit 110 that executes communication processing with the base station via the antenna 111 and a CPU 112 that controls the mobile phone terminal 100 are provided. Yes. In addition, the mobile phone terminal 100 includes a monitor 113 and a speaker 114.

なお、ＲＯＭカード１０３は、たとえば、コンパクトフラッシュ（登録商標）カードやスマートメディア（登録商標）、ＳＤメモリカード（登録商標）などを利用可能である。また、メモリ１０６としては、ＳＤＲＡＭなどを利用可能である。また、オーディオ処理部１０８は、ＭＰ３やＡＡＣなどの規格に基づいて音声信号をエンコードおよびデコード処理する機能を備え、映像処理部１０９は、ＭＰＥＧ４などの規格に基づいて映像信号をエンコードおよびデコードする機能を備えている。 As the ROM card 103, for example, a compact flash (registered trademark) card, smart media (registered trademark), SD memory card (registered trademark), or the like can be used. As the memory 106, an SDRAM or the like can be used. The audio processing unit 108 has a function of encoding and decoding audio signals based on standards such as MP3 and AAC, and the video processing unit 109 has a function of encoding and decoding video signals based on standards such as MPEG4. It has.

ＲＯＭカード１０３には、ＢＧＭデータＢＤおよび効果音データＥＤが格納されている。ＢＧＭデータＢＤは、本実施の形態において送信される音声電子メール（ボイスメール）あるいはマルチメディアメールにおいて、マイクから入力された音声に合成させるバックグラウンドミュージックのデータである。ＢＧＭデータＢＤとしては、音声の内容を聞き取り易くするという意味においては、ボーカル音声のないデータが好ましいが、特にそのようなデータに限定されることはなく、ボーカル付きミュージックを含めて様々な音楽データを利用可能である。効果音データＥＤは、マイクから入力された音声とＢＧＭデータＢＤとが合成された合成音声データに、さらに、比較的短い時間の効果音を付加するためのデータである。たとえば、手拍子や掛け声、シンバルなどの音データなどが含まれる。 The ROM card 103 stores BGM data BD and sound effect data ED. The BGM data BD is background music data to be synthesized with the voice input from the microphone in voice electronic mail (voice mail) or multimedia mail transmitted in the present embodiment. The BGM data BD is preferably data without vocal voice in terms of making it easy to hear the content of the voice, but is not limited to such data, and various music data including music with vocals. Is available. The sound effect data ED is data for adding a sound effect of a relatively short time to the synthesized sound data obtained by synthesizing the sound input from the microphone and the BGM data BD. For example, sound data such as clapping, shout, and cymbal are included.

ＢＧＭデータＢＤおよび効果音データＥＤは、例えばＭＰ３やＡＡＣ規格により圧縮されたファイルフォーマットでＲＯＭカード１０３に格納されている。あるいは携帯電話端末において標準となっている着信メロディフォーマットなどのファイル形式でＲＯＭカード１０３に格納されている。 The BGM data BD and the sound effect data ED are stored in the ROM card 103 in a file format compressed by, for example, MP3 or AAC standards. Alternatively, it is stored in the ROM card 103 in a file format such as a ringing melody format that is standard in mobile phone terminals.

また、ＲＯＭカード１０３に格納されているＢＧＭデータＢＤおよび効果音データＥＤは暗号化されており、メモリ１０６に格納されているオーサリングプログラムＡＰによって復号化される。オーサリングプログラムＡＰは、ＢＧＭデータＢＤおよび効果音データＥＤの復号化に必要な復号キー情報をもっており、これらデータを復号することにより、ＢＧＭや効果音を再生可能としている。 The BGM data BD and sound effect data ED stored in the ROM card 103 are encrypted and decrypted by the authoring program AP stored in the memory 106. The authoring program AP has decryption key information necessary for decrypting the BGM data BD and the sound effect data ED, and BGM and sound effects can be reproduced by decrypting these data.

メモリ１０６に格納されているオーサリングプログラムＡＰは、音声合成処理を含め、本実施の形態における音声電子メールあるいはマルチメディアメールを作成するための様々な機能を実行可能としている。具体的には、オーサリングプログラムＡＰは、マイク装置１０１から入力された音声の録音機能、ＲＯＭカード１０３に格納されているＢＧＭデータＢＤおよび効果音データＥＤの読み出し処理および復号化処理機能、マイク入力音声の録音中あるいは録音後におけるノイズゲート処理機能、録音した音声データとＢＧＭデータＢＤとの合成処理機能、合成音声データと効果音データＥＤとの合成処理機能、合成音声データと映像データとの合成処理機能、合成音声データおよび映像データ付き合成音声データの保存機能、合成音声データおよび映像データ付き合成音声データの再生機能、合成音声データおよび映像データ付き合成音声データを電子メールとして他端末に送信する機能、などを実行可能としている。なお、オーサリングプログラムＡＰが実行する上記各機能は、オーサリングプログラムＡＰが、ＣＰＵ１１２、ＲＡＭ（図示省略）などのハードウェア資源を利用して実行することにより、実現される機能である。 The authoring program AP stored in the memory 106 can execute various functions for creating a voice electronic mail or multimedia mail in the present embodiment, including a voice synthesis process. Specifically, the authoring program AP has a function for recording voice input from the microphone device 101, a process for reading and decoding BGM data BD and sound effect data ED stored in the ROM card 103, and a microphone input voice. Noise gate processing function during or after recording, synthesized processing function of recorded voice data and BGM data BD, synthesized processing function of synthesized voice data and sound effect data ED, synthesized process of synthesized voice data and video data Function, saving function of synthesized voice data with synthesized voice data and video data, playback function of synthesized voice data with synthesized voice data and video data, function to send synthesized voice data and synthesized voice data with video data to other terminals as e-mail , Etc. are executable. The above-described functions executed by the authoring program AP are functions that are realized by the authoring program AP using hardware resources such as the CPU 112 and RAM (not shown).

＜ノイズゲート機能およびＢＧＭデータ合成機能＞
次に、ノイズゲート機能およびＢＧＭデータ合成機能について図２を参照しながら説明する。ノイズゲート機能およびＢＧＭデータ合成機能は、上述したように、オーサリングプログラムＡＰによる処理機能である。ノイズゲート機能には、音声録音中におけるノイズゲート機能と音声録音後に実行するノイズゲート機能がある。音声録音中におけるノイズゲート機能は、マイク装置１０１で入力した音声の音量が所定レベルを超えると録音を開始し、音量が所定レベルより小さくなると録音を停止することにより録音データを生成する機能である。音声録音後に実行するノイズゲート機能は、既に録音が行われ生成された録音データ中で音量が所定のレベルより小さい箇所のデータを破棄し、音量が所定のレベルを超える箇所のみ録音データとして保持する機能である。 <Noise gate function and BGM data synthesis function>
Next, the noise gate function and the BGM data synthesis function will be described with reference to FIG. As described above, the noise gate function and the BGM data synthesis function are processing functions by the authoring program AP. The noise gate function includes a noise gate function during voice recording and a noise gate function executed after voice recording. The noise gate function during voice recording is a function for generating recording data by starting recording when the volume of voice input by the microphone device 101 exceeds a predetermined level and stopping recording when the volume is lower than the predetermined level. . The noise gate function that is executed after voice recording discards the data where the volume is lower than the predetermined level in the recording data that has already been recorded and generated, and holds only the location where the volume exceeds the predetermined level as the recording data It is a function.

たとえば、
「（無音部分）（こんにちは）（無音部分）（今日はいい天気です）（無音部分）（どこか遊びに行こうよ！）」
としゃべった場合は、
「こんにちは」、
「今日はいい天気です」、
「どこか遊びに行こうよ！」、
の３つのデータが記録されることになる。この３つのそれぞれのデータを録音データ要素と呼ぶことにする。図２の録音データとして示したＶ１，Ｖ２，Ｖ３が上記の３つの録音データ要素に該当する。 For example,
"(Silence) (Hello) (silence) (is nice weather today) (Let's go somewhere to play!) (Silence)"
If you talk to
"Hello",
"It is a good weather today",
"Let's go play somewhere!"
These three data are recorded. These three pieces of data will be referred to as recording data elements. V1, V2, and V3 shown as the recording data in FIG. 2 correspond to the above three recording data elements.

次に、ＢＧＭデータＢＤは、図に示すように複数の小節に区分されるが、本実施の形態におけるＢＧＭデータＢＤには、この各小節の先頭位置（これは時間軸上のポイントである。）を指定したタイミングマーカが記録されている。このように小節の先頭位置を指定したタイミングマーカが設定されたＢＧＭデータＢＤを特にマーカ付き楽曲データと呼ぶことにする。この実施の形態におけるマーカ付き楽曲データには、各小節の先頭位置にマーカが設定されているが、これは一例である。小節区切りとは異なるポイントにマーカが設定されていてもよい。 Next, the BGM data BD is divided into a plurality of bars as shown in the figure. In the BGM data BD in the present embodiment, the head position of each bar (this is a point on the time axis). ) Is recorded. The BGM data BD in which the timing marker designating the start position of the measure is set is specifically referred to as marker-added music data. In the music data with marker in this embodiment, a marker is set at the head position of each measure, but this is an example. Markers may be set at different points from the bar break.

そして、オーサリングプログラムＡＰは、上記のように分解されたそれぞれの録音データ要素を、マーカ付き楽曲データの各タイミングマーカ位置にリンクさせる。つまり、各録音データ要素がマーカ付き楽曲データの各小節の先頭に合成されるようにするのである。このようにして録音データとＢＧＭデータＢＤとが合成されることにより合成音声データが生成される。このようにして生成された合成音声データを再生すると、ＢＧＭデータＢＤに基づいて再生されるバックグラウンドミュージックの各小節頭に、実際の録音タイミングとは関係なく、１小節目の頭から「こんにちは」と発声し、２小節目の頭から「今日はいい天気です」と発声し、３小節目の頭から「どこか遊びに行こうよ！」と発声する。このような手順でバックグラウンドミュージックと録音データとの同期が行われるのである。 Then, the authoring program AP links each recording data element disassembled as described above to each timing marker position of the music data with marker. That is, each recording data element is synthesized at the head of each measure of the music data with marker. In this way, the synthesized voice data is generated by synthesizing the recording data and the BGM data BD. In this manner, when the play the synthesized speech data that is generated, for each measure head of the background music to be played on the basis of the BGM data BD, regardless of the actual recording timing, "Hello" from the first measure of the head And say “Today is a good weather” from the head of the second bar, and say “Let's go play somewhere!” From the head of the third bar. In this way, the background music and the recorded data are synchronized.

ここで、図に示すように、録音データ要素Ｖ３のフレーズがＢＧＭデータＢＤの４小節目の領域まで掛かっている。その場合、ＢＧＭデータＢＤの３小節目をループ再生してバックグラウンドミュージックと録音データとが違和感無く合成する方法をとるようにしてもよい。 Here, as shown in the figure, the phrase of the recording data element V3 extends to the fourth bar area of the BGM data BD. In that case, the third measure of the BGM data BD may be loop-reproduced to synthesize the background music and the recorded data without a sense of incongruity.

このように、本実施の形態においては、ノイズゲート処理とマーカ付き楽曲データを用いることで、録音データとＢＧＭデータＢＤの合成処理を自動化させることが可能である。したがって、操作ボタンの数に制限があり、また、モニタサイズが小さい携帯電話端末においても、煩雑な操作を必要とすることなく、音声合成処理を実行することが可能である。 As described above, in the present embodiment, by using the noise gate processing and the music data with the marker, it is possible to automate the synthesis processing of the recording data and the BGM data BD. Therefore, the number of operation buttons is limited, and even a mobile phone terminal with a small monitor size can execute speech synthesis processing without requiring complicated operations.

ただし、ＢＧＭデータＢＤを再生しながら、そのリズムに合わせてユーザが発声し、合成録音を行う方法をとってもよい。その場合は録音しながらキー操作をすることで、任意の時間軸上で録音音声とＢＧＭとの合成を行うことが可能である。 However, a method may be used in which the user utters in accordance with the rhythm and performs synthetic recording while reproducing the BGM data BD. In that case, it is possible to synthesize the recorded voice and BGM on an arbitrary time axis by operating the keys while recording.

＜効果音データ合成機能＞
効果音データ合成機能も、オーサリングプログラムＡＰによる処理機能である。図２に示すように、効果音データＥＤは、合成音声データの再生時間軸上の任意のポイントで合成させることが可能である。合成方法は、次の通りである。オーサリングプログラムＡＰは、ＢＧＭデータＢＤの再生を行いつつ（あるいは、合成音声データの再生を行いつつ）、ユーザによる効果音挿入指示を入力するのである。ユーザは、ＢＧＭを聞きながら、効果音を挿入したいポイントで、操作部１０７を操作して、効果音挿入指示を行うのである。このようにして、ＢＧＭデータＢＤ、録音データ、効果音データＥＤが合成された合成音声データを生成することが可能である。 <Sound effect data synthesis function>
The sound effect data synthesis function is also a processing function by the authoring program AP. As shown in FIG. 2, the sound effect data ED can be synthesized at an arbitrary point on the reproduction time axis of the synthesized voice data. The synthesis method is as follows. The authoring program AP inputs a sound effect insertion instruction by the user while reproducing the BGM data BD (or reproducing the synthesized voice data). While listening to the BGM, the user operates the operation unit 107 at a point where a sound effect is to be inserted, and issues a sound effect insertion instruction. In this way, it is possible to generate synthesized voice data in which BGM data BD, recording data, and sound effect data ED are synthesized.

＜映像データの合成機能＞
また、オーサリングプログラムＡＰは、生成された合成音声データに静止画像あるいは動画像などの映像データを合成し、映像データ付き合成音声データを生成することが可能である。合成方法は、次の通りである。第１の方法は、図３に示すように、各映像データをＢＧＭデータＢＤの各小節にリンクさせる方法である。図に示したように、複数の小節にわたって１つの映像データをリンクさせてもよい。また、第２の方法としては、オーサリングプログラムＡＰを実行し、合成音声データを再生しながら、映像データの切り替わりポイントをユーザに指定させる方法である。 <Video data composition function>
Further, the authoring program AP can synthesize video data such as a still image or a moving image with the generated synthesized audio data to generate synthesized audio data with video data. The synthesis method is as follows. The first method is a method of linking each video data to each bar of the BGM data BD as shown in FIG. As shown in the figure, one video data may be linked over a plurality of bars. As a second method, the authoring program AP is executed to reproduce the synthesized audio data and allow the user to specify a video data switching point.

以上説明した方法により、ＢＧＭデータＢＤ、録音データ、効果音データＥＤが合成された合成音声データが生成され、さらに、映像データが合成されることにより、映像データ付き合成音声データが生成される。 By the method described above, synthesized audio data in which the BGM data BD, the recording data, and the sound effect data ED are synthesized is generated, and further, the synthesized audio data with video data is generated by synthesizing the video data.

＜データ形式＞
オーサリングプログラムＡＰは、上記の処理によって生成された合成音声データを、オーディオ処理部１０８を利用してＭＰ３あるいはＡＡＣなどの規格に基づいて圧縮処理する。また、上記の処理によって生成された映像データ付き合成音声データを、オーディオ処理部１０８および映像処理部１０９を利用して、ＭＰＥＧ４などの規格に基づいて動画圧縮処理する。このようにして、本実施の形態における合成音声データは、ＭＰ３あるいはＡＡＣなどの規格に基づいて圧縮された音声電子メールとしてメモリ１０６に格納される。あるいは、映像データ付き合成音声データはＭＰＥＧ４などの規格に基づいて（音声はＭＰ３やＡＡＣなどの規格に基づいて）圧縮されたマルチメディアメールとしてメモリ１０６に格納される。 <Data format>
The authoring program AP uses the audio processing unit 108 to compress the synthesized voice data generated by the above processing based on a standard such as MP3 or AAC. Further, the synthesized audio data with video data generated by the above processing is subjected to moving image compression processing based on a standard such as MPEG4 using the audio processing unit 108 and the video processing unit 109. In this manner, the synthesized voice data in the present embodiment is stored in the memory 106 as voice electronic mail compressed based on a standard such as MP3 or AAC. Alternatively, the synthesized audio data with video data is stored in the memory 106 as multimedia mail compressed based on a standard such as MPEG4 (based on a standard such as MP3 or AAC).

また、オーサリングプログラムＡＰは、合成音声データを携帯電話端末で標準的に利用されている汎用音楽データ形式（たとえばｍｍｆ）に変換することも可能である。この場合には、オーサリングプログラムＡＰは、オーディオ処理部１０８を利用せず、ＣＰＵ１１２上の処理で合成音声データを汎用音楽データ形式に変換する。このようにして生成されたデータは音声電子メールとしてメモリ１０６に格納される。 Further, the authoring program AP can also convert the synthesized voice data into a general-purpose music data format (for example, mmf) that is normally used in a mobile phone terminal. In this case, the authoring program AP does not use the audio processing unit 108, and converts the synthesized voice data into a general-purpose music data format by processing on the CPU 112. The data generated in this way is stored in the memory 106 as voice electronic mail.

昨今の携帯電話端末で利用される汎用音楽データの多くは、図４に示すようにＢＧＭ音楽や効果音を再生するＦＭ音源チャンネルと、録音音声を再生するＰＣＭチャンネルとから構成される。そのためＢＧＭ音楽及び効果音についてはＦＭ音源チャンネルを利用し、録音した音声データはＰＣＭチャンネルを利用して１つの合成音声データを生成するようにすればよい。 As shown in FIG. 4, most of general-purpose music data used in recent mobile phone terminals is composed of an FM sound source channel for reproducing BGM music and sound effects and a PCM channel for reproducing recorded sound. Therefore, the FM sound source channel may be used for BGM music and sound effects, and the synthesized voice data may be generated using the PCM channel for the recorded voice data.

以上の処理により、音声電子メールあるいはマルチメディアメールが生成されると、ＣＰＵ１１２は、電子メールの宛先アドレスなどを指定した上で、音声電子メールあるいはマルチメディアメールを宛先アドレスに向けて送信するのである。そして、音声電子メールを受信した端末では、ＢＧＭにあわせて合成された音声を視聴することが可能である。この音声は、人工音声などではなく、実際に送信者が録音したデータであるので、感情表現を正確に伝達することが可能である。また、このような録音音声がＢＧＭにあわせて再生させるので、通常の音声通話と異なり、伝達内容を様々な態様で演出することが可能である。さらには、任意のポイントで挿入された効果音により、さらに、音声メールの表現力を増強させることが可能である。また、マルチメディアメールにおいては、これら表現豊かな音声メールに加えて、映像が再生されるので、さらに表現力豊かなコミュニケーションを図ることが可能である。 When voice e-mail or multimedia mail is generated by the above processing, the CPU 112 designates the destination address of the e-mail and transmits the voice e-mail or multimedia mail to the destination address. . A terminal that has received the voice electronic mail can view the voice synthesized in accordance with the BGM. Since this voice is not artificial voice but data actually recorded by the sender, it is possible to accurately convey the emotional expression. Further, since such recorded voice is reproduced in accordance with the BGM, it is possible to produce the transmitted content in various modes, unlike a normal voice call. Furthermore, it is possible to further enhance the expressiveness of voice mail by using sound effects inserted at arbitrary points. In addition, in multimedia mail, video is reproduced in addition to these expressive voice mails, so communication with richer expressiveness can be achieved.

＜シナリオデータ＞
図２および図３に示すように、合成音声データあるいは映像データ付き合成音声データは、複数のデータがレイヤ構成されたものである。つまり、合成音声データあるいは映像データ付き合成音声データは、ＢＧＭデータＢＤに合成される各データファイル名（データ識別情報）とＢＧＭデータＢＤに対する同期情報（時間情報など）で定義することが可能である。 <Scenario data>
As shown in FIGS. 2 and 3, the synthesized audio data or the synthesized audio data with video data is composed of a plurality of data layers. That is, the synthesized audio data or the synthesized audio data with video data can be defined by the name of each data file (data identification information) synthesized with the BGM data BD and the synchronization information (time information etc.) for the BGM data BD. .

そこで、本実施の形態の変形例として、これらデータ識別情報および同期情報をシナリオデータとし、録音データとシナリオデータのみを宛先アドレスに送信する。言い換えると、シナリオデータとは、ＢＧＭデータＢＤ、効果音データＥＤ、映像データを特定する情報と、これら各データと録音データとが合成されるタイミングを示す情報とを含むものである。つまり、この方法をとる場合、データ識別情報で指定された各データファイルを受信側の端末が保持しているか、あるいは、受信側の端末がネットワーク上のサーバ等からこれら各データファイルを受信する必要がある。しかし、送信端末は、各データファイルを送信する必要がなく、データ転送量を削減することが可能である。 Therefore, as a modification of the present embodiment, the data identification information and the synchronization information are used as scenario data, and only the recording data and scenario data are transmitted to the destination address. In other words, the scenario data includes information specifying the BGM data BD, sound effect data ED, and video data, and information indicating the timing at which these data and recording data are combined. In other words, when this method is adopted, each receiving side data file specified by the data identification information is held by the receiving side terminal, or the receiving side terminal needs to receive each data file from a server on the network. There is. However, the transmitting terminal does not need to transmit each data file, and the data transfer amount can be reduced.

受信側においてシナリオデータを利用して音声電子メールあるいはマルチメディアメールを再生する際には、所定のアプリケーションプログラムを実行して、シナリオデータに記述されているデータ識別情報に従って、ＢＧＭデータ、効果音データ、映像データなどを記録媒体から読み出し、記述された同期情報に従って再生するのである。 When playing back voice e-mail or multimedia mail using scenario data on the receiving side, a predetermined application program is executed, and BGM data and sound effect data are executed according to the data identification information described in the scenario data. The video data is read from the recording medium and reproduced according to the described synchronization information.

＜合成処理の流れ＞
以上説明した合成処理について図５ないし図９のフローチャートを参照しながら説明する。図５に示すように、まず、ノイズゲート処理（ステップＳ１０）が行われ、録音データが録音データ要素に分解される。 <Flow of composition processing>
The synthesis process described above will be described with reference to the flowcharts of FIGS. As shown in FIG. 5, first, noise gate processing (step S10) is performed, and the recorded data is decomposed into recorded data elements.

図６は、ノイズゲート処理（ステップＳ１０）の処理内容を示すフローチャートである。ステップＳ１１でノイズゲートの処理タイミングが選択される。録音後ノイズゲート処理を行う設定となっている場合には、マイク録音処理（ステップＳ１２）が実行され、次に、録音データに対してノイズゲート処理（ステップＳ１３）が実行される。録音中ノイズゲート処理を行う設定となっている場合には、ノイズゲート録音処理（ステップＳ１４）が実行される。以上の処理により、録音データ要素が生成される。なお、録音後ノイズゲート処理を行うか、録音中ゲート処理を行うかの設定は、ユーザが自由に変更できるようにすればよい。 FIG. 6 is a flowchart showing the processing contents of the noise gate processing (step S10). In step S11, the noise gate processing timing is selected. If it is set to perform noise gate processing after recording, microphone recording processing (step S12) is performed, and then noise gate processing (step S13) is performed on the recording data. If it is set to perform noise gate processing during recording, noise gate recording processing (step S14) is executed. The recording data element is generated by the above processing. It should be noted that the setting of whether to perform noise gate processing after recording or to perform gate processing during recording can be freely changed by the user.

図５に戻り、次に、ＢＧＭ処理（ステップＳ２０）が行われる。図７は、ＢＧＭ処理（ステップＳ２０）の処理内容を示すフローチャートである。まず、ステップＳ２１においてＢＧＭデータＢＤの読み込み先が選択される。ＢＧＭデータＢＤの読み込み先がカードメディアである場合には、カードメディア（ＲＯＭカード１０３）の読み込み処理が行われる（ステップＳ２２）。一方、ＢＧＭデータＢＤの読み込み先が公開サーバである場合には、公開サーバへの接続処理が行われる（ステップＳ２５）。次に、複数存在するＢＧＭデータＢＤの中から合成処理の対象となるＢＧＭデータＢＤの選択が行われる（ステップＳ２３，Ｓ２６）。そして、取得されたＢＧＭデータＢＤの復号化処理が行われる（ステップＳ２４，２７）。 Returning to FIG. 5, next, the BGM process (step S20) is performed. FIG. 7 is a flowchart showing the processing contents of the BGM processing (step S20). First, in step S21, a reading destination of the BGM data BD is selected. When the reading destination of the BGM data BD is a card medium, a reading process of the card medium (ROM card 103) is performed (step S22). On the other hand, when the read destination of the BGM data BD is a public server, a connection process to the public server is performed (step S25). Next, the BGM data BD that is the target of the synthesis process is selected from a plurality of BGM data BD (steps S23 and S26). Then, the obtained BGM data BD is decrypted (steps S24 and S27).

図５に戻り、次に、合成処理が行われる（ステップＳ３０）。合成処理の内容は、上述した通りであり、マーカ付き楽曲データのタイミングマーカに同期するように、録音データ要素が合成される。なお、上述したように、この合成処理後のデータは、図２で示したようにレイヤ構成をとっており、ＢＧＭデータＢＤと録音データとが分離可能な状態で合成されている。たとえば、ＢＧＭデータＢＤと録音データとのリンク情報が生成される。あるいは、複数のトラック（チャンネル）を含む合成音声データが生成され、各トラックに各データが格納される。 Returning to FIG. 5, next, a synthesis process is performed (step S30). The contents of the synthesizing process are as described above, and the recording data elements are synthesized so as to synchronize with the timing markers of the music data with markers. As described above, the data after the synthesis processing has a layer configuration as shown in FIG. 2, and the BGM data BD and the recording data are synthesized in a separable state. For example, link information between BGM data BD and recorded data is generated. Alternatively, synthesized voice data including a plurality of tracks (channels) is generated, and each data is stored in each track.

なお、ＢＧＭデータＢＤの各タイミングマーカ位置において録音データ要素を合成するが、このとき、特定の録音データ要素については、複数のタイミングマーカ位置に合成させるようにしてもよい。たとえば、特定の録音データ要素を、連続する複数のタイミングマーカ位置に合成させることにより、特定のメッセージをリピート再生させることが可能であり、録音メッセージの中で特に重要なフレーズを強調させることが可能である。 The recording data element is synthesized at each timing marker position of the BGM data BD. At this time, a specific recording data element may be synthesized at a plurality of timing marker positions. For example, it is possible to repeat a specific message by synthesizing a specific recording data element at a plurality of successive timing marker positions, and to emphasize a particularly important phrase in the recording message. It is.

次に、効果音挿入処理（ステップＳ４０）が行われる。効果音挿入方法は上述した通りであり、ユーザの指定操作に応答して、合成音声データの再生時間軸上の任意のポイントに効果音が合成される。なお、効果音挿入後の合成音声データも、図２で示したようにレイヤ構成をとっており、各データが分離可能な状態で合成されている。効果音データＥＤの取得方法については、図７で示したＢＧＭデータＢＤの取得方法と同様である。 Next, a sound effect insertion process (step S40) is performed. The sound effect insertion method is as described above, and the sound effect is synthesized at an arbitrary point on the reproduction time axis of the synthesized voice data in response to the user's designated operation. The synthesized voice data after the sound effect is inserted also has a layer configuration as shown in FIG. 2, and is synthesized in a state where each data can be separated. The method for obtaining the sound effect data ED is the same as the method for obtaining the BGM data BD shown in FIG.

次に、ステップＳ５０において、映像を付加するか否かの選択が行われる。映像を付加する場合には、マルチメディアメール作成処理（ステップＳ６０）が行われ、映像を付加しない場合には、音声電子メール（ボイルメール）の作成処理（ステップＳ７０）が行われる。映像を付加するか否かの選択は、ユーザにより指定される。 Next, in step S50, it is selected whether or not to add a video. When a video is added, multimedia mail creation processing (step S60) is performed, and when no video is added, voice electronic mail (boil mail) creation processing (step S70) is performed. Selection of whether or not to add video is specified by the user.

図８は、マルチメディアメールの作成処理の内容を示すフローチャートである。まず、ファイル形式の選択が行われる（ステップＳ６１）。ファイル形式としてＭＰＥＧ等による動画圧縮形式が設定されている場合には、指定された映像データを付加した上で、設定されたファイル形式に従って圧縮処理が実行される（ステップＳ６２）。ファイル形式としてシナリオデータが設定されている場合には、映像データ付き合成音声データを定義するシナリオデータを生成する（ステップＳ６３）。このシナリオデータには、映像データを指定する情報も含められる。 FIG. 8 is a flowchart showing the contents of the multimedia mail creation process. First, a file format is selected (step S61). If a moving image compression format such as MPEG is set as the file format, the specified video data is added and the compression process is executed according to the set file format (step S62). If scenario data is set as the file format, scenario data defining the synthesized audio data with video data is generated (step S63). This scenario data includes information for designating video data.

このようにして生成されたマルチメディアメールは、メモリ１０６に格納される（ステップＳ６４）。ここで、メモリ１０６に格納されるマルチメディアメールは、ステップＳ６２処理後のデータは圧縮された映像データ付き合成音声データであり、ステップＳ６３処理後のデータは、シナリオデータと録音データである。そして、メモリ１０６に格納されたマルチメディアメールは、宛先アドレスが指定され、他の端末に送信される（ステップＳ６５）。 The multimedia mail generated in this way is stored in the memory 106 (step S64). Here, in the multimedia mail stored in the memory 106, the data after the processing in step S62 is the synthesized audio data with compressed video data, and the data after the processing in step S63 is scenario data and recording data. The multimedia mail stored in the memory 106 is transmitted to another terminal with the destination address designated (step S65).

図９は、音声電子メール（ボイスメール）の作成処理の内容を示すフローチャートである。まず、ファイル形式の選択が行われる（ステップＳ７１）。ファイル形式としてＭＰ３やＡＡＣ等による音声圧縮形式が設定されている場合には、設定されたファイル形式に従って圧縮処理が実行される（ステップＳ７２）。ファイル形式として携帯電話端末で標準化されている汎用音楽ファイル形式が設定されている場合には、設定されたファイル形式への変換処理が行われる（ステップＳ７３）。ファイル形式としてシナリオデータが設定されている場合には、合成音声データを定義するシナリオデータを生成する（ステップＳ７４）。 FIG. 9 is a flowchart showing the contents of voice electronic mail (voice mail) creation processing. First, a file format is selected (step S71). If an audio compression format such as MP3 or AAC is set as the file format, compression processing is executed according to the set file format (step S72). When the general-purpose music file format standardized by the mobile phone terminal is set as the file format, conversion processing to the set file format is performed (step S73). If scenario data is set as the file format, scenario data defining the synthesized speech data is generated (step S74).

このようにして生成された音声電子メールは、メモリ１０６に格納される（ステップＳ７５）。ここで、メモリ１０６に格納される音声電子メールは、ステップＳ７２，Ｓ７３処理後のデータは圧縮あるいは変換された合成音声データであり、ステップＳ７４処理後のデータは、シナリオデータと録音データである。そして、メモリ１０６に格納された音声電子メールは、宛先アドレスが指定され、他の端末に送信される（ステップＳ７６）。 The voice electronic mail generated in this way is stored in the memory 106 (step S75). Here, the voice e-mail stored in the memory 106 is the synthesized voice data in which the data after the processing in steps S72 and S73 is compressed or converted, and the data after the processing in step S74 is scenario data and recorded data. The voice e-mail stored in the memory 106 is transmitted to another terminal with a destination address designated (step S76).

｛第２の実施の形態｝
次に、第２の実施の形態について説明する。上述した第１の実施の形態においては、音声電子メールあるいはマルチメディアメールを生成するための合成処理を携帯電話端末１００内で実行することとした。つまり、図１０に示すように、携帯電話端末１００において、音声入力、音声録音、合成、メール送信の全ての処理を実行し、合成処理後の電子メールを受信端末２００に送信することとした。 {Second Embodiment}
Next, a second embodiment will be described. In the first embodiment described above, the synthesizing process for generating voice electronic mail or multimedia mail is executed in the mobile phone terminal 100. That is, as shown in FIG. 10, the mobile phone terminal 100 executes all the processes of voice input, voice recording, synthesis, and mail transmission, and transmits the synthesized e-mail to the receiving terminal 200.

第２の実施の形態においては、合成処理をネットワークで接続された合成サーバ３００において実行する。具体的には、図１１に示すように、携帯電話端末１００においては、音声入力処理および音声録音処理のみを実行する。そして、携帯電話端末１００から合成サーバ３００に録音データを送信し、合成サーバ３００において合成処理を実行するのである。ここで、合成処理の条件を示した情報を、携帯電話端末１００から合成サーバ３００に送信するようにすればよい。合成処理の条件を示した情報とは、第１の実施の形態で説明したシナリオデータと同種の情報であればよい。合成サーバ３００は、第１の実施の形態と同様の合成処理を実行することにより、音声電子メールあるいはマルチメディアメールを生成すると、この電子メールを記憶装置に格納する。そして、この記憶装置に格納された電子メールへのアクセスパス情報としてＵＲＬを指定した情報をメール受信端末２００に送信するのである。メール受信端末２００は、このＵＲＬを指定することにより、音声電子メールあるいはマルチメディアメールを受信することが可能である。 In the second embodiment, the synthesis process is executed in the synthesis server 300 connected via a network. Specifically, as shown in FIG. 11, the cellular phone terminal 100 executes only voice input processing and voice recording processing. Then, the recording data is transmitted from the mobile phone terminal 100 to the composition server 300, and the composition processing is executed in the composition server 300. Here, information indicating the conditions of the composition process may be transmitted from the mobile phone terminal 100 to the composition server 300. The information indicating the conditions for the synthesis process may be information of the same type as the scenario data described in the first embodiment. When the synthesizing server 300 generates a voice e-mail or multimedia mail by executing a synthesizing process similar to that of the first embodiment, the e-mail is stored in a storage device. Then, information specifying a URL is transmitted to the mail receiving terminal 200 as access path information to the electronic mail stored in the storage device. The mail receiving terminal 200 can receive voice electronic mail or multimedia mail by specifying this URL.

なお、図１１で示した例では、アクセスパス情報（ＵＲＬ情報）を、合成サーバ３００からメール受信端末２００に送信するようにしたが、合成サーバ３００が、一旦、アクセスパス情報をメール送信端末である携帯電話端末１００に送信し、携帯電話端末１００からメール受信端末２００にアクセスパス情報を送信するようにしてもよい。 In the example shown in FIG. 11, the access path information (URL information) is transmitted from the composition server 300 to the mail receiving terminal 200. However, the composition server 300 once transmits the access path information to the mail transmission terminal. It may be transmitted to a certain mobile phone terminal 100, and the access path information may be transmitted from the mobile phone terminal 100 to the mail receiving terminal 200.

このように、第２の実施の形態によれば、メール送信端末である携帯電話端末１００においては、合成処理が実行されないので、端末への処理負荷を小さくすることが可能である。 As described above, according to the second embodiment, since the cell phone terminal 100 that is a mail transmitting terminal does not execute the combining process, the processing load on the terminal can be reduced.

｛第３の実施の形態｝
次に、第３の実施の形態について説明する。第３の実施の形態が第２の実施の形態と異なる点は、音声の録音も合成サーバ３００で実行する点である。この他の点は、第２の実施の形態と同様である。具体的には、まず、メール送信端末である携帯電話端末１００は、合成サーバ３００との間で電話回線を接続する。そして、ユーザは、マイク装置１０１に向かってメッセージを発声するのである。このメッセージが電話回線を通じて合成サーバ３００に転送され、合成サーバ３００において録音処理が行われるのである。その後の処理は、第２の実施の形態と同様である。 {Third embodiment}
Next, a third embodiment will be described. The third embodiment is different from the second embodiment in that voice recording is also performed by the synthesis server 300. The other points are the same as in the second embodiment. Specifically, first, the cellular phone terminal 100 that is a mail transmission terminal connects a telephone line to the synthesis server 300. Then, the user utters a message toward the microphone device 101. This message is transferred to the synthesis server 300 through the telephone line, and recording processing is performed in the synthesis server 300. Subsequent processing is the same as in the second embodiment.

第３の実施の形態においても、メール送信端末である携帯電話端末１００においては、合成処理が実行されないので、端末への処理負荷を小さくすることが可能である。 Also in the third embodiment, since the combining process is not executed in the mobile phone terminal 100 that is a mail transmitting terminal, it is possible to reduce the processing load on the terminal.

｛第４の実施の形態｝
次に、本発明の第４の実施の形態について説明する。第４の実施の形態が第１の実施の形態と異なる点は、マイク装置１０１から入力した音声に様々な音声処理を加え、その音声処理後の音声とＢＧＭデータＢＤとを合成する点である。 {Fourth embodiment}
Next, a fourth embodiment of the present invention will be described. The fourth embodiment is different from the first embodiment in that various audio processing is added to the audio input from the microphone device 101, and the audio after the audio processing and the BGM data BD are synthesized. .

処理の流れは、図５〜図９のフローチャートを用いて説明したものと略同様である。ただし、ステップＳ３０の合成処理が異なる。つまり、第１の実施の形態におけるステップ３０の合成処理は、録音した音声をそのままＢＧＭデータＢＤに合成させていたが、この実施の形態においては、録音した音声を加工したのち、ＢＧＭデータＢＤと合成する。図１３は、この実施の形態における合成処理(ステップＳ３０)のフローチャートである。この合成処理も、オーサリングプログラムＡＰによって実行される処理である。合成処理フローにおいて、まず、音声処理（ステップＳ３１）が実行される。音声処理は、録音された音声を変調する処理や、録音した音声に特殊効果を与える処理である。 The processing flow is substantially the same as that described with reference to the flowcharts of FIGS. However, the synthesis process in step S30 is different. That is, in the synthesis process of step 30 in the first embodiment, the recorded voice is directly synthesized with the BGM data BD. However, in this embodiment, after the recorded voice is processed, the BGM data BD Synthesize. FIG. 13 is a flowchart of the synthesizing process (step S30) in this embodiment. This synthesizing process is also a process executed by the authoring program AP. In the synthesis processing flow, first, voice processing (step S31) is executed. The sound processing is processing for modulating the recorded sound and processing for giving a special effect to the recorded sound.

音声を変調する処理としては、音声のテンポ（速度）を変更する処理、音声のピッチ（音程）をシフトする処理などが含まれる。たとえば、マイク装置１０１に対しては、ゆっくりとしたテンポで発声し録音するが、変調処理により音声をアップテンポに変更することにより、軽やかな調子の音声メッセージに変調することが可能である。また、録音音声のピッチをシフトし、全体的にメッセージの音程を低音側にシフトさせることで、重苦しい雰囲気や、機嫌の悪い様子を演出することが可能である。 The process for modulating the sound includes a process for changing the tempo (speed) of the sound, a process for shifting the pitch (pitch) of the sound, and the like. For example, the microphone device 101 is uttered and recorded at a slow tempo, but can be modulated into a light tone voice message by changing the voice to an up-tempo by modulation processing. Further, by shifting the pitch of the recorded voice and shifting the pitch of the message to the lower side as a whole, it is possible to produce a heavy atmosphere and a bad mood.

また、音声に特殊効果を与える処理としては、イコライザ処理、ハーモナイズ処理、エコー処理などが含まれる。イコライザ処理は、高音域を強調させたり、低音域を強調させたり、あるいは特定の音域をカットしたりすることにより、音声メッセージの周波数特性を変化させる処理である。ハーモナイズ処理は、録音音声のピッチに対して和音となる他の音を付加し、音声メッセージを和音にする処理である。エコー処理は、録音音声を時間差で再生し、音を響かせる処理である。たとえば、ハーモナイズ処理により、壮大な雰囲気のメッセージを生成することが可能である。また、エコー処理により、幻想的な雰囲気を演出することが可能である。 In addition, processing for giving special effects to sound includes equalizer processing, harmonization processing, echo processing, and the like. The equalizer process is a process of changing the frequency characteristics of a voice message by emphasizing a high sound range, emphasizing a low sound range, or cutting a specific sound range. The harmonization process is a process of adding another sound that becomes a chord to the pitch of the recorded voice to make the voice message a chord. The echo process is a process for reproducing the recorded voice with a time difference and making the sound resonate. For example, a message with a magnificent atmosphere can be generated by the harmonization process. In addition, a fantastic atmosphere can be produced by echo processing.

図１４を参照しながら、音声処理（ステップＳ３１）の処理の流れを説明する。まず、録音音声のテンポの変更を行うか否かの判定を行う（ステップＳ３０１）。テンポの変更を行わない場合、ステップＳ３０５に移行する。 With reference to FIG. 14, the flow of the audio processing (step S31) will be described. First, it is determined whether or not to change the tempo of the recorded sound (step S301). When the tempo is not changed, the process proceeds to step S305.

テンポの変更を行う場合（ステップＳ３０１でＹｅｓ）、シンクロ処理を行うか否かの判定を行う（ステップＳ３０２）。テンポの変更処理としては、２つの処理方法が用意されている。１つは、マニュアルによるテンポ変更処理であり、もう１つは、自動テンポ変更処理（シンクロ処理）である。マニュアルによるテンポ変更処理は、ユーザによって指定されたテンポ設定に従い、録音音声のテンポを変更する処理である。自動テンポ変更処理は、録音音声のテンポがＢＧＭデータＢＤにシンクロするように変更される処理である。 When changing the tempo (Yes in step S301), it is determined whether or not to perform the synchronization process (step S302). Two processing methods are prepared as tempo change processing. One is manual tempo change processing, and the other is automatic tempo change processing (synchronization processing). The manual tempo changing process is a process of changing the tempo of the recorded sound in accordance with the tempo setting designated by the user. The automatic tempo change process is a process in which the tempo of the recorded sound is changed so as to be synchronized with the BGM data BD.

このテンポ変更処理を説明する前に、まず、録音音声の録音時（テンポ変更前）におけるテンポの取得方法について説明する。録音時のテンポの取得方法には、２つの方法がある。１つは、ガイド音声により決定される方法である。ユーザにより音声が入力される際、メトロノームのように一定のリズムを刻むガイド音声が再生される。ユーザは、このガイド音声を聞きながら音声を録音するのである。録音音声には、このガイド音声の情報も含まれており、ガイド音声によって録音音声のテンポが決定される。もう１つの方法は、録音音声のリズムを自動で取得する方法である。録音音声に対して音声解析処理が行われ、音声が音節単位に分割される。そして、この各音節の発声タイミングから録音音声のテンポを自動的に取得するのである。以上の２つの方法により、録音音声の録音時におけるテンポが決定される。 Before explaining the tempo changing process, first, a method for obtaining a tempo at the time of recording the recorded sound (before changing the tempo) will be described. There are two methods for obtaining the tempo during recording. One is a method determined by the guide voice. When voice is input by the user, a guide voice with a constant rhythm is reproduced like a metronome. The user records the voice while listening to the guide voice. The recorded voice includes information on the guide voice, and the tempo of the recorded voice is determined by the guide voice. The other method is a method for automatically acquiring the rhythm of the recorded voice. A voice analysis process is performed on the recorded voice, and the voice is divided into syllable units. The tempo of the recorded voice is automatically acquired from the utterance timing of each syllable. By the above two methods, the tempo at the time of recording the recorded voice is determined.

次に、テンポの変更処理について説明する。マニュアルによるテンポ変更処理方法は、ユーザにより指定されたテンポ設定に従い、録音時のテンポを修正する方法である。たとえば、録音時のテンポがガイド音声により決定されていれば、このガイド音声のテンポを設定されたテンポに修正するのである。これに従い、録音音声のテンポも変更される。あるいは、録音時のテンポが自動取得されている場合には、この自動取得されたテンポが、設定されたテンポ設定に従い変更されるのである。 Next, the tempo change process will be described. The manual tempo change processing method is a method of correcting the tempo at the time of recording according to the tempo setting designated by the user. For example, if the recording tempo is determined by the guide sound, the tempo of the guide sound is corrected to the set tempo. Accordingly, the tempo of the recorded sound is also changed. Alternatively, when the recording tempo is automatically acquired, the automatically acquired tempo is changed according to the set tempo setting.

自動テンポ変更処理方法は、録音音声の録音時のテンポがＢＧＭデータＢＤにシンクロするように自動的に変更される方法である。ＢＧＭデータＢＤのテンポは、あらかじめＢＧＭデータＢＤに記録されている場合には、それを利用することが可能である。あるいはＢＧＭデータＢＤを音声解析することによって取得することも可能である。たとえば、ドラムなどのリズムを刻む音声に基づいてテンポを解析することが可能である。そして、録音時のテンポがガイド音声により決定されていれば、このガイド音声のテンポをＢＧＭデータのテンポに合わせるように変更するのである。これに従い、録音音声のテンポもＢＧＭにシンクロするように変更される。あるいは、録音時のテンポが自動取得されている場合には、この自動取得されたテンポがＢＧＭデータＢＤのテンポにシンクロするように自動的に変更されるのである。 The automatic tempo change processing method is a method in which the tempo at the time of recording of the recorded voice is automatically changed so as to synchronize with the BGM data BD. If the tempo of the BGM data BD is recorded in advance in the BGM data BD, it can be used. Alternatively, the BGM data BD can be acquired by performing voice analysis. For example, it is possible to analyze the tempo based on a voice that engraves a rhythm such as a drum. If the tempo at the time of recording is determined by the guide voice, the tempo of the guide voice is changed to match the tempo of the BGM data. In accordance with this, the tempo of the recorded sound is also changed to synchronize with the BGM. Alternatively, when the recording tempo is automatically acquired, the automatically acquired tempo is automatically changed to synchronize with the tempo of the BGM data BD.

図１４のフローチャートに戻る。シンクロ処理を行う場合（ステップＳ３０２でＹｅｓ）、録音音声のテンポが、ＢＧＭデータＢＤにシンクロするように自動変更される（ステップＳ３０３）。シンクロ処理を行わない場合（ステップＳ３０２でＮｏ）、録音音声のテンポが設定値に従って変更される（ステップＳ３０４）。 Returning to the flowchart of FIG. When the synchronization process is performed (Yes in step S302), the tempo of the recorded sound is automatically changed to synchronize with the BGM data BD (step S303). When the sync process is not performed (No in step S302), the tempo of the recorded sound is changed according to the set value (step S304).

次に、ピッチシフト処理を行うか否かの判定を行う（ステップＳ３０５）。ピッチシフト処理を行う設定とされている場合（ステップＳ３０５でＹｅｓ）、設定値に従ってピッチシフト処理が実行される（ステップＳ３０６）。ピッチシフト処理の設定値とは、音程のシフト量である。録音音声のシフト量は、ユーザにより設定可能である。 Next, it is determined whether or not to perform pitch shift processing (step S305). If it is set to perform the pitch shift process (Yes in step S305), the pitch shift process is executed according to the set value (step S306). The set value of the pitch shift process is a pitch shift amount. The shift amount of the recorded voice can be set by the user.

次に、イコライザ処理を行うか否かの判定を行う（ステップＳ３０７）。イコライザ処理を行う設定とされている場合（ステップＳ３０７でＹｅｓ）、設定値に従ってイコライザ処理が実行される（ステップＳ３０８）。イコライザ処理の設定値とは、強調する音域の情報、あるいは、カットする音域の情報などであり、ユーザにより設定可能である。 Next, it is determined whether or not to perform the equalizer process (step S307). If it is set to perform the equalizer process (Yes in step S307), the equalizer process is executed according to the set value (step S308). The set value of the equalizer process is information on a sound range to be emphasized or information on a sound range to be cut, and can be set by the user.

次に、ハーモナイズ処理を行うか否かの判定を行う（ステップＳ３０９）。ハーモナイズ処理を行う設定とされている場合（ステップＳ３０９でＹｅｓ）、設定値に従ってハーモナイズ処理が実行される（ステップＳ３１０）。ハーモナイズ処理の設定値は、ユーザにより設定可能とする。たとえば、ベース音に対して３度の音を付加する。あるいは、ベース音に対して３度と５度の音を付加するなどの設定を可能とすればよい。 Next, it is determined whether or not to perform harmonization processing (step S309). If it is set to perform the harmonization process (Yes in step S309), the harmonization process is executed according to the set value (step S310). The setting value of the harmonization process can be set by the user. For example, a third sound is added to the bass sound. Alternatively, settings such as adding 3rd and 5th sounds to the bass sound may be made possible.

次に、エコー処理を行うか否かの判定を行う（ステップＳ３１１）。エコー処理を行う設定とされている場合（ステップＳ３１１でＹｅｓ）、設定値に従ってエコー処理が実行される（ステップＳ３１２）。エコー処理の設定値とは、エコー音声の時間差、エコー音声の継続時間などを指定する情報であり、ユーザにより設定可能である。 Next, it is determined whether or not to perform echo processing (step S311). If it is set to perform echo processing (Yes in step S311), echo processing is executed according to the set value (step S312). The setting value of the echo process is information for designating a time difference between echo sounds, a duration time of the echo sound, and the like, and can be set by the user.

このように、音声処理（ステップＳ３１）においては、様々な処理が実行されるが、どの音声処理を実行させるかは、ユーザにより設定可能とすればよい。たとえば、ユーザは、テンポ自動変更とハーモナイズ処理を実行するように設定したり、テンポのマニュアル変更処理とピッチシフト処理とエコー処理を実行するように設定したり、自由に音声処理の組み合わせを選択することが可能である。また、音声処理を与える音声の時間軸上のポイントをユーザにより指定できるようにすればよい。これにより、たとえば、音声の前半部分は、エコー処理により幻想的な雰囲気を演出し、後半部分は、アップテンポにしてスピード間溢れるメッセージとすることなどが可能である。 As described above, various processes are executed in the audio process (step S31), and what audio process is to be executed may be set by the user. For example, the user is set to execute automatic tempo change and harmonization processing, or is set to execute manual tempo change processing, pitch shift processing and echo processing, or freely selects a combination of audio processing It is possible. In addition, it is only necessary that the user can designate a point on the time axis of the voice to which the voice processing is applied. As a result, for example, the first half of the sound can produce a fantastic atmosphere by echo processing, and the second half can be made up-tempo and a message overflowing in speed.

このように、どのような音声処理を実行するかは、ユーザにより自由に設定可能としているが、このような細かな設定を行う負担からユーザを開放するために、複数のテーマに対応した複数種類の設定セットを用意しておくことが望ましい。この設定セットには、実行する音声処理の組み合わせや、実行される各音声処理の設定値などが規定されている。たとえば、ロック調、バラード調、ラップ調など曲調に合わせた設定セットを用意しておけば便利である。あるいは、悲しみ編、怒り編、喜び編などの感情にマッチした設定セットを用意しておけば便利である。ユーザは、ＢＧＭデータＢＤとしてロック調の音楽を選択し、設定セットとしてロック調を選択しておけば、簡単にロック調の音声メッセージを作成可能である。 In this way, what kind of audio processing is performed can be freely set by the user, but in order to free the user from the burden of making such detailed settings, multiple types corresponding to multiple themes are available. It is desirable to have a set of settings. In this setting set, combinations of voice processing to be executed, setting values for each voice processing to be executed, and the like are defined. For example, it is convenient to prepare a setting set that matches the tune, such as rock, ballad, and lap. Alternatively, it is convenient to prepare a set that matches emotions such as sadness, anger, and joy. The user can easily create a rock-tone voice message by selecting rock-like music as the BGM data BD and selecting the lock-like as the setting set.

図１３に戻り、ステップＳ３１において音声処理が終了すると、音声処理が行われた音声とＢＧＭデータＢＤとの合成処理が行われる（ステップＳ３２）。合成処理が終了すると、図５で示したステップＳ４０,Ｓ５０,Ｓ６０（またはＳ７０）が実行され、マルチメディアメールあるいはボイスメールが送信されるのである。 Returning to FIG. 13, when the audio processing is completed in step S31, the synthesis processing of the audio subjected to the audio processing and the BGM data BD is performed (step S32). When the synthesizing process is completed, steps S40, S50, and S60 (or S70) shown in FIG. 5 are executed, and multimedia mail or voice mail is transmitted.

以上、説明したように、本実施の形態によれば、録音した音声を変調し、あるいは様々な特殊効果を与えた上でＢＧＭデータＢＤと合成する。これにより、音声メールや映像付き音声メールの表現力をさらに増強させることが可能である。特に、音声処理の設定とＢＧＭデータＢＤの組み合わせを工夫することで、ＢＧＭデータＢＤの雰囲気や曲調に合わせたメッセージを再生することが可能である。 As described above, according to the present embodiment, the recorded voice is modulated or synthesized with BGM data BD after various special effects are given. As a result, it is possible to further enhance the expressive power of voice mail and voice mail with video. In particular, it is possible to reproduce a message in accordance with the atmosphere and tone of the BGM data BD by devising a combination of the voice processing setting and the BGM data BD.

本発明によれば、ノイズゲート処理とテンポ変更処理によって、ＢＧＭは、録音音声と無関係に流れるサウンドではなく、録音音声と融合して、１つのサウンドを構成する。つまり、ノイズゲート処理によって、録音音声の文節あるいは文章が、ＢＧＭの小節にリンクされ、融合する。さらには、テンポ変更処理によって、ＢＧＭの各小節内において録音音声の音節がＢＧＭのリズムに融合し、統一感のあるサウンドを構成するのである。 According to the present invention, by the noise gate process and the tempo change process, the BGM is combined with the recorded voice instead of the sound that flows independently of the recorded voice to constitute one sound. In other words, the recorded voice phrase or sentence is linked to the BGM bars by the noise gate processing and merged. Furthermore, the tempo change process merges the syllables of the recorded voice into the BGM rhythm within each BGM measure, thereby forming a unified sound.

なお、上記の実施の形態では、音声を録音した後、ＢＧＭデータＢＤを取得し、その次に、録音音声の音声処理および合成処理を実行する処理の流れとしたが、音声の録音中に音声処理を実行するようにしてもよい。この場合、音声の録音と音声処理が並行して行われ、その次に、ＢＧＭデータＢＤが取得され、音声処理後の音声とＢＧＭデータＢＤが合成されるという処理フローである。ただし、シンクロ処理によるテンポ自動変更処理を行うためには、ＢＧＭデータＢＤの情報が必要であるため、自動変更処理については、ＢＧＭデータＢＤの取得後に実行するようにすればよい。 In the above embodiment, after recording the voice, the BGM data BD is acquired, and then the voice processing and the synthesis processing of the recorded voice are performed. However, the voice is recorded during the voice recording. Processing may be executed. In this case, voice recording and voice processing are performed in parallel, and then BGM data BD is acquired, and the voice after voice processing and BGM data BD are synthesized. However, since the information of the BGM data BD is necessary to perform the automatic tempo change process by the synchro process, the automatic change process may be executed after obtaining the BGM data BD.

また、上記の実施の形態においては、録音音声を変調あるいは録音音声に特殊効果を与える音声処理が実行される場合を説明したが、ＢＧＭ音楽を変調あるいはＢＧＭ音楽に特殊効果を与えるような音声処理が実行されるようにしてもよい。音声処理の方法としては、録音音声で設定されている処理と同じ処理を施すようにしてもよいし、ＢＧＭ音楽については、別に音声処理の内容が設定されてもよい。 Further, in the above-described embodiment, a case has been described in which sound processing for modulating a recorded sound or giving a special effect to the recorded sound is performed. However, a sound processing for modulating BGM music or giving a special effect to BGM music. May be executed. As a voice processing method, the same processing as that set for the recorded voice may be performed, or for BGM music, the contents of the voice processing may be set separately.

実施の形態にかかる携帯電話端末のブロック図である。It is a block diagram of the mobile phone terminal concerning an embodiment. 音声電子メールのレイヤ構造を示す図である。It is a figure which shows the layer structure of a voice electronic mail. マルチメディアメールのレイヤ構造を示す図である。It is a figure which shows the layer structure of a multimedia mail. 汎用音楽ファイル形式を示す図である。It is a figure which shows a general purpose music file format. 合成処理のメイン処理を示すフローチャートである。It is a flowchart which shows the main process of a synthesis process. ノイズゲート処理のフローチャートである。It is a flowchart of a noise gate process. ＢＧＭデータ取得処理のフローチャートである。It is a flowchart of a BGM data acquisition process. マルチメディアメールの作成処理を示すフローチャートである。It is a flowchart which shows the production process of multimedia mail. ボイスメールの作成処理を示すフローチャートである。It is a flowchart which shows the production process of a voice mail. 第１の実施の形態における処理の流れを示す図である。It is a figure which shows the flow of the process in 1st Embodiment. 第２の実施の形態における処理の流れを示す図である。It is a figure which shows the flow of the process in 2nd Embodiment. 第３の実施の形態における処理の流れを示す図である。It is a figure which shows the flow of the process in 3rd Embodiment. 第４の実施の形態における合成処理のフローチャートである。It is a flowchart of the synthetic | combination process in 4th Embodiment. 音声処理のフローチャートである。It is a flowchart of an audio | voice process.

Explanation of symbols

１００携帯電話端末
１０１マイク装置
１０３ＲＯＭカード
１０６メモリ（ＳＤＲＡＭ）
ＡＰオーサリングプログラム
ＢＤＢＧＭデータ
ＥＤ効果音データ
100 Mobile phone terminal 101 Microphone device 103 ROM card 106 Memory (SDRAM)
AP authoring program BD BGM data ED Sound effect data

Claims

Voice input means;
Means for recording the voice input by the voice input means and storing it as recorded data;
Synthesis processing means for synthesizing the recording data and music data and generating synthesized voice data;
Transmitting means for transmitting the synthesized voice data as an e-mail;
An e-mail transmitting terminal comprising:

In the e-mail transmission terminal according to claim 1,
The music data is
Sound effect data,
Including
The synthesis processing means includes
Means for synthesizing the sound effect data at a specified point on the playback time axis of the synthesized sound data;
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to claim 1 or 2,
The e-mail transmission terminal, wherein the music data is stored in a recording medium detachable from the e-mail transmission terminal or a storage device of a server on a network to which the e-mail transmission terminal can be connected.

In the e-mail transmission terminal according to claim 3,
The music data stored in the recording medium or the storage device is encrypted,
The synthesis processing means includes
Means for decrypting the encrypted music data;
An e-mail transmission terminal comprising:

In the electronic mail transmitting terminal according to any one of claims 1 to 4,
The music data is
Marked music data with one or more timing markers set in advance on the playback time axis,
Including
The synthesis processing means includes
Means for disassembling a portion of the recorded data below a predetermined volume as a silent portion, and decomposing the recorded data into a plurality of recorded data elements separated by a silent portion;
Means for synthesizing each of the plurality of recording data elements in synchronism with a time set by the timing marker of the music data with the marker;
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to claim 5,
The marker-attached music data is
Music data with timing markers set at the beginning of each measure,
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to any one of claims 1 to 6,
The synthesis processing means includes
Means for synthesizing video data at an arbitrary point on the time axis of the synthesized audio data in response to a designation operation by a user, and generating synthesized audio data with video data;
Including
The transmission means includes
Means for transmitting the synthesized audio data with video data as an e-mail;
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to any one of claims 1 to 6,
Means for compressing and converting the synthesized voice data;
An e-mail transmitting terminal comprising:

The e-mail transmission terminal according to claim 7, further comprising:
Means for compressing and converting the synthesized audio data with video data into a moving image;
An e-mail transmitting terminal comprising:

In the e-mail transmission terminal according to any one of claims 1 to 6,
The e-mail transmission terminal is a mobile phone terminal, and the synthesized voice data is converted into a standard music data format defined in the mobile phone terminal.

The electronic mail transmitting terminal according to any one of claims 1 to 10, further comprising:
Means for generating scenario data recording the rules of the synthesis process;
With
The transmission means includes
Means for transmitting the recording data and the scenario data as an e-mail;
An e-mail transmission terminal comprising:

12. The e-mail transmission terminal according to claim 1, further comprising:
Processing means for processing the voice input by the voice input means;
With
The processing means includes
Means for modulating the sound and / or for providing a special effect to the sound;
Including
The e-mail transmission terminal characterized in that the synthesizing processing unit synthesizes the recording data and the music data after being processed by the processing unit.

In the e-mail transmission terminal according to claim 12,
The processing means includes
Means for executing a tempo change process and / or a pitch shift process of the voice input by the voice input means;
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to claim 12,
The processing means includes
Means for executing any one or more of equalizer processing, harmonization processing, and echo processing on the voice input by the voice input means;
An e-mail transmission terminal comprising:

In the e-mail transmission terminal according to any one of claims 12 to 14,
A plurality of setting information corresponding to a plurality of themes is prepared in advance, each setting information defines the contents of processing executed by the processing means, and when one setting information is selected, An e-mail transmission terminal characterized in that contents of processing by the processing means are determined.

The electronic mail transmitting terminal according to any one of claims 1 to 15, further comprising:
Means for modulating the music defined by the music data and / or means for giving a special effect to the music defined by the music data;
An e-mail transmission terminal comprising:

A system for forwarding email,
A terminal,
A synthesis server;
With
The terminal
Voice input means;
Means for recording the voice input by the voice input means and storing it as recorded data;
Means for transmitting the recording data to the synthesis server;
With
The synthesis server
Synthesis processing means for synthesizing the received recording data and music data and generating synthesized voice data;
Means for storing the synthesized voice data as an e-mail in a storage device;
An e-mail system comprising:

A system for forwarding email,
A terminal,
A synthesis server;
With
The terminal
Voice input means;
Means for transmitting the voice input by the voice input means to the synthesis server;
With
The synthesis server
Means for recording the received voice as recorded data;
A synthesis processing means for synthesizing the recording data and the music data and generating synthesized voice data;
Means for storing the synthesized voice data as an e-mail in a storage device;
An e-mail system comprising:

The electronic mail system according to claim 17 or claim 18,
The music data is
Sound effect data,
Including
The synthesis processing means includes
Means for synthesizing sound effect data at a specified point on the playback time axis of the synthesized sound data;
An e-mail system characterized by including:

The electronic mail system according to any one of claims 17 to 19,
The music data is
Marked music data with one or more timing markers set in advance on the playback time axis,
Including
The synthesis processing means includes
Means for disassembling a portion of the recorded data below a predetermined volume as a silent portion, and decomposing the recorded data into a plurality of recorded data elements separated by a silent portion;
Means for synthesizing each of the plurality of recording data elements in synchronism with a time set by the timing marker of the music data with the marker;
An e-mail system characterized by including:

The electronic mail system according to claim 20,
The marker-attached music data is
Music data with timing markers set at the beginning of each measure,
An e-mail system characterized by including:

The e-mail system according to any one of claims 17 to 21, further comprising:
Means for synthesizing video data at an arbitrary point on the time axis of the synthesized audio data in response to a designation operation by a user, and generating synthesized audio data with video data;
An e-mail system comprising:

The e-mail system according to any one of claims 17 to 21, further comprising:
Means for compressing and converting the synthesized voice data;
An e-mail system comprising:

The e-mail system according to claim 22, further comprising:
Means for compressing and converting the synthesized audio data with video data into a moving image;
An e-mail system comprising:

The electronic mail system according to any one of claims 17 to 21,
The electronic mail system according to claim 1, wherein the terminal is a mobile phone terminal, and the synthesized voice data is converted into a standard music data format defined in the mobile phone terminal.

The electronic mail system according to any one of claims 17 to 25,
The synthesis server further includes:
Processing means for processing audio received from the terminal;
With
The processing means includes
Means for modulating the sound and / or for providing a special effect to the sound;
Including
The e-mail system, wherein the synthesizing unit synthesizes the recording data and the music data after being processed by the processing unit.

The e-mail system of claim 26.
The processing means includes
Means for executing a tempo change process and / or a pitch shift process of the voice input by the voice input means;
An e-mail system characterized by including:

The e-mail system of claim 26.
The processing means includes
Means for executing any one or more of equalizer processing, harmonization processing, and echo processing on the voice input by the voice input means;
An e-mail system characterized by including:

The e-mail system according to any one of claims 26 to 28,
A plurality of setting information corresponding to a plurality of themes is prepared in advance, each setting information defines the contents of processing executed by the processing means, and when one setting information is selected, An electronic mail system characterized in that contents of processing by a processing means are determined.

30. The electronic mail system according to claim 17, further comprising:
Means for modulating the music defined by the music data and / or means for giving a special effect to the music defined by the music data;
An e-mail system characterized by including: