JP3066452B2

JP3066452B2 - Sound characteristic conversion device, sound / label association device, and methods thereof

Info

Publication number: JP3066452B2
Application number: JP8510060A
Authority: JP
Inventors: 株式会社アルカディア; 株式会社エィ・ティ・アール人間情報通信研究所
Original assignee: 株式会社アルカディア; 株式会社エィ・ティ・アール人間情報通信研究所
Priority date: 1994-09-12
Filing date: 1995-09-12
Publication date: 2000-07-17
Anticipated expiration: 2015-09-12
Also published as: AU3400395A; WO1996008813A1

Description

【発明の詳細な説明】技術分野この発明は、音声、楽音、自然音等の音に対する特性
変換に関するものであり、特にその変換操作の容易化に
関するものである。また、当該特性変換に好適な音とラ
ベルの対応付けに関するものである。Description: TECHNICAL FIELD The present invention relates to characteristic conversion for sound such as voice, musical sound, and natural sound, and more particularly to facilitation of the conversion operation. In addition, the present invention relates to a correspondence between a sound and a label suitable for the characteristic conversion.

背景技術音声等の音に対して、その特性を変換し、所望の特性
を得ることが行われている。特性を変換するためには、
音の時間領域波形や周波数スペクトルに対して、その波
形やスペクトルを変形することにより行われるのが、一
般的である。例えば、アナログの音声信号を取り込み、
これをディジタルデータに変換し、このディジタルデー
タに対して希望する特性変換に対応する波形変形を施し
た後、再びアナログ信号に変換する、という操作が行わ
れている。これにより、音声信号の特性を所望の特性に
変換することができる。2. Description of the Related Art It has been practiced to convert characteristics of sound such as voice to obtain desired characteristics. To convert a property,
It is common to modify the time domain waveform or frequency spectrum of a sound by modifying the waveform or spectrum. For example, take in an analog audio signal,
An operation of converting the digital data into digital data, subjecting the digital data to waveform deformation corresponding to the desired characteristic conversion, and then converting the digital data back into an analog signal is performed. Thereby, the characteristics of the audio signal can be converted to desired characteristics.

しかしながら、上記のような従来の特性変換において
は、次のような問題点があった。特性変換の処理は、音
の時間領域波形、周波数スペクトルや線形予測分析（LP
C）のパラメータ等をディスプレイに表示し、これらを
操作することにより行われる。この操作によって、所望
の特性を得るためには、時間領域波形、周波数スペクト
ルや線形予測分析（LPC）のパラメータ等に対する専門
知識を有していなければならないという問題があった。
さらに、専門知識を有していても、所望の特性変換を行
うためには、十分な訓練が必要であるという問題もあっ
た。However, the conventional characteristic conversion as described above has the following problems. The processing of the characteristic conversion includes the time domain waveform of the sound, the frequency spectrum,
This is performed by displaying the parameters and the like of C) on the display and operating them. In order to obtain desired characteristics by this operation, there is a problem that it is necessary to have expertise in a time domain waveform, a frequency spectrum, parameters of a linear predictive analysis (LPC), and the like.
Further, there is a problem that sufficient training is required to perform a desired characteristic conversion even if the user has specialized knowledge.

発明の開示この発明は、上記のような問題点を解決して、変換操
作の容易な音質変換装置および方法、ならびに、これら
に好適な音・ラベル対応付け装置および方法を提供する
ことを目的とする。DISCLOSURE OF THE INVENTION An object of the present invention is to solve the above-described problems and provide a sound quality conversion device and method that can be easily converted, and a sound and label association device and method suitable for these. I do.

この発明の音特性変換装置は、所定の区分にしたがっ
て区分けされた音データ、および当該音データの各区分
ごとに対応づけられたラベルデータを保持する音・ラベ
ルデータ保持手段と、ラベルデータに対して修飾データ
が与えられると、当該修飾データに基づいて、ラベルデ
ータに基づくラベルを視覚的に修飾して表示手段に表示
させる表示制御手段と、当該ラベルデータに対応づけら
れた音データに対して、ラベルデータに対応して与えら
れた修飾データに基づき、対応する特性変換を行う変換
手段とを備えている。A sound characteristic conversion device according to the present invention comprises: sound / label data holding means for holding sound data divided according to a predetermined division, and label data associated with each division of the sound data; When the decoration data is given, based on the decoration data, the display control means for visually modifying the label based on the label data and displaying the label on the display means, and the sound data associated with the label data. And conversion means for performing a corresponding characteristic conversion based on the decoration data given corresponding to the label data.

この発明の音特性変換装置は、入力された音データを
音の区切りに基づいて区分する音データ区分手段と、前
記音の区切りに対応する区切り符号が付されて入力され
たラベルデータを、当該区切り符号に基づいて区分する
ラベルデータ区分手段と、区分された音データおよび区
切られたラベルデータを互いに対応づける対応形成手段
とを備えている。The sound characteristic conversion device of the present invention includes a sound data classification unit that classifies input sound data based on a sound segment, and a label data input with a delimiter code corresponding to the sound segment. Label data dividing means for dividing based on the delimiter code, and correspondence forming means for associating the divided sound data and the divided label data with each other are provided.

この発明の音特性変換装置は、ラベルに対する視覚的
な修飾が、ラベルに対する文字飾りであることを特徴と
している。The sound characteristic conversion device of the present invention is characterized in that the visual modification to the label is a character decoration on the label.

この発明の音特性変換装置は、ラベルに対する視覚的
な修飾が、ラベルの順序であることを特徴としている。The sound characteristic conversion device of the present invention is characterized in that the visual modification to the label is the order of the label.

この発明の音特性変換方法は、音データにラベルデー
タを対応づけておくとともに、音特性変換内容と修飾処
理とを対応づけておき、ラベルデータにより表されるラ
ベルを、与えられた修飾処理に基づいて視覚的に修飾し
て表示し、当該ラベルデータに対応づけられた音データ
に対して、ラベルデータに対して与えられた修飾処理に
対応する特性変換を行うことを特徴としている。The sound characteristic conversion method of the present invention associates sound data with label data, associates sound characteristic conversion contents with a modification process, and assigns a label represented by the label data to a given modification process. The sound data associated with the label data is subjected to characteristic conversion corresponding to the modification processing given to the label data.

この発明の音特性変換方法は、入力された音データを
音の区切りに基づいて区分し、ラベルデータを前記音の
区切りに対応して区分するとともに、区分したラベルデ
ータを区分した音データに対応づけるようにしたことを
特徴としている。The sound characteristic conversion method of the present invention classifies input sound data based on a sound segment, classifies label data according to the sound segment, and classifies the divided label data with the classified sound data. It is characterized by the fact that it is attached.

この発明の音特性変換装置は、所定の区分にしたがっ
て区分けされた音データ、および当該音データの各区分
ごとに対応づけられたラベルデータを保持する音・ラベ
ルデータ保持手段と、ラベルデータに対応づけられた音
データに対して、ラベルデータに対応して与えられた修
飾データに基づき、対応する特性変換を行う変換手段と
を備えている。A sound characteristic conversion device according to the present invention comprises: sound / label data holding means for holding sound data segmented according to a predetermined segment and label data associated with each segment of the sound data; Conversion means for performing a corresponding characteristic conversion on the attached sound data based on the modification data given in correspondence with the label data.

この発明の音特性変換方法は、音データにラベルデー
タを対応づけておくとともに、特性変換内容と修飾処理
とを対応づけておき、ラベルデータに対応づけられた音
データに対して、ラベルデータに対して施された修飾処
理に対応する特性変換を行うことを特徴としている。The sound characteristic conversion method of the present invention associates sound data with label data, associates characteristic conversion contents with a modification process, and converts sound data associated with label data into label data. It is characterized in that characteristic conversion corresponding to the modification processing performed on the image is performed.

この発明のシステムは、通信路を介して通信可能であ
る送信側装置と受信側装置を有しており、送信側装置か
ら受信側装置へ音データを伝送するシステムであって、送信側装置は、ラベルデータおよび修飾データを入力
するデータ入力手段と、ラベルデータおよび修飾データ
を通信路を介して受信側装置に伝送する通信手段とを備
えており、受信側手段は、送信側装置からのラベルデータおよび
修飾データを受信する通信手段と、当該ラベルデータに
基づいて標準音データを生成する標準音データ生成手段
と、標準音データを修飾データに基づいて音特性を変換
し、音特性変換データを生成する変換手段とを備えてい
る。A system according to the present invention includes a transmitting device and a receiving device communicable via a communication path, and is a system for transmitting sound data from a transmitting device to a receiving device. Data input means for inputting label data and decoration data, and communication means for transmitting the label data and decoration data to the receiving device via a communication path. The receiving means comprises a label from the transmitting device. Communication means for receiving data and modification data; standard sound data generation means for generating standard sound data based on the label data; and converting sound characteristics of the standard sound data based on the modification data, and converting sound characteristic conversion data. Conversion means for generating.

この発明の伝送方法は、通信路を介して、送信側から
受信側へ音データを伝送する方法であって、送信側においては、ラベルデータおよび修飾データを
入力し、ラベルデータおよび修飾データを通信路を介し
て受信側に伝送し、受信側は、送信側からのラベルデータおよび修飾デー
タを受信し、当該ラベルデータに基づいて標準音データ
を生成するとともに、標準音データを修飾データに基づ
いて音特性を変換し、音特性変換データを生成すること
を特徴としている。The transmission method according to the present invention is a method for transmitting sound data from a transmission side to a reception side via a communication path. The transmission side inputs label data and modification data, and communicates the label data and modification data. The receiving side receives the label data and the modification data from the transmitting side, generates the standard sound data based on the label data, and transmits the standard sound data based on the modification data. It is characterized by converting sound characteristics and generating sound characteristic conversion data.

この発明の音・ラベル対応付け装置は、音データを入
力する音データ入力手段と、音データによって表される
音の大きさに基づいて、音データを区分けする音データ
区分手段と、音データの前記区分に対応する位置に区切
り符号が付されたラベルデータを入力するラベルデータ
入力手段と、区切り符号に基づいて、ラベルデータを区
分けするラベルデータ区分手段と、区分された音データ
および区分されたラベルデータを互いに対応づける対応
形成手段とをを備えている。A sound / label associating device according to the present invention includes: sound data input means for inputting sound data; sound data classification means for classifying sound data based on a loudness of a sound represented by the sound data; Label data input means for inputting label data with a delimiter code at a position corresponding to the division, label data division means for dividing the label data based on the delimiter code, divided sound data and the divided sound data And association forming means for associating the label data with each other.

この発明の音・ラベル対応付け装置は、音データを入
力する音データ入力手段と、音データに対応するラベル
データを入力するラベルデータ入力手段と、ラベルデー
タによって表される各ラベルの平均継続時間と音データ
の継続時間とに基づいて、音データを各ラベルに対応づ
けて区分する詳細対応形成手段とを備えている。A sound / label associating device according to the present invention includes a sound data input unit for inputting sound data, a label data input unit for inputting label data corresponding to the sound data, and an average duration of each label represented by the label data. And detailed correspondence forming means for classifying the sound data in association with each label based on the sound data and the duration of the sound data.

この発明の音・ラベル対応付け装置は、対応形成手段
によって対応づけられたラベルデータと音データに関
し、ラベルデータによって表される各ラベルの平均継続
時間と音データの継続時間とに基づいて、音データを各
ラベルに対応づけて区分する詳細対応形成手段を備えて
いる。The sound / label associating apparatus according to the present invention relates to label data and sound data associated by the association forming means, based on the average duration of each label represented by the label data and the duration of the audio data. There is provided detailed correspondence forming means for classifying data in association with each label.

この発明の音・ラベル対応付け装置は、音データによ
って表される音の性質を視覚的に表示するための音表示
部と、ラベルデータによって表されるラベルを表示する
ためのラベル表示部とを備え、音表示部において、音の
大きさに基づいて音の区切りを表す区切りマークを表示
するようにしたことを特徴としている。A sound / label associating device according to the present invention includes a sound display unit for visually displaying a property of a sound represented by sound data, and a label display unit for displaying a label represented by label data. The sound display unit is characterized in that a delimiter mark indicating a sound delimiter is displayed based on a loudness of a sound.

この発明の音・ラベル対応付け方法は、音データによ
って表される音の大きさに基づいて、音データを区分け
し、音データの前記区分に対応する位置に区切り符号が
付されたラベルデータを受け、当該区切り符号に基づい
て、ラベルデータを区分けし、区分された音データおよ
び区分されたラベルデータを互いに対応づけることを特
徴としている。The sound / label associating method of the present invention classifies sound data based on the loudness of sound represented by sound data, and converts label data with a delimiter code at a position corresponding to the division of the sound data. Then, based on the delimiter, the label data is divided, and the divided sound data and the divided label data are associated with each other.

この発明の音・ラベル対応付け方法は、音データとラ
ベルとの対応付けを行う方法であって、各ラベルごとの
平均継続時間を予め用意しておき、ラベルデータによっ
て表される各ラベルの平均継続時間と音データの継続時
間とに基づいて、音データを各ラベルに対応づけて区分
するようにしたことを特徴としている。The sound / label associating method of the present invention is a method of associating sound data with a label, in which an average duration time for each label is prepared in advance, and the average of each label represented by the label data is averaged. It is characterized in that sound data is classified in association with each label based on the duration and the duration of the sound data.

この発明の音・ラベル対応付け方法は、各ラベルごと
の平均継続時間を予め用意しておき、対応づけられたラ
ベルデータと音データに関し、ラベルデータによって表
される各ラベルの平均継続時間と音データの継続時間と
に基づいて、音データを各ラベルに対応づけて区分する
ようにしたことを特徴としている。According to the sound / label association method of the present invention, an average duration for each label is prepared in advance, and for the associated label data and audio data, the average duration and audio for each label represented by the label data are recorded. It is characterized in that sound data is classified in association with each label based on the duration of the data.

この発明の音・ラベル対応付けのための表示方法は、
音データによって表される音の性質を視覚的に表示する
ための音表示部、ラベルデータによって表されるラベル
を表示するためのラベル表示部を備え、音表示部におい
て、音の大きさに基づいて音の区切りを表す区切りマー
クを表示するようにしたことを特徴としている。The display method for sound / label correspondence according to the present invention is as follows.
A sound display unit for visually displaying the nature of the sound represented by the sound data, and a label display unit for displaying a label represented by the label data, wherein the sound display unit includes a sound display unit based on the volume of the sound. In this case, a delimiter mark indicating a sound delimiter is displayed.

この発明において、ラベルデータとは、音声や自然音
等の音と対応づけ可能な、文字列、図形列、記号列、絵
の列等およびこれらの組合わせをいうものであり、例え
ば、テキストデータや文字に対応したアイコンデータ等
がこれに含まれる。In the present invention, label data refers to character strings, graphic strings, symbol strings, picture rows, and the like, and combinations thereof, which can be associated with sounds such as voices and natural sounds. And icon data corresponding to characters and the like.

音データとは、音の波形を直接又は間接的に表現した
データであり、例えば、音のアナログ波形をディジタル
化したデータや、音をLPCパラメータによって表現した
データ等がこれに含まれる。The sound data is data that directly or indirectly represents the waveform of a sound, and includes, for example, data obtained by digitizing an analog waveform of the sound, data that represents the sound using LPC parameters, and the like.

修飾とは、音データに基づいて得られる音の特性を変
換するため、ラベルに対してなされる、強調、アンダー
ラインの付加、符号等の付加、順序の入れ替え等を含む
概念である。修飾データの内容は、特性変換の内容を示
していてもよいが、修飾データに基づいてラベルデータ
を視覚的に修飾する場合には、視覚的修飾の内容を示す
ようにする方が好ましいことがある。Modification is a concept that includes emphasis, addition of an underline, addition of a sign and the like, and change of the order, etc., performed on a label in order to convert characteristics of a sound obtained based on sound data. Although the contents of the decoration data may indicate the contents of the property conversion, when visually modifying the label data based on the decoration data, it is preferable to indicate the contents of the visual modification. is there.

音の特性変換とは、音の何等かの性質を変化させるこ
とをいうものであり、例えば、ピッチ変更、強度変更、
ビブラートの付与、周波数スペクトルの変更、継続時間
の変更、サンプリング間隔の変更、音質の女性化、男性
化、明瞭化、不明瞭化等やこれらの組み合わせ等の音質
変換だけでなく、音を出す順序の入れ替え、音の一部削
除等も含む概念である。Sound characteristic conversion refers to changing some property of the sound, for example, pitch change, intensity change,
Addition of vibrato, change of frequency spectrum, change of duration, change of sampling interval, feminization, masculinization, clarification, unclearness, etc. This is a concept that also includes replacement of parts, partial deletion of sounds, and the like.

請求項１の音特性変換装置および請求項５の音特性変
換方法は、音データにラベルデータを対応づけておくと
ともに、音特性変換内容と修飾処理とを対応づけてお
き、ラベルデータにより表されるラベルを、与えられた
修飾処理に基づいて視覚的に修飾して表示し、当該ラベ
ルデータに対応づけられた音データに対して、ラベルデ
ータに対して与えられた修飾処理に対応する特性変換を
行うことを特徴としている。したがって、対応するラベ
ルに対して視覚的な修飾を行うだけで、音に対する特性
変換を行うことができる。According to the sound characteristic conversion device of the first aspect and the sound characteristic conversion method of the fifth aspect, the label data is associated with the sound data, and the sound characteristic conversion content and the modification process are associated with each other, and are represented by the label data. Label is visually modified based on the applied modification processing, and the sound data associated with the label data is subjected to the characteristic conversion corresponding to the modification processing applied to the label data. It is characterized by performing. Therefore, the characteristic conversion for the sound can be performed only by visually modifying the corresponding label.

請求項２の音特性変換装置および請求項６の音特性変
換方法は、入力された音データを音の区切りに基づいて
区分し、ラベルデータを前記音の区切りに対応して区分
するとともに、区分したラベルデータを区分した音デー
タに対応づけるようにしている。したがって、音データ
およびラベルデータを入力するだけで、両者の対応づけ
を行うことができる。According to a second aspect of the present invention, there is provided a sound characteristic converting apparatus and a sound characteristic converting method, wherein input sound data is divided based on a sound segment, and label data is divided according to the sound segment. The classified label data is associated with the divided sound data. Therefore, it is possible to associate the two simply by inputting the sound data and the label data.

請求項７の音特性変換装置および請求項８の音特性変
換方法は、音データにラベルデータを対応づけておくと
ともに、特性変換内容と修飾処理とを対応づけておき、
ラベルデータに対応づけられた音データに対して、ラベ
ルデータに対して施された修飾処理に対する特性変換を
行うことを特徴としている。したがって、音データに比
べて音節の区切りが明瞭なラベルデータに対し、修飾処
理を施すだけで、音に対する特性変換を行うことができ
る。According to the sound characteristic conversion device of claim 7 and the sound characteristic conversion method of claim 8, while labeling data is associated with sound data, characteristic conversion contents and modification processing are associated with each other,
Characteristic conversion is performed on the sound data associated with the label data with respect to the modification processing performed on the label data. Therefore, it is possible to perform the characteristic conversion on the sound only by performing the modification process on the label data in which the syllable division is clearer than the sound data.

請求項９の音伝送システムおよび請求項10の音伝送方
法は、送信側において、ラベルデータおよび修飾データ
を入力し、受信側において、当該ラベルデータに基づい
て標準音データを生成するとともに、標準音データを修
飾データに基づいて音特性変換し、音質変換データを生
成するようにしている。したがって、ラベルデータおよ
び修飾データを送るだけで、所望の音特性の音を送るこ
とができる。In the sound transmission system according to the ninth aspect and the sound transmission method according to the tenth aspect, label data and modification data are input on the transmitting side, and standard sound data is generated on the receiving side based on the label data. The data is subjected to sound characteristic conversion based on the modification data to generate sound quality conversion data. Therefore, a sound having desired sound characteristics can be sent only by sending the label data and the modification data.

請求項11の音・ラベル対応付け装置および請求項15の
音・ラベル対応付け方法は、音データによって表される
音の大きさを、音の大きさに関する２つのしきい値を設
け、音の大きさが所定時間以上連続して１つのしきい値
より超えた場合に区分けの始まりとし、音の大きさが所
定時間以上連続して他のしきい値やより下回った場合に
区分けの終了とするように、区分けし、音データの前記
区分に対応する位置に区切り符号が付されたラベルデー
タを受け、当該区切り符号に基づいて、ラベルデータを
区分けし、区分された音データおよび区分されたラベル
データを互いに対応づけることを特徴としている。した
がって、区分けした音データと区分けしたら、ラベルデ
ータとを容易に対応づけることができる。The sound / label associating device according to claim 11 and the sound / label associating method according to claim 15 provide a sound volume represented by sound data by setting two thresholds relating to the sound volume, When the volume exceeds a threshold for more than a predetermined time continuously, the division is started. When the volume of the sound continuously falls below another threshold for more than a predetermined time, the classification ends. In such a way, the label data is divided and the label data with the delimiter attached to the position corresponding to the division of the sound data is received, and the label data is divided based on the delimiter, and the divided sound data and the divided sound data are divided. It is characterized in that label data is associated with each other. Therefore, when the sound data is separated from the divided sound data, the label data can be easily associated with the sound data.

請求項12の音・ラベル対応付け装置および請求項16の
音・ラベル対応付け方法は、音データとラベルとの対応
付けを行う方法であって、ラベルデータの各音節ごとの
平均継続時間に基づいて求められたラベルの平均継続時
間と音データの継続時間とに基づいて、音データを各ラ
ベルに対応づけて区分するようにしたことを特徴として
いる。したがって、各ラベルごとに音データを対応づけ
ることを、容易に行うことができる。A sound / label associating apparatus according to claim 12 and a sound / label associating method according to claim 16 are methods for associating sound data with a label, based on an average duration of each syllable of the label data. The sound data is classified according to each label on the basis of the average duration of the label and the duration of the sound data obtained in this way. Therefore, it is possible to easily associate the sound data with each label.

請求項14の音・ラベル対応付け装置および請求項18の
音・ラベル対応付けのための表示方法は、音データによ
って表される音の性質を視覚的に表示するための音表示
部、ラベルデータによって表されるラベルを表示するた
めのラベル表示部を備え、音表示部において、音の大き
さを、音の大きさに関する２つのしきい値を設け、音の
大きさが所定時間以上連続して１つのしきい値より超え
た場合に区分けの始まりとし、音の大きさが所定時間以
上連続して他のしきい値より下回った場合に区分けの終
了とするように、区分けした際の音の区切りを表する区
切りマークを表示するようにしたことを特徴としてい
る。したがって、音データの区切り位置を確認しなが
ら、ラベルデータの入力、表示等を行うことができる図面の簡単な説明図１は、この発明の位置実施例よる音特性変換装置の
表示画面を示す図である。A sound / label associating device according to claim 14 and a display method for sound / label associating according to claim 18, comprising: a sound display unit for visually displaying a property of a sound represented by the sound data; A label display unit for displaying a label represented by the following formula. In the sound display unit, two loudness thresholds are set for the loudness of the sound, and the loudness of the sound is continuously longer than a predetermined time. When the threshold value exceeds one threshold, the segmentation starts, and when the loudness falls below another threshold continuously for a predetermined time or more, the segmentation ends. It is characterized in that a delimiter mark indicating a delimiter is displayed. Therefore, it is possible to input and display label data while confirming the separation position of sound data. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a display screen of a sound characteristic conversion device according to a position embodiment of the present invention. It is.

図２は、この発明の一実施例による音質変換装置の全
体構成を示す図である。FIG. 2 is a diagram showing an overall configuration of a sound quality conversion device according to one embodiment of the present invention.

図３は、図２の機能を実現するためにCPUを用いた場
合のハードウエア構成を示す図である。FIG. 3 is a diagram showing a hardware configuration when a CPU is used to realize the functions of FIG.

図４は、音特性変換装置の動作を示すフローチャート
である。FIG. 4 is a flowchart showing the operation of the sound characteristic conversion device.

図５は、音特性変換装置の動作を示すフローチャート
である。FIG. 5 is a flowchart showing the operation of the sound characteristic conversion device.

図６は、音声データと関連づけて記憶されたラベルデ
ータを示す図である。FIG. 6 is a diagram showing label data stored in association with audio data.

図７は、音声データの記憶状態を示す図である。 FIG. 7 is a diagram showing a storage state of audio data.

図８は、CRT16に表示されたラベルを表す図である。 FIG. 8 is a diagram illustrating a label displayed on the CRT 16.

図９は、視覚的修飾と音質変換内容との対応関係を示
す図である。FIG. 9 is a diagram showing the correspondence between visual modification and sound quality conversion content.

図10は、修飾データが付加されたラベルデータを示す
図である。FIG. 10 is a diagram showing label data to which decoration data is added.

図11は、視覚的修飾が施されたラベルを示す図であ
る。FIG. 11 is a diagram showing a label to which visual modification has been applied.

図12は、音声データの区分けを説明するための図であ
る。FIG. 12 is a diagram for explaining division of audio data.

図13は、ピッチ変換の処理を示す図である。 FIG. 13 is a diagram showing a pitch conversion process.

図14Aは、ピッチ変換前の音源波形を示す図である。 FIG. 14A is a diagram showing a sound source waveform before pitch conversion.

図14Bは、ピッチ変換後の音源波形を示す図である。 FIG. 14B is a diagram showing a sound source waveform after pitch conversion.

図15Aは、パワー変更前の音声データおよびその短時
間区間平均パワーを示す図である。FIG. 15A is a diagram showing audio data before power change and its short-term average power.

図15Bは、パワー変更後の音声データおよびその短時
間区間平均パワーを示す図である。FIG. 15B is a diagram showing audio data after power change and its short-term average power.

図16Aは、元の音声データを示す図である。 FIG. 16A is a diagram showing original audio data.

図16Bは、音の時間長さを変更した音声データを示す
図である。FIG. 16B is a diagram showing audio data in which the time length of sound has been changed.

図16Cは、ビブラートを施した音声データを示す図で
ある。FIG. 16C is a diagram showing audio data to which vibrato has been applied.

図17は、アイコンに対して修飾として用いる記号の例
を示す図である。FIG. 17 is a diagram illustrating an example of a symbol used as a decoration for an icon.

図18は、音の順序の入れ替えを示す図である。 FIG. 18 is a diagram showing a change in the order of sounds.

図19は、日本語に対する区切り処理の例を示す図であ
る。FIG. 19 is a diagram illustrating an example of a delimiter process for Japanese.

図20は、ラベルごとの区分の例を示す図である。 FIG. 20 is a diagram illustrating an example of classification for each label.

図21は、音声伝送装置の一実施例を示す図である。 FIG. 21 is a diagram illustrating an embodiment of a voice transmission device.

図22は、図21の実施例において伝送されるデータの例
を示す図である。FIG. 22 is a diagram showing an example of data transmitted in the embodiment of FIG.

図23は、修飾データを符号化したテーブルを示す図で
ある。FIG. 23 is a diagram illustrating a table in which the decoration data is encoded.

発明を実施するための最良の形態図２に、この発明の一実施例による音質変換装置の全
体構成を示す。音データ区分手段２は、入力された音デ
ータを、音の区切りに基づいて区分けする。ラベルデー
タ区分手段４には、音の区切りに対応する区切り符号が
付されたラベルデータが入力される。ラベルデータ区分
手段４は、このラベルデータを区切り符号に基づいて区
分けする。区分けされた音データと、区分けされたラベ
ルデータは、対応形成手段６に入力され、区分ごとに互
いに対応づけられる。対応づけられた音データとラベル
データは、音・ラベル保持手段８に保持される。BEST MODE FOR CARRYING OUT THE INVENTION FIG. 2 shows an overall configuration of a sound quality conversion device according to an embodiment of the present invention. The sound data classification means 2 classifies the input sound data based on sound delimiters. Label data to which a delimiter code corresponding to a sound delimiter is added is input to the label data dividing means 4. The label data classifying means 4 classifies the label data based on the delimiter. The segmented sound data and the segmented label data are input to the association forming unit 6, and are associated with each other for each segment. The associated sound data and label data are held in the sound / label holding means 8.

表示制御手段10は、各区分に対する修飾データを受け
て、対応するラベルデータを修飾し、修飾されたラベル
を表示手段14に表示する。これにより、どの区分に対し
てどのような修飾が施されたのかを、容易に確認するこ
とができる。変換手段12は、各区分に対する修飾データ
を受けて、対応する音データを修飾し、修飾された音デ
ータを出力する。The display control means 10 receives the modification data for each section, modifies the corresponding label data, and displays the modified label on the display means 14. As a result, it is possible to easily confirm what kind of modification has been applied to which category. The conversion means 12 receives the modification data for each section, modifies the corresponding sound data, and outputs the modified sound data.

図３に、図２の構成を、CPUを用いて実現した場合の
ハードウエア構成を示す。バスライン40には、表示手段
であるCRT16、CPU18、音・ラベルデータ保持手段である
メモリ20、入力インターフェイス22、ハードディスク2
4、出力インターフェイス26、フロッピーディスクドラ
イブ（FDD）15が接続されている。入力インターフェイ
ス22には、A/Dコンバータ28を介してマイク30が接続さ
れている。また、入力インターフェイス22には、キーボ
ード32、マウス34も接続されている。出力インターフェ
イス26には、D/Aコンバータ36を介して、スピーカ38が
接続されている。ハードディスク24には、図３、図４に
フローチャートを示したプログラムが格納されている。
なお、このプログラムは、FDD15によってフロッピーデ
ィスク（記録媒体）から、ハードディスク24にインスト
ールされたものである。もちろん、CD−ROM等の記録媒
体からインストールするようにしてもよい。メモリ20
は、音・ラベルデータ保持手段であるとともに、プログ
ラムを実行するためのワークエリアとしても用いられ
る。FIG. 3 shows a hardware configuration when the configuration in FIG. 2 is realized using a CPU. The bus line 40 includes a CRT 16 and a CPU 18 as display means, a memory 20 as sound and label data holding means, an input interface 22, and a hard disk 2
4. Output interface 26 and floppy disk drive (FDD) 15 are connected. A microphone 30 is connected to the input interface 22 via an A / D converter 28. Further, a keyboard 32 and a mouse 34 are also connected to the input interface 22. A speaker 38 is connected to the output interface 26 via a D / A converter 36. The hard disk 24 stores a program whose flowchart is shown in FIGS.
This program is installed on the hard disk 24 from a floppy disk (recording medium) by the FDD 15. Of course, it may be installed from a recording medium such as a CD-ROM. Memory 20
Is a sound / label data holding unit and is also used as a work area for executing a program.

図４および図５にしたがって、CPU18の処理動作を説
明する。まず、ステップS1において、マイク30により音
声信号（アナログ音声データ）が入力される。CPU18
は、音声信号が入力されると、A/D変換器28により変換
されたディジタルデータ（ディジタル音声データ）を取
り込む。さらに、CPU18は、この音声データの波形をCRT
16の音表示部80に表示する。この表示状態を、図１にい
示す。The processing operation of the CPU 18 will be described with reference to FIGS. First, in step S1, an audio signal (analog audio data) is input from the microphone 30. CPU18
When an audio signal is input, the digital data processor fetches digital data (digital audio data) converted by the A / D converter. Further, the CPU 18 converts the waveform of the audio data into a CRT
It is displayed on the 16 sound display sections 80. This display state is shown in FIG.

次に、このディジタル音声データを、音の区切りに基
づいて区分けする（ステップS2）。この区分けは、次の
ようにして行う。たとえば、「Hi my name is John Nic
e to meet you」という音声が入力されたとする。この
時得られたディジタル音声データが図12の上段のようで
あったとする。なお、図12の上段はディジタル音声デー
タの波形表示である。CPU18は、このディジタル音声デ
ータに基づき、その短時間区間平均パワーを算出する。
算出された短時間区間平均パワーを、図12の下段に示
す。Next, the digital audio data is divided based on sound delimiters (step S2). This division is performed as follows. For example, "Hi my name is John Nic
"e to meet you" is input. It is assumed that the digital audio data obtained at this time is as shown in the upper part of FIG. The upper part of FIG. 12 shows a waveform display of digital audio data. The CPU 18 calculates the short-time section average power based on the digital audio data.
The calculated short-time section average power is shown in the lower part of FIG.

次に、CPU18は、データレベルとスキップレベルの２
つのしきい値に基づいて、区分けを行う。区分け終了の
後、短時間区間平均パワーが、100mS以上連続してデー
タレベルを超えた場合には、区分けの始りとする。ま
た、区分け始まりの後、短時間区間平均パワーが、80mS
以上連続してスキップレベルを下回った場合には、区分
けの終了とする。このようにして、区分けを行う。な
お、この実施例では、データレベルを50dB、スキップレ
ベルを40dBとした。Next, the CPU 18 determines two levels of the data level and the skip level.
Classification is performed based on one threshold. After the end of the classification, if the short-time section average power continuously exceeds the data level for 100 ms or more, the classification is started. In addition, after the start of classification, the average power for the short-time section is 80 ms
If the skip level is continuously lower than the above, the division is terminated. In this way, classification is performed. In this embodiment, the data level is 50 dB and the skip level is 40 dB.

上記区分けに基づき、図12に示すように、220mS〜560
mSが第１区分、630mS〜1890mSが第２区分、2060mS〜239
0mSが第３区分であると決定できる。CPU18は、決定した
区分に基づき、CRT16の音表示部80の波形上に、区分位
置を示すライン84a、84b、84c、84dを表示する（図１参
照）。Based on the above classification, as shown in FIG.
mS is the first category, 630mS to 1890mS is the second category, 2060mS to 239
0 ms can be determined to be the third category. The CPU 18 displays lines 84a, 84b, 84c, and 84d indicating the division positions on the waveform of the sound display unit 80 of the CRT 16 based on the determined division (see FIG. 1).

また、CPU18は、この区分けしたディジタル音声デー
タを、メモリ20に記憶する（ステップS3）。メモリ20に
記憶された各音声データを、図７に示す。第１区分はア
ドレスADRS1以下に、第２区分はアドレスADRS2以下に、
第３区分はアドレスADRS3以下に、それぞれ記憶されて
いる。Further, the CPU 18 stores the divided digital audio data in the memory 20 (step S3). Each sound data stored in the memory 20 is shown in FIG. The first section is below address ADRS1, the second section is below address ADRS2,
The third section is stored at addresses ADRS3 and below, respectively.

次に、図１に示すCRT16のラベル表示部82に対し、キ
ーボード32から、上記の音声データに対応するラベルデ
ータを入力する（ステップS4）。この際、上記音声の区
切りと同じ位置に、区切り符号として句読点を付して入
力する。たとえば、上記の音声データに対してなら、
「Hi,my name is John.Nice to meet you.」と入力す
る。CPU18は、この入力を受けて、ラベルデータを句読
点にしたがって、「Hi」「my name is John」「Nice to
meet you」の３つに、区分する。Next, label data corresponding to the above voice data is input from the keyboard 32 to the label display section 82 of the CRT 16 shown in FIG. 1 (step S4). At this time, a punctuation mark is added as a delimiter code at the same position as the voice delimiter and input. For example, for the above audio data,
Enter "Hi, my name is John. Nice to meet you." The CPU 18 receives this input, and converts the label data into “Hi”, “my name is John”, “Nice to
meet you ".

この実施例では、図１に示すように、音質データに対
する区分位置を示すライン84a,84b,84c,84dを表示する
ようにしている。したがって、ラベルデータを入力する
際に、これに対応づけて区切り符号を入力することが容
易である。In this embodiment, as shown in FIG. 1, lines 84a, 84b, 84c, and 84d indicating division positions for sound quality data are displayed. Therefore, when inputting label data, it is easy to input a delimiter code in association with the label data.

CPU18は、区分したラベルデータを、順次、区分した
音声データに対応づけて記憶する（ステップS5）。すな
わち、図６に示すように、各ラベルデータとともに、対
応する音声データの先頭アドレスが記憶される。The CPU 18 stores the segmented label data in sequence in association with the segmented audio data (step S5). That is, as shown in FIG. 6, the head address of the corresponding audio data is stored together with each label data.

なお、音声データの区分数とラベルデータの区分数が
合致しない場合には、ラベルデータの区分数に基づいて
音声データの区分数を修正することが好ましい。すなわ
ち、音声データの区分けのしきい値（データレベルとス
キップレベル）を変更して、再度音声データの区分を行
い、区分数を合致させるようにすればよい。あるいは、
ラベルデータの文字数から推測して、音声データの区分
位置を新たに設定したり、削除したりして、区分数を合
致させてもよい。また、マウス30やキーボード32を用い
て、作業者が区分けを修正するようにしてもよい。When the number of sections of audio data does not match the number of sections of label data, it is preferable to correct the number of sections of audio data based on the number of sections of label data. That is, the threshold value (data level and skip level) of the audio data division may be changed, the audio data may be divided again, and the number of divisions may be matched. Or,
The number of segments may be matched by newly setting or deleting the segment position of the audio data by inferring from the number of characters in the label data. The worker may correct the division using the mouse 30 and the keyboard 32.

次に、CPU18は、入力されたラベルデータに基づくラ
ベルを、CRT16のラベル表示部82（図１参照）に表示す
る（ステップS6）。表示されたラベルを、図８に示す。
次に、作業者は、この表示されたラベルに対して、各音
特性変換の内容に対応して予め定められた視覚的修飾を
施す。視覚的修飾と音特性変換の対応の例を図９に示
す。これを対応テーブルとして記憶しておけば、この内
容を変えることにより、視覚的修飾と音特性変換の対応
関係を変更することができる。なお、図８の内容は、図
１に示すように、CRT16上にアイコンとして表示されて
いるので、ガイダンスとなって操作が容易である。Next, the CPU 18 displays a label based on the input label data on the label display unit 82 (see FIG. 1) of the CRT 16 (step S6). FIG. 8 shows the displayed label.
Next, the operator applies a predetermined visual modification to the displayed label in accordance with the content of each sound characteristic conversion. FIG. 9 shows an example of the correspondence between the visual modification and the sound characteristic conversion. If this is stored as a correspondence table, the correspondence between visual modification and sound characteristic conversion can be changed by changing this content. Note that the contents of FIG. 8 are displayed as icons on the CRT 16 as shown in FIG. 1, so that the operation is easy as guidance.

「my name is John」の部分のみ、パワーを上げたい
場合には、次のような操作を行う。まず、キーボード32
またはマウス34を用いて、図１のラべル表示部82の「my
name is John」の部分を選択する。次に、選択した「m
y name is John」の部分を、強調文字にするアイコン90
をマウス34によってクリックする。これにより、図10に
示すように、メモリ20には「my name is John」に対し
て修飾データ「＼強調」が付加される。なお、ここで、
「＼」は、次以降の文字列が、制御コード（修飾デー
タ）であることを示す符号である。If you want to increase the power only for "my name is John", perform the following operation. First, the keyboard 32
Alternatively, using the mouse 34, the label display section 82 shown in FIG.
Select the "name is John" part. Next, select "m
y name is John ”icon 90
Is clicked with the mouse. As a result, as shown in FIG. 10, the decoration data “＼ emphasis” is added to “my name is John” in the memory 20. Here,
"@" Is a code indicating that the following character string is a control code (modification data).

CPU18は、ステップS7において、この修飾データに基
づいて修飾されたラベルをCRT16のラベル表示部82に表
示する（図11参照）。図11から明らかなように、特性変
換の施される箇所、およびその内容を容易に確認するこ
とができる。In step S7, the CPU 18 displays the label modified based on the modification data on the label display section 82 of the CRT 16 (see FIG. 11). As is clear from FIG. 11, it is possible to easily confirm the location where the characteristic conversion is performed and the content thereof.

次に、CPU18は、図10に示すラベルデータの最初の区
分を読み出し、先頭アドレスADRS1に基づいて、対応す
る音声データを読み込む（ステップS8）。これにより、
図12に示す「Hi」の部分のディジタル音声データが、読
み出される。次に、当該ラベルデータに対し、修飾デー
タが付加されているか否かを判断する（ステップS9）。
ここでは、修飾データが付加されていないので、ステッ
プS11に進む。Next, the CPU 18 reads the first section of the label data shown in FIG. 10, and reads the corresponding audio data based on the start address ADRS1 (step S8). This allows
The digital audio data of “Hi” shown in FIG. 12 is read. Next, it is determined whether or not decoration data is added to the label data (step S9).
Here, since no decoration data is added, the process proceeds to step S11.

ステップS11において、CPU18は、全ての区分について
処理したか否かを判断する。まだであれば、次の区分に
ついて（ステップS12）、ステップS8以下を繰返して実
行する。次の区分「my nam e is John」に対しては、
修飾データ「＼強調」が付加されている。したがって、
ステップS9から、ステップS10に進む。In step S11, the CPU 18 determines whether or not all the sections have been processed. If not, the process from step S8 is repeated for the next section (step S12). For the next segment, "my nam e is John"
Modification data “＼ emphasize” is added. Therefore,
The process proceeds from step S9 to step S10.

ステップS10においては、「my name is John」のディ
ジタル音声データに対し、「＼強調」について予め定め
られた特性変換を実行する。ここでは、図９のテーブル
に従って、音声データに対しパワーの増加が施される。
パワーの増大は、ディジタル音声データによって示され
る波形の振幅を大きくすることにより、行っている。こ
のようにして特性変換された音声データは、再び、図７
のアドレスADRS2以下に記憶される（オリジナルの音声
データを保持するため、他のアドレスに記憶するように
してもよい）。In step S10, a predetermined characteristic conversion is performed on the digital audio data “my name is John” for “＼ emphasis”. Here, the power of the audio data is increased according to the table of FIG.
The power is increased by increasing the amplitude of the waveform represented by the digital audio data. The sound data whose characteristics have been converted in this manner is again shown in FIG.
(The original audio data may be stored at another address in order to retain the original audio data.)

全ての区分についての処理が終了すると、CPU18は、
音質変換を施したディジタル音声データを、出力インタ
ーフェイス26から出力する（ステップS13）。図15Aに特
性変換前の音声データを、図15Bに特性変換後の音声デ
ータを示す。「my name is John」の部分のパワーが大
きくなるよう変換されていることが分かる。このように
変換されたディジタル音声データがD/A変換器36によっ
てアナログ音声データに変換され、スピーカ378から特
性変換された音声として出力される。つまり、「my nam
e is John」の部分が大きくなって出力される。When the processing for all the sections is completed, the CPU 18
The digital voice data subjected to the sound quality conversion is output from the output interface 26 (step S13). FIG. 15A shows audio data before the characteristic conversion, and FIG. 15B shows audio data after the characteristic conversion. It can be seen that the power of "my name is John" has been converted to be larger. The digital audio data thus converted is converted into analog audio data by the D / A converter 36, and is output from the speaker 378 as audio whose characteristics have been converted. That is, "my nam
e is John "is enlarged and output.

上記のように、ラベルに対して視覚的修飾を施すだけ
で、音質の変換を行うことができ、操作が極めて容易で
ある。さらに、どの区分に対して、どのような音質変換
が施されているのかを容易に確認できる。As described above, the sound quality can be converted simply by applying visual modification to the label, and the operation is extremely easy. Further, it is possible to easily confirm what kind of sound quality conversion is performed for which section.

なお、同様にして、ピッチの上昇も行うことができ
る。この場合には、ピッチを上昇させたい部分を選択し
た後、アイコン92を選択すればよい（図１参照）。Note that the pitch can be raised in the same manner. In this case, after selecting the part whose pitch is to be increased, the icon 92 may be selected (see FIG. 1).

ピッチ上昇の処理手順を、図13に示す。CPU18は、ま
ず、対象となるディジタル音声データに対して、線形予
測分析（LPC）を行い、音声データを音源データと声道
伝達特性データとに分離する。次に、分離した音源デー
タに対して、ピッチの変更を施す。その後、声道伝達特
性データと再合成し、ピッチ上昇の施されたディジタル
音声データを得る。なお、線形予測分析に関しては、”
音声の線形予測”（J.D.Markel:A.H.Gray,Jr著、鈴木久
喜訳、コロナ社）が詳しい。図14に、ピッチ上昇前のデ
ィジタル音声データの一部分と、ピッチ上昇後のディジ
タル音声データの一部分を示す。FIG. 13 shows the processing procedure for pitch rise. First, the CPU 18 performs linear prediction analysis (LPC) on the target digital audio data, and separates the audio data into sound source data and vocal tract transfer characteristic data. Next, the pitch of the separated sound source data is changed. After that, the digital voice data is re-synthesized with the vocal tract transfer characteristic data to obtain digital voice data with an increased pitch. As for linear prediction analysis,
For details, see "Linear Prediction of Voice" (JDMarkel: AHGray, Jr., translated by Kuki Suzuki, Corona). FIG. 14 shows a part of the digital voice data before the pitch rise and a part of the digital voice data after the pitch rise.

その他の音特性の変換例を、図16A、図16B、図16Cに
示す。図16Aは変換前の音声データであり、図16Bは「my
name is john」に関し、音の時間長を変更した後の音
声データである。ラベルの大きさが時間長に対応するよ
うに処理されている。16A, 16B, and 16C show other conversion examples of sound characteristics. FIG. 16A shows audio data before conversion, and FIG. 16B shows “my
This is audio data after the time length of the sound has been changed for "name is john". The size of the label is processed so as to correspond to the time length.

図16Cは「my name is John」に関し、ビブラートを施
した後の音声データである。ラベルに対して下線が付さ
れている。下線の種類によって、ビブラートの種類を変
えるようにしてもよい。FIG. 16C shows audio data of “my name is John” after being subjected to vibrato. The label is underlined. The type of vibrato may be changed according to the type of underline.

全ての音特性変換について述べることは、説明上困難
であるため省略したが、本発明はその他の音特性変換一
般を対象とするものである。なお、周波数領域における
音特性変換を施す場合には、FFT等によって周波数スペ
クトラムを得て、処理を行えばよい。The description of all sound characteristic conversions is omitted because it is difficult to explain, but the present invention is directed to other sound characteristic conversions in general. When performing sound characteristic conversion in the frequency domain, a process may be performed by obtaining a frequency spectrum by FFT or the like.

音特性変換としては、上記のように主として音質を変
更するものの他、音の順序を変えたり、音の一部を削除
したり、繰り返したりする処理も含むものである。たと
えば、図18に示すように、音と対応づけられたラベルの
順序を入れ替えることにより、音の出力順序を変えるよ
うにしてもよい。この例では、「Hi my name is John N
ice to meet you」を「Hi Nice to meet you my name i
s John」に変更している。同じようにして、ラベルを削
除することにより音を削除したり、ラベルを複製するこ
とにより音を繰り返したりすることができる。The sound characteristic conversion includes a process of changing a sound order, a process of deleting a part of a sound, and a process of repeating, in addition to a process of mainly changing the sound quality as described above. For example, as shown in FIG. 18, the order of sound output may be changed by changing the order of labels associated with sounds. In this example, "Hi my name is John N
ice to meet you '' as `` Hi Nice to meet you my name i
s John ". Similarly, the sound can be deleted by deleting the label, or the sound can be repeated by duplicating the label.

また、上記実施例では、ラベルとして文字列を例に取
って説明したが、記号、符号等でもよく、アイコン等を
用いてもよい。また、その修飾の方法も、音声の男性化
の場合には図17Aの男性に対するマークを、女性化の場
合には図17Bの女性に対するマークを、対象とするアイ
コンに重ねあわせるようにして行ってもよい。Further, in the above-described embodiment, a description has been given using a character string as an example of a label. However, a symbol, a sign, or the like may be used, or an icon may be used. In addition, the method of modification is also performed by superimposing the mark for the male in FIG. 17A in the case of voice masculinization and the mark for the female in FIG. 17B in the case of feminization, over the target icon. Is also good.

あるいはまた、画面上に表示された顔写真に対応づけ
て音質変換の内容を定めておき、マウス等によって顔写
真を選択することにより、音質変換の内容を決定するよ
うにしてもよい。Alternatively, the content of the sound quality conversion may be determined in association with the face photograph displayed on the screen, and the content of the sound quality conversion may be determined by selecting the face photograph with a mouse or the like.

さらに、上記実施例では、音声について説明したが、
楽音や風の音、波の音等の自然音等の全ての音に対して
適用可能である。Further, in the above-described embodiment, the description has been given of the voice.
The invention can be applied to all sounds such as natural sounds such as musical sounds, wind sounds, and wave sounds.

また、上記実施例では音をマイク30から入力している
が、ラベルデータに基づいて、音を合成するようにして
もよい。この際には、ラベルデータに基づいて基本音を
合成し、当該ラベルデータに施された修飾内容に基づ
き、合成された基本音を特性変換して出力する。あるい
はまた、LPCパラメータ等によって音をデータとして記
述して与えるようにしてもよい。Further, in the above embodiment, the sound is input from the microphone 30, but the sound may be synthesized based on the label data. At this time, the basic sound is synthesized based on the label data, and the synthesized basic sound is subjected to characteristic conversion and output based on the modification content applied to the label data. Alternatively, the sound may be described and given as data using LPC parameters or the like.

上記の例では、英語の音声について説明したが、本発
明は言語を問わずに適用可能である。図19に、日本語の
「はい、わかりましたありがとうございました」とい
う音声入力に対する区分けの処理状態を示す。In the above example, English speech was described, but the present invention is applicable regardless of language. FIG. 19 shows the processing state of the classification for the Japanese voice input “Yes, thank you, thank you”.

なお、上記各実施例においては、区分ごとに修飾を施
して、音質変換をするようにしている。しかし、各区分
内のラベルデータの数に基づいて、音声データを音節ご
とに区分すれば、各音節ごとに音質変換を施すことが可
能となる。このようなさらに細かい区分けを、「たっす
る」という日本語の音声入力を例にとって説明する（も
ちろん、他の言語にも適用可能である）。In each of the above embodiments, the sound quality is converted by modifying each section. However, if audio data is divided into syllables based on the number of label data in each division, sound quality conversion can be performed for each syllable. Such finer divisions will be described with an example of a Japanese voice input of "tatsusuru" (of course, it is applicable to other languages).

まず、何人かの被験者にラベルの各要素を発声させた
場合の平均継続時間長を計測しておく。これを、表１に
示すテーブルとしてハードディスク24に予め記憶してお
く。First, the average duration time when several subjects utter each element of the label is measured. This is stored in the hard disk 24 in advance as a table shown in Table 1.

まず、入力された音声データ「たっする」の全時間長
Ｔを実測する。ここでは、たとえば、実測した全時間長
Ｔが802msであったとする。次に、音声データに対応付
けられたラベルデータ「たっする」の各要素を、表１に
したがってカテゴリに分る。つまり、「た」「っす」
「る」に分解し、それぞれ、カテゴリCV、DV、CVである
と判断する。CPU18は、表１に基づき、各要素の平均継
続時間長t1、t2、t3を合計する。ここでは、204.Oms＋3
81.Oms＋204.Oms＝789.Omsが得られる。さらに、この合
計時間長ｔと各要素の平均時間長t1、t2、t3に基づき、
各要素の時間長割合r1、r2、r3を算出する。たとえば、
要素「た」の時間長割合r1は、204.0/789.0である。同
様に、要素「っす」「る」の時間長割合r2、r3は、それ
ぞれ、381.0/789.0、204,0/789.0である。 First, the total time length T of the input voice data "Tatsuru" is actually measured. Here, for example, it is assumed that the actually measured total time length T is 802 ms. Next, each element of the label data “Tatsusuru” associated with the audio data is divided into categories according to Table 1. In other words, "ta""ssu"
It is decomposed into "R" and determined to be categories CV, DV and CV, respectively. The CPU 18 sums up the average durations t1, t2, and t3 of each element based on Table 1. Here, 204.Oms + 3
81.Oms + 204.Oms = 789.Oms is obtained. Further, based on the total time length t and the average time lengths t1, t2, and t3 of each element,
The time length ratios r1, r2, and r3 of each element are calculated. For example,
The time length ratio r1 of the element “ta” is 204.0 / 789.0. Similarly, the time length ratios r2 and r3 of the elements "ss" and "ru" are 381.0 / 789.0 and 204,0 / 789.0, respectively.

このようにして算出した各要素の時間長割合r1、r2、
r3に基づいて、実測した全時間長Ｔを各要素に配分す
る。たとえば、要素「た」に配分される実時間T1は、 T1＝Ｔ・r1 で算出される。要素「っす」「る」に配分される実時間
T2、T3も同様に、 T2＝Ｔ・r2 T3＝Ｔ・r3 として算出される。このようにして算出した実時間T1、
T2、T3に基づいて、図20に示すように音声データを区分
する。以上のような処理を行うことにより、より詳細に
（ラベルの要素ごとに）音声データを区分し、ラベルと
対応付けることができる（詳細対応の形成）。したがっ
て、ラベルの要素単位で、音の特性変換を施すことが可
能となる。たとえば、「る」のみに下線を付けて修飾す
ることにより、「る」のみにビブラートを施すことがで
きる。The time length ratios r1, r2,
Based on r3, the actually measured total time length T is allocated to each element. For example, the real time T1 allocated to the element “ta” is calculated by T1 = T · r1. Real time allocated to the elements "ss" and "ru"
Similarly, T2 and T3 are calculated as T2 = Tr2 T3 = Tr3. The real time T1 calculated in this way,
Based on T2 and T3, the audio data is divided as shown in FIG. By performing the above-described processing, the audio data can be classified in more detail (for each element of the label) and associated with the label (formation of detailed correspondence). Therefore, it is possible to perform sound characteristic conversion for each label element. For example, it is possible to apply vibrato to only “Ru” by modifying only “Ru” by underlining.

このようにして、簡単な方法で、各音節ごとに区分す
ることができる。なお、各音節を、音声認識手法を用い
てより正確に推定するようにしてもよい。In this way, it is possible to classify each syllable in a simple manner. Note that each syllable may be more accurately estimated using a speech recognition technique.

上記実施例では、表示制御手段10、表示手段14を設
け、ラベルデータに対する修飾を確認しながら行えるよ
うにしている。しかしながら、これら手段10、14を設け
なくとも、図10のように修飾データの構造が分っていれ
ば、修飾データを入力ることが可能である。この場合、
表示手段14によって修飾を確認することはできないが、
次のような効果を有する。In the above embodiment, the display control means 10 and the display means 14 are provided so that the modification can be performed while confirming the modification to the label data. However, even if these means 10 and 14 are not provided, it is possible to input the decoration data if the structure of the decoration data is known as shown in FIG. in this case,
Although the decoration cannot be confirmed by the display means 14,
It has the following effects.

音データに対して、修飾データを付することも可能で
はあるが、音データそのものでは各音節の区分が明瞭で
ないため、所定の音節範囲にわたって音質変換を施すこ
とが困難である。これに対し、ラベルデータは、文字間
の区分（各音節の区分に対応している）が明瞭であり、
所定の音節範囲にわたって音質変換を施すことが容易で
ある。すなわち、所望の範囲にわたる音節に対して、音
質変換を施すことが容易となる。Although it is possible to add modification data to sound data, it is difficult to perform sound quality conversion over a predetermined syllable range because the syllables are not clearly distinguished in the sound data itself. On the other hand, in the label data, the division between characters (corresponding to the division of each syllable) is clear,
It is easy to perform sound quality conversion over a predetermined syllable range. That is, it is easy to perform sound quality conversion on syllables in a desired range.

なお、上記実施例では、図２の各ブロックの機能を実
現するためにCPUを用いているが、その一部または全を
ハードウエアロジックによって実現してもよい。In the above embodiment, the CPU is used to realize the function of each block in FIG. 2, but a part or all of the CPU may be realized by hardware logic.

図21に、音声伝送システムの一実施例を示す。通信路
50を介して、送信側装置52と受信側装置60とが接続され
ている。なお、通信路50は、有線、無線を問わない。送
信側装置52は、キーボード等のデータ入力手段54と通信
手段56を備えている。また、受信側装置60は、標準音声
データ生成手段62、通信手段64、変換手段66、音声出力
手段68を備えている。FIG. 21 shows one embodiment of the voice transmission system. Communication channel
The transmitting device 52 and the receiving device 60 are connected via 50. The communication path 50 may be wired or wireless. The transmitting device 52 includes a data input unit 54 such as a keyboard and a communication unit 56. In addition, the receiving device 60 includes a standard audio data generation unit 62, a communication unit 64, a conversion unit 66, and an audio output unit 68.

以下、送信側装置52から受信側装置60へ、音声を伝送
する場合を例にとって説明する。まず、データ入力手段
54から、図22のようなラベルデータおよび修飾データを
入力する。「＼女性」「＼男性」の部分は修飾データで
あり、この後に続く｛｝内のラベルデータの音質変換の
内容を決定するものである。この実施例では、「＼女
性」な女性的な声に変換することを意味し、「＼男性」
は男性的な声に変換することを意味している。Hereinafter, a case where voice is transmitted from the transmitting device 52 to the receiving device 60 will be described as an example. First, the data input means
From 54, label data and decoration data as shown in FIG. 22 are input. The parts of “＼female” and “＼male” are the modification data, which determine the content of the sound quality conversion of the label data in the parentheses that follow. In this embodiment, it means to convert into a "＼female" female voice, and "、 male"
Means converting to a masculine voice.

次に、このデータは、通信手段56により、通信路50を
介して受信側装置60に伝送される。受信側装置60の通信
手段64はこれを受信し、これを一時保持する。標準音声
データ生成手段62は、保持されたデータを取得し、その
中からラベルデータのみを取り出す。ここでは、「おは
ようございます」「ごきげんいかがですか」が取り出さ
れる。標準音声データ生成手段62は、このラベルデータ
に基づいて、音声合成手法などにより、これに対応する
標準音声データを生成する。Next, this data is transmitted by the communication means 56 to the receiving device 60 via the communication path 50. The communication means 64 of the receiving device 60 receives this and temporarily holds it. The standard audio data generating means 62 acquires the stored data and extracts only the label data from the stored data. Here, "Good morning" and "How are you?" Based on the label data, the standard voice data generating means 62 generates standard voice data corresponding to the label data by a voice synthesis method or the like.

一方、変換手段66は、通信手段64に保持されたデータ
の中から修飾データのみを取り出す。ここでは、「＼女
性」「＼男性」が取り出される。変換手段66は、標準音
声データの対応する部分を、この修飾データに基づい
て、音質変換する。修飾データと音質変換の内容との関
係は、予め定められている。ここでは、「おはようござ
います」が女性化された音声データに変換され、「ごき
げんいがですか」が男性化された音声データに変換され
る。変換手段66は、このようにして得られた音質変換デ
ータを出力する。On the other hand, the conversion unit 66 extracts only the decoration data from the data held in the communication unit 64. Here, “＼women” and “＼men” are extracted. The conversion means 66 converts the sound quality of the corresponding part of the standard audio data based on the modification data. The relationship between the decoration data and the content of the sound quality conversion is predetermined. Here, "Good morning" is converted into feminized voice data, and "Good morning" is converted into feminized voice data. The conversion means 66 outputs the sound quality conversion data thus obtained.

音声出力手段68は、音質変換データをアナログ信号に
変換し、スピーカから出力する。The audio output means 68 converts the sound quality conversion data into an analog signal and outputs the analog signal from a speaker.

以下のようにして、送信側装置52から受信側装置60に
向けて、音声が伝送される。この実施例によれば、デー
タ量の少ないラベルデータおよび修飾データを送るだけ
で、音声を送ることができる。また、標準的な声だけで
なく、修飾データに基づいて、所望の音質の声を送るこ
とができる。Audio is transmitted from the transmitting device 52 to the receiving device 60 as follows. According to this embodiment, sound can be transmitted only by transmitting label data and modification data having a small data amount. In addition, not only a standard voice but also a voice having a desired sound quality can be transmitted based on the modification data.

従来の装置においては、データ量の多い音声データを
送っていたので、伝送速度が遅かったが、この実施例に
よれば、これを飛躍的に向上させることができる。In the conventional apparatus, audio data having a large data amount was transmitted, so that the transmission speed was low. However, according to this embodiment, this can be dramatically improved.

なお、修飾データが複雑である場合は、当該修飾デー
タに符号を付して受信側装置当60に記憶しておき、符号
のみを送るようにしてもよい。たとえば、図23に示すよ
うに、＼強調＼斜体＼25ポイントという修飾データを、
「Ａ」という符号で記憶しておけば、便利である。If the decoration data is complicated, a code may be added to the decoration data and stored in the receiving device 60, and only the code may be sent. For example, as shown in FIG. 23, the decoration data of {emphasis} italic ＼25 points,
It is convenient to memorize it with the symbol “A”.

また、送信側装置において、ラベルデータに対してど
のような修飾データが付されているのかを確認するた
め、図２の実施例のように修飾データによってラベルを
修飾し、表示するようにしてもよい。Further, in order to confirm what kind of modification data is attached to the label data in the transmitting apparatus, the label may be modified by the modification data and displayed as in the embodiment of FIG. Good.

なお、図２の実施例において述べた種々の変形、応
用、拡張は、本実施例にも適用することができる。例え
ば、この実施例では伝送の対象を音声としているが、そ
の他の音全般に適用することができる。Note that the various modifications, applications, and extensions described in the embodiment of FIG. 2 can be applied to the present embodiment. For example, in this embodiment, the transmission target is audio, but the present invention can be applied to all other sounds.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭60−245000（ＪＰ，Ａ) 特開昭61−6730（ＪＰ，Ａ) 特開昭61−93484（ＪＰ，Ａ) 特開昭61−100799（ＪＰ，Ａ) 特開平１−219635（ＪＰ，Ａ) 特開平５−143278（ＪＰ，Ａ) 特開平６−337876（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/00 - 13/06 G06F 3/16,15/20 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-60-245000 (JP, A) JP-A-61-1730 (JP, A) JP-A-61-93484 (JP, A) JP-A-61-93484 100799 (JP, A) JP-A-1-219635 (JP, A) JP-A-5-143278 (JP, A) JP-A-6-337876 (JP, A) (58) Fields investigated (Int. ⁷ , DB name) G10L 13/00-13/06 G06F 3 / 16,15 / 20

Claims

(57) [Claims]

1. A sound data segmented according to a predetermined segment and a label which is segmented by a delimiter code and indicates a transmission content of a sound indicated by the sound data as a character, a picture or a symbol for each segment of the sound data. Sound / label data holding means for holding the label data in the category in which the conversion of the sound characteristic is desired is given, when the decoration data is given, based on the decoration data,
Display control means for visually modifying a label based on the label data of the category and displaying the label on the display means, and giving sound data associated with the label data of the category in accordance with the label data of the category. A conversion means for performing a corresponding characteristic conversion based on the obtained modification data.

2. Sound data classification means for classifying input sound data based on sound divisions, and label data indicating, as characters or symbols, transmission contents of sounds indicated by the sound data. A label data dividing means for dividing the label data inputted with the corresponding delimiter code based on the delimiter code, a correspondence forming means for associating the divided sound data and the delimited label data with each other, and a label data. When the modification data is given to the sound data associated with the label data, display control means for visually modifying the label based on the label data and displaying the label on the display means based on the modification data. ,
A conversion unit for performing a corresponding characteristic conversion based on the modification data given in correspondence with the label data.

3. The sound characteristic conversion device according to claim 1, wherein the visual modification of the label is a character decoration on the label.

4. The sound characteristic conversion device according to claim 1, wherein the visual modification to the label is an order of the label.

5. The sound data is associated with a label indicating a transmission content of the sound indicated by the sound data as a character, a picture or a symbol, and the sound characteristic conversion content is associated with a modification process. The label represented by the data is visually modified based on the given modification processing and displayed, and for the sound data associated with the label data,
Performing a characteristic conversion corresponding to a modification process given to the label data.

6. The input sound data is classified based on a sound delimiter, and label data indicating, as a character or a symbol, the transmission content of the sound indicated by the sound data, the delimiter code corresponding to the sound delimiter. Is divided based on the delimiter code, and the divided sound data and the delimited label data are associated with each other. Based on the data, the label based on the label data is visually modified and displayed on the display means, and the sound data associated with the label data is
A sound characteristic conversion method characterized by performing a corresponding characteristic conversion based on decoration data given corresponding to label data.

7. A sound data segmented according to a predetermined segment and a label indicating the transmission content of the sound indicated by the sound data as a character or a symbol are associated with each segment of the sound data and segmented by a delimiter code. Sound to hold
A sound characteristic conversion device comprising: a label data holding unit; and a conversion unit that performs a corresponding characteristic conversion on sound data associated with the label data based on modification data given in correspondence with the label data.

8. For sound data classified according to a predetermined division, a label indicating a transmission content of a sound indicated by the sound data as a character or a symbol is associated with each division of the sound data, and separated by a delimiter code. A sound characteristic conversion method characterized in that sound data associated with label data is classified and the corresponding characteristic conversion is performed based on modification data given corresponding to the label data.

9. A system for transmitting sound data from a transmitting device to a receiving device, comprising a transmitting device and a receiving device communicable via a communication path, wherein the transmitting device comprises: Data input means for inputting label data divided into a plurality of divisions by a delimiter and modification data associated with the divisions; and communication means for transmitting the label data and the modification data to the receiving apparatus via a communication path. The communication means receives label data and modification data from the transmission-side device; the standard sound data generation means generates standard sound data based on the label data; and Conversion means for performing sound characteristic conversion on the standard sound data according to the classification of the label data associated with the decoration data, and generating sound characteristic conversion data. A sound transmission system comprising:

10. A method for transmitting sound data from a transmitting side to a receiving side via a communication path, wherein the transmitting side associates the label data divided into a plurality of sections by a delimiter code and the sections. Input the qualified data provided, and transmit the label data and the qualified data to the receiving side via the communication channel.The receiving side receives the label data and the qualified data from the transmitting device, and performs standardization based on the label data. Generating sound data, converting sound characteristics of the standard sound data based on the classification of the label data associated with the decoration data, based on the decoration data, and generating sound characteristic conversion data. Sound transmission method.

11. A sound data input means for inputting sound data, wherein the sound volume represented by the sound data is provided with two threshold values relating to the sound volume, and the sound volume continues for a predetermined time or more. The sound data segmentation is performed such that the segmentation starts when the threshold value exceeds one threshold value and ends when the loudness falls below another threshold value continuously for a predetermined time or more. Label data input means for associating a label indicating a transmission content of the sound indicated by the sound data as a character or a symbol with the sound data, and inputting label data with a delimiter sign at a position corresponding to the division of the sound data, A sound and / or sound generator comprising: label data classifying means for classifying label data on the basis of a delimiter code; and correspondence forming means for associating the classified sound data and the classified label data with each other. Bell associating device.

12. A sound data input means for inputting sound data, a label data input means for inputting as a label data a label indicating a transmission content of a sound indicated by the sound data as a character or a symbol, and inputting the label as label data. Detailed correspondence forming means for classifying sound data in association with each label based on the average duration of the label obtained based on the average duration of each syllable and the actual duration of the input sound data A sound / label associating device comprising:

13. The sound / label associating apparatus according to claim 11, wherein the label data and the sound data associated by the association forming means are associated with the label data obtained based on the average duration of each syllable of the label data. Detailed correspondence forming means for classifying sound data in association with each label based on the average duration and the duration of the sound data.

14. A sound display unit for visually displaying the nature of the sound represented by the sound data, and a label display unit for displaying a label represented by the label data. Two loudness thresholds are set for the loudness of the sound, and when the loudness of the sound exceeds one threshold continuously for a predetermined time or more, the segmentation is started. A sound / label correspondence, wherein a delimiter mark indicating a sound delimiter at the time of division is displayed so that the division is terminated when continuously falling below another threshold for a predetermined time or more. Mounting device.

15. When the sound volume represented by the sound data is provided with two threshold values relating to the sound volume, and the sound volume continuously exceeds one threshold value for a predetermined time or more. The sound data is divided so that the sound data is divided at the beginning of the sound data, and when the sound volume continuously falls below another threshold for a predetermined time or more, the sound data is divided. Delimiter is attached to the position, label data indicating the transmission content of the sound indicated by the sound data as characters or symbols, label data is divided based on the delimiter, the classified sound data and the classified label A sound / label associating method characterized by associating data with each other.

16. A method for associating sound data with label label data indicating, as characters or symbols, the transmission contents of the sound indicated by the sound data, wherein an average duration of each syllable of the label is prepared in advance. In advance, the average duration of the label is determined based on the average duration of each syllable of the label, and the sound data is calculated based on the determined average duration of each label and the actual duration of the sound data. A sound / label associating method, wherein the sound and the label are classified according to the label.

17. The sound / label associating method according to claim 15, wherein an average duration of each syllable of the label is prepared in advance, and the average duration of each syllable is related to the associated label data and sound data. A sound / label associating method, wherein sound data is classified in association with each label based on an average duration time of label data and a duration time of sound data obtained based on time.

18. A sound display unit for visually displaying a property of a sound represented by sound data, and a label display unit for displaying a label represented by label data. Sound loudness 2
The threshold is set, and when the loudness of the sound exceeds a threshold for more than a predetermined time continuously, the start of the segmentation is made. A display method for associating a sound with a label, wherein a break mark indicating a break of a sound at the time of division is displayed so that the division is terminated when the sound level falls below the threshold.

19. A storage medium in which a program for realizing a device or method according to claim 1 using a computer is described.