JPH07210184A

JPH07210184A - Voice editor/synthesizer

Info

Publication number: JPH07210184A
Application number: JP6005586A
Authority: JP
Inventors: Hiroko Yoshida; 田博子吉
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-01-24
Filing date: 1994-01-24
Publication date: 1995-08-11

Abstract

PURPOSE:To make it possible to synthesize a natural and smooth voice from unrecorded synthetic units when each of the units has a certain degree of recording quantity and to synthesize a natural voice generating no discontinuousness even if the synthetic unit is small. CONSTITUTION:A recording voice retrieving part 2, a connection unit determining part 3 and a sound signal connecting part 5 are prepared, and when a sentence of contents expressed a voice to be synthetized is inputted from a synthetic sentence signal input part 1, the retrieving part 2 retrieves the contents of recorded voices and selects several recording units including the voice concerned. The determining part 3 selects voices to be connected so as to quite connection distortion out of those voices, the connecting part 5 connects the selected voices and a synthetic signal output part 6 synthesizes a required voice.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、駅の案内放送や電気製
品の操作説明等に用いる、デジタル録音した音声を編集
により合成する録音編集合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a recording / editing / synthesizing device for synthesizing digitally recorded voices by editing, which is used for station guide broadcasting and operation explanation of electric appliances.

【０００２】[0002]

【従来の技術】従来、この種の音声編集合成装置は、あ
らかじめ人が発声した音声を、単語や文節、文等を単位
として録音しておき、必要に応じて読み出して編集し、
文章等の音声に合成して出力している。2. Description of the Related Art Conventionally, this type of voice editing / synthesizing apparatus records a voice uttered by a person in advance in units of words, phrases, sentences, etc., and reads and edits the voice as necessary.
Synthesized into speech such as sentences and output.

【０００３】すなわち、以下に示す文章１の例の様に
「まもなく３番線に急行東京行きがまいります」という
音声を出力するには「まもなく→Ａ３→Ｂ２→Ｃ１→ま
いります」というように音声を選択して順に出力するこ
とによって、所望の文章を合成することができる。That is, as in the example of the sentence 1 below, in order to output a voice saying "I will soon go to Line 3 for Tokyo", I will say "Soon → A3 → B2 → C1 → I will come" A desired sentence can be synthesized by selecting and outputting in sequence.

【０００４】文章１：まもなく ○番線に ○ ○○○行きがまいります。Ａ：番線Ｂ：電車種別Ｃ：行き先 A1: １番線に B1:各駅停車 C1: 東京行きが A2: ２番線に B2:急行 C2: 横浜行きが A3: ３番線に B3:快速 C3: 品川行きが B4:通勤快速 C4: 川崎行きが[0004] sentence 1: Mairi is soon to ○ Line ○ ○○○ go. A: Line B: Train type C: Destination A1: Line 1 B1: Stop at each station C1: Tokyo bound A2: Line 2 B2: Express C2: Yokohama bound A3: Line 3 B3: Rapid C3: Shinagawa bound B4: Rapid commuting C4: To Kawasaki

【０００５】このように、上記従来の方法でも、合成単
位を組み合わせて出力することにより、所望の音声を合
成することができる。As described above, even in the above-mentioned conventional method, a desired voice can be synthesized by combining and outputting the synthesis units.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記従
来の音声編集合成装置では、合成単位が文章や文節であ
り、その単位ごとに音声を出力しているため録音量が多
く、ある程度決まった文章でなければ合成できない。ま
た、合成単位として録音されていない音声は出力するこ
とができないという問題があった。すなわち、以下の文
章の例の様に文章のパターンが替わったり、電車の行き
先が変わった場合、文章１の「東京行きが」や、「横浜
行きが」という音声はあっても、「東京行きの」や、
「横浜方面」の音声がないため、それらを新たに録音し
なければならない。However, in the above-described conventional voice editing / synthesizing apparatus, the synthesis unit is a sentence or a phrase, and since the voice is output for each unit, the recording amount is large and the sentence is fixed to a certain degree. Unless it can be synthesized. In addition, there is a problem that a voice that is not recorded as a synthesis unit cannot be output. That is, if the pattern of the sentence is changed or the destination of the train is changed as in the example of the following sentence, even if there are voices such as "To Tokyo" or "To Yokohama" in sentence 1, "To Tokyo" No '
Since there is no voice for "Yokohama", we have to record them newly.

【０００７】文章２：まもなく ○番線に ○○方面 ○○○行きの電車が到着します。Ａ：番線Ｄ：方面Ｅ：行き先 A1: １番線に B1:小田原方面 C1: 東京行きの A2: ２番線に B2:横浜方面 C2: 静岡行きの A3: ３番線に B3:品川方面 C3: 名古屋行きの B4:川崎方面 C4: 青森行きのSentence 2: Soon, a train bound for ○○ will arrive on the ○○ line . A: Line D: Direction E: Destination A1: Line 1 B1: Odawara C1: Tokyo A2: Line 2 B2: Yokohama C2: Shizuoka A3: Line 3 B3: Shinagawa C3: Nagoya B4: To Kawasaki C4: To Aomori

【０００８】このような新たな録音を避けるため、合成
単位を「東京」、「横浜」、「行きの」や、「行きが」
等、文章や文節にせず、単語、接続語、助詞、音節、と
いうように小さくすると、録音量を増やさずに合成でき
る語彙や文章を増加することができるが、「横浜方面」
を再生する際には、「横浜」と「方面」という合成単位
を順に出力するだけであるため、「横浜、方面」という
様な不連続な音声になってしまい、音声品質が低下して
しまうという問題があった。In order to avoid such a new recording, the composition units are "Tokyo", "Yokohama", "Gono" and "Goga".
For example, if you reduce the number of words, connectives, particles, syllables, etc. instead of sentences or phrases, you can increase the vocabulary and sentences that can be synthesized without increasing the recording amount.
When playing back, only the synthetic units of "Yokohama" and "direction" are output in order, resulting in discontinuous voice such as "Yokohama, direction", which deteriorates the voice quality. There was a problem.

【０００９】本発明は、上記従来の問題を解決するもの
で、録音されていない合成単位でも、ある程度の音声量
があれば、それらの音声の中から自然で滑らかな音声を
合成したり、合成単位を小さくしても不連続が生じな
い、自然な音声を合成できる音声編集合成装置を提供す
ることを目的とする。The present invention solves the above-mentioned conventional problems, and even if a synthesis unit that is not recorded has a certain amount of voice, it synthesizes a natural and smooth voice from those voices, or synthesizes them. An object of the present invention is to provide a voice editing / synthesizing device capable of synthesizing a natural voice in which discontinuity does not occur even if the unit is reduced.

【００１０】[0010]

【課題を解決するための手段】本発明は、上記目的を達
成するために、音声の接続を音声が継続している途中の
無声破裂音の無音区間、無声摩擦音の摩擦区間、同じ有
声音でスペクトルが似通っている区間で接続する手段を
備えたものである。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention uses a silent segment of an unvoiced plosive sound, a frictional segment of an unvoiced fricative sound, and the same voiced sound in the middle of continuous voice connection. It is provided with a means for connecting in sections where the spectra are similar.

【００１１】本発明はまた、音声を接続する場合、接続
される音声信号は、接続箇所からある一定区間パワーを
直線的に減衰させ、接続する音声信号は、同じ区間パワ
ーを直線的に増加させていき、その区間の信号を足し合
わせるようにしたものである。According to the present invention, when a voice is connected, the connected voice signal linearly attenuates a certain section power from the connection point, and the connected voice signal linearly increases the same section power. Then, the signals of that section are added together.

【００１２】本発明はまた、音声を接続する場合、接続
する箇所が音声のピッチが存在する箇所である場合、ピ
ッチの位相の差による音声の接続歪を避けるために、接
続箇所の２つの音声信号の相関をとり、相関の高い部分
を接続箇所に設定して、音声の位相による接続歪を軽減
するようにしたものである。In the present invention, when connecting voices, when the connecting place is a place where the pitch of the voices exists, in order to avoid the connection distortion of the voices due to the phase difference of the pitches, two voices of the connecting place are connected. The signal is correlated, and a portion having a high correlation is set as a connection point to reduce connection distortion due to the phase of voice.

【００１３】[0013]

【作用】本発明は、上記のような構成により次の様な作
用を有する。すなわち、音声が継続している途中の無声
破裂音の無音区間、無声摩擦音の摩擦区間、同じ有声音
でスペクトルが似通っている箇所等、接続による歪が生
じない箇所で接続することによって、録音量を増やさな
くても、不連続が生じない自然な音声を合成することが
できる。The present invention has the following actions due to the above-mentioned structure. That is, the amount of recording can be increased by connecting at a place where no distortion occurs due to the connection, such as a silent section of an unvoiced plosive while the voice is continuing, a friction section of an unvoiced fricative, or a section where the spectrum is similar for the same voiced sound. It is possible to synthesize a natural voice without discontinuity without increasing the.

【００１４】また、音声のつながりを良くするために、
接続される側の音声に、ある一定の区間内で１から０に
直線的に向かう重み付けを施し、接続する側の音声に
は、接続箇所からある一定の区間内でに０から１に直線
的に向かう重み付けを施し、その区間で２つの音声を足
し合わせることによって、接続箇所での音声のつながり
が良くなり、滑らかな音声を再生することができる。In order to improve the voice connection,
The connected voice is weighted linearly from 1 to 0 in a certain section, and the connected voice is linearly changed from 0 to 1 in a certain section from the connection point. By weighting toward and adding two voices in the section, the voice connection at the connection point is improved, and a smooth voice can be reproduced.

【００１５】さらに、ピッチが存在する箇所で接続する
場合、接続箇所近傍での２つの音声の相関を取り、一番
相関の大きな箇所を接続箇所に設定することによって、
位相による接続歪が少ない、自然な音声を再生できる接
続箇所を決定することができる。Further, when connecting at a place where a pitch exists, by correlating two voices in the vicinity of the connecting place and setting a place having the largest correlation as a connecting place,
It is possible to determine a connection point that can reproduce natural sound with little phase connection distortion.

【００１６】[0016]

【実施例】以下、図面を参照しながら、本発明の一実施
例について説明する。図１は本発明の音声編集合成装置
の一実施例の構成を示すブロック図である。図１におい
て、１は合成文章信号入力部、２は録音音声検索部、３
は接続単位決定部、４は音声信号読み込み部、５は音声
信号接続部、６は合成信号出力部、７は録音音声であ
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of a voice editing / synthesizing apparatus of the present invention. In FIG. 1, 1 is a synthetic text signal input unit, 2 is a recorded voice search unit, 3
Is a connection unit determining unit, 4 is a voice signal reading unit, 5 is a voice signal connecting unit, 6 is a synthesized signal output unit, and 7 is recorded voice.

【００１７】音声は一般に音韻の種類によって、ある程
度の特徴を持っている。つまり、ｐ，ｔ，ｋ等、無声破
裂音や無声破擦音のｔ∫は、音韻の始まりに無音の閉鎖
音があり、無声摩擦音ｓ，ｆ，ｈや無性破擦音のｔｓ
（ツ）は持続的な、鋭い摩擦的な音で始まる。また、音
声の母音区間はスペクトル変化がある程度一定である。
このような箇所では音声の性質が一定であるため、音声
の途中で接続しても、接続による歪はめだたない。その
ため、図１の合成文章信号入力部１で合成したい音声の
内容の文章を入力したら、録音音声検索部２で録音され
ている音声の内容を検索して、その音声が含まれている
合成単位をいくつか選択する。そして、接続単位決定部
３では、それらの音声の中から一番接続歪がめだたない
で接続できる音声を選び出し、音声信号読み込み部４で
選び出された音声に相当する信号を録音音声７の中から
読み込み、音声信号接続部５で音声を接続して、合成信
号出力部６で接続された音声信号を出力する。Speech generally has some features depending on the type of phoneme. That is, t∫ of unvoiced plosives or unvoiced affrications such as p, t, k has a silent closing sound at the beginning of the phoneme, and unvoiced fricatives s, f, h and ts of asexual affricate.
(Tsu) begins with a persistent, sharp, frictional sound. In the vowel section of speech, the spectrum change is constant to some extent.
Since the nature of the voice is constant in such a part, even if the voice is connected in the middle of the voice, the distortion due to the connection is not significant. Therefore, when a sentence having a voice content to be synthesized is input in the synthetic sentence signal input unit 1 of FIG. 1, the recorded voice search unit 2 searches the content of the recorded voice, and a synthesis unit including the voice. Select some. Then, the connection unit determination unit 3 selects a voice that can be connected with the least connection distortion from those voices, and outputs a signal corresponding to the voice selected by the voice signal reading unit 4 in the recorded voice 7. The voice signal is connected to the voice signal connection unit 5, and the synthesized signal output unit 6 outputs the connected voice signal.

【００１８】図２および図３に示す例によると、「下田
方面」という文節を合成する場合、まず録音音声検索部
２で、「下田方面」の「下田」を含む「下田行きが」が
選択され、また「下田方面」の「方面」を含む「東京方
面」、「国府津方面」、「平塚方面」、「熱海方面」が
選択される。次に、接続単位決定部３で、選択された音
声の中から、母音ａで接続すれば所望の音声が合成され
る「下田行きが」と「平塚方面」が決定され、音声信号
接続部５でこれらの音声の母音ａで接続して「下田方
面」を合成し、合成信号出力部６で合成された音声を出
力する。According to the examples shown in FIGS. 2 and 3, when synthesizing the phrase "Shimoda area", first, the recording voice search unit 2 selects "Shimoda bound" including "Shimoda" of "Shimoda area". In addition, “Tokyo direction” including “Shimoda direction”, “Koufu direction”, “Hiratsuka direction”, and “Atami direction” are selected. Next, the connection unit determination unit 3 determines “Shimoda bound” and “Hiratsuka direction”, in which a desired voice is synthesized by connecting with the vowel a, from the selected voices, and the voice signal connection unit 5 Then, the vowels a of these voices are connected to synthesize "Shimoda area", and the synthesized voice is output by the synthetic signal output unit 6.

【００１９】また、音声信号接続部５で音声を接続する
場合、図４に示すように、２つの音声のつながりを良く
するため、接続する箇所でのピッチ周期に相当する区間
等、ある一定の区間の音声データに接続される側の音声
Ａに１から０に向かう重み付けを施し、接続する側の音
声Ｂには０から１に向かう重み付けを施し、重み付けを
行なった区間の音声を相互に加算することによって音声
の連続性を良くし、接続による歪を軽減することができ
る。Further, in the case of connecting voices by the voice signal connection unit 5, as shown in FIG. 4, in order to improve the connection between the two voices, a certain interval such as a section corresponding to the pitch cycle at the connecting point is provided. The voice A on the side connected to the voice data of the section is weighted from 1 to 0, the voice B on the connecting side is weighted from 0 to 1, and the voices of the weighted sections are added to each other. By doing so, it is possible to improve the continuity of voice and reduce distortion due to connection.

【００２０】また、ピッチが存在する箇所で接続する場
合、音声のピッチの位相のずれによって歪が生じてしま
う場合がある。そこで、接続する部分がきまったら、図
５に示すように、以下の式から２つの音声の相関を求
め、一番相関が高かった箇所の接続点を移動して、位相
のずれ（ｓｈｉｆｔで表される量）を補正することによ
って音声の位相を合わせ、音声の位相の相違による接続
歪を回避することができる。Further, when the connection is made at a place where a pitch exists, distortion may occur due to the phase shift of the pitch of the voice. Therefore, when the connected portion is determined, as shown in FIG. 5, the correlation between the two voices is obtained from the following equation, and the connection point at the location having the highest correlation is moved to obtain the phase shift (shift expression). It is possible to match the phase of the voice by correcting the amount) and avoid the connection distortion due to the difference in the phase of the voice.

【００２１】[0021]

【数１】 [Equation 1]

【００２２】以上のように、本実施例によれば、音声の
途中であっても接続しやすい箇所で音声を接続すること
によって、録音されていない音声でも、ある程度録音さ
れた音声の中から自然で滑らかな音声を合成することが
できる。As described above, according to this embodiment, even if the voice is not recorded, the voice is naturally recorded from the recorded voice to some extent by connecting the voice at a place where the voice can be easily connected even in the middle of the voice. You can synthesize smooth voices with.

【００２３】また、録音時に先行母音を含む音声、無声
破裂音の無音部で区切った音声、または無声摩擦音の摩
擦部分で区切った音声など、組み合わせる際に、つなぎ
やすい部分で始まる音声を単位として録音しておき、再
生時にそれらの部分で接続することによって、単語、接
続語、助詞単位の編集合成並の録音量で、文章、文節単
位の編集合成並の品質の音声を合成することができる。Further, when recording, a voice including a preceding vowel, a voice delimited by a silent part of an unvoiced plosive, or a voice delimited by a frictional part of an unvoiced fricative is recorded as a unit of a voice starting at a part which is easily connected. In addition, by connecting these parts at the time of reproduction, it is possible to synthesize a voice having a quality equivalent to that of editing / synthesis in sentences or clauses with a recording amount equivalent to that of editing / synthesis in units of words, connecting words, and particles.

【００２４】[0024]

【発明の効果】本発明は、上記実施例から明らかなよう
に、音声が継続している途中の無声破裂音の無音区間、
無声摩擦音の摩擦区間、同じ有声音でスペクトルが似通
っている箇所等、接続による歪が生じない箇所で接続す
ることによって、録音量を増やさなくても、不連続が生
じない自然な音声を合成することができる。As is apparent from the above embodiment, the present invention provides a silent section of a voiceless plosive in the middle of continuous voice.
By connecting at a place where distortion due to the connection does not occur, such as a friction section of unvoiced fricative, a place where the spectrum is similar for the same voiced sound, natural speech that does not cause discontinuity without increasing the recording amount is synthesized. be able to.

【００２５】また、音声のつながりを良くするために、
接続される側の音声に、ある一定の区間内で１から０に
直線的に向かう重み付けを施し、接続する側の音声に
は、接続箇所からある一定の区間内でに０から１に直線
的に向かう重み付けを施し、その区間で２つの音声を足
し合わせることによって、接続箇所での音声のつながり
が良くなり、滑らかな音声を再生することができる。In order to improve the connection of voice,
The connected voice is weighted linearly from 1 to 0 in a certain section, and the connected voice is linearly changed from 0 to 1 in a certain section from the connection point. By weighting toward and adding two voices in the section, the voice connection at the connection point is improved, and a smooth voice can be reproduced.

【００２６】さらに、ピッチが存在する箇所で接続する
場合、接続箇所近傍での２つの音声の相関を取り、一番
相関の大きな箇所を接続箇所に設定することによって、
位相による接続歪が少ない、自然な音声を再生できる接
続箇所を決定することができる。Further, when connecting at a place where a pitch exists, by correlating two voices in the vicinity of the connecting place and setting a place having the largest correlation as a connecting place,
It is possible to determine a connection point that can reproduce natural sound with little phase connection distortion.

[Brief description of drawings]

【図１】本発明の一実施例における音声編集装置の構成
を示す概略ブロック図。FIG. 1 is a schematic block diagram showing the configuration of a voice editing device according to an embodiment of the present invention.

【図２】同装置の動作を説明するための模式図。FIG. 2 is a schematic diagram for explaining the operation of the device.

【図３】同装置の動作を説明するための波形図。FIG. 3 is a waveform diagram for explaining the operation of the device.

【図４】同装置の動作を説明するための波形図。FIG. 4 is a waveform diagram for explaining the operation of the device.

【図５】同装置の動作を説明するための波形図。FIG. 5 is a waveform diagram for explaining the operation of the device.

[Explanation of symbols]

１合成文章信号入力部２録音音声検索部３接続単位決定部４音声信号読み込み部５音声信号接続部６合成信号出力部７録音音声 1 Synthetic text signal input unit 2 Recorded voice search unit 3 Connection unit determination unit 4 Voice signal reading unit 5 Voice signal connection unit 6 Synthetic signal output unit 7 Recorded voice

Claims

[Claims]

1. A voice editing / synthesizing device for outputting voices recorded in units of sentences or phrases in combination, in which voices are connected by a silent segment of a silent plosive and a frictional segment of an unvoiced fricative in the middle of continuous voice. And a voice editing / synthesizing device having means for connecting in sections having similar spectra.

2. When connecting voice, the connected voice signal linearly attenuates power in a certain section from the connection point, and the connected voice signal linearly increases power in the same section, 2. The voice editing / synthesizing apparatus according to claim 1, wherein signals of the section are added together.

3. When connecting voices, when the connecting place is a place where the pitch of the voice exists, in order to avoid the connection distortion of the voice due to the phase difference of the pitch, the correlation of two voice signals at the connecting place. The voice editing / synthesizing apparatus according to claim 1, wherein a portion having a high correlation is set as a connection point to reduce connection distortion due to a voice phase.