JPH07199991A

JPH07199991A - Data generation device for speech synthesis

Info

Publication number: JPH07199991A
Application number: JP6000541A
Authority: JP
Inventors: Masami Terajima; 正己寺嶋
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1994-01-07
Filing date: 1994-01-07
Publication date: 1995-08-04
Anticipated expiration: 2018-05-26
Also published as: JP3409292B2

Abstract

PURPOSE:To automatically generate synthesized speech data for guidance broadcasting on a bus. CONSTITUTION:A tape on which broadcast contents are recorded is reproduced and its playback signal is converted into a digital signal, which is separated into basic speech patterns Bp by words and phrases by utilizing pause periods of the speech. It is checked whether or not the patterns Bp are registered in a data base 16, and when not, they are registered in the data base 16 together with given identification codes Ppi. A sequence of codes Bp corresponding to the basic speech pattern sequence of the input speech is generated and stored in a reproduction table 21 by input speeches.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、例えば乗合自動車な
どの車内放送に用いられる音声合成放送装置における音
声合成用データを作成する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for creating voice synthesis data in a voice synthesis broadcasting device used for in-vehicle broadcasting of a passenger car or the like.

【０００２】[0002]

【従来の技術】乗合自動車の車内案内放送において、予
め登録した単語や文節などの基本音声パターンを適宜読
み出し、組み合わせて所望の放送音声を再生する装置は
特許第１５０４６９４号で提案されている。しかし、こ
の装置に用いられる基本音声パターンの生成方法、生成
装置は提案されていない。従来においては一連の放送音
声から、手作業で基本音声パターンを分離抽出し、その
基本音声パターンが変動パターンか共通パターンかをそ
の都度判別している。例えば放送音声が「次は東京駅で
す」の場合、「次は」、「駅です」は共通パターンと、
「東京」は変動パターンとそれぞれ抽出判別される。こ
のようにして得られた基本音声パターンは重複しないよ
うに記憶し、その記憶したものを用いて編集して、放送
音声を合成している。2. Description of the Related Art Japanese Patent No. 1504694 proposes a device for appropriately reading out a basic voice pattern such as a word or a phrase registered in advance in a car guidance broadcast of a shared vehicle and combining it to reproduce a desired broadcast voice. However, a method and a device for generating a basic voice pattern used in this device have not been proposed. Conventionally, a basic voice pattern is manually extracted and extracted from a series of broadcast voices, and it is discriminated each time whether the basic voice pattern is a variation pattern or a common pattern. For example, if the broadcast audio is "Next is Tokyo Station", "Next" and "Station" are common patterns.
“Tokyo” is extracted and discriminated as a fluctuation pattern. The basic voice pattern thus obtained is stored so as not to be duplicated, and the stored voice is edited to synthesize the broadcast voice.

【０００３】[0003]

【発明が解決しようとする課題】従来の手作業による音
声合成データの作成は、案内放送の量が少ない場合はそ
れ程問題ないが、乗合自動車の車内放送について見る
と、路線数、停留場数が多い場合は、放送音声の量が長
大となり、このようなものについて手作業で基本音声パ
ターンの分離抽出、さらに重複なく記憶することは非現
実的なものとなる。The conventional manual creation of voice-synthesized data is not so problematic when the amount of guide broadcasting is small, but when looking at the in-vehicle broadcasting of a shared vehicle, the number of routes and the number of stops are large. In this case, the amount of broadcast audio becomes long, and it becomes unrealistic to manually separate and extract the basic audio patterns and store them without duplication.

【０００４】[0004]

【課題を解決するための手段】この発明によれば連続音
声が入力手段で入力され、その入力された音声からパタ
ーン抽出手段で、単語や文節レベルの基本音声パターン
が分離抽出される。その抽出された基本音声パターンは
判定手段により最初に出現したものが判定され、その判
定が最初の出現の場合は登録手段により、その基本音声
パターンに識別符号を付けて両者が登録される。また変
換手段により入力音声ごとに、これに対する抽出された
基本音声パターンと、登録された識別符号とを用いて、
基本音声パターンの識別符号列が生成される。According to the present invention, continuous speech is input by the input means, and the basic speech pattern at the word or phrase level is separated and extracted from the input speech by the pattern extraction means. The extracted basic voice pattern is determined by the determination means to be the first appearance, and when the determination is the first appearance, the registration means adds the identification code to the basic voice pattern and registers both. Also, for each input voice by the conversion means, using the extracted basic voice pattern and the registered identification code,
An identification code string of the basic voice pattern is generated.

【０００５】[0005]

【実施例】この発明の実施例を、乗合自動車の車内案内
のための音声合成用データの作成に適用した場合につき
説明する。放送されるべき連続音声を入力手段により電
気信号として入力される。例えば放送音声を予め磁気テ
ープに録音しておき、その磁気テープ１１をテープレコ
ーダ１２で再生し、その再生出力をＡＤ変換機１３でデ
ジタル信号に変換して、磁気ディスクのようなメモリ１
４に一時的に格納する。例えば図２に示す乗合自動車の
路線に対する案内放送の原稿は、その一部を示せば図３
のようなものである。この例は路線番号が９８５１に対
するもので、この原稿をアナウンサが読み上げ、その音
声を磁気テープに録音する。なお、この原稿には「路線
番号」、「路線名」、「業務放送：車内」、「車外」そ
の他、文の初めの停留所名（その停留所の所で放送する
ことを示す）など読み上げない文字も含まれている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A case where the embodiment of the present invention is applied to the creation of voice synthesis data for in-car guidance of a shared vehicle will be described. The continuous sound to be broadcast is input as an electric signal by the input means. For example, broadcast audio is recorded on a magnetic tape in advance, the magnetic tape 11 is reproduced by a tape recorder 12, the reproduced output is converted into a digital signal by an AD converter 13, and the memory 1 such as a magnetic disk is used.
Stored temporarily in 4. For example, the manuscript of the guide broadcasting for the route of the shared vehicle shown in FIG. 2 is shown in FIG.
Is like. In this example, the route number is 9851, and the announcer reads this manuscript and records the voice on a magnetic tape. In this manuscript, characters such as "route number", "route name", "business broadcast: inside the car", "outside the car", and other names such as the stop name at the beginning of the sentence (indicating that it will be broadcast at the stop) are not read. Is also included.

【０００６】原稿読み上げの際、単語あるいは文節ご
と、文と文の間、そして一連の放送ブロックとブロック
との間、さらには次の路線との間には、各々に応じて長
さの異なる休止期間（無音声期間）を挿入する。例え
ば、文中の句読点では各々休止期間として読点に１を、
句点に２を入れる。原稿中のスペースは、休止期間１，
一つの停留所案内に対する一連の放送ブロックと別のブ
ロックとの間には、休止期間５を入れ、路線との区切り
には休止期間８を入れるといった具合である。[0006] When reading a manuscript, a pause having a different length depending on each word or phrase, between sentences, between a series of broadcast blocks and further between routes. Insert a period (voiceless period). For example, for each punctuation mark in the sentence, 1 is set as the resting period,
Insert 2 at the punctuation mark. The space in the manuscript is 1
A pause period 5 is inserted between a series of broadcast blocks for one stop guide and another block, and a pause period 8 is inserted between the blocks to separate the route.

【０００７】このようにしてメモリ１４に全ての車内放
送用音声が入力されると、基本音声パターン抽出手段１
５でメモリ１４の記憶内容を順次読み出して、単語、文
節レベルの基本音声パターンＢ_Pを、順次休止期間（無
音声区間）を手掛りとして、分離抽出する。この抽出し
た基本音声パターンＢ_Pが最初のもの（未登録のもの）
かを基本音声パターンデータベース１６を参照して判定
手段１７で判定する。つまり、抽出された基本音声パタ
ーンＢ_Pと同一の基本音声パターンがデータベース１６
にあるかを、両基本音声パターンをパターンマッチング
（例えばＤＰ（動的計画法）マッチング）の手法で比較
して調べる。When all the in-vehicle broadcasting voices are input to the memory 14 in this way, the basic voice pattern extracting means 1
In step 5, the stored contents of the memory 14 are sequentially read, and the basic speech patterns B _P at the word and phrase levels are sequentially separated and extracted with the pause period (non-voice section) as a clue. This extracted basic voice pattern B _P is the first one (unregistered one)
Whether or not is determined by the determination means 17 with reference to the basic voice pattern database 16. That is, the same basic voice pattern as the extracted basic voice pattern B _P is stored in the database 16
The basic voice patterns are compared by a pattern matching (for example, DP (Dynamic Programming) matching) method.

【０００８】この判定の結果、未登録であれば、この基
本音声パターンに識別符号Ｂ_Piを付与して、基本音声パ
ターンデータベース１６に登録手段１８で登録する。一
方、既登録であった場合には、その基本音声パターンの
識別符号Ｂ_Pgをデータベース１６から読み出す。例えば
図３に示した例の場合、図４に示すように路線番号の
「９８５１」が識別番号Ｂ_P０として登録され、次に
「昭和車庫前」が識別符号Ｂ_P１として登録され、以下
「発」、「平成団地行き」、「この放送は」、「特９８
号系統」…はそれぞれ識別符号Ｂ_P２，Ｂ_P３，Ｂ
_P４，Ｂ_P５，……として登録される。この際に、図３
の原稿による入力音声の場合、例えば「この放送は、特
９８号系統平成団地行きです。」について基本音声パタ
ーンとして「平成団地行きです」は図４に示すように識
別符号Ｂ_P６として登録されているから、次の入力音声
の「お待たせ致しました、平成団地行きです。危険物の
持ち込みはお断り致します。」について抽出された基本
音声パターン「平成団地行きです」は登録済みのものと
して新たに識別符号が与えられることはない。If the result of this determination is that it has not been registered, the identification code B _Pi is added to this basic voice pattern and it is registered in the basic voice pattern database 16 by the registration means 18. On the other hand, if it is already registered, the identification code B _Pg of the basic voice pattern is read from the database 16. For example, in the case of the example shown in FIG. 3, as shown in FIG. 4, the route number “9851” is registered as the identification number B _P 0, then “Showa garage front” is registered as the identification code B _P 1, and "Departure", "To Heisei housing complex", "This broadcast is", "Special 98
No. system ”... are identification codes B _P 2, B _P 3, B, respectively.
It is registered as _P 4, B _P 5, .... At this time, FIG.
In the case of the input voice of the manuscript of “For example, this broadcast is to Heisei housing complex No. 98 system” is registered as an identification code B _P 6 as a basic voice pattern as shown in FIG. Therefore, the basic voice pattern "I'm going to Heisei housing complex" extracted from the following input voice, "I'm sorry to keep you going to Heisei housing complex. We will refuse to bring dangerous materials." Is not given a new identification code.

【０００９】次に変換手段１９において、各１放送文ご
とに入力音声は、その基本音声パターンの配列順に、基
本音声パターンの識別符号列に変換され、各放送文ごと
に再生テーブル２１に記憶される。この記憶は、基本音
声パターンを分離抽出する際に用いた休止期間の長さを
解析して、単語レベルの区切り、文の区切り、放送ブロ
ックの区切り、路線の区切りかを判別して、放送文ごと
の識別符号列を作成する。Next, the conversion means 19 converts the input voice for each one broadcast sentence into an identification code string of the basic voice pattern in the arrangement order of the basic voice patterns, and stores it in the reproduction table 21 for each broadcast sentence. It This memory analyzes the length of the pause period used when separating and extracting the basic voice pattern to determine whether it is a word level break, sentence break, broadcast block break, or line break, and Create an identification code string for each.

【００１０】図３，図４に示した例では図５に示すよう
に音声「９８５１」に対し、放送１として識別符号Ｂ_P
０のみが与えられ、音声「昭和車庫前発平成団地行き」
に対し、放送２として識別符号Ｂ_P１，Ｂ_P２，Ｂ_P３
の符号列が作られる。以下同様にして放送文が作られ
る。このようにして得られた音声合成用データを用いて
音声合成するには再生データテーブル２１を順次読み出
し、その読み出された各識別符号と対応する基本音声パ
ターンをデータベース１６から読み出して音声合成すれ
ばよい。[0010] Figure 3, with respect to speech "9851" as shown in FIG. 5 in the example shown in FIG. 4, the identification code B _P as a broadcast 1
Only 0 is given, and the voice "Showa garage front bound for Heisei housing complex"
On the other hand, as broadcast 2, identification codes B _P 1, B _P 2, B _P 3
A code string of is created. Similarly, a broadcast sentence is created. In order to perform voice synthesis using the voice synthesis data thus obtained, the reproduction data table 21 is sequentially read, and the basic voice pattern corresponding to each read identification code is read from the database 16 to perform voice synthesis. Good.

【００１１】基本音声パターンの記憶は、差分符号化
法、適応差分符号化法などにより圧縮して記憶量を小と
することもできる。また、この発明は乗合自動車の案内
放送に限らず、他の合成音声を発声するのに必要とする
音声合成用データの作成にも適用できる。The basic voice pattern can be stored by compressing it by a differential coding method, an adaptive differential coding method or the like to reduce the storage amount. Further, the present invention is not limited to the guide broadcasting of a shared vehicle, but can be applied to the creation of voice synthesis data necessary for uttering other synthetic voices.

【００１２】[0012]

【発明の効果】以上述べたように、この発明によれば入
力音声が自動的に基本音声パターンに分離抽出され、そ
の各一連の入力音声ごとにそれぞれが構成する基本音声
パターンの順に、その基本音声パターンを示す識別符号
の列が生成され、手作業でこのような合成音声データを
作る場合と比較して効率よく、迅速に、正確に作ること
ができ、長大な音声についても容易に作成でき、しかも
同一基本音声パターンに異なる識別符号を与えるような
重複が生じない。As described above, according to the present invention, the input voice is automatically separated and extracted into the basic voice patterns, and the basic voice patterns formed in each of the series of the input voices are arranged in the order of the basic voice patterns. A sequence of identification codes indicating a voice pattern is generated, which is more efficient, faster, and more accurate than the case of manually creating such synthesized voice data, and even a long voice can be easily created. Moreover, there is no duplication that gives different identification codes to the same basic voice pattern.

[Brief description of drawings]

【図１】この発明の実施例を機能的に示すブロック図。FIG. 1 is a block diagram functionally showing an embodiment of the present invention.

【図２】乗合自動車の路線の例を示す図。FIG. 2 is a diagram showing an example of routes of a shared vehicle.

【図３】乗合自動車の案内放送原稿の例を示す図。FIG. 3 is a diagram showing an example of a guide broadcasting manuscript of a shared vehicle.

【図４】基本音声パターンとその識別符号との関係を示
すデータベース１６の記憶例を示す図。FIG. 4 is a diagram showing a storage example of a database 16 showing a relationship between a basic voice pattern and its identification code.

【図５】再生テーブル２１の記憶例を示す図。FIG. 5 is a diagram showing a storage example of a reproduction table 21.

Claims

[Claims]

1. A means for inputting a continuous voice, a means for separating and extracting a basic voice pattern at a word or phrase level from the input voice, and determining whether the extracted basic voice pattern is the first appearance. And a means for registering by adding an identification code when the basic voice pattern is the first appearance in the determination, the input using the registered identification code and the extracted basic voice pattern. A voice synthesizing data creation device comprising: means for generating an identification code string of a basic voice pattern corresponding to a voice.

2. The input voice is provided with a pause period for each word, phrase, between sentences, and between a series of input voices, and the basic voice pattern is separated by using the pause period as a clue. The data synthesizing apparatus for voice synthesis according to claim 1, wherein

3. The length of the pause period for each word or phrase, the pause period between the sentences, and the pause period between the series of input voices are different from each other, and from the length of the pause period, 3. The voice synthesizing data creating apparatus according to claim 2, further comprising means for adding a code for distinguishing each series of input voices and storing the identification code sequence in the reproduction table for each series of input voices. .

4. The means for determining whether the first appearance is
The means for determining whether or not the separated and extracted basic voice pattern matches the basic voice pattern registered in the database by pattern matching. A voice synthesizing data creation device described in.