JP2705062B2

JP2705062B2 - Split labeling device

Info

Publication number: JP2705062B2
Application number: JP62064953A
Authority: JP
Inventors: 芳春阿部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1987-03-19
Filing date: 1987-03-19
Publication date: 1998-01-26
Anticipated expiration: 2013-01-26
Also published as: JPS63231400A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は音声データを表す特徴ベクトルの時系列の
分割ラベル付け装置に関する。〔従来の技術〕第６図は従来の分割ラベル付け装置の機能ブロツク図
であつて，分割ラベル付け部（３）は，フアイル装置
（１）に格納された複数の学習用音声データを，これら
の学習用音声データのそれぞれに対して与えられるフア
イル装置（２）に格納されたラベル列に従つて分割ラベ
ル付けし，フアイル装置（４）に分割結果を，又，フア
イル装置（５）にラベル別の統計量を格納する。このよ
うにして求められたラベル別の統計量は音声認識装置な
どにおけるテンプレート照合に利用されるためできるだ
け偏りの少ないラベル別の統計量を求める必要がある。
そのためには，多くの発声環境を含む多量の音声データ
の分割ラベル付けが必要となる。〔発明が解決しようとする問題点〕上述のような構成の従来装置は，使用に当り，学習用
音声データのそれぞれにつき一つずつ分割して付けるべ
きラベル列を作成する必要があり，多量の音声データの
分割ラベル付けを行おうとすると，ラベル列として多種
多量のものを作成しておかねばならず，多量の音声デー
タの分割ラベル付けには向いていない。又，発声のコンテクストによる影響をラベル列に反映
させる等の目的で，新たにラベル種を追加したり，ラベ
ル列の構成を変更しようとする場合，多量に作成された
ラベル列に対し，同一の仕方で誤りが生じないように修
正を加えることは容易ではない。本発明はかかる欠点を除去するため，従来のこの種装
置に，音声データの発声内容を表す音素記号列から，そ
れと対になるラベル列を，規則により生成するラベル列
生成部を設けることで多種多量のラベル列の生成を容易
にし，しかも，ラベル列の変更を規則の変更によつて誤
りなく容易にしようとするもので，以下図面について詳
細に説明する。〔発明の実施例〕第１図は本発明の一実施例の機能ブロツク図であつ
て，（６）は学習用音声データの発声内容を表す音素記
号列を格納するためのフアイル装置，（７）はフアイル
装置（６）上の音素記号列からラベル列を生成し，フア
イル装置（２）に格納するラベル列生成部である。第２図はフアイル装置（６）に格納されている音素記
号列の一部を例示したもので，左につけられた番号は，
フアイル装置（１）に格納されている学習用音声データ
及びフアイル装置（２）に格納されるラベル列との対応
をとるための番号で，同一の番号によつて対応がとられ
る。これらの音素記号列を構成する音素記号としては，第
５図に示すものがある。任意の学習用音声データの発声
内容はこれらの音素記号の組合せで表現される。ラベル列生成部（７）はフアイル装置（６）に格納さ
れたこのような音素記号列の一つ一つからラベル列を生
成し，フアイル装置（２）に格納する。ところで，上述のような音素記号列は発声内容を抽象
的に表現したものであつて，物理的な音声パターンに付
けられるラベル列とは複雑な対応関係にあるため，音素
記号列からラベル列への変換は，多段階への書き換え規
則群によつて行う必要がある。第３図は，このようなラベル列生成部の一構成例を示
すもので，変換すべき音素記号列（701）は，（7041）
〜（7046）の書き換え規則で構成される書き換え規則群
（704）を参照し，これらの規則の左辺の記号列と一致
する部分記号列を規則の右辺に書かれている記号列に置
き換える動作を繰り返し行う規則適用部（703）によつ
て，ラベル列（702）に変換される。第４図は，第２図に一部例示した音素記号列を，上述
のラベル列生成部に通して得られたラベル列を，左側に
対応番号をつけて示したものである。この図のように，
一般に，ラベル列は音素記号列に比べかなり複雑なもの
であり，一つ一つの音声データに対しラベル列を作成し
与える必要のあつた従来装置に比べ，本発明に係る装置
の方が多量の音声データの分割ラベル付けに向いている
といえる。一方，ラベル種の増加やラベル構成の変更に関して
は，書き換え規則群の変更で済み，ラベル列生成部
（７）で用いられる書き換え規則群（704）の規則数
は，分割ラベル付けの対象とする音声データの個数に比
べて十分に小さいのが普通であるから，ラベル列そのも
のを変更する従来装置に比べ本発明に係る装置の方が優
れているといえる。〔発明の効果〕以上のように，本発明に係る分割ラベル付け装置で
は，音声データの発声内容を表す音素記号列から，変更
可能な書き換え規則群によつてラベル列を生成するラベ
ル列生成部を設けることによつて，一つ一つの音声デー
タについて複雑なラベル列を作成し与える必要をなくす
と共に，ラベル種の追加やラベル列構成の変更は少数の
書き換え規則の変更で済ますことができ，多種多量の分
割ラベル付けが容易になるという効果を有する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a time-series division labeling apparatus for feature vectors representing audio data. [Prior Art] FIG. 6 is a functional block diagram of a conventional divided labeling apparatus, in which a divided labeling unit (3) converts a plurality of learning speech data stored in a file apparatus (1) into these. For each of the learning speech data, the divided labels are assigned according to the label sequence stored in the file device (2), and the division result is assigned to the file device (4) and the label is assigned to the file device (5). Store another statistic. Since the statistic for each label obtained in this way is used for template matching in a speech recognition device or the like, it is necessary to obtain a statistic for each label with as little bias as possible.
For that purpose, it is necessary to label a large amount of audio data including many vocal environments. [Problems to be Solved by the Invention] In the conventional apparatus having the above-described configuration, in use, it is necessary to create a label string to be divided and attached to each of the learning speech data, and thus a large amount of In order to label audio data in a divided manner, a large number of labels must be created as a label sequence, which is not suitable for dividing and labeling a large amount of audio data. Also, when adding a new label type or changing the configuration of a label sequence for the purpose of reflecting the influence of the context of the utterance on the label sequence, etc., the same It is not easy to make corrections in such a way that errors do not occur. In order to eliminate such a drawback, the present invention provides a conventional device of this type with a label sequence generating section for generating a label sequence corresponding to the phoneme symbol sequence representing the utterance content of voice data by a rule. The purpose of the present invention is to make it easy to generate a large number of label strings and to easily change the label strings by changing the rules without error. [Embodiment of the Invention] FIG. 1 is a functional block diagram of an embodiment of the present invention, and (6) is a file device for storing a phoneme symbol string representing the utterance content of learning speech data, (7) ) Is a label string generation unit that generates a label string from the phoneme symbol string on the file device (6) and stores the label string in the file device (2). FIG. 2 illustrates a part of the phoneme symbol string stored in the file device (6).
This is a number for associating the learning speech data stored in the file device (1) with the label string stored in the file device (2), and is associated with the same number. The phoneme symbols constituting these phoneme symbol strings include those shown in FIG. The utterance content of any learning speech data is represented by a combination of these phoneme symbols. The label string generation unit (7) generates a label string from each of the phoneme symbol strings stored in the file device (6) and stores the label string in the file device (2). By the way, the phoneme symbol string as described above is an abstract representation of the utterance content and has a complicated correspondence with the label string attached to the physical voice pattern. Must be converted by a multi-step rewrite rule group. FIG. 3 shows an example of the configuration of such a label string generation unit. The phoneme symbol string (701) to be converted is (7041)
Refer to the rewrite rule group (704) composed of the rewrite rules of (1) to (7046), and replace the subsymbol string that matches the symbol string on the left side of these rules with the symbol string written on the right side of the rule. It is converted into a label string (702) by the rule applying unit (703) which is repeatedly performed. FIG. 4 shows a phoneme symbol string partially exemplified in FIG. 2 and a label string obtained by passing through the above-described label string generation unit with a corresponding number attached to the left side. As shown in this figure,
In general, a label sequence is considerably more complex than a phoneme symbol sequence, and the device according to the present invention has a larger amount of data than a conventional device which has to create and provide a label sequence for each voice data. It can be said that it is suitable for dividing and labeling audio data. On the other hand, as for the increase of the label type and the change of the label configuration, only the rewriting rule group needs to be changed, and the number of rules of the rewriting rule group (704) used in the label sequence generation unit (7) is subject to the division labeling. Since the number is usually sufficiently smaller than the number of voice data, it can be said that the apparatus according to the present invention is superior to the conventional apparatus that changes the label string itself. [Effects of the Invention] As described above, in the divided labeling apparatus according to the present invention, a label sequence generation unit that generates a label sequence from a phoneme symbol sequence representing the utterance content of voice data according to a group of rewrite rules that can be changed. By eliminating the need to create and give a complex label sequence for each audio data, the addition of label types and changes in the label sequence configuration can be achieved by changing a small number of rewriting rules. There is an effect that it is easy to label a large number of different divided labels.

【図面の簡単な説明】第１図は本発明の一実施例の機能ブロツク図，第２図は
音素記号列表現の例を示す図，第３図は本発明の一実施
例におけるラベル列生成部の構成例を示す図，第４図は
生成されたラベル列の例を示す図，第５図は音素記号列
を構成する音素記号例を示す図，第６図は従来の分割ラ
ベル付け装置の機能ブロツク図である。図において，（１）はフアイル装置，（２）はフアイル
装置，（３）は分割ラベル部，（４）はフアイル装置，
（５）はフアイル装置，（６）はフアイル装置，（７）
はラベル列生成部，（701）は音素記号列，（702）はラ
ベル列，（703）は規則適用部，（704）は書き換え規則
群である。なお，図中，同一あるいは相当部分には同一符号を付し
て示してある。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a functional block diagram of one embodiment of the present invention, FIG. 2 is a diagram showing an example of phoneme symbol string representation, and FIG. 3 is a label string generation in one embodiment of the present invention. FIG. 4 shows an example of a generated label sequence, FIG. 5 shows an example of phoneme symbols constituting a phoneme symbol sequence, and FIG. 6 shows a conventional divided labeling apparatus. 3 is a functional block diagram of FIG. In the figure, (1) is a file device, (2) is a file device, (3) is a divided label portion, (4) is a file device,
(5) File device, (6) File device, (7)
Is a label sequence generation unit, (701) is a phoneme symbol sequence, (702) is a label sequence, (703) is a rule application unit, and (704) is a rewrite rule group. In the drawings, the same or corresponding parts are denoted by the same reference numerals.

Claims

(57) [Claims] Voice data storage means (1) for storing a time series of feature vectors representing learning voice data, and phoneme symbols corresponding to the learning voice data and storing phoneme symbol strings representing the utterance contents of the learning voice data A learning voice stored in the voice data storage means (1) based on a rewrite rule group which can be changed from a row storage means (6) and a phoneme symbol string stored in the phoneme symbol string storage means (6). A label string generating means (7, 2) for generating a label string attached to a physical voice pattern in a time series of a data feature vector; and a label string generated by the label string generating means (7, 2). The divided labeling means (3) for dividing the time series of the feature vector of the learning speech data stored in the speech data storage means (1) in accordance with the label and sequentially assigning each label of this label string to each divided section. Dividing labeling apparatus characterized in that it comprises and.