JP2705062B2 - Split labeling device - Google Patents

Split labeling device

Info

Publication number
JP2705062B2
JP2705062B2 JP62064953A JP6495387A JP2705062B2 JP 2705062 B2 JP2705062 B2 JP 2705062B2 JP 62064953 A JP62064953 A JP 62064953A JP 6495387 A JP6495387 A JP 6495387A JP 2705062 B2 JP2705062 B2 JP 2705062B2
Authority
JP
Japan
Prior art keywords
label
string
storage means
voice data
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP62064953A
Other languages
Japanese (ja)
Other versions
JPS63231400A (en
Inventor
芳春 阿部
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP62064953A priority Critical patent/JP2705062B2/en
Publication of JPS63231400A publication Critical patent/JPS63231400A/en
Application granted granted Critical
Publication of JP2705062B2 publication Critical patent/JP2705062B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は音声データを表す特徴ベクトルの時系列の
分割ラベル付け装置に関する。 〔従来の技術〕 第6図は従来の分割ラベル付け装置の機能ブロツク図
であつて,分割ラベル付け部(3)は,フアイル装置
(1)に格納された複数の学習用音声データを,これら
の学習用音声データのそれぞれに対して与えられるフア
イル装置(2)に格納されたラベル列に従つて分割ラベ
ル付けし,フアイル装置(4)に分割結果を,又,フア
イル装置(5)にラベル別の統計量を格納する。このよ
うにして求められたラベル別の統計量は音声認識装置な
どにおけるテンプレート照合に利用されるためできるだ
け偏りの少ないラベル別の統計量を求める必要がある。
そのためには,多くの発声環境を含む多量の音声データ
の分割ラベル付けが必要となる。 〔発明が解決しようとする問題点〕 上述のような構成の従来装置は,使用に当り,学習用
音声データのそれぞれにつき一つずつ分割して付けるべ
きラベル列を作成する必要があり,多量の音声データの
分割ラベル付けを行おうとすると,ラベル列として多種
多量のものを作成しておかねばならず,多量の音声デー
タの分割ラベル付けには向いていない。 又,発声のコンテクストによる影響をラベル列に反映
させる等の目的で,新たにラベル種を追加したり,ラベ
ル列の構成を変更しようとする場合,多量に作成された
ラベル列に対し,同一の仕方で誤りが生じないように修
正を加えることは容易ではない。 本発明はかかる欠点を除去するため,従来のこの種装
置に,音声データの発声内容を表す音素記号列から,そ
れと対になるラベル列を,規則により生成するラベル列
生成部を設けることで多種多量のラベル列の生成を容易
にし,しかも,ラベル列の変更を規則の変更によつて誤
りなく容易にしようとするもので,以下図面について詳
細に説明する。 〔発明の実施例〕 第1図は本発明の一実施例の機能ブロツク図であつ
て,(6)は学習用音声データの発声内容を表す音素記
号列を格納するためのフアイル装置,(7)はフアイル
装置(6)上の音素記号列からラベル列を生成し,フア
イル装置(2)に格納するラベル列生成部である。 第2図はフアイル装置(6)に格納されている音素記
号列の一部を例示したもので,左につけられた番号は,
フアイル装置(1)に格納されている学習用音声データ
及びフアイル装置(2)に格納されるラベル列との対応
をとるための番号で,同一の番号によつて対応がとられ
る。 これらの音素記号列を構成する音素記号としては,第
5図に示すものがある。任意の学習用音声データの発声
内容はこれらの音素記号の組合せで表現される。 ラベル列生成部(7)はフアイル装置(6)に格納さ
れたこのような音素記号列の一つ一つからラベル列を生
成し,フアイル装置(2)に格納する。 ところで,上述のような音素記号列は発声内容を抽象
的に表現したものであつて,物理的な音声パターンに付
けられるラベル列とは複雑な対応関係にあるため,音素
記号列からラベル列への変換は,多段階への書き換え規
則群によつて行う必要がある。 第3図は,このようなラベル列生成部の一構成例を示
すもので,変換すべき音素記号列(701)は,(7041)
〜(7046)の書き換え規則で構成される書き換え規則群
(704)を参照し,これらの規則の左辺の記号列と一致
する部分記号列を規則の右辺に書かれている記号列に置
き換える動作を繰り返し行う規則適用部(703)によつ
て,ラベル列(702)に変換される。 第4図は,第2図に一部例示した音素記号列を,上述
のラベル列生成部に通して得られたラベル列を,左側に
対応番号をつけて示したものである。この図のように,
一般に,ラベル列は音素記号列に比べかなり複雑なもの
であり,一つ一つの音声データに対しラベル列を作成し
与える必要のあつた従来装置に比べ,本発明に係る装置
の方が多量の音声データの分割ラベル付けに向いている
といえる。 一方,ラベル種の増加やラベル構成の変更に関して
は,書き換え規則群の変更で済み,ラベル列生成部
(7)で用いられる書き換え規則群(704)の規則数
は,分割ラベル付けの対象とする音声データの個数に比
べて十分に小さいのが普通であるから,ラベル列そのも
のを変更する従来装置に比べ本発明に係る装置の方が優
れているといえる。 〔発明の効果〕 以上のように,本発明に係る分割ラベル付け装置で
は,音声データの発声内容を表す音素記号列から,変更
可能な書き換え規則群によつてラベル列を生成するラベ
ル列生成部を設けることによつて,一つ一つの音声デー
タについて複雑なラベル列を作成し与える必要をなくす
と共に,ラベル種の追加やラベル列構成の変更は少数の
書き換え規則の変更で済ますことができ,多種多量の分
割ラベル付けが容易になるという効果を有する。
Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a time-series division labeling apparatus for feature vectors representing audio data. [Prior Art] FIG. 6 is a functional block diagram of a conventional divided labeling apparatus, in which a divided labeling unit (3) converts a plurality of learning speech data stored in a file apparatus (1) into these. For each of the learning speech data, the divided labels are assigned according to the label sequence stored in the file device (2), and the division result is assigned to the file device (4) and the label is assigned to the file device (5). Store another statistic. Since the statistic for each label obtained in this way is used for template matching in a speech recognition device or the like, it is necessary to obtain a statistic for each label with as little bias as possible.
For that purpose, it is necessary to label a large amount of audio data including many vocal environments. [Problems to be Solved by the Invention] In the conventional apparatus having the above-described configuration, in use, it is necessary to create a label string to be divided and attached to each of the learning speech data, and thus a large amount of In order to label audio data in a divided manner, a large number of labels must be created as a label sequence, which is not suitable for dividing and labeling a large amount of audio data. Also, when adding a new label type or changing the configuration of a label sequence for the purpose of reflecting the influence of the context of the utterance on the label sequence, etc., the same It is not easy to make corrections in such a way that errors do not occur. In order to eliminate such a drawback, the present invention provides a conventional device of this type with a label sequence generating section for generating a label sequence corresponding to the phoneme symbol sequence representing the utterance content of voice data by a rule. The purpose of the present invention is to make it easy to generate a large number of label strings and to easily change the label strings by changing the rules without error. [Embodiment of the Invention] FIG. 1 is a functional block diagram of an embodiment of the present invention, and (6) is a file device for storing a phoneme symbol string representing the utterance content of learning speech data, (7) ) Is a label string generation unit that generates a label string from the phoneme symbol string on the file device (6) and stores the label string in the file device (2). FIG. 2 illustrates a part of the phoneme symbol string stored in the file device (6).
This is a number for associating the learning speech data stored in the file device (1) with the label string stored in the file device (2), and is associated with the same number. The phoneme symbols constituting these phoneme symbol strings include those shown in FIG. The utterance content of any learning speech data is represented by a combination of these phoneme symbols. The label string generation unit (7) generates a label string from each of the phoneme symbol strings stored in the file device (6) and stores the label string in the file device (2). By the way, the phoneme symbol string as described above is an abstract representation of the utterance content and has a complicated correspondence with the label string attached to the physical voice pattern. Must be converted by a multi-step rewrite rule group. FIG. 3 shows an example of the configuration of such a label string generation unit. The phoneme symbol string (701) to be converted is (7041)
Refer to the rewrite rule group (704) composed of the rewrite rules of (1) to (7046), and replace the subsymbol string that matches the symbol string on the left side of these rules with the symbol string written on the right side of the rule. It is converted into a label string (702) by the rule applying unit (703) which is repeatedly performed. FIG. 4 shows a phoneme symbol string partially exemplified in FIG. 2 and a label string obtained by passing through the above-described label string generation unit with a corresponding number attached to the left side. As shown in this figure,
In general, a label sequence is considerably more complex than a phoneme symbol sequence, and the device according to the present invention has a larger amount of data than a conventional device which has to create and provide a label sequence for each voice data. It can be said that it is suitable for dividing and labeling audio data. On the other hand, as for the increase of the label type and the change of the label configuration, only the rewriting rule group needs to be changed, and the number of rules of the rewriting rule group (704) used in the label sequence generation unit (7) is subject to the division labeling. Since the number is usually sufficiently smaller than the number of voice data, it can be said that the apparatus according to the present invention is superior to the conventional apparatus that changes the label string itself. [Effects of the Invention] As described above, in the divided labeling apparatus according to the present invention, a label sequence generation unit that generates a label sequence from a phoneme symbol sequence representing the utterance content of voice data according to a group of rewrite rules that can be changed. By eliminating the need to create and give a complex label sequence for each audio data, the addition of label types and changes in the label sequence configuration can be achieved by changing a small number of rewriting rules. There is an effect that it is easy to label a large number of different divided labels.

【図面の簡単な説明】 第1図は本発明の一実施例の機能ブロツク図,第2図は
音素記号列表現の例を示す図,第3図は本発明の一実施
例におけるラベル列生成部の構成例を示す図,第4図は
生成されたラベル列の例を示す図,第5図は音素記号列
を構成する音素記号例を示す図,第6図は従来の分割ラ
ベル付け装置の機能ブロツク図である。 図において,(1)はフアイル装置,(2)はフアイル
装置,(3)は分割ラベル部,(4)はフアイル装置,
(5)はフアイル装置,(6)はフアイル装置,(7)
はラベル列生成部,(701)は音素記号列,(702)はラ
ベル列,(703)は規則適用部,(704)は書き換え規則
群である。 なお,図中,同一あるいは相当部分には同一符号を付し
て示してある。
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a functional block diagram of one embodiment of the present invention, FIG. 2 is a diagram showing an example of phoneme symbol string representation, and FIG. 3 is a label string generation in one embodiment of the present invention. FIG. 4 shows an example of a generated label sequence, FIG. 5 shows an example of phoneme symbols constituting a phoneme symbol sequence, and FIG. 6 shows a conventional divided labeling apparatus. 3 is a functional block diagram of FIG. In the figure, (1) is a file device, (2) is a file device, (3) is a divided label portion, (4) is a file device,
(5) File device, (6) File device, (7)
Is a label sequence generation unit, (701) is a phoneme symbol sequence, (702) is a label sequence, (703) is a rule application unit, and (704) is a rewrite rule group. In the drawings, the same or corresponding parts are denoted by the same reference numerals.

Claims (1)

(57)【特許請求の範囲】 1.学習用音声データを表す特徴ベクトルの時系列が格
納される音声データ格納手段(1)と、 学習用音声データに対応し、この学習用音声データの発
声内容を表す音素記号列が格納れる音素記号列格納手段
(6)と、 この音素記号列格納手段(6)に格納された音素記号列
から、変更可能な書き換え規則群に基づいて、音声デー
タ格納手段(1)に格納された学習用音声データの特徴
ベクトルの時系列中の物理的な音声パターンに付けられ
るラベル列を生成するラベル列生成手段(7、2)と、 このラベル列生成手段(7、2)が生成したラベル列の
各ラベルに対応して、音声データ格納手段(1)に格納
された学習用音声データの特徴ベクトルの時系列を分割
し、各分割区間にこのラベル列の各ラベルを順に付ける
分割ラベル付け手段(3)と を備えることを特徴とする分割ラベル付け装置。
(57) [Claims] Voice data storage means (1) for storing a time series of feature vectors representing learning voice data, and phoneme symbols corresponding to the learning voice data and storing phoneme symbol strings representing the utterance contents of the learning voice data A learning voice stored in the voice data storage means (1) based on a rewrite rule group which can be changed from a row storage means (6) and a phoneme symbol string stored in the phoneme symbol string storage means (6). A label string generating means (7, 2) for generating a label string attached to a physical voice pattern in a time series of a data feature vector; and a label string generated by the label string generating means (7, 2). The divided labeling means (3) for dividing the time series of the feature vector of the learning speech data stored in the speech data storage means (1) in accordance with the label and sequentially assigning each label of this label string to each divided section. Dividing labeling apparatus characterized in that it comprises and.
JP62064953A 1987-03-19 1987-03-19 Split labeling device Expired - Lifetime JP2705062B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62064953A JP2705062B2 (en) 1987-03-19 1987-03-19 Split labeling device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62064953A JP2705062B2 (en) 1987-03-19 1987-03-19 Split labeling device

Publications (2)

Publication Number Publication Date
JPS63231400A JPS63231400A (en) 1988-09-27
JP2705062B2 true JP2705062B2 (en) 1998-01-26

Family

ID=13272908

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62064953A Expired - Lifetime JP2705062B2 (en) 1987-03-19 1987-03-19 Split labeling device

Country Status (1)

Country Link
JP (1) JP2705062B2 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0693221B2 (en) * 1985-06-12 1994-11-16 株式会社日立製作所 Voice input device

Also Published As

Publication number Publication date
JPS63231400A (en) 1988-09-27

Similar Documents

Publication Publication Date Title
Johnson-Laird How jazz musicians improvise
US4960031A (en) Method and apparatus for representing musical information
Loy Musicians make a standard: The MIDI phenomenon
US20050209855A1 (en) Speech signal processing apparatus and method, and storage medium
US7977560B2 (en) Automated generation of a song for process learning
JPS58195957A (en) Program starting system by voice
CN111145719B (en) Data labeling method and device for Chinese-English mixing and tone labeling
CN110264987A (en) Chord based on deep learning carries out generation method
Huzaifah et al. Deep generative models for musical audio synthesis
JP2000221968A (en) Automatic musical composition device and memory medium
US5396828A (en) Method and apparatus for representing musical information as guitar fingerboards
JP2705062B2 (en) Split labeling device
US7314991B2 (en) Method of conveying musical information
Conklin et al. Modelling and generating music using multiple viewpoints
Forsyth et al. Improving and adapting finite state transducer methods for musical accompaniment
Simić et al. PyTabs: A DSL for simplified music notation
JPS6083136A (en) Program reader
Robertson Variations on a Theme by Paganini: Narrative archetypes in nineteenth-and twentieth-century theme-and-variation sets
Cahn Schoenberg, Al-Kindī, and the Unbound Braid: A Rendezvous in Barcelona a Thousand Years in the Making
JPH04273299A (en) Voice recognition device
Jonsäll Spectromorphological Reductions: Exploring and developing approaches for sound-based notation of live electronics
JPH06167989A (en) Speech synthesizing device
Music Featured author
Beauchamp An introductory catalogue of computer synthesized sounds
JP2023010111A (en) Translation device and program

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term