JPS6083093A

JPS6083093A - Word voice recognition equipment

Info

Publication number: JPS6083093A
Application number: JP58191285A
Authority: JP
Inventors: 真哉高橋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1983-10-13
Filing date: 1983-10-13
Publication date: 1985-05-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明は、単語音声の語頭部の特徴により。[Detailed description of the invention] This invention is based on the characteristics of the beginning of the word sound.

単語音声発声終了前に一次子備選択を行い、予備選択処
理全体を時間的に短縮して行う単語音声認識装置に関す
るものである。The present invention relates to a word speech recognition device that performs primary selection before the end of word speech utterance, thereby shortening the entire preliminary selection process in terms of time.

単語音声の予備選択に用いる特徴量として、処理の簡易
さ、予備選択能力（予備選択によって認識処理量が減少
する量）の高さ、最終の認識率に及ぼす影響の少なさ９
等の観点から単語音声全体の大局的特徴を用いるのが一
般的である。第１図に、従来の単語音声全体の大局的特
徴を用いた予備選択を行う単語音声認識装置の一実施例
を示す。Features used for preliminary selection of word sounds include ease of processing, high preliminary selection ability (the amount by which the amount of recognition processing is reduced by preliminary selection), and little effect on the final recognition rate.9
It is common to use global features of the entire word sound from the viewpoint of FIG. 1 shows an embodiment of a conventional word speech recognition device that performs preliminary selection using global features of the entire word speech.

マイクロホン（１１から入力された音声波形（２）は音
声分析回路（３）で例えば分析フレーム毎に周波数スペ
クトラム分析を受け、特徴ベクトルの時系列Ａ：ａ１ａ
２・・・ａｔ・・・ａｌに変換される。以下ではこの特
徴ベクトルの時系列を入力特徴パター／（全７レーム）
（４）と呼ぶ。大局的特徴抽出回路（５）では、入力特
徴パターン（４）より例えば次式の様なフーリエ展開係
数Ｆ１ｊを計算し、大局的特徴パターン（６）として出
力する。The audio waveform (2) input from the microphone (11) is subjected to frequency spectrum analysis for each analysis frame, for example, in the audio analysis circuit (3), and the time series of feature vectors A:a1a
2...at...al. Below, the time series of this feature vector is input feature pattern/(7 frames in total)
It is called (4). The global feature extraction circuit (5) calculates a Fourier expansion coefficient F1j as shown in the following equation from the input feature pattern (4), and outputs it as a global feature pattern (6).

π（ｔ−１）ｊＦ１ｊ＝（Σａ　ｔｌｃｏｓ　−）　／Ｌ　１°＝°（
１）ｔ＝１　’Ｌここでａｔｉは入力特徴ノ（ターン（４）中の時刻ｔに
おける１番目の特徴パラメータ、Ｌは音声区間のフレー
ム数、　Ｆｌｊは１番目の％徴ノくラメータのｊ次のフ
ーリエ展開係数である。大局的特徴登録メモリ（８）に
は、あらかじめ使用話者によって登録された登録単語音
声Ｗ１Ｗ２・・・の大局的特徴ノ（ターン０１ｊがそれ
ぞれ格納されている。予備選択回路（７）では。π(t-1)j F1j=(Σa tlcos −) /L 1°=°(
1) t=1'L Here, ati is the input feature (first feature parameter at time t during turn (4), L is the number of frames in the speech section, Flj is the first % feature parameter j These are the following Fourier expansion coefficients.The global feature registration memory (8) stores the global features (turn 01j) of the registered word sounds W1W2, . . . that have been registered in advance by the speaker in use. In the selection circuit (7).

入力単語音声の大局的特徴パターン’Ｆ１ｊ（６）と登
録単語音声ＷＩ　Ｗ２・・・の大局的特徴ノくターン０
１ｊとのパターン間距離ａ（ｙ、ｃ）を例えば次式で計
算し。Global feature pattern 'F1j (6) of input word sound and global feature pattern of registered word sound WI W2... Turn 0
For example, calculate the inter-pattern distance a(y, c) with 1j using the following formula.

ａ（Ｐ、ｃ）が小さい値を持つ数個の登録単語音声の単
語番号を予備選択結果（９）として出力する。認識用単
語辞書登録メモＩＪ　（Ｌυには、使用話者によって登
録された登録単語音声ｗ１　ｗ２・・・の特徴ベクトル
の時系列５＝ｅ１ｓ２・・・８Ｍ（以下登録特徴〕くタ
ーンと呼ぶ）が格納されている。認識処理回路ｕＩでは
。The word numbers of several registered word sounds having small values of a(P, c) are output as preliminary selection results (9). Recognition word dictionary registration memo IJ (Lυ is a time series of feature vectors of registered word sounds w1, w2, etc. registered by the speaker using them 5 = e1s2...8M (hereinafter referred to as "registered features" turns) is stored in the recognition processing circuit uI.

予備選択結果（９）で出力された登録単語音声の登録特
徴パターンと入力、＃継パターン（４）とのパターンマ
ツチングを行なってパターン間距離を計算し。Pattern matching is performed between the registered feature pattern of the registered word voice outputted in the preliminary selection result (9) and the input # continuation pattern (4) to calculate the inter-pattern distance.

パターン間距離が最小となる登録単語音声の単語カテゴ
リな認識結果ａｚとして出力する。Output as the word category recognition result az of the registered word sounds with the minimum distance between patterns.

予備選択に必要な人力単語音声の大局的特徴パターン（
６）の計算は、入力単語音声の発声が終って全てのフレ
ームの分析が終了してからでなければ開始することがで
きないので、第１図に示す様な従来の装置では、認識時
間全体に対して予備選択の計算が占める時間が太きいと
いう欠点があった。Global feature patterns of human word sounds necessary for preliminary selection (
The calculation in step 6) cannot be started until after the input word voice has been uttered and all frames have been analyzed. On the other hand, it has the disadvantage that the preliminary selection calculation takes up a large amount of time.

この発明はこのような欠点を除去するため、音声発声終
了前に入力単語音声の語頭部特徴によって一次子備選択
を行い１次の大局的特徴による予備選択の計算量を軽減
させるもので、以下図面について詳細に説明する。In order to eliminate such drawbacks, the present invention performs primary selection based on the initial feature of the input word sound before the end of speech production, thereby reducing the amount of calculation required for preliminary selection based on the primary global feature. The drawings will be explained in detail below.

第２図に本発明の実施例を示すもので、第２図中マイク
ロホン（１）から認識結果ａ２までは第１図中のものと
同じなので説明を省略する。FIG. 2 shows an embodiment of the present invention, and since the components from the microphone (1) to the recognition result a2 in FIG. 2 are the same as those in FIG. 1, their explanation will be omitted.

音声分析回路（３）で計算された語頭部にフレーム分の
入力特徴バター７　Ａｈ＝ａ１　ａ２・・・ａＫ（１３
は、直ちに語頭予備選択回路０４で例えばフレーム平均
処理されて９語頭部特徴）くターンｈａ９がめられる。Input feature butter 7 for frames at the beginning of the word calculated by the speech analysis circuit (3) Ah=a1 a2...aK(13
is immediately subjected to, for example, frame averaging processing in the word-beginning preliminary selection circuit 04, and nine word-beginning features) are determined.

語頭部特徴登録メモＩＪ　Ｈには、登録単語音声を語頭
部特徴バク−／に基づいて複数のクラスタに分割したと
きの各クラスタ中心単語Ｗ。（Ｗ５．Ｗｌ。１．。The word beginning feature registration memo IJH includes the center word W of each cluster when the registered word sounds are divided into a plurality of clusters based on the word beginning feature baku-/. (W5.Wl.1.

ｅｔｃ）の語頭部特徴パターンと、各クラスタに所属す
る単語の番号（３，５，８・・・ｅｔｃ　）が記述され
ている。語頭予備選択回路αｅは語頭部特徴ノくターｙ
ｈ＜ｔ’３と語頭部特徴登録メモリ（ｌη中のクラスタ
中心単語Ｗ。の語頭部特徴パターンｅとの）くターン間
距離を計算し、パターン間距離が小さい数個のクラスタ
中心単語を選出する。そして１選出された各クラスタ中
心単語が所属するクラスタ内の総ての単語番号を語頭部
特徴登録メモ’Ｊ　（１７）から読み出し、−次子備選
択結果０砂として出力する。etc.) and the word numbers (3, 5, 8, etc.) belonging to each cluster are described. The word-initial preliminary selection circuit αe is a word-initial feature detector.
h<t'3 and the word-head feature registration memory (between the word-head feature pattern e of the cluster center word W in lη) is calculated, and several cluster center words with small inter-pattern distances are calculated. Select. Then, all the word numbers in the cluster to which each selected cluster center word belongs are read out from the word-head feature registration memo 'J (17), and outputted as the -next child selection result 0 sand.

以上の語頭部による予備選択処理は、（イ）特徴量に入
力単語の語頭部を使う、（ロ）語頭部特徴ノくターン間
の距離計算は、入力単語音声と登録単語音声中の数個の
クラスタ中心単語との間で行うので計算時間が少ない、
という理由により、単語音声発声終了前に完了される。The above pre-selection process using word beginnings consists of (a) using the word beginning of the input word as the feature quantity, and (b) calculating the distance between turns using the word beginning feature between the input word audio and the registered word audio. Since it is performed between several cluster center words, the calculation time is short.
For this reason, it is completed before the end of the word vocalization.

予備選択回路（７）では、大局的特徴パターンの距離計
算は、単語音声発声終了前に出力された一次子備選択結
果ａ８の登録単語音声についてのみ行なわれる。よって
、−次子備選択される単語数に従い予備選択回路（７）
内で行なわれる計算量は大幅に減少される。In the preliminary selection circuit (7), the distance calculation of the global feature pattern is performed only for the registered word sounds of the primary child selection result a8 output before the end of the word sound utterance. Therefore, according to the number of words to be selected, the preliminary selection circuit (7)
The amount of computation performed within is significantly reduced.

以上は認識を行う単語を使用話者があらかじめ登録する
。特定話者型の単語音声認識の場合について説明したが
１本発明は話者を限定しない不特定話者型音声認識に使
用してもよい。その際９語頭部特徴パターン、大局的特
徴パターン、登録特徴パターンは複数の話者による単語
音声データから統計的手法によってあらかじめ作成して
おくものである。In the above, the words to be recognized are registered in advance by the speaker. Although the case of speaker-specific word speech recognition has been described, the present invention may also be used for speaker-independent speech recognition in which speakers are not limited. In this case, the nine word-initial feature patterns, global feature patterns, and registered feature patterns are created in advance by statistical methods from word speech data from a plurality of speakers.

以上のようにこの発明によれば、単語音声の語頭部特徴
パターンによって単語音声発声終了前に一次子備選択を
行うので１次の大局的４？徴パターンによる予備選択の
計算量が軽減でき、単語音声認識ＶＣおける予備選択処
理の時間短縮の効果な有する。As described above, according to the present invention, primary selection is performed before the end of word vocalization based on the word-initial feature pattern of the word voice, so the first-order global 4? The amount of calculation for preliminary selection based on characteristic patterns can be reduced, and this has the effect of shortening the time for preliminary selection processing in word speech recognition VC.

[Brief explanation of drawings]

第１図は従来の単語音声認識装置の一例を示す図、第２
図は本発明による単語音声認識装置の実施例を示す図で
ある。図中（１）はマイクロホン、（２）は入力音声波形、（
３）は音声分析回路、（４）は入力特徴パターン（全フ
レーム）　、　＋５１は大局的特徴抽出回路、（６）は
大局的特徴パターン、（力は予備選択回路、（８）は大
局的特徴登録メモ’）　、　（９１は予備選択結果、　
（ＩＩは認識処理回路、συは認識用単語辞書登録メモ
リ、αりは認識結果、α鼾ま入力特徴パターン（語頭部
）、Ｑ４は語頭予備選択回路、　＋１！９は語頭部特徴
パターン、σｅは語頭予備選択回路、ａηは語頭部特徴
登録メモリ。ＱＩＧは一次子備選択結果である。なお１図中同一あるいは相当部分には同一符号を付して
示しである。代理人大岩増雄第　１　崗Figure 1 is a diagram showing an example of a conventional word speech recognition device;
The figure is a diagram showing an embodiment of a word speech recognition device according to the present invention. In the figure, (1) is the microphone, (2) is the input audio waveform, (
3) is the speech analysis circuit, (4) is the input feature pattern (all frames), +51 is the global feature extraction circuit, (6) is the global feature pattern, (power is the preliminary selection circuit, (8) is the global feature Registration memo'), (91 is the preliminary selection result,
(II is the recognition processing circuit, συ is the recognition word dictionary registration memory, α is the recognition result, α snoring input feature pattern (word beginning), Q4 is the word beginning preliminary selection circuit, +1!9 is the word beginning feature pattern , σe is the word-initial preliminary selection circuit, aη is the word-initial feature registration memory. QIG is the primary child selection result. In addition, the same or equivalent parts in Figure 1 are indicated with the same symbols. Agent Oiwa Masuo No. 1

Claims

[Claims]

Each cluster of registered word sounds is divided into a plurality of clusters according to the characteristics of the beginning of the word in a word speech recognition device that performs preliminary selection of registered word sounds for input word sounds and performs recognition processing on one selected registered word sound. A word-beginning feature registration memory described by the feature of the word-beginning of the central word and the word number belonging to each cluster; Calculates the inter-pattern distance between the cluster center word and the word-head feature in the head feature registration memory, extracts several clusters in order from the cluster with the cluster center word with the smallest inter-pattern distance, and belongs to that cluster. 1. A word speech recognition device comprising: a word-initial preliminary selection circuit that outputs a word number as a primary component selection result.