JPS6083093A - Word voice recognition equipment - Google Patents
Word voice recognition equipmentInfo
- Publication number
- JPS6083093A JPS6083093A JP58191285A JP19128583A JPS6083093A JP S6083093 A JPS6083093 A JP S6083093A JP 58191285 A JP58191285 A JP 58191285A JP 19128583 A JP19128583 A JP 19128583A JP S6083093 A JPS6083093 A JP S6083093A
- Authority
- JP
- Japan
- Prior art keywords
- word
- feature
- cluster
- registered
- beginning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】 この発明は、単語音声の語頭部の特徴により。[Detailed description of the invention] This invention is based on the characteristics of the beginning of the word sound.
単語音声発声終了前に一次子備選択を行い、予備選択処
理全体を時間的に短縮して行う単語音声認識装置に関す
るものである。The present invention relates to a word speech recognition device that performs primary selection before the end of word speech utterance, thereby shortening the entire preliminary selection process in terms of time.
単語音声の予備選択に用いる特徴量として、処理の簡易
さ、予備選択能力(予備選択によって認識処理量が減少
する量)の高さ、最終の認識率に及ぼす影響の少なさ9
等の観点から単語音声全体の大局的特徴を用いるのが一
般的である。第1図に、従来の単語音声全体の大局的特
徴を用いた予備選択を行う単語音声認識装置の一実施例
を示す。Features used for preliminary selection of word sounds include ease of processing, high preliminary selection ability (the amount by which the amount of recognition processing is reduced by preliminary selection), and little effect on the final recognition rate.9
It is common to use global features of the entire word sound from the viewpoint of FIG. 1 shows an embodiment of a conventional word speech recognition device that performs preliminary selection using global features of the entire word speech.
マイクロホン(11から入力された音声波形(2)は音
声分析回路(3)で例えば分析フレーム毎に周波数スペ
クトラム分析を受け、特徴ベクトルの時系列A:a1a
2・・・at・・・alに変換される。以下ではこの特
徴ベクトルの時系列を入力特徴パター/(全7レーム)
(4)と呼ぶ。大局的特徴抽出回路(5)では、入力特
徴パターン(4)より例えば次式の様なフーリエ展開係
数F1jを計算し、大局的特徴パターン(6)として出
力する。The audio waveform (2) input from the microphone (11) is subjected to frequency spectrum analysis for each analysis frame, for example, in the audio analysis circuit (3), and the time series of feature vectors A:a1a
2...at...al. Below, the time series of this feature vector is input feature pattern/(7 frames in total)
It is called (4). The global feature extraction circuit (5) calculates a Fourier expansion coefficient F1j as shown in the following equation from the input feature pattern (4), and outputs it as a global feature pattern (6).
π(t−1)j
F1j=(Σa tlcos −) /L 1°=°(
1)t=1 ’L
ここでatiは入力特徴ノ(ターン(4)中の時刻tに
おける1番目の特徴パラメータ、Lは音声区間のフレー
ム数、 Fljは1番目の%徴ノくラメータのj次のフ
ーリエ展開係数である。大局的特徴登録メモリ(8)に
は、あらかじめ使用話者によって登録された登録単語音
声W1W2・・・の大局的特徴ノ(ターン01jがそれ
ぞれ格納されている。予備選択回路(7)では。π(t-1)j F1j=(Σa tlcos −) /L 1°=°(
1) t=1'L Here, ati is the input feature (first feature parameter at time t during turn (4), L is the number of frames in the speech section, Flj is the first % feature parameter j These are the following Fourier expansion coefficients.The global feature registration memory (8) stores the global features (turn 01j) of the registered word sounds W1W2, . . . that have been registered in advance by the speaker in use. In the selection circuit (7).
入力単語音声の大局的特徴パターン’F1j(6)と登
録単語音声WI W2・・・の大局的特徴ノくターン0
1jとのパターン間距離a(y、c)を例えば次式で計
算し。Global feature pattern 'F1j (6) of input word sound and global feature pattern of registered word sound WI W2... Turn 0
For example, calculate the inter-pattern distance a(y, c) with 1j using the following formula.
a(P、c)が小さい値を持つ数個の登録単語音声の単
語番号を予備選択結果(9)として出力する。認識用単
語辞書登録メモIJ (Lυには、使用話者によって登
録された登録単語音声w1 w2・・・の特徴ベクトル
の時系列5=e1s2・・・8M(以下登録特徴〕くタ
ーンと呼ぶ)が格納されている。認識処理回路uIでは
。The word numbers of several registered word sounds having small values of a(P, c) are output as preliminary selection results (9). Recognition word dictionary registration memo IJ (Lυ is a time series of feature vectors of registered word sounds w1, w2, etc. registered by the speaker using them 5 = e1s2...8M (hereinafter referred to as "registered features" turns) is stored in the recognition processing circuit uI.
予備選択結果(9)で出力された登録単語音声の登録特
徴パターンと入力、#継パターン(4)とのパターンマ
ツチングを行なってパターン間距離を計算し。Pattern matching is performed between the registered feature pattern of the registered word voice outputted in the preliminary selection result (9) and the input # continuation pattern (4) to calculate the inter-pattern distance.
パターン間距離が最小となる登録単語音声の単語カテゴ
リな認識結果azとして出力する。Output as the word category recognition result az of the registered word sounds with the minimum distance between patterns.
予備選択に必要な人力単語音声の大局的特徴パターン(
6)の計算は、入力単語音声の発声が終って全てのフレ
ームの分析が終了してからでなければ開始することがで
きないので、第1図に示す様な従来の装置では、認識時
間全体に対して予備選択の計算が占める時間が太きいと
いう欠点があった。Global feature patterns of human word sounds necessary for preliminary selection (
The calculation in step 6) cannot be started until after the input word voice has been uttered and all frames have been analyzed. On the other hand, it has the disadvantage that the preliminary selection calculation takes up a large amount of time.
この発明はこのような欠点を除去するため、音声発声終
了前に入力単語音声の語頭部特徴によって一次子備選択
を行い1次の大局的特徴による予備選択の計算量を軽減
させるもので、以下図面について詳細に説明する。In order to eliminate such drawbacks, the present invention performs primary selection based on the initial feature of the input word sound before the end of speech production, thereby reducing the amount of calculation required for preliminary selection based on the primary global feature. The drawings will be explained in detail below.
第2図に本発明の実施例を示すもので、第2図中マイク
ロホン(1)から認識結果a2までは第1図中のものと
同じなので説明を省略する。FIG. 2 shows an embodiment of the present invention, and since the components from the microphone (1) to the recognition result a2 in FIG. 2 are the same as those in FIG. 1, their explanation will be omitted.
音声分析回路(3)で計算された語頭部にフレーム分の
入力特徴バター7 Ah=a1 a2・・・aK(13
は、直ちに語頭予備選択回路04で例えばフレーム平均
処理されて9語頭部特徴)くターンha9がめられる。Input feature butter 7 for frames at the beginning of the word calculated by the speech analysis circuit (3) Ah=a1 a2...aK(13
is immediately subjected to, for example, frame averaging processing in the word-beginning preliminary selection circuit 04, and nine word-beginning features) are determined.
語頭部特徴登録メモIJ Hには、登録単語音声を語頭
部特徴バク−/に基づいて複数のクラスタに分割したと
きの各クラスタ中心単語W。(W5.Wl。1.。The word beginning feature registration memo IJH includes the center word W of each cluster when the registered word sounds are divided into a plurality of clusters based on the word beginning feature baku-/. (W5.Wl.1.
etc)の語頭部特徴パターンと、各クラスタに所属す
る単語の番号(3,5,8・・・etc )が記述され
ている。語頭予備選択回路αeは語頭部特徴ノくターy
h<t’3と語頭部特徴登録メモリ(lη中のクラスタ
中心単語W。の語頭部特徴パターンeとの)くターン間
距離を計算し、パターン間距離が小さい数個のクラスタ
中心単語を選出する。そして1選出された各クラスタ中
心単語が所属するクラスタ内の総ての単語番号を語頭部
特徴登録メモ’J (17)から読み出し、−次子備選
択結果0砂として出力する。etc.) and the word numbers (3, 5, 8, etc.) belonging to each cluster are described. The word-initial preliminary selection circuit αe is a word-initial feature detector.
h<t'3 and the word-head feature registration memory (between the word-head feature pattern e of the cluster center word W in lη) is calculated, and several cluster center words with small inter-pattern distances are calculated. Select. Then, all the word numbers in the cluster to which each selected cluster center word belongs are read out from the word-head feature registration memo 'J (17), and outputted as the -next child selection result 0 sand.
以上の語頭部による予備選択処理は、(イ)特徴量に入
力単語の語頭部を使う、(ロ)語頭部特徴ノくターン間
の距離計算は、入力単語音声と登録単語音声中の数個の
クラスタ中心単語との間で行うので計算時間が少ない、
という理由により、単語音声発声終了前に完了される。The above pre-selection process using word beginnings consists of (a) using the word beginning of the input word as the feature quantity, and (b) calculating the distance between turns using the word beginning feature between the input word audio and the registered word audio. Since it is performed between several cluster center words, the calculation time is short.
For this reason, it is completed before the end of the word vocalization.
予備選択回路(7)では、大局的特徴パターンの距離計
算は、単語音声発声終了前に出力された一次子備選択結
果a8の登録単語音声についてのみ行なわれる。よって
、−次子備選択される単語数に従い予備選択回路(7)
内で行なわれる計算量は大幅に減少される。In the preliminary selection circuit (7), the distance calculation of the global feature pattern is performed only for the registered word sounds of the primary child selection result a8 output before the end of the word sound utterance. Therefore, according to the number of words to be selected, the preliminary selection circuit (7)
The amount of computation performed within is significantly reduced.
以上は認識を行う単語を使用話者があらかじめ登録する
。特定話者型の単語音声認識の場合について説明したが
1本発明は話者を限定しない不特定話者型音声認識に使
用してもよい。その際9語頭部特徴パターン、大局的特
徴パターン、登録特徴パターンは複数の話者による単語
音声データから統計的手法によってあらかじめ作成して
おくものである。In the above, the words to be recognized are registered in advance by the speaker. Although the case of speaker-specific word speech recognition has been described, the present invention may also be used for speaker-independent speech recognition in which speakers are not limited. In this case, the nine word-initial feature patterns, global feature patterns, and registered feature patterns are created in advance by statistical methods from word speech data from a plurality of speakers.
以上のようにこの発明によれば、単語音声の語頭部特徴
パターンによって単語音声発声終了前に一次子備選択を
行うので1次の大局的4?徴パターンによる予備選択の
計算量が軽減でき、単語音声認識VCおける予備選択処
理の時間短縮の効果な有する。As described above, according to the present invention, primary selection is performed before the end of word vocalization based on the word-initial feature pattern of the word voice, so the first-order global 4? The amount of calculation for preliminary selection based on characteristic patterns can be reduced, and this has the effect of shortening the time for preliminary selection processing in word speech recognition VC.
第1図は従来の単語音声認識装置の一例を示す図、第2
図は本発明による単語音声認識装置の実施例を示す図で
ある。
図中(1)はマイクロホン、(2)は入力音声波形、(
3)は音声分析回路、(4)は入力特徴パターン(全フ
レーム) 、 +51は大局的特徴抽出回路、(6)は
大局的特徴パターン、(力は予備選択回路、(8)は大
局的特徴登録メモ’) 、 (91は予備選択結果、
(IIは認識処理回路、συは認識用単語辞書登録メモ
リ、αりは認識結果、α鼾ま入力特徴パターン(語頭部
)、Q4は語頭予備選択回路、 +1!9は語頭部特徴
パターン、σeは語頭予備選択回路、aηは語頭部特徴
登録メモリ。
QIGは一次子備選択結果である。
なお1図中同一あるいは相当部分には同一符号を付して
示しである。
代理人大岩増雄
第 1 崗Figure 1 is a diagram showing an example of a conventional word speech recognition device;
The figure is a diagram showing an embodiment of a word speech recognition device according to the present invention. In the figure, (1) is the microphone, (2) is the input audio waveform, (
3) is the speech analysis circuit, (4) is the input feature pattern (all frames), +51 is the global feature extraction circuit, (6) is the global feature pattern, (power is the preliminary selection circuit, (8) is the global feature Registration memo'), (91 is the preliminary selection result,
(II is the recognition processing circuit, συ is the recognition word dictionary registration memory, α is the recognition result, α snoring input feature pattern (word beginning), Q4 is the word beginning preliminary selection circuit, +1!9 is the word beginning feature pattern , σe is the word-initial preliminary selection circuit, aη is the word-initial feature registration memory. QIG is the primary child selection result. In addition, the same or equivalent parts in Figure 1 are indicated with the same symbols. Agent Oiwa Masuo No. 1
Claims (1)
択された登録単語音声について認識処理をする単語音声
認識装置において9語頭部の特徴によって複数のクラス
タに分割された登録単語音声の各クラスタ中心単語の語
頭部の特徴と、各クラスタに所属する単語番号とによっ
て記述される語頭部特徴登録メモリと、音声認識時に単
語音声発声終了前に入力単語音声の語頭部特徴と前記語
頭部特徴登録メモリ中のクラスタ中心単語の語頭部特徴
とのパターン間距離を計算し、パターン間距離が小さい
クラスタ中心単語を持つクラスタから順に数個のクラス
タを抽出してそのクラスタに所属する単語番号を一次子
備選択結果として出力する語頭予備選択回路とを有する
ことを特徴とする単語音声認識装置。Each cluster of registered word sounds is divided into a plurality of clusters according to the characteristics of the beginning of the word in a word speech recognition device that performs preliminary selection of registered word sounds for input word sounds and performs recognition processing on one selected registered word sound. A word-beginning feature registration memory described by the feature of the word-beginning of the central word and the word number belonging to each cluster; Calculates the inter-pattern distance between the cluster center word and the word-head feature in the head feature registration memory, extracts several clusters in order from the cluster with the cluster center word with the smallest inter-pattern distance, and belongs to that cluster. 1. A word speech recognition device comprising: a word-initial preliminary selection circuit that outputs a word number as a primary component selection result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58191285A JPS6083093A (en) | 1983-10-13 | 1983-10-13 | Word voice recognition equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58191285A JPS6083093A (en) | 1983-10-13 | 1983-10-13 | Word voice recognition equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS6083093A true JPS6083093A (en) | 1985-05-11 |
Family
ID=16272017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP58191285A Pending JPS6083093A (en) | 1983-10-13 | 1983-10-13 | Word voice recognition equipment |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS6083093A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018190180A1 (en) * | 2017-04-13 | 2018-10-18 | 日東電工株式会社 | Polarizer, image display device and method for producing said image display device |
-
1983
- 1983-10-13 JP JP58191285A patent/JPS6083093A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018190180A1 (en) * | 2017-04-13 | 2018-10-18 | 日東電工株式会社 | Polarizer, image display device and method for producing said image display device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5278942A (en) | Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data | |
US6195634B1 (en) | Selection of decoys for non-vocabulary utterances rejection | |
JPH0876788A (en) | Detection method of easy-to-confuse word in voice recognition | |
JP2000267692A (en) | Training method for voice recognizer | |
US5129001A (en) | Method and apparatus for modeling words with multi-arc markov models | |
US7346497B2 (en) | High-order entropy error functions for neural classifiers | |
JPS6024597A (en) | Voice registration system | |
JP2021033260A (en) | Training method, speaker identification method, and recording medium | |
EP1187096A1 (en) | Speaker adaptation with speech model pruning | |
JPS6083093A (en) | Word voice recognition equipment | |
Shokri et al. | A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter | |
JP3437492B2 (en) | Voice recognition method and apparatus | |
JP2001255887A (en) | Speech recognition device, speech recognition method and medium recorded with the method | |
Fukuda et al. | Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM. | |
JP3100180B2 (en) | Voice recognition method | |
Ming et al. | Improving speech recognition performance by using multi-model approaches | |
JP3036509B2 (en) | Method and apparatus for determining threshold in speaker verification | |
JP2886474B2 (en) | Rule speech synthesizer | |
JP3866171B2 (en) | Phoneme determination method, apparatus and program thereof | |
Sinha et al. | Exploring the role of pitch-adaptive cepstral features in context of children's mismatched ASR | |
JP3256979B2 (en) | A method for finding the likelihood of an acoustic model for input speech | |
JP3285047B2 (en) | Speech recognition device for unspecified speakers | |
JPH11338492A (en) | Speaker recognition unit | |
EP0190489A1 (en) | Speaker-independent speech recognition method and system | |
JP3285048B2 (en) | Speech recognition device for unspecified speakers |