JP2853731B2 - Voice recognition device - Google Patents

Voice recognition device

Info

Publication number
JP2853731B2
JP2853731B2 JP7136725A JP13672595A JP2853731B2 JP 2853731 B2 JP2853731 B2 JP 2853731B2 JP 7136725 A JP7136725 A JP 7136725A JP 13672595 A JP13672595 A JP 13672595A JP 2853731 B2 JP2853731 B2 JP 2853731B2
Authority
JP
Japan
Prior art keywords
likelihood
cumulative
threshold
cumulative likelihood
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP7136725A
Other languages
Japanese (ja)
Other versions
JPH08328583A (en
Inventor
信輔 坂井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP7136725A priority Critical patent/JP2853731B2/en
Publication of JPH08328583A publication Critical patent/JPH08328583A/en
Application granted granted Critical
Publication of JP2853731B2 publication Critical patent/JP2853731B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、音声認識装置に関す
る。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device.

【0002】[0002]

【従来の技術】音声認識装置は非常に大きい演算量を必
要とするため、従来よりビームサーチによる演算量の削
減が試みられている。ビームサーチによる候補刈り取り
のためのビームの幅の設定法としては、各候補刈り取り
時に、尤度の高いものから一定数の候補を残す方法と、
最大尤度から一定幅の範囲の尤度をもつ候補を残す方法
が良く知られている。
2. Description of the Related Art Since a speech recognition apparatus requires a very large amount of calculation, attempts have been made to reduce the amount of calculation by beam search. As a method of setting the beam width for the candidate pruning by the beam search, at the time of each candidate pruning, a method of leaving a fixed number of candidates from those having a high likelihood,
A method of leaving a candidate having a likelihood within a certain range from the maximum likelihood is well known.

【0003】伊藤らによる、音響学会研究発表会講演論
文集1993年10月73〜74ページに掲載の論文
「連続音声認識におけるビームサーチ」おいては、一定
数の候補を残す方法のほうが、ビーム幅の設定値の変化
に対して探索効率が安定していると報告されている。ま
た、最大尤度から一定幅の範囲の尤度をもつ候補の数
は、発声時の周囲雑音環境の影響を受けて変動すると考
えられるが、一定数の候補を残す方法においては、その
ような候補数の変動はない。
In a paper entitled "Beam Search in Continuous Speech Recognition" published by Ito et al. It is reported that the search efficiency is stable with respect to a change in the set value of the width. In addition, the number of candidates having a likelihood within a certain range from the maximum likelihood is considered to fluctuate under the influence of the surrounding noise environment at the time of utterance. There is no change in the number of candidates.

【0004】[0004]

【発明が解決しようとする課題】しかしながら、上述の
一定数(仮にM個とする)の候補を残す方法では、候補
刈り取り時に第M位の候補を求めるための並べ替え処理
が必要となるために、処理量が多いという欠点があっ
た。
However, in the above-described method of leaving a fixed number of candidates (assuming that the number is M), it is necessary to perform a rearrangement process for obtaining the M-th candidate at the time of candidate pruning. However, there is a disadvantage that the amount of processing is large.

【0005】[0005]

【課題を解決するための手段】請求項1記載の発明によ
れば、入力音声の最初の数フレームで、最大尤度と一定
個数番目の候補の尤度との差をもとめておき、それ以後
は、前記差を用いて、ビームサーチにおける候補刈り取
りのための閾値を設定することを特徴とする音声認識装
置が得られる。
According to the first aspect of the present invention, the difference between the maximum likelihood and the likelihood of a certain number of candidates in the first several frames of the input speech is determined, and thereafter, The present invention provides a speech recognition apparatus characterized in that a threshold value for pruning candidates in a beam search is set using the difference.

【0006】請求項2記載の発明によれば、音声信号を
分析して特徴ベクトル時系列を出力する特徴抽出部と、
あらかじめ作成された標準パタンを蓄えておく標準パタ
ン記憶部と、累積尤度を保持する累積尤度記憶部と、前
記累積尤度記憶部に蓄えられた累積尤度と前記特徴ベク
トルの時系列と前記標準パタンとから新しい累積尤度を
求める漸化式計算部と、前記特徴ベクトル時系列のう
ち、ある部分系列に対しては、漸化式計算部でもとめら
れた累積尤度のうち一定個数を出力するとともに、最大
の累積尤度と前記一定個数番目の尤度との差を蓄積して
おき、それ以降の部分系列に対しては、前記蓄積された
尤度の差を用いて求められた閾値により、出力する累積
尤度を決定する累積尤度出力部と、前記累積尤度出力部
から出力される累積尤度より前記音声信号に対する認識
結果を求める結果出力部とを有することを特徴とする音
声認識装置が得られる。
According to the second aspect of the present invention, a feature extracting unit for analyzing a speech signal and outputting a feature vector time series,
A standard pattern storage unit that stores a standard pattern created in advance, a cumulative likelihood storage unit that holds a cumulative likelihood, and a time series of the cumulative likelihood and the feature vector stored in the cumulative likelihood storage unit. A recurrence formula calculation unit for obtaining a new cumulative likelihood from the standard pattern; and a certain number of the cumulative likelihoods obtained by the recurrence formula calculation unit for a certain partial sequence of the feature vector time series. Is output, and the difference between the maximum cumulative likelihood and the certain number of likelihoods is stored, and for subsequent subsequences, the difference is calculated using the difference between the stored likelihoods. A cumulative likelihood output unit that determines a cumulative likelihood to be output according to the threshold value, and a result output unit that obtains a recognition result for the speech signal from the cumulative likelihood output from the cumulative likelihood output unit. To obtain a speech recognition device .

【0007】請求項3記載の発明によれば、入力音声の
任意の個数の部分系列のおのおのに対して、第M位の候
補の累積尤度の最大累積尤度との差の平均値を求め、次
の部分系列の間では、前の部分系列で求めた前記差の平
均値を用いて候補刈り取りの閾値を設定することを特徴
とする音声認識装置における閾値設定方法が得られる。
According to the third aspect of the invention, the average value of the difference between the cumulative likelihood of the M-th candidate and the maximum cumulative likelihood of each of the arbitrary number of partial sequences of the input speech is determined. , Between the next partial series, a threshold setting method in the speech recognition apparatus, wherein a threshold for candidate pruning is set using the average value of the differences obtained in the previous partial series.

【0008】[0008]

【実施例】次に、本発明について図面を参照して説明す
る。
Next, the present invention will be described with reference to the drawings.

【0009】図1は、本発明の一実施例を示すブロック
図である。図1を参照すると本発明の実施例は、特徴抽
出部101と、標準パタン記憶部102と、累積尤度記
憶部103と、漸化式計算部104と、累積尤度出力部
105と、結果出力部106とから構成される。
FIG. 1 is a block diagram showing one embodiment of the present invention. Referring to FIG. 1, an embodiment of the present invention includes a feature extraction unit 101, a standard pattern storage unit 102, a cumulative likelihood storage unit 103, a recurrence formula calculation unit 104, a cumulative likelihood output unit 105, And an output unit 106.

【0010】特徴抽出部101は、音声入力を特徴ベク
トルの時系列に変換し、漸化式計算部104に出力す
る。標準パタン記憶部102は、標準パタンを記憶す
る。累積尤度記憶部103は、累積尤度出力部105か
ら出力される累積尤度を記憶する。処理が開始される以
前には、全認識パス候補に対して累積尤度の初期値1.
0を保持する。漸化式計算部104は、第iフレームの
特徴ベクトル、標準パタン、および第i−1フレームま
での累積尤度から、第iフレームまでの累積尤度を求め
る。累積尤度出力部105は、入力された累積尤度の集
合から、次フレームの累積尤度計算に用いられるものを
選択し、累積尤度記憶部103に出力する。結果出力部
106は、最終フレームまでの累積尤度に基づいて認識
結果を出力する。
The feature extraction unit 101 converts a speech input into a time series of feature vectors, and outputs the result to a recurrence formula calculation unit 104. The standard pattern storage unit 102 stores a standard pattern. The cumulative likelihood storage unit 103 stores the cumulative likelihood output from the cumulative likelihood output unit 105. Before the processing is started, the initial value of the cumulative likelihood for all recognition path candidates is set to 1.
Holds 0. The recurrence formula calculation unit 104 obtains the cumulative likelihood up to the i-th frame from the feature vector of the i-th frame, the standard pattern, and the cumulative likelihood up to the (i−1) -th frame. The cumulative likelihood output unit 105 selects, from the input set of cumulative likelihoods, one used for calculating the cumulative likelihood of the next frame, and outputs it to the cumulative likelihood storage unit 103. The result output unit 106 outputs a recognition result based on the accumulated likelihood up to the last frame.

【0011】次に、図1及び図2を参照して、本実施例
の動作について説明する。
Next, the operation of this embodiment will be described with reference to FIGS.

【0012】入力された音声は、特徴抽出部101にお
いて、一定の時間間隔ごとに、音声の周波数をスペクト
ルをあらわす特徴ベクトルに変換され、漸化式計算部1
04に出力される。この一定の時間間隔を以下ではフレ
ームと呼ぶ。第iフレームにおいて、漸化式計算部10
4では、標準パタン記憶部102に保持されている標準
パタン REF={R1 ,…,RN }、ここでRw ={rw (1),…,rw (Jw ) } を用いて、現在のフレームの特徴ベクトルの各標準パタ
ンに対する局所的尤度 lw (i,j)(w=1,…,N、j=1,…,Jw ) を求める。ここで、Nは標準パタン数、Jw はw番目の
標準パタンのフレーム長である。次に、この局所的尤
度、及び累積尤度記憶部103に保持されている第i−
1フレームの累積尤度集合 G={g1 (i−1,1),…,g1 (i−1,J1 ),…,gN (i−1, 1),…,gN (i−1,JN )} から、動的計画法に基づいた最大化処理により、下記数
1として現在のフレームの認識パス候補およびその累積
尤度を求める(図2のステップ1)。
The input speech is converted into a feature vector representing a spectrum of the frequency of the speech at regular time intervals in a feature extraction unit 101, and a recurrence formula calculation unit 1 is provided.
04 is output. This fixed time interval is hereinafter referred to as a frame. In the i-th frame, the recurrence formula calculating unit 10
In 4, the reference pattern REF = {R 1, ..., R N} , which is held in the standard pattern storage section 102, wherein R w = {r w (1 ), ..., r w (J w)} using Then, the local likelihood l w (i, j) (w = 1,..., N, j = 1,..., J w ) of the feature vector of the current frame with respect to each standard pattern is obtained. Here, N is the number of standard patterns, and J w is the frame length of the w-th standard pattern. Next, the local likelihood and the i-th i-
1 cumulative likelihood set of frames G = {g 1 (i- 1,1), ..., g 1 (i-1, J 1), ..., g N (i-1, 1), ..., g N ( From i−1, J N ) 求 め る, a recognition path candidate of the current frame and its cumulative likelihood are obtained as Equation 1 below by maximization processing based on dynamic programming (Step 1 in FIG. 2).

【0013】[0013]

【数1】 累積尤度出力部105は、あらかじめ決められたKと比
較して、i≦Kであるならば、最大値から第M番目の累
積尤度を求め、これを候補刈り取りのための閾値THと
し、これと最大尤度との差dを求める。後で平均を求め
るために、dの累積値Sd を、Sd =Sd +dと更新す
る(ステップ2,6、及び7)。
(Equation 1) The cumulative likelihood output unit 105 determines the M-th cumulative likelihood from the maximum value if i ≦ K in comparison with a predetermined K, and sets the Mth cumulative likelihood as a threshold TH for candidate pruning. The difference d between this and the maximum likelihood is determined. The accumulated value S d of d is updated as S d = S d + d to obtain an average later (steps 2, 6, and 7).

【0014】なお、Sd は、第1フレーム以前には0に
初期化しておく。
Note that S d is initialized to 0 before the first frame.

【0015】i=Kの場合は、Kフレーム間の最大尤度
と候補刈り取り閾値との差の平均D=Sd /Kを求める
(ステップ3)。
If i = K, an average D = S d / K of the difference between the maximum likelihood between K frames and the candidate pruning threshold is determined (step 3).

【0016】また、i>Kの場合は、候補刈り取りのた
めの閾値THは、TH=gmax −Dとする。gmax は、
第iフレームにおける累積尤度の最大値である(ステッ
プ4)。
When i> K, the threshold value TH for candidate pruning is set to TH = g max -D. g max is
This is the maximum value of the cumulative likelihood in the i-th frame (step 4).

【0017】各フレームにおいて、累積尤度出力部10
5は、累積尤度の閾値THよりも大きい尤度をもつ認識
パス候補のみを累積尤度記憶部に出力する(ステップ
5)。
In each frame, the cumulative likelihood output unit 10
5 outputs only the recognition path candidates having the likelihood larger than the threshold value TH of the cumulative likelihood to the cumulative likelihood storage unit (step 5).

【0018】現フレームが最終フレームである場合は、
累積尤度出力部105は、標準パタンの終端点に達した
すべての認識パス候補を結果出力部106に出力する。
結果出力部106は、累積候補が最大の認識パス候補を
もとめ、認識結果を出力する(ステップ8,9)。
If the current frame is the last frame,
The cumulative likelihood output unit 105 outputs to the result output unit 106 all recognition path candidates that have reached the end point of the standard pattern.
The result output unit 106 determines the recognition path candidate having the largest cumulative candidate and outputs the recognition result (steps 8 and 9).

【0019】以上、本実施例では、入力音声の最初のK
フレームで、第M位の候補の累積尤度の最大累積尤度と
の差の平均値を求めるという例によって説明したが、さ
らに一般には、入力音声の任意のLmax 個の部分系列l
1 ,li2 ,…,liLmax(Lmax ≧1)(これらの
部分系列を仮に学習区間と呼ぶ)のおのおのに対して、
上記の平均値を求め、学習区間lik と次の学習区間l
k+1 の間では、lik で求めた差の平均値を用いて候
補刈り取りの閾値を設定するという方法をとることがで
きる。
As described above, in the present embodiment, the first K
In the frame, has been described by way of obtaining the average value of the difference between the maximum cumulative likelihood of accumulated likelihood of the M-position candidate, more generally, any L max number of partial series l of the input speech
For each of i 1 , li 2 ,..., li Lmax (L max ≧ 1) (these subsequences are temporarily referred to as learning intervals),
The average value of the learning section li k and the next learning section l
Between i k + 1 , a method of setting a threshold for candidate pruning using the average value of the differences obtained by l i k can be used.

【0020】[0020]

【発明の効果】以上説明したように、本発明による音声
認識装置は、周囲の雑音環境の変動やビーム幅Mの設定
値の変化に対応して第M位の候補の累積尤度と最大の累
積尤度の差が大きく変動するような場合でも、入力の一
部を用いてこの差の平均値を求めておき、これを用いて
刈り取り閾値の決定を行なうので、入力の全ての区間に
対して累積尤度第M位までの候補を残す方法に準ずる候
補の刈り取りが行なわれ、安定した探索効率を有しなが
らも、刈り取り閾値決定のための処理量が多くならない
という効果を有する。
As described above, the speech recognition apparatus according to the present invention provides the maximum likelihood of the M-th candidate and the maximum likelihood corresponding to the fluctuation of the surrounding noise environment and the change of the set value of the beam width M. Even when the difference between the accumulated likelihoods fluctuates greatly, the average value of this difference is obtained using a part of the input, and the pruning threshold is determined using this. Thus, the pruning of candidates according to the method of leaving candidates up to the M-th cumulative likelihood is performed, and there is an effect that the processing amount for determining the pruning threshold does not increase while maintaining stable search efficiency.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の音声認識装置の一実施例の構成を示し
たブロック図である。
FIG. 1 is a block diagram showing a configuration of an embodiment of a speech recognition device of the present invention.

【図2】図1に示す音声認識装置の一実施例の処理の流
れを示したフローチャートである。
FIG. 2 is a flowchart showing a processing flow of an embodiment of the voice recognition device shown in FIG. 1;

【符号の説明】[Explanation of symbols]

101 特徴抽出部 102 標準パタン記憶部 103 累積尤度記憶部 104 漸化式計算部 105 累積尤度出力部 106 結果出力部 Reference Signs List 101 Feature extraction unit 102 Standard pattern storage unit 103 Cumulative likelihood storage unit 104 Recurrence formula calculation unit 105 Cumulative likelihood output unit 106 Result output unit

フロントページの続き (58)調査した分野(Int.Cl.6,DB名) G10L 3/00 561 G10L 5/06 JICSTファイル(JOIS)Continuation of the front page (58) Field surveyed (Int. Cl. 6 , DB name) G10L 3/00 561 G10L 5/06 JICST file (JOIS)

Claims (3)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 入力音声の最初の数フレームで、最大尤
度と一定個数番目の候補の尤度との差をもとめておき、
それ以後は、前記差を用いて、ビームサーチにおける候
補刈り取りのための閾値を設定することを特徴とする音
声認識装置。
In the first several frames of input speech, the difference between the maximum likelihood and the likelihood of a certain number of candidates is determined,
Thereafter, using the difference, a threshold value for pruning candidates in a beam search is set.
【請求項2】 音声信号を分析して特徴ベクトル時系列
を出力する特徴抽出部と、 あらかじめ作成された標準パタンを蓄えておく標準パタ
ン記憶部と、 累積尤度を保持する累積尤度記憶部と、 前記累積尤度記憶部に蓄えられた累積尤度と前記特徴ベ
クトルの時系列と前記標準パタンとから新しい累積尤度
を求める漸化式計算部と、 前記特徴ベクトル時系列のうち、ある部分系列に対して
は、漸化式計算部でもとめられた累積尤度のうち一定個
数を出力するとともに、最大の累積尤度と前記一定個数
番目の尤度との差を蓄積しておき、それ以降の部分系列
に対しては、前記蓄積された尤度の差を用いて求められ
た閾値により、出力する累積尤度を決定する累積尤度出
力部と、 前記累積尤度出力部から出力される累積尤度より前記音
声信号に対する認識結果を求める結果出力部とを有する
ことを特徴とする音声認識装置。
2. A feature extraction unit that analyzes a speech signal and outputs a feature vector time series, a standard pattern storage unit that stores a standard pattern created in advance, and a cumulative likelihood storage unit that stores cumulative likelihood. A recurrence formula calculating unit for calculating a new cumulative likelihood from the cumulative likelihood stored in the cumulative likelihood storage unit, the time series of the feature vector, and the standard pattern; and For the subsequence, a constant number of cumulative likelihoods determined by the recurrence formula calculation unit is output, and the difference between the maximum cumulative likelihood and the constant number likelihood is accumulated. For the subsequent subsequences, a cumulative likelihood output unit that determines the cumulative likelihood to be output, based on a threshold obtained using the accumulated likelihood difference, and an output from the cumulative likelihood output unit. From the accumulated likelihood That the recognition result speech recognition apparatus characterized by having a a determined result output unit.
【請求項3】 入力音声のある任意個数の閾値学習区間
については、最大尤度と一定個数番目の候補の尤度との
差の平均値を求め、その閾値学習区間と次の閾値学習区
間の間では、その閾値学習区間で求めた尤度差の平均値
を用いて候補刈り取りの閾値を設定することを特徴とす
る音声認識装置における閾値設定方法。
3. An arbitrary number of threshold learning sections having an input voice.
Of the maximum likelihood and the likelihood of a certain number of candidates
The average value of the differences is calculated, and the threshold learning section and the next threshold learning section are calculated.
A threshold setting method in a speech recognition device, wherein a threshold value of a candidate pruning is set using an average value of likelihood differences obtained in the threshold learning section between the intervals .
JP7136725A 1995-06-02 1995-06-02 Voice recognition device Expired - Lifetime JP2853731B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7136725A JP2853731B2 (en) 1995-06-02 1995-06-02 Voice recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7136725A JP2853731B2 (en) 1995-06-02 1995-06-02 Voice recognition device

Publications (2)

Publication Number Publication Date
JPH08328583A JPH08328583A (en) 1996-12-13
JP2853731B2 true JP2853731B2 (en) 1999-02-03

Family

ID=15182045

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7136725A Expired - Lifetime JP2853731B2 (en) 1995-06-02 1995-06-02 Voice recognition device

Country Status (1)

Country Link
JP (1) JP2853731B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10572812B2 (en) 2015-03-19 2020-02-25 Kabushiki Kaisha Toshiba Detection apparatus, detection method, and computer program product

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1436736B1 (en) 2001-09-28 2017-06-28 Level 3 CDN International, Inc. Configurable adaptive global traffic control and management
US9167036B2 (en) 2002-02-14 2015-10-20 Level 3 Communications, Llc Managed object replication and delivery
US8930538B2 (en) 2008-04-04 2015-01-06 Level 3 Communications, Llc Handling long-tail content in a content delivery network (CDN)
US10924573B2 (en) 2008-04-04 2021-02-16 Level 3 Communications, Llc Handling long-tail content in a content delivery network (CDN)
US9762692B2 (en) 2008-04-04 2017-09-12 Level 3 Communications, Llc Handling long-tail content in a content delivery network (CDN)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
電子情報通信学会論文集 Vol.J75−D−II,No.1,p1〜10(平成4年1月)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10572812B2 (en) 2015-03-19 2020-02-25 Kabushiki Kaisha Toshiba Detection apparatus, detection method, and computer program product

Also Published As

Publication number Publication date
JPH08328583A (en) 1996-12-13

Similar Documents

Publication Publication Date Title
US7039588B2 (en) Synthesis unit selection apparatus and method, and storage medium
US6980955B2 (en) Synthesis unit selection apparatus and method, and storage medium
US8010362B2 (en) Voice conversion using interpolated speech unit start and end-time conversion rule matrices and spectral compensation on its spectral parameter vector
US4882759A (en) Synthesizing word baseforms used in speech recognition
JP4531166B2 (en) Speech recognition method using reliability measure evaluation
US7437288B2 (en) Speech recognition apparatus
US6278972B1 (en) System and method for segmentation and recognition of speech signals
US7010483B2 (en) Speech processing system
US5309547A (en) Method of speech recognition
JP2853731B2 (en) Voice recognition device
JPH10105187A (en) Signal segmentalization method basing cluster constitution
JPH08211889A (en) Pattern adaptive system using tree structure
JP4659541B2 (en) Speech recognition apparatus and speech recognition program
JPH0247760B2 (en)
JP3039623B2 (en) Voice recognition device
JP3428058B2 (en) Voice recognition device
Shinozaki et al. Hidden mode HMM using bayesian network for modeling speaking rate fluctuation
JPH0792989A (en) Speech recognizing method
JP3353334B2 (en) Voice recognition device
JPH06266386A (en) Word spotting method
US7912715B2 (en) Determining distortion measures in a pattern recognition process
KR100293465B1 (en) Speech recognition method
JP2001083978A (en) Speech recognition device
JPH0247758B2 (en)
JPH09305195A (en) Speech recognition device and speech recognition method

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 19981021