JPH10198394A - Voice recognition method - Google Patents

Voice recognition method

Info

Publication number
JPH10198394A
JPH10198394A JP9002647A JP264797A JPH10198394A JP H10198394 A JPH10198394 A JP H10198394A JP 9002647 A JP9002647 A JP 9002647A JP 264797 A JP264797 A JP 264797A JP H10198394 A JPH10198394 A JP H10198394A
Authority
JP
Japan
Prior art keywords
noise
hmm
recognition
word
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9002647A
Other languages
Japanese (ja)
Inventor
Hiroo Ikura
啓雄 居倉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP9002647A priority Critical patent/JPH10198394A/en
Priority to US08/874,331 priority patent/US5860062A/en
Publication of JPH10198394A publication Critical patent/JPH10198394A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition method with which a voice superposed with noise can be recognized with a high degree of provability even though a kind of noise and an SN ratio are simultaneously changed on issuance of voice. SOLUTION: A voice recognition device comprises a voice signal input part 2, a characteristic value extracting part 2, a data storage part 3 and a recognition result determining part 4. Upon creation of a word HMM(hiddened Markov model) used for recognition, noise HMM having been created previously from several kinds of noises, is used, and further, in consideration with S/N rations in a plurality of levels, it is possible to recognize a voice with a high degree of provability even though a kind of noise to e superposed, and the S/N ratio will be changed on the way.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、隠れマルコフモデ
ル(HMM)を用いた音声認識方法に関するものであ
る。
The present invention relates to a speech recognition method using a hidden Markov model (HMM).

【0002】[0002]

【従来の技術】計算機による音声の自動認識に広く用い
られている手法にHMMによるものがある。HMMとは
一般に複数個の状態を持つ非決定性確率有限オートマト
ンであり、その各状態は確率的定常信号源である。すな
わち、HMMは確率的定常信号源を非決定的に切り替え
ながら信号を発する非定常信号源と言うことができる。
2. Description of the Related Art An HMM method is widely used for automatic speech recognition by a computer. An HMM is generally a non-deterministic stochastic finite state automaton having a plurality of states, each of which is a stochastic stationary signal source. That is, the HMM can be said to be a non-stationary signal source that emits a signal while non-deterministically switching the stochastic stationary signal source.

【0003】HMMを用いた音声の認識に用いられる手
法に最尤推定法と呼ばれるものがある。HMMを単語認
識に用いるためには、まず、認識対象単語毎にHMMを
準備しておき、それぞれのHMMが自単語に属する音声
サンプルから抽出される特徴パラメータ系列を出力し易
いように、HMMを定義している内部パラメータを調節
する。そして最尤推定法では、未知の音声が入力された
とき、各HMM毎にその未知の音声から抽出した特徴パ
ラメータ系列の出力し易さ(尤度)を算出し、最大の尤
度を出力したHMMに対応する単語を認識結果とする。
There is a method called maximum likelihood estimation method as a method used for speech recognition using the HMM. In order to use the HMM for word recognition, first, an HMM is prepared for each word to be recognized, and the HMM is set so that each HMM can easily output a feature parameter sequence extracted from a speech sample belonging to the own word. Adjust the defined internal parameters. In the maximum likelihood estimation method, when an unknown voice is input, the output ease (likelihood) of a feature parameter sequence extracted from the unknown voice is calculated for each HMM, and the maximum likelihood is output. A word corresponding to the HMM is set as a recognition result.

【0004】単語を認識単位とするHMMを用いて雑音
が重畳された単語音声を認識する手法の一つにFran
c Martinが文献‘‘Recognition
ofNoisy Speech by Composi
tion of Hidden Markov Mod
els’’(信学技報SP92−96)で提案したNO
VO−HMMを用いる方法がある。これは雑音HMMと
単語HMMの内部パラメータを前記文献中でNOVO変
換と呼ばれている手法で合成し、こうして生成されたN
OVO−HMMを用いることにより、雑音が重畳された
単語音声を高い精度で認識するというものである。
[0004] One of the methods for recognizing a word voice on which noise is superimposed using an HMM using a word as a recognition unit is Tran.
c Martin wrote the document `` Recognition
ofNoisy Speech by Composi
Tion of Hidden Markov Mod
Els '' (NO. SP92-96)
There is a method using a VO-HMM. This is because the noise HMM and the internal parameters of the word HMM are synthesized by a method called NOVO conversion in the above-mentioned document, and the N
By using the OVO-HMM, word speech on which noise is superimposed is recognized with high accuracy.

【0005】図6は従来のNOVO変換の概念図であ
る。従来の方法では、認識対象単語の学習サンプルデー
タを用いた学習によって認識対象単語HMMを生成し、
1種類の雑音の学習サンプルデータを用いた学習によっ
て雑音HMMを生成した後、これら認識対象単語HMM
と雑音HMMとをNOVO変換によって合成し、各認識
対象単語毎にNOVO−HMMを得る。
FIG. 6 is a conceptual diagram of a conventional NOVO conversion. In the conventional method, a recognition target word HMM is generated by learning using learning sample data of the recognition target word,
After generating a noise HMM by learning using one type of noise learning sample data, these recognition target words HMM
And the noise HMM are combined by NOVO conversion to obtain a NOVO-HMM for each recognition target word.

【0006】[0006]

【発明が解決しようとする課題】従来のNOVO−HM
Mによる認識手法において高い認識率を得るには、NO
VO−HMMを作成するときに用いた雑音、すなわち、
認識に際して考慮する雑音が発声時間中に大きく変化し
ないことが必要であり、発声途中で雑音の種類、または
SN比が大きく変化した場合には認識率が大きく低下し
てしまうという問題点があった。
SUMMARY OF THE INVENTION Conventional NOVO-HM
To obtain a high recognition rate in the recognition method using M,
The noise used when creating the VO-HMM, ie,
It is necessary that the noise to be considered during recognition does not change significantly during the utterance time, and if the type of noise or the SN ratio changes significantly during utterance, there is a problem that the recognition rate is greatly reduced. .

【0007】したがって本発明は、雑音の種類とSN比
が発声途中に同時に変化する場合にも雑音重畳音声を高
い確率で認識することができる音声認識方法を提供する
ことを目的とする。
Accordingly, an object of the present invention is to provide a speech recognition method capable of recognizing a noise-superimposed speech with a high probability even when the type of noise and the SN ratio simultaneously change during speech production.

【0008】[0008]

【課題を解決するための手段】請求項1記載の発明は、
単語を認識単位とするHMMを用いて雑音が重畳された
音声を認識する音声認識方法であって、認識に用いる単
語HMMの生成時に、予め複数種の雑音から生成してお
いた雑音HMMを使用し、更に複数レベルのSN比を考
慮することにより、重畳される雑音の種類とSN比が途
中で変化する場合にも音声を高い確率で認識する。
According to the first aspect of the present invention,
A speech recognition method for recognizing speech on which noise is superimposed using an HMM having a word as a recognition unit, wherein a noise HMM previously generated from a plurality of types of noise is used when generating a word HMM used for recognition. In addition, by considering the SN ratios of a plurality of levels, even when the type of noise to be superimposed and the SN ratio change on the way, the voice is recognized with a high probability.

【0009】請求項2記載の発明は、単語を認識単位と
するHMMを用いて雑音が重畳された音声を認識する音
声認識方法であって、複数の雑音と複数レベルのSN比
を考慮して生成した各単語HMMの尤度を算出する際
に、最大尤度を与える経路上の雑音の種類を記録してお
き、最初の数個の単語HMMにおいて雑音の種類の遷移
状況がだいたい同じものとなれば、雑音の遷移の系列を
先の遷移状況に固定して各単語HMMの尤度の計算を行
い、計算量を削減する。
According to a second aspect of the present invention, there is provided a speech recognition method for recognizing a speech on which noise is superimposed using an HMM having a word as a recognition unit, wherein a plurality of noises and a plurality of levels of SN ratio are considered. When calculating the likelihood of each generated word HMM, the type of noise on the path that gives the maximum likelihood is recorded, and the transition status of the type of noise in the first few words HMM is almost the same. If so, the transition sequence of the noise is fixed to the previous transition state, the likelihood of each word HMM is calculated, and the calculation amount is reduced.

【0010】[0010]

【発明の実施の形態】本発明は、雑音が重畳された単語
音声を認識するNOVO−HMMの生成に際して、複数
種類の雑音と複数レベルのSN比を考慮することによ
り、雑音の種類、または音声と雑音のSN比が発生時間
中に変化しても認識精度が大きく低下しないNOVO−
HMMを生成するものである。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention considers a plurality of types of noise and a plurality of levels of S / N ratios in generating a NOVO-HMM that recognizes a word voice on which noise is superimposed, so that the type of noise or voice NOVO- does not significantly reduce recognition accuracy even if the SN ratio of noise and noise changes during the time of occurrence.
An HMM is generated.

【0011】以下、本発明の一実施の形態による音声認
識方法について図面を参照しながら説明する。図1は本
発明の一実施の形態による音声認識装置の構成ブロック
図である。1は学習サンプルデータまたは認識対象デー
タである音声信号をデジタル値に変換する音声信号入力
部、2は入力信号からフレーム毎に特徴量を算出する特
徴量抽出部、3は学習サンプルデータ、雑音データ、認
識対象単語HMM、雑音HMM、NOVO−HMMを格
納するデータ格納部、4は入力単語の出力確率を計算す
ると共に認識結果の決定を行う認識結果判定部である。
Hereinafter, a speech recognition method according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a configuration block diagram of a speech recognition device according to an embodiment of the present invention. 1 is an audio signal input unit for converting an audio signal, which is learning sample data or recognition target data, into a digital value, 2 is a feature amount extraction unit for calculating a feature amount for each frame from the input signal, 3 is learning sample data, noise data , A data storage unit for storing a recognition target word HMM, a noise HMM, and a NOVO-HMM, and a recognition result determination unit 4 for calculating an output probability of an input word and determining a recognition result.

【0012】図2は本発明の一実施の形態による音声認
識装置の回路ブロック図である。11はマイク、12は
中央処理装置(CPU)、13は読み出し専用メモリ
(ROM)、14は書き込み可能メモリ(RAM)、1
5は出力装置である。
FIG. 2 is a circuit block diagram of a speech recognition apparatus according to one embodiment of the present invention. 11 is a microphone, 12 is a central processing unit (CPU), 13 is a read-only memory (ROM), 14 is a writable memory (RAM), 1
5 is an output device.

【0013】図1における音声信号入力装置1はマイク
11とCPU12により構成され、特徴量抽出部2とデ
ータ格納部3と認識結果判定部4はCPU12がROM
13に書かれたプログラムを実行し、RAM14にアク
セスすることにより実行される。
The audio signal input device 1 shown in FIG. 1 comprises a microphone 11 and a CPU 12, and a feature extracting unit 2, a data storing unit 3, and a recognition result judging unit 4 have a CPU 12
13 is executed by executing a program written in the RAM 13 and accessing the RAM 14.

【0014】図3は本発明の一実施の形態による音声認
識方法のフローチャートである。まず、マイク11から
音声を入力し、特徴量抽出部2を経て得られた特徴量を
RAM14に書き込む(step2)。次に、ROM1
3上に格納されている各認識対象単語毎に作成したNO
VO−HMMについて、RAM14上の特徴量に対する
尤度を計算する(step3)。そして、最大の尤度を
出力したNOVO−HMMに対応する単語名を認識結果
として出力装置15に出力する(step4)。
FIG. 3 is a flowchart of a voice recognition method according to an embodiment of the present invention. First, a voice is input from the microphone 11, and the feature amount obtained through the feature amount extraction unit 2 is written into the RAM 14 (step 2). Next, ROM1
No. 3 created for each recognition target word stored on
For the VO-HMM, the likelihood for the feature on the RAM 14 is calculated (step 3). Then, a word name corresponding to the NOVO-HMM that has output the maximum likelihood is output to the output device 15 as a recognition result (step 4).

【0015】図4は本発明の一実施の形態による音声認
識方法に用いられるNOVO変換の概念図であって、N
OVO−HMMの作成過程を示すものであり、NOVO
変換を施す雑音HMMの状態数を2、認識に際して考慮
する雑音をAとBの2種類、SN比のレベル数をx[d
B]とy[dB]の2段階とした場合の図である。
FIG. 4 is a conceptual diagram of NOVO conversion used in the speech recognition method according to one embodiment of the present invention.
This figure shows the process of creating an OVO-HMM,
The number of states of the noise HMM to be transformed is 2, the noises to be considered at the time of recognition are A and B, and the number of levels of the SN ratio is x [d
B] and y [dB].

【0016】図6の従来方法と比べると学習サンプルデ
ータを用いた学習によって生成する雑音HMMの形状が
異なる。従来の方法では1種類の雑音から2状態の雑音
HMMを学習によって直接生成した後にその雑音HMM
と単語HMMに対してNOVO変換を施していたが、本
発明では、まず2種類の雑音から学習によってそれぞれ
1状態の雑音HMMを生成し、その後、考慮するSN比
のレベル数だけ各雑音HMMの状態を複製する。そし
て、学習と複製によって得られた全ての雑音HMMの状
態間の状態遷移確率を人為的に与えることによって(こ
の場合は自己遷移確率を0.7程度、他状態遷移確率を
それぞれ0.1程度)雑音HMMの状態を結合し、4状
態の雑音HMMを生成する。そして、この4状態の雑音
HMMと認識対象単語HMMとに従来のNOVO変換を
施した場合、NOVO−HMMの形状は図4のいちばん
下に示すように、4列の単語HMMが並んだような形に
なる。このいちばん上の列の各状態は、NOVO変換に
おける音声線形スペクトルと雑音線形スペクトルとを加
算する段階で雑音Aのx[dB]用の係数を用いたも
の、2番目の列の各状態は雑音Aのy[dB]用の係数
を用いたもの、3番目の列の各状態は雑音Bのx[d
B]用の係数を用いたもの、4番目の列の各状態は雑音
Bのy[dB]用の係数を用いたものになる。
The shape of the noise HMM generated by the learning using the learning sample data differs from the conventional method of FIG. In the conventional method, a two-state noise HMM is directly generated from one type of noise by learning, and then the noise HMM is generated.
NOVO conversion is performed on the word HMM and the word HMM. However, in the present invention, first, a one-state noise HMM is generated by learning from two types of noise, and thereafter, the number of noise HMMs of each noise HMM is increased by the number of SN ratio levels to be considered. Duplicate state. Then, the state transition probabilities between the states of all the noise HMMs obtained by learning and duplication are artificially given (in this case, the self transition probability is about 0.7, and the other state transition probabilities are about 0.1, respectively). ) Combine the states of the noise HMM to generate a 4-state noise HMM. When the conventional NOVO conversion is performed on the four-state noise HMM and the recognition target word HMM, the shape of the NOVO-HMM is such that four rows of word HMMs are arranged as shown at the bottom of FIG. It takes shape. Each state in the top row uses a coefficient for x [dB] of noise A at the stage of adding the speech linear spectrum and the noise linear spectrum in the NOVO conversion. A using the coefficient for y [dB] of A, each state in the third column is x [d
The state using the coefficient for B] and the state in the fourth column use the coefficient for y [dB] of the noise B.

【0017】図5は本発明の一実施の形態による音声認
識方法の計算量削減の概念図である。上述した方法で作
成したNOVO−HMMは、単一種類の雑音のみ、また
は単一レベルのSN比のみを考慮した場合のNOVO−
HMMに比べて多くの状態を有しており、各NOVO−
HMMの尤度計算に膨大な計算量が必要となる。そこ
で、各NOVO−HMMの尤度計算時に、最大尤度を与
える経路上の雑音の種類とSN比を記録しておき、最初
の数単語のNOVO−HMMでの雑音とSN比の遷移状
況がだいたい同じものとなった場合、雑音とSN比の遷
移の系列を固定して全てのNOVO−HMMの尤度の再
計算を行う。これにより、計算量を大幅に削減すること
ができる。
FIG. 5 is a conceptual diagram of the calculation amount reduction of the voice recognition method according to one embodiment of the present invention. The NOVO-HMM created by the above-described method has a NOVO-HMM when only a single type of noise or only a single-level SN ratio is considered.
It has more states than HMM, and each NOVO-
An enormous amount of calculation is required for the likelihood calculation of the HMM. Therefore, when calculating the likelihood of each NOVO-HMM, the type of noise on the path that gives the maximum likelihood and the SN ratio are recorded, and the transition state of the noise and SN ratio in the NOVO-HMM for the first few words is recorded. When they are almost the same, the sequence of the transition between the noise and the S / N ratio is fixed, and the likelihood of all NOVO-HMMs is recalculated. As a result, the amount of calculation can be significantly reduced.

【0018】[0018]

【発明の効果】本発明の音声認識方法は、雑音の種類と
SN比が発声途中に同時に変化する場合にも雑音重畳音
声を高い確率で認識することができる。
According to the speech recognition method of the present invention, noise-superimposed speech can be recognized with a high probability even when the type of noise and the S / N ratio change simultaneously during speech production.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施の形態による音声認識装置の構
成ブロック図
FIG. 1 is a configuration block diagram of a speech recognition device according to an embodiment of the present invention;

【図2】本発明の一実施の形態による音声認識装置の回
路ブロック図
FIG. 2 is a circuit block diagram of a speech recognition device according to one embodiment of the present invention.

【図3】本発明の一実施の形態による音声認識方法のフ
ローチャート
FIG. 3 is a flowchart of a voice recognition method according to an embodiment of the present invention;

【図4】本発明の一実施の形態による音声認識方法に用
いられるNOVO変換の概念図
FIG. 4 is a conceptual diagram of NOVO conversion used in a speech recognition method according to an embodiment of the present invention.

【図5】本発明の一実施の形態による音声認識方法の計
算量削減の概念図
FIG. 5 is a conceptual diagram of a calculation amount reduction of the voice recognition method according to one embodiment of the present invention;

【図6】従来のNOVO変換の概念図FIG. 6 is a conceptual diagram of a conventional NOVO conversion.

【符号の説明】[Explanation of symbols]

1 音声信号入力部 2 特徴量抽出部 3 データ格納部 4 認識結果判定部 11 マイク 12 CPU 13 ROM 14 RAM 15 出力装置 Reference Signs List 1 audio signal input unit 2 feature amount extraction unit 3 data storage unit 4 recognition result determination unit 11 microphone 12 CPU 13 ROM 14 RAM 15 output device

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】単語を認識単位とするHMMを用いて雑音
が重畳された音声を認識する音声認識方法であって、認
識に用いる単語HMMの生成時に、予め複数種の雑音か
ら生成しておいた雑音HMMを使用し、更に複数レベル
のSN比を考慮することにより、重畳される雑音の種類
とSN比が途中で変化する場合にも音声を高い確率で認
識することを特徴とする音声認識方法。
1. A speech recognition method for recognizing speech on which noise is superimposed using an HMM having a word as a recognition unit, wherein a word HMM used for recognition is generated in advance from a plurality of types of noise. Speech recognition characterized by using a noisy noise HMM and considering a plurality of levels of S / N ratios to recognize speech with high probability even when the type of superimposed noise and the S / N ratio change on the way. Method.
【請求項2】単語を認識単位とするHMMを用いて雑音
が重畳された音声を認識する音声認識方法であって、複
数の雑音と複数レベルのSN比を考慮して生成した各単
語HMMの尤度を算出する際に、最大尤度を与える経路
上の雑音の種類を記録しておき、最初の数個の単語HM
Mにおいて雑音の種類の遷移状況がだいたい同じものと
なれば、雑音の遷移の系列を先の遷移状況に固定して各
単語HMMの尤度の計算を行い、計算量を削減すること
を特徴とする音声認識方法。
2. A speech recognition method for recognizing speech on which noise is superimposed by using an HMM having a word as a recognition unit, wherein each of the word HMMs generated in consideration of a plurality of noises and an S / N ratio of a plurality of levels. When calculating the likelihood, the type of noise on the path that gives the maximum likelihood is recorded, and the first few words HM
When the transition state of the noise type is substantially the same in M, the likelihood of each word HMM is calculated by fixing the sequence of the noise transition to the previous transition state, and the amount of calculation is reduced. Voice recognition method to be used.
JP9002647A 1996-06-21 1997-01-10 Voice recognition method Pending JPH10198394A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP9002647A JPH10198394A (en) 1997-01-10 1997-01-10 Voice recognition method
US08/874,331 US5860062A (en) 1996-06-21 1997-06-13 Speech recognition apparatus and speech recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9002647A JPH10198394A (en) 1997-01-10 1997-01-10 Voice recognition method

Publications (1)

Publication Number Publication Date
JPH10198394A true JPH10198394A (en) 1998-07-31

Family

ID=11535156

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9002647A Pending JPH10198394A (en) 1996-06-21 1997-01-10 Voice recognition method

Country Status (1)

Country Link
JP (1) JPH10198394A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403896B2 (en) 2002-03-15 2008-07-22 International Business Machines Corporation Speech recognition system and program thereof
CN106033669A (en) * 2015-03-18 2016-10-19 展讯通信(上海)有限公司 Voice identification method and apparatus thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403896B2 (en) 2002-03-15 2008-07-22 International Business Machines Corporation Speech recognition system and program thereof
US7660717B2 (en) 2002-03-15 2010-02-09 Nuance Communications, Inc. Speech recognition system and program thereof
CN106033669A (en) * 2015-03-18 2016-10-19 展讯通信(上海)有限公司 Voice identification method and apparatus thereof

Similar Documents

Publication Publication Date Title
US10902845B2 (en) System and methods for adapting neural network acoustic models
US5581655A (en) Method for recognizing speech using linguistically-motivated hidden Markov models
US20020173955A1 (en) Method of speech recognition by presenting N-best word candidates
JP3459712B2 (en) Speech recognition method and device and computer control device
US20070094007A1 (en) Conversation controller
US8532990B2 (en) Speech recognition of a list entry
US20110218805A1 (en) Spoken term detection apparatus, method, program, and storage medium
JPH08110791A (en) Speech recognizing method
CN111489737B (en) Voice command recognition method and device, storage medium and computer equipment
JPH0782348B2 (en) Subword model generation method for speech recognition
JP5060006B2 (en) Automatic relearning of speech recognition systems
US20020002457A1 (en) Method and configuration for determining a representative sound, method for synthesizing speech, and method for speech processing
WO2019107170A1 (en) Urgency estimation device, urgency estimation method, and program
JP7044856B2 (en) Speech recognition model learning methods and systems with enhanced consistency normalization
CN111640423B (en) Word boundary estimation method and device and electronic equipment
JP2003208195A5 (en)
JP2003208195A (en) Device, method and program for recognizing consecutive speech, and program recording medium
JPH10198394A (en) Voice recognition method
JP4442211B2 (en) Acoustic model creation method
JPH1011085A (en) Voice recognizing method
US7818172B2 (en) Voice recognition method and system based on the contexual modeling of voice units
JP4104831B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
JP6903613B2 (en) Speech recognition device, speech recognition method and program
JP3316352B2 (en) Voice recognition method
JP2683976B2 (en) Probabilistic model for speech recognition