JPH09237097A

JPH09237097A - Voice recognition device

Info

Publication number: JPH09237097A
Application number: JP8042461A
Authority: JP
Inventors: Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-02-29
Filing date: 1996-02-29
Publication date: 1997-09-09

Abstract

PROBLEM TO BE SOLVED: To avoid badness of the sharing property of leaning data by providing the standard pattern expressing the center part of a word and the standard pattern expressing the connection part of a word and connecting them alternately. SOLUTION: An inputted voice is converted into an electric signal in a voice input means 1. The voice converted into the electric signal is further analyzed in a voice analyzing means 2 and then time series of feature vectors are outputted. The time series of the feature vectors of the input voice are collated with standard patterns preliminarily stored in a standard pattern storage means 5 in a collating means 3 and then scores are calculated every word of recognition objects. In this case, the standard pattern expressing the center part of a word and the standard pattern expressing the connection part of a word are provided as standard patterns and the standard patter of a word string is constituted by connecting them alternately. Moreover, collations are performed under the control of a finite state automaton 6. Then, recognized results are outputted based on the scores of respective words in a decision means 4.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声で発声された数
字を認識するような連続数字音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous number voice recognition apparatus for recognizing numbers spoken by voice.

【０００２】[0002]

【従来の技術】従来、連続数字音声認識装置における標
準パタンの単位としては、単語単位の標準パタンを用い
るのが一般的である。日本語の任意の文章を認識対象に
したり、大語彙の単語認識（例えば日本人の全人名等）
を対象とするような音声認識装置では、単語単位に標準
パタンを用意することは、標準パタンの個数が多くなり
すぎて実際問題として不可能となる。しかしながら、１
０数字のみを認識対象とするような小語彙の数字音声認
識装置では、単語単位（一桁数字単位）の標準パタンを
用いても、標準パタンの個数は１０個程度ですむ。2. Description of the Related Art Conventionally, as a unit of a standard pattern in a continuous numeral voice recognition device, a standard pattern of word unit is generally used. Recognize arbitrary sentences in Japanese and recognize large vocabulary words (for example, all names of Japanese people)
In a voice recognition device for which the target pattern is, it is practically impossible to prepare standard patterns for each word because the number of standard patterns becomes too large. However, 1
In a small vocabulary number voice recognition device that recognizes only 0 numbers, the number of standard patterns is only about 10 even if the standard pattern of word units (single digit numbers) is used.

【０００３】単語単位に標準パタンを用意すると、単語
内の調音結合は標準パタンの中に組み込まれ、単語内の
調音結合には対応できることになる。しかしながら、単
語と単語の間に生じる調音結合には対応できない。When a standard pattern is prepared for each word, the articulatory combination within a word is incorporated into the standard pattern, and the articulatory combination within a word can be dealt with. However, it cannot deal with articulatory coupling that occurs between words.

【０００４】単語と単語の間に生じる調音結合に対応す
る手法としては、（社）日本音響学会、音声研究会資
料、資料番号Ｓ８４-６４“半単語対標準パターンを用
いた連続数字音声認識”に記載のような例がある。この
例では、先行する数字の中心部分から後続する数字の中
心部分までの範囲をひとまとまりの単位として標準パタ
ンとする。このように数字と数字の間の部分を標準パタ
ンとするため、標準パタンの中に単語と単語の間に生じ
る調音結合が取り込まれ、単語間の調音結合に対応でき
ることになる。この方法では、単語の対毎に標準パタン
を用意するので標準パタンの個数は２桁数字の組み合わ
せの個数、すなわち１００個となる。As a method for coordinating articulatory coupling between words, there is a Japanese Society for Acoustics, Speech Study Group material, Material No. S84-64, "Continuous digit speech recognition using half-word pair standard pattern". There is an example as described in. In this example, the range from the central portion of the preceding numeral to the central portion of the succeeding numeral is set as the unit of the standard pattern. In this way, since the part between the numbers is set as the standard pattern, the articulatory coupling generated between the words is taken into the standard pattern, and the articulatory coupling between the words can be dealt with. In this method, since a standard pattern is prepared for each word pair, the number of standard patterns is 100, that is, the number of combinations of two-digit numbers.

【０００５】単語と単語の間に生じる調音結合に対応す
る他の従来例として、電子情報通信学会技術研究報告、
ＳＰ９５-２３“連続数字音声認識における音響モデル
学習法の検討”に記載のような例がある。この例では、
単語をhead、body、tailの３部分に分割し、head部分は
先行する単語毎に異なるモデルを用意する。同様にtail
部分も後続する単語毎に異なるモデルを用意する。head
部分に先行する単語との間に生じる調音結合が取り込ま
れ、tail部分に後続する単語との間に生じる調音結合が
取り込まれ、単語間の調音結合に対応できることにな
る。数字の場合、body部分が１０種類、各body毎にそれ
ぞれhead部分が１０種類、tail部分が１０種類となるの
で標準パタンの総数は２１０個となる。As another conventional example corresponding to articulatory coupling generated between words, as a technical research report of the Institute of Electronics, Information and Communication Engineers,
There is an example as described in SP95-23 “Examination of acoustic model learning method in continuous numeral speech recognition”. In this example,
A word is divided into three parts, head, body, and tail, and the head part prepares a different model for each preceding word. As well as tail
A different model is prepared for each word that follows the part. head
The articulatory coupling that occurs between the word preceding the portion and the word that follows the tail portion is captured, and the articulatory coupling between the words can be accommodated. In the case of numbers, there are 10 types of body parts, 10 types of head parts and 10 types of tail parts for each body, so the total number of standard patterns is 210.

【０００６】[0006]

【発明が解決しようとする課題】上記２種の従来技術で
は、単語間の調音結合に対応した標準パタンが用意で
き、連続数字に対する認識精度が向上する。しかしなが
ら、第１の従来例では、２桁数字の組合せにおいてしか
標準パタンの学習ができず、学習データの共有性が悪い
という問題がある。また、第２の従来例でも、body部分
に関しては学習データの共有性は良いものの、head部
分、tail部分で第１の従来例と同様に学習データの共有
性が悪いという問題がある。さらに、第２の従来例で
は、head部分、body部分、tail部分の全ての組み合わせ
を満たす学習データが必要となり、１０００種もの学習
データが必要になるという問題がある。According to the above-mentioned two conventional techniques, a standard pattern corresponding to articulatory coupling between words can be prepared, and the recognition accuracy for continuous numbers is improved. However, the first conventional example has a problem that the standard pattern can be learned only in the combination of two-digit numbers, and the sharing of learning data is poor. Also, in the second conventional example, there is a problem that the sharing of the learning data is good in the body part, but the sharing of the learning data is poor in the head part and the tail part as in the first conventional example. Further, in the second conventional example, there is a problem that learning data that satisfies all combinations of the head part, the body part, and the tail part is required, and 1000 kinds of learning data are required.

【０００７】本発明の目的は上記従来技術における単語
間の調音結合に対応する能力を生かしながら、従来技術
で問題であった学習データの共有性の悪さを回避する手
段を提供することにある。It is an object of the present invention to provide means for avoiding the poor sharing of learning data, which is a problem in the prior art, while making use of the ability to cope with articulatory coupling between words in the prior art.

【０００８】[0008]

【課題を解決するための手段】上記本発明の目的は、標
準パタンとして、単語の中心部分を表わす標準パタン
と、単語の接続部分を表わす標準パタンを設け、これら
を交互に連結することにより単語列の標準パタンを構成
するようにすることにより達成される。The object of the present invention is to provide a standard pattern representing a central portion of a word and a standard pattern representing a connecting portion of words as standard patterns, and connecting these alternately to form a word. This is achieved by making a standard pattern of rows.

【０００９】単語の中心部分を表わす標準パタンは数字
の場合計１０種類しかなく、学習データの共有性を高く
できる。単語の接続部分を表わす標準パタンは数字の場
合計１００種類あり、学習データの共有性はやや悪くな
るが、単語間の調音結合に対応できる。以上により、学
習データの共有性を高く保ちつつ、単語間の調音結合に
対応できる標準パタンを作成することができ、高精度な
音声認識を実現できる。In the case of numbers, there are only a total of 10 standard patterns that represent the central portion of a word, and the sharing of learning data can be improved. There are a total of 100 standard patterns that represent connected parts of words in the case of numbers, and although the sharing of learning data is a little poor, it is possible to deal with articulatory coupling between words. As described above, it is possible to create a standard pattern that can support articulatory coupling between words while maintaining high sharing of learning data, and realize highly accurate voice recognition.

【００１０】[0010]

【発明の実施の形態】以下、図を用いて本発明の実施例
を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１１】図１は本発明の連続数字音声認識装置の一
実施例の構成を示すブロック図である。入力された音声
は音声入力手段１において電気信号に変換される。電気
信号に変換された音声はさらに音声分析手段２において
分析され、特徴ベクトルの時系列が出力される。標準パ
タン格納手段５に予め格納されている標準パタンと前記
入力音声の特徴ベクトル時系列とが照合手段３にて照合
され、認識対象の各単語毎にスコアが求められる。な
お、照合は有限状態オートマトン６の制御の下に行なわ
れる。判定手段４では前記各単語のスコアに基づいて認
識結果を出力する。FIG. 1 is a block diagram showing the configuration of an embodiment of the continuous numeral voice recognition apparatus of the present invention. The input voice is converted into an electric signal by the voice input unit 1. The voice converted into the electric signal is further analyzed by the voice analysis means 2, and a time series of feature vectors is output. The standard pattern stored in advance in the standard pattern storage unit 5 and the feature vector time series of the input voice are collated by the collation unit 3, and a score is obtained for each word to be recognized. The matching is performed under the control of the finite state automaton 6. The determining means 4 outputs a recognition result based on the score of each word.

【００１２】次に本発明の中で用いている標準パタンに
ついて説明する。本発明では、標準パタンとして確率モ
デルを採用している。図２は本発明の中で用いている確
率モデル（Hidden Markov Model、以下HMMと略す）を
示した図である。図中各円は状態を表わし、矢印は状態
間の遷移を表わす。矢印に添えた記号ａijは状態ｉから
状態ｊへの遷移が生じる確率を表わし、記号ｂij（ｋ）
は状態ｉから状態ｊへの遷移が生じたときに第ｋ番目の
分類に属する特徴ベクトルが出力される確率を表わす。
入力音声の特徴ベクトル時系列が与えられると、前記状
態遷移確率、出力確率を用いて入力音声の特徴ベクトル
時系列がこの確率モデル（HMM）から出力された確率を
計算することができる。前記図１の中の照合手段３で
は、この確率計算の処理が行なわれる。確率計算処理の
詳細に関しては、Kluwer AcademicPublishers， Norwe
l， MA， 1989 “Automatic Speech Recognition”，95
頁-97頁に記載されている公知の方法を用いればよい。Next, the standard pattern used in the present invention will be described. In the present invention, a probabilistic model is adopted as the standard pattern. FIG. 2 is a diagram showing a stochastic model (Hidden Markov Model, hereinafter abbreviated as HMM) used in the present invention. In the figure, each circle represents a state, and arrows represent transitions between the states. The symbol aij attached to the arrow represents the probability of transition from the state i to the state j, and the symbol bij (k)
Represents the probability that a feature vector belonging to the kth classification will be output when a transition from state i to state j occurs.
Given the feature vector time series of the input voice, the probability that the feature vector time series of the input voice is output from this probability model (HMM) can be calculated using the state transition probability and the output probability. The matching means 3 in FIG. 1 performs this probability calculation process. For details of the probability calculation process, see Kluwer Academic Publishers, Norwe
l, MA, 1989 “Automatic Speech Recognition”, 95
Known methods described on page-97 may be used.

【００１３】さらに図３を用いて、本発明の主眼であ
る、連続数字音声認識用の標準パタンについて説明す
る。図３は、２桁数字“１２”を単語中心部の標準パタ
ンと単語接続部の標準パタンを交互に連結して構成した
ものである。図中＊は語頭および語尾の無音を示す。図
３の例では単語中心部の標準パタンには５状態のＨＭＭ
を割り当て、単語接続部の標準パタンには３状態のＨＭ
Ｍを割り当てている。図中３１は語頭の無音と数字
“１”の間の接続部の標準パタン、３２は数字“１”の
中心部分の標準パタン、３３は数字“１”と数字“２”
の間の接続部の標準パタン、３４は数字“２”の中心部
分の標準パタン、３５は数字“２”と語尾の無音の間の
接続部の標準パタンである。単語中心部の標準パタンは
計１０個あり、単語接続部の標準パタンは１２０ある
（語頭語尾の無音との接続があるため１００より多くな
る）。これらの単語中心部の標準パタンと単語接続部の
標準パタンの連結においては以下のような規則に従うこ
とは言うまでもない。すなわち、連続数字ＸＹの標準パ
タンの構成に当たっては、Ｘの単語中心部の標準パタン
とＹの単語中心部の標準パタンとの間に単語接続部の標
準パタンＸＹをはさんで連結する。Further, referring to FIG. 3, a standard pattern for continuous numeral voice recognition, which is the main feature of the present invention, will be described. In FIG. 3, the two-digit number "12" is formed by alternately connecting the standard pattern of the word center part and the standard pattern of the word connection part. In the figure, * indicates silence at the beginning and end of a word. In the example of FIG. 3, the standard pattern in the center of the word has an HMM of 5 states.
Is assigned to the standard pattern of the word connection part,
M is assigned. In the figure, 31 is a standard pattern of a connecting portion between the silence and the numeral "1", 32 is a standard pattern of a central portion of the numeral "1", 33 is a numeral "1" and a numeral "2".
, 34 is a standard pattern of the central portion of the numeral "2", 34 is a standard pattern of the central portion of the numeral "2", and 35 is a standard pattern of the joint between the numeral "2" and the ending silence. There are a total of 10 standard patterns in the center of the word, and 120 standard patterns in the word connecting part (more than 100 due to the connection with silence at the beginning of the word). It goes without saying that the following rules are followed in the connection of the standard pattern of the word center part and the standard pattern of the word connection part. That is, in constructing the standard pattern of consecutive numbers XY, the standard pattern XY of the word connecting portion is connected between the standard pattern of the X word central portion and the standard pattern of the Y word central portion.

【００１４】次に照合部の制御にて用いる有限状態オー
トマトンについて説明する。図４は１桁から１０桁まで
の間の任意の数字列を表現する有限状態オートマトンで
ある。図中○で示したのが各状態、状態と状態の間が矢
印（アーク）で接続されている。認識が開始されるとま
ず状態１にはいる。状態間の各アークは１桁の数字を表
しており、１回状態遷移が生じる毎に１桁の数字を認識
する。図４の有限状態オートマトンは全部で１１の状態
で構成される。状態１からスタートし、他のいずれかの
状態を経由して最終的に状態０に到って終了する。１桁
数字の場合には状態１から状態０に遷移して認識を終了
する。２桁数字の場合には状態１から状態２、状態０へ
と遷移して認識を終了する。Ｎ桁数字の場合には状態１
から状態２、状態３、…、状態Ｎ、状態０へと遷移して
認識を終了する。実際には、事前には入力される数字の
桁数が判らないので、これら全ての可能性を全て評価
し、最も高い確率を与えるものを認識結果とする。図４
中の各アークが１桁数字に対応しているが、本発明の標
準パタンを用いる場合にはこのアークの部分には単語中
心部分用の標準パタンを割り当て、各状態の中で単語接
続部用の標準パタンを用いて、この状態の中に入ってく
るアーク（単語）とこの状態から出ていくアーク（単
語）の間の接続を行なうこととなる。この接続におい
て、先に述べた規則に従うことは言うまでもない。図４
では、各アークが１桁数字を表していたが、符号（＋、
−）や小数点（．）を表すアークを用いて有限状態オー
トマトンを構成することにより、小数点以下の部分を含
む実数を表現したり、負の数を表現したりすることも容
易にできる。Next, the finite state automaton used in the control of the matching unit will be described. FIG. 4 is a finite state automaton that expresses an arbitrary number sequence between 1 and 10 digits. In the figure, the circles indicate the respective states, and the states are connected by arrows (arcs). When the recognition is started, the state 1 is entered first. Each arc between states represents a one-digit number and recognizes a one-digit number each time a state transition occurs. The finite state automaton in FIG. 4 is composed of 11 states in total. It starts from state 1 and eventually reaches state 0 through any of the other states and ends. In the case of a one-digit number, the state 1 is changed to the state 0, and the recognition is ended. In the case of a two-digit number, the state 1 transits to the state 2 and the state 0 to end the recognition. State 1 for N-digit numbers
To state 2, state 3, ..., State N, state 0, and the recognition is ended. Actually, since the number of digits of the number to be input is not known in advance, all of these possibilities are evaluated and the one giving the highest probability is the recognition result. FIG.
Each arc in the table corresponds to a one-digit number, but when the standard pattern of the present invention is used, a standard pattern for the central part of the word is assigned to the part of this arc, and for each word connection part in each state. Using the standard pattern of, the arcs (words) that come into this state and the arcs (words) that leave this state are connected. It goes without saying that this connection follows the rules described above. FIG.
Then, each arc represented a one-digit number, but the sign (+,
By constructing a finite state automaton using an arc representing −) or a decimal point (.), It is possible to easily represent a real number including a part below the decimal point or a negative number.

【００１５】次に本発明の音声認識装置において用いる
標準パタンであるHMMの通常の学習方法について説明す
る。HMMは大量の学習用音声サンプルを用いてパラメタ
推定を行なうことにより実施する。図５に示したのはそ
の学習フローの概要を示すフローチャートである。まず
HMMの初期モデルを何らかの方法により作成し（１０
１）、その後学習用音声サンプルを用いたパラメタ再推
定処理（１０２）を収束条件を満たすまで（１０３）繰
り返す。本学習方法は元々繰り返し推定アルゴリズムで
あり、繰り返し回数が増える毎にモデルの精度が向上す
る。したがって、初期モデルは必ずしも精度高く作成す
る必要はない。初期モデルの作成方法については何通り
かの方法があるが、例えば乱数を与えるような手法でよ
い。パラメタ再推定の方法については後述する。収束条
件判断についても何通りかの方法が考えられるが、例え
ば繰り返しの回数を固定して、一定回数（例えば５回）
の繰り返しを行なったら終了する様な方法で実用上問題
ない。Next, a normal learning method of the HMM which is a standard pattern used in the speech recognition apparatus of the present invention will be described. The HMM is implemented by parameter estimation using a large number of training speech samples. FIG. 5 is a flowchart showing the outline of the learning flow. First
Create an initial model of HMM by some method (10
1) and then the parameter re-estimation process (102) using the learning voice sample is repeated (103) until the convergence condition is satisfied. This learning method is originally an iterative estimation algorithm, and the accuracy of the model improves as the number of iterations increases. Therefore, it is not always necessary to create the initial model with high accuracy. There are several methods for creating the initial model, but a method such as giving a random number may be used. The method of parameter re-estimation will be described later. There are several possible methods for determining the convergence condition, but for example, the number of repetitions is fixed and a fixed number of times (for example, 5 times).
There is no problem in practice by the method of ending after repeating.

【００１６】収束条件が満足されたら繰り返しを終了
し、パラメタ推定により得られた各HMMのパラメタを格
納する（１０４）。When the convergence condition is satisfied, the iteration is terminated and the parameters of each HMM obtained by the parameter estimation are stored (104).

【００１７】次にHMMのパラメタ再推定処理について説
明する。図５のフローチャートに示したようにHMMのパ
ラメタ再推定処理は学習フローの中で繰り返し行なわれ
る。ここではその一回分の処理を図６のフローチャート
を用いて説明する。HMMのパラメタ再推定処理は学習用
の音声サンプルを用いて行なう。学習用の音声サンプル
の個数がNであるとすると、N回類似のパラメタ推定計算
処理を行ない、これが終了した後に各HMMのパラメタを
新しい値に更新する。各音声サンプルを用いたパラメタ
推定処理においては、まず音声サンプルの発声内容に合
わせて認識基本単位のHMMを連結し（２０３）、この連
結したHMMに対してForward-Backwardアルゴリズムと呼
ばれる手法を用いてパラメタ推定を行なう（２０４）。
連結されたHMMを元の認識基本単位に分解することによ
り、各認識基本単位のHMMのパラメタ推定値が得られる
（２０５）。ただし、この時点では各認識基本単位のHM
Mのパラメタの更新は行なわず、全音声サンプルについ
てパラメタ推定値が得られた後にそれまでに得られた全
パラメタ推定値を総合して各認識基本単位のHMMのパラ
メタの更新を行なう（２０７）。なお、パラメタ推定
（Forward-Backwardアルゴリズム）の具体的な計算手続
きについてはKluwer Academic Publishers， Norwel，
MA， 1989 “Automatic Speech Recognition”，95頁-9
7頁に記載されている公知の方法を用いればよい。Next, the parameter re-estimation processing of the HMM will be described. As shown in the flowchart of FIG. 5, the HMM parameter re-estimation process is repeatedly performed in the learning flow. Here, the one-time processing will be described with reference to the flowchart of FIG. The HMM parameter re-estimation process is performed using speech samples for learning. Assuming that the number of training voice samples is N, the parameter estimation calculation processing similar to N times is performed, and after this processing is completed, the parameters of each HMM are updated to new values. In the parameter estimation process using each voice sample, first, the HMMs of the basic recognition units are concatenated according to the utterance content of the voice sample (203), and a method called the Forward-Backward algorithm is used for this concatenated HMM. Parameter estimation is performed (204).
By decomposing the concatenated HMMs into the original recognition basic units, the parameter estimation value of the HMM of each recognition basic unit is obtained (205). However, at this point, the HM of each recognition basic unit
The parameters of M are not updated, and after the parameter estimates are obtained for all speech samples, the parameter estimates of all recognition parameters obtained so far are combined to update the parameters of the HMM of each recognition basic unit (207). . For the specific calculation procedure of parameter estimation (Forward-Backward algorithm), see Kluwer Academic Publishers, Norwel,
MA, 1989 “Automatic Speech Recognition”, page 95-9
The known method described on page 7 may be used.

【００１８】[0018]

【発明の効果】以上本発明によれば、学習データの共有
性を高く保ちつつ、単語間の調音結合に対応できる標準
パタンを作成することができ、高精度な連続数字音声認
識を実現できる。As described above, according to the present invention, it is possible to create a standard pattern capable of coordinating articulations between words while maintaining high sharing of learning data, and to realize highly accurate continuous digit speech recognition.

[Brief description of drawings]

【図１】本発明の連続数字音声認識装置の一実施例の構
成を示すブロック図。FIG. 1 is a block diagram showing the configuration of an embodiment of a continuous numeral voice recognition device of the present invention.

【図２】本発明の連続数字音声認識装置で用いる隠れマ
ルコフモデルを説明する図。FIG. 2 is a diagram illustrating a hidden Markov model used in the continuous numeral voice recognition device of the present invention.

【図３】本発明の連続数字音声認識装置で用いる単語中
心部の標準パタンと単語接続部の標準パタンの連結の仕
方を説明する図。FIG. 3 is a diagram for explaining a method of connecting a standard pattern of a word center portion and a standard pattern of a word connection portion used in the continuous numeral voice recognition device of the present invention.

【図４】１桁から１０桁までの任意の数字列を表現する
有限状態オートマトン。FIG. 4 is a finite state automaton that expresses an arbitrary digit string of 1 to 10 digits.

【図５】本発明の標準パタンの学習方法を説明するフロ
ーチャート。FIG. 5 is a flowchart illustrating a standard pattern learning method of the present invention.

【図６】本発明の標準パタンの学習方法におけるパラメ
タ推定処理を説明するフローチャート。FIG. 6 is a flowchart illustrating a parameter estimation process in the standard pattern learning method of the present invention.

[Explanation of symbols]

１・・・音声入力手段、２・・・音声分析手段、３・・
・照合手段、４・・・判定手段５・・・標準パタン格納
手段、６・・・有限状態オートマトン１０１・・・初期モデル作成処理、１０２・・・パラメ
タ再推定処理２０４・・・Forward-Backwardアルゴリズム。1 ... voice input means, 2 ... voice analysis means, 3 ...
Collation means, 4 determination means, 5 standard pattern storage means, 6 finite state automaton 101, initial model creation processing, 102, parameter re-estimation processing, 204 forward-backward algorithm.

Claims

[Claims]

1. A voice input means for inputting continuous numeric voice,
A voice analysis unit that analyzes the input voice and outputs a time series of feature vectors, a standard pattern storage unit that stores a standard pattern that serves as a reference for recognizing continuous numeric voice, and a time when the feature vector is used. In a voice recognition device comprising a collating means for collating a series and the standard pattern, and recognizing based on a collating result in the collating means, the standard pattern storing means includes a standard pattern corresponding to a central portion of each numeral and A standard pattern corresponding to a connecting portion of two consecutive numbers is stored, and the collating means is a standard in which the standard pattern of the central portion of the numeral and the standard pattern of the connecting portion of the numeral and the numeral are alternately connected. A voice recognition device, which outputs a recognition result by comparing a pattern with a time series of the feature vector of the input voice.

2. The continuous numeral voice recognition apparatus according to claim 1, wherein the standard pattern is constituted by an established model based on the stochastic model learned based on a learning voice sample.

3. The continuous digit speech recognition apparatus according to claim 2, wherein the probabilistic model is a hidden Markov model.

4. The continuous numeral voice recognition apparatus according to claim 1, wherein the matching means is controlled by a finite state automaton.

5. The continuous numeral voice recognition apparatus according to claim 4, wherein the finite state automaton is configured to limit the number of digits of an input numeral.

6. The continuous numeral voice recognition device according to claim 4, wherein the finite state automaton is configured to limit a range of numerical values of inputted numerals.

7. The standard pattern includes "+ (plus)" and "-".
A sign of "(minus)" and a standard pattern of decimal point ". (Ten)" are provided, and the finite state automaton is configured to limit an input number as a real number including a part after the decimal point. 5. The method according to claim 4, wherein
The described continuous numeral voice recognition device.