JPS59211098A

JPS59211098A - Voice recognition equipment

Info

Publication number: JPS59211098A
Application number: JP58085340A
Authority: JP
Inventors: 渡部　敏恵
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-05-16
Filing date: 1983-05-16
Publication date: 1984-11-29
Anticipated expiration: 2009-12-12
Also published as: JPH06100918B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】ｆａ）　　発明の技術分野本発明は、話者の入力音声の入力特徴パラメータ時系列
を、予め登録しである複数個の登録特徴パラメータ時系
列と比較して認識する音声認識装置に関する。[Detailed Description of the Invention] fa) Technical Field of the Invention The present invention relates to a method for recognizing speech by comparing an input feature parameter time series of input speech of a speaker with a plurality of registered feature parameter time series registered in advance. Relating to a recognition device.

（ｂｌ　　技術の背景近年、音声認識装置として、限定された複数の登録話者
の入力音声を、適当な入力特徴パラメータ時系列に変換
して登録音声辞書に登録し、未知話者の入力音声なのか
、成る特定話者の入力音声なのかを、登録音声辞書と照
合して識別判定する話者音声認識装置とが、特定話者の
単語の入力音声を、適当な入力特徴パラメータ時系列に
変換して登録音声辞書に登録し、認識したい特定話者の
発声した入力音声を登録音声辞書と照合し、認識結果を
文字として表示する特定話者音声認識装置等、音声認識
技術の進歩と共に、音声による機械との対話分野にまで
拡大してきた。(bl Technology background) In recent years, speech recognition devices have been used to convert the input speech of a limited number of registered speakers into an appropriate input feature parameter time series and register it in a registered speech dictionary. A speaker speech recognition device that identifies whether the input speech is from a specific speaker by comparing it with a registered speech dictionary converts the input speech of the specific speaker's words into a time series of appropriate input feature parameters. With the advancement of voice recognition technology, such as specific speaker voice recognition devices, which register the input voice uttered by a specific speaker to be recognized with the registered voice dictionary and display the recognition results as characters, This has expanded to include the field of interaction with machines.

音声認識方法として、未知入力音声を周波数分析し、そ
の分析した結果のアナログ信号をデジタル信号に変換し
、変換されたデジタル信号を時系列化し、音声区間を決
める闇値により音声区間を決め、各音素の特徴を表す入
力特徴パラメータ時系列として抽出し、前記抽出された
入力特徴パラメータ時系列と、予め登録されている複数
個の登録特徴パラメータ時系列とを照合して、照合結果
の最も近い距離を選択して未知入力音声の認識結果を得
るよう構成されている。前記特徴パラメータ時系列をど
のような形式とするか、登録方法と照合選択方法をどの
ようにするか等が、入力音声認識の難易性と認識率に影
響を与える為、各種の方式について検討されている。As a speech recognition method, unknown input speech is frequency-analyzed, the analog signal resulting from the analysis is converted into a digital signal, the converted digital signal is converted into a time series, the speech section is determined by the dark value that determines the speech section, and each The input feature parameter time series representing the features of the phoneme is extracted as a time series, and the extracted input feature parameter time series is compared with a plurality of registered feature parameter time series registered in advance to determine the closest distance between the matching results. is configured to obtain recognition results for unknown input speech by selecting . Various methods have been studied because the format of the feature parameter time series, the registration method and matching selection method, etc. affect the difficulty and recognition rate of input speech recognition. ing.

（ｃｌ　　従来技術と問題点従来の、音声認識方式は、特定話者の発声した入力音声
の入力特徴パラメータ時系列を複数個登録した登録音声
辞書を持ち、認識したい入力音声の語頭・語尾を決めた
音声区間の入力特徴パラメータ時系列と、登録音声辞書
の複数個の登録特徴パラメータ時系列とを比較して、認
識結果を出力するものである。従来は、−回の入力音声
から一例以上の音声区間を切り出した後に、各々の音声
区間の入力特徴パラメータ時系列全てを用いて、予め登
録されている複数個の登録特徴パラメータ時系列毎に距
離計算を行っていた。(cl. Prior Art and Problems) Conventional speech recognition methods have a registered speech dictionary in which multiple time series of input feature parameters of input speech uttered by a specific speaker are registered, and the beginning and end of the input speech to be recognized are determined. The recognition result is output by comparing the input feature parameter time series of the voice section with the time series of a plurality of registered feature parameters in the registered speech dictionary. After cutting out a voice section, distance calculations are performed for each of a plurality of registered feature parameter time series registered in advance using all input feature parameter time series of each voice section.

この種の音声認識方式の構成について説明する。The configuration of this type of speech recognition method will be explained.

第１図に従来の音声認識装置の回路構成ブロック図を示
す。予め、話者の音声を登録処理する手順は、話者の発
声した一個の入力音声を、マイク１より入力させ、入力
した入力音声を帯域フィルタ一群２で、音声帯域２００
Ｈｚ　〜５ＫＨｚ程度を１０〜２０（７）チャンネルフ
ィルタ一群に分けて、５〜３０ｍ５周期で各チャンネル
フィルター出力を取り出し、特徴パラメータ時系列抽出
部３で、デジタル情報の入力特徴パラメータ時系列に変
換し、入力特徴パラメータ時系列バッファ４に格納する
。入力特徴パラメータ時系列バッファ４に格納された入
力特徴パラメータ時系列は、音声区間切り出し回路５に
より、語頭と語尾の音声区間を決める闇値により音声区
間を切り出し、入力音声の音素の特徴を表す登録特徴パ
ラメータ時系列として、登録特徴パラメータ時系列辞書
部６に登録される。以上の手順を登録したい入力音声の
数だけ繰り返し、複数個の登録特徴パラメータ時系列を
登録特徴パラメータ時系列辞書部６に登録する。FIG. 1 shows a block diagram of a circuit configuration of a conventional speech recognition device. The procedure for registering the speaker's voice in advance is to input one input voice uttered by the speaker from the microphone 1, and filter the input voice through a group of band filters 2 in the voice band 200.
About Hz to 5KHz is divided into a group of 10 to 20 (7) channel filters, the output of each channel filter is extracted at 5 to 30m5 cycles, and the feature parameter time series extractor 3 converts it into an input feature parameter time series of digital information. , are stored in the input feature parameter time series buffer 4. The input feature parameter time series stored in the input feature parameter time series buffer 4 is processed by a speech segment cutting circuit 5 to cut out speech segments using dark values that determine the speech segments at the beginning and end of a word, and register them to represent the characteristics of the phonemes of the input speech. It is registered in the registered feature parameter time series dictionary section 6 as a feature parameter time series. The above procedure is repeated for the number of input voices to be registered, and a plurality of registered feature parameter time series are registered in the registered feature parameter time series dictionary section 6.

次ぎに、話者の音声を認識処理する手順は、話者の発声
した入力音声を、マイク１より入力させ、上記連間様の
手順に従って、入力音声を入力特徴パラメータ時系列バ
ッファ４に格納する。入力特徴パラメータ時系列バッフ
ァ４に格納された入力特徴パラメータ時系列は、音声区
間切り出し回路５により、音声区間を決める閾値により
、複数個の音声区間毎に区分して切り出される。この音
声区間切り出し回路５は、同一アルゴリズムにより動作
する闇値のみが変えられた形式でも、アルゴリズム自体
が異なっている回路の組合せ形式でもよい。Next, the procedure for recognizing the speaker's voice is to input the input voice uttered by the speaker from the microphone 1, and store the input voice in the input feature parameter time series buffer 4 according to the above-mentioned procedure. . The input feature parameter time series stored in the input feature parameter time series buffer 4 is segmented into a plurality of speech segments and cut out by a speech segment extraction circuit 5 using a threshold value that determines the speech segment. The voice section extraction circuit 5 may be of a type in which only the darkness value is changed, operating according to the same algorithm, or may be of a type combining circuits with different algorithms themselves.

第４図は音声区間切り出し回路５で音声区間を決める闇
値により、複数個の音声区間を決める例であり、仮に、
３レベルの閾値をＴＬ、　ＴＭ、　ＴＨとし、−回の音
声、例えば、「アオモリ」が入力されると、闇値により
、闇値ＴＬレベルで１アオモリ」、閾値ＴＭレヘルで「
オモリ」、闇値ＴＨレベルで「オモ」の３個の音声区間
が決められる。従って、第１図においてｎ（ｌｌｉｌの
闇値が設定された場合は、該音声区間の入力特徴パラメ
ータ時系列はｎ個出力し、予め登録されている登録特徴
パラメータ時系列辞書部６の複数個の登録特徴パラメー
タ時系列ｍ個と、該音声区間毎の入力特徴パラメータ時
系列ｎ個とを、照合選択回路７で順次ｎｘｍ回照合して
、照合距離をｎＸｍ回計算し、ｎＸｍ回の照合距離のう
ち最も近い照合距離を選択して認識結果として認識端子
８に出力する。FIG. 4 shows an example in which a plurality of voice segments are determined by the dark value that determines voice segments in the voice segment cutout circuit 5.
Let the three-level thresholds be TL, TM, and TH, and when - times of voice, for example, "Aomori" is input, the darkness value will cause 1 "Aomori" at the dark value TL level and "1 Aomori" at the threshold TM level.
The three voice sections of "Omo" and "Omo" are determined by the dark value TH level. Therefore, in FIG. 1, when a dark value of n(llil) is set, n input feature parameter time series for the speech section are output, and a plurality of registered feature parameter time series dictionary sections 6 registered in advance are output. The matching selection circuit 7 sequentially matches the m registered feature parameter time series and the input feature parameter time series n for each voice section nxm times, calculates the matching distance nXm times, and calculates the matching distance nXm times. Among them, the closest matching distance is selected and outputted to the recognition terminal 8 as the recognition result.

以上が従来の音声認識の処理手順である。この方式では
、音声区間ｎ１ｌｆｆｉの入力特徴パラメータ時系列と
、ｍ個の登録特徴パラメータ時系列の照合距離計算量は
、ｎｘｍ回であり、認識カテゴリ数が多くなればなる程
計算量が増加し、照合処理に要する時間が多くなり、か
つ誤認識率が大きい欠点を有していた。The above is the conventional speech recognition processing procedure. In this method, the amount of calculation for the matching distance between the input feature parameter time series of the speech interval n1lffi and the m registered feature parameter time series is nxm times, and the amount of calculation increases as the number of recognition categories increases. This method has disadvantages in that the verification process requires a large amount of time and the rate of misrecognition is high.

（ｄｌ　　発明の目的本発明は、上記従来の欠点を解決することを目的として
いる。(dl OBJECT OF THE INVENTION The present invention aims to solve the above-mentioned conventional drawbacks.

（ｅｌ　　発明の構成上記目的は、未知入力音声から抽出された人力特徴パラ
メータ時系列と、予め登録されている複数個の登録特徴
パラメータ時系列とを照合して、未知入力音声の認識を
行う音声認識装置において、前記入力特徴パラメータ時
系列から複数個の音声区間を切り出す音声区間切り出し
回路と、前記音声区間切り出し回路より切り出された音
声区間を選択する第一の選択回路と、前記第一の選択回
路により選択された前記音声区間の前記入力特徴バラメ
ーク時系列を用いて認識結果を選択する第二の選択回路
を備え、認識したい未知入力音声を、前記音声区間切り
出し回路で切り出された複数個の音声区間の入力特徴パ
ラメータ時系列を、粗にサンプリングした相入力特徴パ
ラメータ時系列に変換し、前記第一の選択回路で相入力
特徴パラメータ時系列に対応するように、前記複数個の
登録特徴パラメータ時系列と照合して最も近い音声区間
を選択し、この選択された音声区間の前記入力特徴バラ
メーク時系列と、前記複数個の登録特徴パラメータ時系
列を、前記第二の選択回路により照合選択して認識する
よう構成した本発明によって達成される。(el) Structure of the Invention The above object is to recognize an unknown input voice by comparing a human feature parameter time series extracted from the unknown input voice with a plurality of registered feature parameter time series registered in advance. In the recognition device, a speech segment cutting circuit that cuts out a plurality of speech segments from the input feature parameter time series, a first selection circuit that selects the speech segment cut out from the speech segment cutting circuit, and the first selection. a second selection circuit that selects a recognition result using the input feature variation time series of the speech section selected by the circuit; The input feature parameter time series of the voice section is converted into a roughly sampled phase input feature parameter time series, and the first selection circuit selects the plurality of registered feature parameters so as to correspond to the phase input feature parameter time series. The closest speech section is selected by comparing with the time series, and the second selection circuit compares and selects the input feature parameter time series of the selected speech section and the plurality of registered feature parameter time series. This is achieved by the present invention, which is configured to recognize this.

本発明による複数個の音声区間切り出し回路と二段階の
選択回路を設けることにより、従来の方式に対し、照合
距離計算量を少なくすることができると同時に、入力特
徴パラメータ時系列と登録特徴パラメータ時系列を、２
回照合することに近僚し、入力音声の強弱や時間長並び
にアクセント等様々な変動に対応して認識し、特に、話
者のその日の体調により声がかすれたりしても、高い認
識率で認識することができる利点がある。By providing a plurality of speech segment extraction circuits and a two-stage selection circuit according to the present invention, it is possible to reduce the amount of matching distance calculation compared to the conventional method, and at the same time, it is possible to reduce the amount of matching distance calculation compared to the conventional method. series, 2
It recognizes various variations such as the strength, duration, and accent of the input voice, and can achieve a high recognition rate even if the voice becomes hoarse due to the speaker's physical condition that day. There are benefits that can be recognized.

ｆｆｌ　　発明の実施例以下本発明の一実施例について説明する。第２図は本発
明による音声認識装置の回路構成ブロック図であり、企
図を通し、同一対象物は第１図と同一符号で示す。９は
第一選択回路、１０は登録特徴パラメータ時系列辞書部
、１１はアンドゲート回路、１２は第二選択回路、であ
る。ffl Embodiment of the Invention An embodiment of the present invention will be described below. FIG. 2 is a circuit configuration block diagram of a speech recognition device according to the present invention, and throughout the invention, the same objects are designated by the same symbols as in FIG. 9 is a first selection circuit, 10 is a registered feature parameter time series dictionary section, 11 is an AND gate circuit, and 12 is a second selection circuit.

本回路構成において、音声の登録処理手順は、従来の第
１図に示す方法と同様なので省略する。In this circuit configuration, the voice registration processing procedure is the same as the conventional method shown in FIG. 1, so a description thereof will be omitted.

本発明による音声認識処理手順は、マイク１から認識さ
せる話者の入力音声を入力し、入力特徴パラメータ時系
列を入力特徴パラメータ時系列バンファ４に格納する所
までは第１図と同様である。The speech recognition processing procedure according to the present invention is the same as that shown in FIG. 1 up to the point where the input speech of the speaker to be recognized is inputted from the microphone 1 and the input feature parameter time series is stored in the input feature parameter time series buffer 4.

この入力特徴パラメータ時系列を音声区間切り出し回路
５で、ｎ個の闇値を設定してｎ個の音声区間の人力特徴
パラメータ時系列として切り出し、第一選択回路９にお
いてｎ個の人力特徴パラメータ時系列を、帯域フィルタ
一群２のチャンネル１個中の１個のチャンネル（但しｊ
＜ｉとする）を用いたｎ個の相入力特徴パラメータ時系
列に変換して、予め登録特徴パラメータ時系列辞書部１
０に登録されているｍ（ｌｌｉＩの登録特徴パラメータ
時系列中の１個を、第一選択回路９に送ってｎ個の相入
力特徴パラメータ時系列と照合し、照合距離の最も近い
音声区間をｎ個中から１個選択する。このｎ個中より選
択された１個の音声区間のみがアンドゲート回路１１の
ａ側に入力される。また、同時に、同一闇値で切り出さ
れた音声区間のｉチャンネル全て持った入力特徴パラメ
ータ時系列は、同系のアンドゲート回路１１のｂ側に人
力される。このアンドゲート回路１１のａ、ｂ両側に入
力された系統のみアンドゲート回路１１のａ側より入力
特徴バラメーク時系列を出力する。続いて、ｍ個中の他
の１個の登録特徴パラメータ時系列を、ｎ個の音声区間
の相入力特徴パラメータ時系列と照合し、上述同様にし
て音声区間を選択し、アントゲート回路１１のａ側に選
択された音声区間の入力特徴パラメータ時系列が出力さ
れる。但し、相入力特徴パラメータ時系列として入力特
徴パラメータ時系０列から時間方向に間引きサンプリングしたものや、一定
バイト数にパラメタ圧縮を施したもの等を使用してもよ
い。かくして、ｎ個の音声区間の相入力特徴パラメータ
時系列をｍ個の登録特徴パラメータ時系列とｍ回照合し
、合計ｍ個の音声区間が選択され、順次アンドゲート回
路１１のＣ側に、選択された音声区間の入力特徴パラメ
ータ時系列のみが出力され、第二選択回路１２に入力さ
れる。このｍ個の選択された音声区間の入力特徴パラメ
ータ時系列と、登録特徴パラメータ時系列辞書部１０に
登録されているｍ個の登録特徴パラメータ時系列とを、
順次第二選択回路１２でｍ回照合距離を計算し、最も近
い照合距離を選択して、認識結果として選択された登録
特徴パラメータ時系列を認識端子８に出力する。本発明
の方式において、第一選択回路と第二選択回路で計算す
る照合距離の計算量は、（ｎ　ｊ／　ｉ　＋ｌ）　ｍ　　＜　（ｎｘｍ）を満足
する（ｊ／ｉ）　＜　（１１／ｎ）　、　ｊ＜ｉであるｉと
ｊを設定することにより計算量は少なくなる。This input feature parameter time series is cut out as a human feature parameter time series of n speech sections by setting n dark values in the speech section extraction circuit 5, and then in the first selection circuit 9, the n human feature parameter time series is cut out as a human feature parameter time series of n speech sections. sequence, one channel out of one channel of bandpass filter group 2 (however, j
<i) into a time series of n phase input feature parameters, and preregistered feature parameter time series dictionary unit 1
One of the registered feature parameter time series of m(lliI registered in One voice section is selected from n. Only one voice section selected from among these n is input to the a side of the AND gate circuit 11. At the same time, the voice section cut out with the same darkness value is input to the a side of the AND gate circuit 11. The input feature parameter time series of all i-channels is manually input to the b side of the AND gate circuit 11 of the same system.Only the systems input to both the a and b sides of this AND gate circuit 11 are input from the a side of the AND gate circuit 11. The input feature parameter make time series is output.Next, the other one registered feature parameter time series among m is compared with the phase input feature parameter time series of n speech sections, and the speech section is determined in the same manner as described above. is selected, and the input feature parameter time series of the selected voice section is output to the a side of the ant gate circuit 11. However, the input feature parameter time series is thinned out in the time direction from the input feature parameter time series 0 as the phase input feature parameter time series. It is also possible to use a parameter compressed version with a fixed number of bytes, etc. In this way, the phase input feature parameter time series of n speech sections is compared with the m registered feature parameter time series m times. , a total of m speech sections are selected, and only the input feature parameter time series of the selected speech sections are sequentially output to the C side of the AND gate circuit 11 and input to the second selection circuit 12. The input feature parameter time series of the selected voice section and the m registered feature parameter time series registered in the registered feature parameter time series dictionary section 10 are
In order, the second selection circuit 12 calculates the matching distance m times, selects the closest matching distance, and outputs the registered feature parameter time series selected as the recognition result to the recognition terminal 8. In the method of the present invention, the amount of calculation of the matching distance calculated by the first selection circuit and the second selection circuit is (j/i) < (11/n) satisfying (n j/ i +l) m < (nxm). ), the amount of calculation is reduced by setting i and j such that j<i.

第３図は本発明の他の実施例であって、１３は第一選択
回路、１４は選択入力特徴パラメータ時系列バッファ、
１５は第二選択回路である。第一選択回路１３では、第
２図で記述した相入力特徴パラメータ時系列ｎ個を、登
録特徴パラメータ時系列ｍ個と照合し、照合距離のうち
最も近い音声区間ｒ個（但し１５ｍ）選択し、選択入力
特徴パラメータ時系列バッファ１４に格納する。この選
択格納された１個の入力特徴パラメータ時系列と、ｍ（
ＩＩｉｌの登録特徴パラメータ時系列を、第二選択回路
１５でｒ回照合距離を計算し、最も近い照合距離を選択
して認識結果として認＆ｆｆ１ｉ端子８に出力する。第
一選択回路と第二選択回路で計算する照合距離の計算量
は、（ｊ／ｉ）Ｘｎｍ＋ｒ　　＜　（ｎｘｍ）を満足する（ｊ／ｉ）＜　（１−１／ｎ）、ｒ≦ｍｊ＜ｉ１であるｒと、ｉとＪを設定することにより計算量は少な
くなる。FIG. 3 shows another embodiment of the present invention, in which 13 is a first selection circuit, 14 is a selected input feature parameter time series buffer,
15 is a second selection circuit. The first selection circuit 13 compares the n phase input feature parameter time series described in FIG. , the selected input feature parameters are stored in the time series buffer 14. This selected and stored one input feature parameter time series and m(
The second selection circuit 15 calculates the matching distance r times for the registered feature parameter time series of IIil, selects the closest matching distance, and outputs it to the recognition &ff1i terminal 8 as a recognition result. The amount of calculation for the matching distance calculated by the first selection circuit and the second selection circuit is as follows: (j/i) By setting r which is i 1 , i and J, the amount of calculation is reduced.

以上、本発明の実施例として、帯域フィルターによる周
波数分析方式で説明したが、ＬＰＧ分析方式等を採用し
ている音声認識装置にも利用できる。Although the embodiments of the present invention have been described above using a frequency analysis method using a bandpass filter, the present invention can also be used in a speech recognition device employing an LPG analysis method or the like.

（ｇｌ　　発明の詳細な説明したように、本発明による複数個の音声区間切り
出し回路と、二段階の選択回路を設けることにより、話
者の入力音声を認識する音声認識において、入力音声の
強弱や時間長並びにアクセント等様々な変動に対応して
認識できるので、認識率を改善できると共に、照合距離
の計算量を減少できる効果がある。(gl As described in detail of the invention, by providing a plurality of speech segment extraction circuits and a two-stage selection circuit according to the present invention, it is possible to recognize the strength and weakness of the input speech in speech recognition that recognizes the input speech of the speaker. Since recognition can be performed in response to various variations such as time length and accent, it is possible to improve the recognition rate and reduce the amount of calculation for matching distance.

[Brief explanation of drawings]

第１図は、従来の音声認識装置の回路構成ブロック図、
第２図、第３図は本発明による音声認識装置の回路構成
ブロック図、第４図は入力音声の音声区間を決める関係
図である。図面において、１はマイク、２は帯域フィルタ３２一群、３は特徴パラメータ時系列抽出部、４は入力特徴
パラメータ時系列バッファ、５は音声区間切り出し回路
、７は照合選択回路、８は認ｇｉｉｌｌ端子、９は第一
選択回路、１０は登録特徴パラメータ時系列辞書部、１
２は第二選択回路、１３は第一選択回路、１４は選択入
力特徴パラメータ時系列バッファ、１５は第二選択回路
をそれぞれ示す。４FIG. 1 is a circuit configuration block diagram of a conventional speech recognition device.
FIGS. 2 and 3 are block diagrams of the circuit configuration of the speech recognition apparatus according to the present invention, and FIG. 4 is a relationship diagram for determining the speech section of input speech. In the drawing, 1 is a microphone, 2 is a group of bandpass filters 3 2 , 3 is a feature parameter time series extractor, 4 is an input feature parameter time series buffer, 5 is a speech section extraction circuit, 7 is a matching selection circuit, and 8 is a recognition giill. Terminal, 9 is the first selection circuit, 10 is the registered feature parameter time series dictionary section, 1
2 represents a second selection circuit, 13 represents a first selection circuit, 14 represents a selected input feature parameter time series buffer, and 15 represents a second selection circuit. 4

Claims

[Claims]

In a speech recognition device that recognizes unknown input speech by comparing an input feature parameter time series extracted from unknown input speech with a plurality of registered feature parameter time series registered in advance, the input feature parameter time series is a speech segment cutting circuit that cuts out a plurality of speech segments from a series; a first selection circuit that selects the speech segments cut out by the speech segment cutting circuit; A second selection circuit selects a recognition result using the input feature parameter time series, and the input feature parameter time series of the plurality of speech segments cut out by the speech segment extraction circuit is used to select the unknown input speech to be recognized. , convert it into a roughly sampled phase input feature parameter time series, and compare it with the plurality of registered feature parameter time series to correspond to the phase input feature parameter time series in the first selection circuit to select the closest voice. It is characterized in that a section is selected, and the input feature parameter time series of the selected speech section and the plurality of registered feature parameter time series are collated and selected by the second selection circuit for recognition. speech recognition device.