JP2000322080A

JP2000322080A - Speech recognition processor and speech recognition processing method

Info

Publication number: JP2000322080A
Application number: JP11128246A
Authority: JP
Inventors: Masahiko Ikeda; 雅彦池田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-05-10
Filing date: 1999-05-10
Publication date: 2000-11-24

Abstract

PROBLEM TO BE SOLVED: To provide a speech recognition processor and a speech recognition processing method capable of recognizing speech with high speed and a high recognition rate. SOLUTION: This speech recognition processor 100 is equipped with matching processors 3 and 4, an initial matching buffer 7#1, a matching buffer 6, and a beam search processor 8. The matching processor 4 performs a matching process while judging whether a frame of an initial segment will be an initial edge or not. The result is stored in the initial matching buffer 7#1. In the matching processor 3, a matching process of a frame positioned in other than the initial segment is performed. The result is stored in the matching buffer 6. The beam search processor 8 performs a beam search process on the basis of the information in the matching buffer 6.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識処理装置
および音声認識処理方法に関し、特に、標準パターンと
のマッチング処理を行なう音声認識処理装置および音声
認識処理方法に関する。The present invention relates to a speech recognition processing apparatus and a speech recognition processing method, and more particularly to a speech recognition processing apparatus and a speech recognition processing method for performing a matching process with a standard pattern.

【０００２】[0002]

【従来の技術】図８は、従来の音声認識処理装置８００
の構成の概要を示すブロック図である。図８に示す矢印
は、１フレームの処理の流れを示している。図８を参照
して、音声認識処理装置８００は、マイク４０、Ａ／Ｄ
変換器４１、音響分析器４２、音声区間検出器４３、マ
ッチング処理器４４、マッチングバッファ４５、ビーム
サーチ処理器４６および標準パターン記憶器４７を備え
る。FIG. 8 shows a conventional speech recognition processor 800.
FIG. 2 is a block diagram showing an outline of the configuration of FIG. The arrow shown in FIG. 8 indicates the flow of processing for one frame. Referring to FIG. 8, voice recognition processing device 800 includes microphone 40, A / D
It includes a converter 41, an acoustic analyzer 42, a voice section detector 43, a matching processor 44, a matching buffer 45, a beam search processor 46, and a standard pattern storage 47.

【０００３】音声認識処理装置８００では、マイク４０
を通して入力された音声をサンプリングし、これをＡ／
Ｄ変換器４１においてデジタル変換する。Ａ／Ｄ変換器
４１は、デジタルデータを音声認識処理を行なう単位に
分割し（フレーム分割）し、フレーム単位で音響分析器
４２に伝送する。音響分析器４２では、Ａ／Ｄ変換器４
１から受けるフレームのデータ構造を、マッチング処理
器４４でマッチング処理を行なう形式に変換する。音声
区間検出器４３では、音響分析器４２の出力から音声区
間を検出する。音声区間検出器４３において音声区間と
判定されたデータは、マッチング処理器４４に伝送され
る。マッチング処理器４４は、音声区間のデータと標準
パターン記憶器４７に予め記憶された標準パターンとの
間でマッチング処理を行なう。In the speech recognition processing device 800, the microphone 40
Sampled audio input through
The digital conversion is performed in the D converter 41. The A / D converter 41 divides the digital data into units for performing voice recognition processing (frame division) and transmits the data to the acoustic analyzer 42 in frame units. In the acoustic analyzer 42, the A / D converter 4
The data structure of the frame received from 1 is converted into a format in which the matching processor 44 performs a matching process. The voice section detector 43 detects a voice section from the output of the acoustic analyzer 42. The data determined as a voice section by the voice section detector 43 is transmitted to the matching processor 44. The matching processor 44 performs a matching process between the data of the voice section and the standard pattern stored in the standard pattern storage 47 in advance.

【０００４】マッチング処理の結果は、マッチングバッ
ファ４５に格納される。マッチングバッファ４５は、た
とえば、（入力パターンのフレーム番号、マッチした標
準パターンのフレーム番号、類似度）等を蓄積情報とし
て格納する。[0006] The result of the matching process is stored in a matching buffer 45. The matching buffer 45 stores, for example, (frame number of input pattern, frame number of matched standard pattern, similarity) as accumulated information.

【０００５】図９は、図８に示す音声認識処理装置８０
０の動作を説明するための概念図である。図９におい
て、縦軸は標準パターンのフレームを、横軸は入力音声
（入力パターンと記す）のフレームをそれぞれ時系列で
表わしている。標準パターンは、Ｎ個とする（標準パタ
ーン１〜Ｎ）。なお、図中、記号２５は、マッチング経
路を表わしている。FIG. 9 shows a speech recognition processing device 80 shown in FIG.
FIG. 9 is a conceptual diagram for explaining the operation of the “0”. In FIG. 9, the vertical axis represents the frame of the standard pattern, and the horizontal axis represents the frame of the input voice (referred to as an input pattern) in time series. The number of standard patterns is N (standard patterns 1 to N). Note that, in the figure, a symbol 25 represents a matching path.

【０００６】音声認識処理装置８００では、入力パター
ン側の開始フレーム（マッチング処理の初端）は固定と
し、標準パターン側の開始フレームを移動させる（フレ
ーム１〜Ｆｈ）。以下、開始フレームとなり得る複数の
フレームから構成される区間を、端点フリー区間と称
す。In the speech recognition processor 800, the start frame on the input pattern side (the beginning of the matching process) is fixed, and the start frame on the standard pattern side is moved (frames 1 to Fh). Hereinafter, a section including a plurality of frames that can be the start frame is referred to as an end point free section.

【０００７】図８を参照して、ビームサーチ処理器４６
は、マッチングバッファ４５の結果を用いてビームサー
チ処理を行なう。ビームサーチ処理では、マッチングバ
ッファ４５に貯えられる類似度に基づき、マッチング処
理で使用した標準パターンを認識対象とするか否かの判
定を行なう。たとえば、マッチングバッファ４５に貯え
られる類似度が所定のしきい値よりも大きい（非類似度
が高い）標準パターンについては認識対象外とする。マ
ッチング処理を中断させることで計算速度を上げる。Referring to FIG. 8, beam search processor 46
Performs a beam search process using the result of the matching buffer 45. In the beam search process, it is determined based on the similarity stored in the matching buffer 45 whether or not the standard pattern used in the matching process is to be recognized. For example, a standard pattern whose similarity stored in the matching buffer 45 is higher than a predetermined threshold (dissimilarity is high) is not recognized. The calculation speed is increased by interrupting the matching process.

【０００８】図１０は、従来の音声認識処理装置９００
の構成の概要を示すブロック図である。図１０に示す音
声認識処理装置９００が、図８に示す音声認識処理装置
８００と異なる点は、マッチング処理器４４に代わって
マッチング処理器５４を備え、かつビームサーチ処理器
４６を含まないことにある。FIG. 10 shows a conventional speech recognition processing apparatus 900.
FIG. 2 is a block diagram showing an outline of the configuration of FIG. The difference between the speech recognition processing device 900 shown in FIG. 10 and the speech recognition processing device 800 shown in FIG. 8 is that the speech recognition processing device 900 includes a matching processor 54 instead of the matching processor 44 and does not include the beam search processor 46. is there.

【０００９】音声認識処理装置９００は、音声認識処理
装置８００と同様の手順で、音声区間の判定を行なう。
マッチング処理器５４は、音声区間と判断されたフレー
ムに関し、標準パターンとのマッチング処理を行なう。
この際、図１１に示すように、入力パターン側について
は、フレーム１〜フレームＦｉを、標準パターン側につ
いては、フレーム１〜Ｆｈをそれぞれ端点フリー区間と
して設定する。[0009] Speech recognition processing apparatus 900 determines a speech section in the same procedure as speech recognition processing apparatus 800.
The matching processor 54 performs a matching process with a standard pattern for a frame determined to be a voice section.
At this time, as shown in FIG. 11, the frames 1 to Fi are set as the end point free sections on the input pattern side, and the frames 1 to Fh are set as the end point free sections on the standard pattern side.

【００１０】[0010]

【発明が解決しようとする課題】上述のように、従来の
音声認識処理装置８００では、マッチング処理の対象と
なる入力パターン側の開始フレームは固定であり、標準
パターン側については端点フリー区間に含まれる数フレ
ームを開始フレームの候補とする。これは、入力パター
ン側および標準パターン側の両者において開始フレーム
を動かすと処理が複雑化してしまうからである。As described above, in the conventional speech recognition processing apparatus 800, the start frame on the input pattern side to be subjected to the matching process is fixed, and the standard pattern side is included in the end point free section. Are set as candidates for the start frame. This is because moving the start frame on both the input pattern side and the standard pattern side complicates the processing.

【００１１】しかしながら、音声入力時に雑音が入る
と、音声区間検出器４３は音声区間を誤検出することが
ある。このような場合、入力パターン側の開始フレーム
が固定されていることにより、誤検出の結果が初期段階
のマッチング処理に影響を与え、認識率が落ちるという
問題があった。However, if noise is input during voice input, the voice section detector 43 may erroneously detect a voice section. In such a case, since the start frame on the input pattern side is fixed, the result of the erroneous detection affects the matching process in the initial stage, and there is a problem that the recognition rate is reduced.

【００１２】また、従来の音声認識処理装置９００で
は、標準パターン側（フレーム１〜Ｆｈ）および入力パ
ターン側（フレーム１〜Ｆｉ）の両者において端点フリ
ー区間を設け、開始フレームを移動させることにより認
識率を向上させている。In the conventional speech recognition processing apparatus 900, an end point free section is provided on both the standard pattern side (frames 1 to Fh) and the input pattern side (frames 1 to Fi), and recognition is performed by moving the start frame. The rate is improving.

【００１３】しかしながら、逐次処理を行なうためには
ビームサーチ処理器４５を設置することが困難である。
このため、計算量が膨大になるため処理速度が遅くなっ
てしまうという問題があった。However, it is difficult to install the beam search processor 45 for performing the sequential processing.
For this reason, there has been a problem that the processing speed becomes slow due to the huge amount of calculation.

【００１４】そこで、本発明はこのような問題を解決す
るためになされたものであり、その目的は音声認識処理
の高速化を図りつつ、認識率が高い音声認識処理装置お
よび音声認識処理方法を提供することにある。The present invention has been made to solve such a problem, and an object of the present invention is to provide a speech recognition processing device and a speech recognition processing method having a high recognition rate while speeding up speech recognition processing. To provide.

【００１５】[0015]

【課題を解決するための手段】請求項１に係る音声認識
処理装置は、入力した音声パターンが、予め格納されて
いる標準パターンと類似しているか否かを判定すること
により音声認識を行なう音声認識処理装置であって、標
準パターンを格納する記憶器と、入力した音声パターン
をフレーム分割して、フレーム単位で出力する音声入力
器と、音声入力器の出力するフレームが音声区間に属す
るか否かを検出する検出器と、検出器の出力するフレー
ムが音声区間の初期区間に属するか否かを判定する区間
判定器と、区間判定器において初期区間に属すると判定
されたフレームについて、フレームがマッチング経路の
開始フレームに該当するか否かを判別しながら対応する
標準パターンのフレームとマッチング処理を行なう第１
のマッチング処理器と、区間判定器において初期区間以
外の区間に属すると判定されたフレームについて、対応
する標準パターンのフレームとマッチング処理を行なう
第２のマッチング処理器とを備える。According to a first aspect of the present invention, there is provided a voice recognition processing apparatus for performing voice recognition by determining whether an input voice pattern is similar to a standard pattern stored in advance. A recognition processing device, a storage device for storing a standard pattern, a voice input device for dividing an input voice pattern into frames, and outputting the frame in frame units, and determining whether a frame output from the voice input device belongs to a voice section. A detector that detects whether the frame belongs to the initial section of the voice section, and a section that determines whether the frame output by the detector belongs to the initial section. A first process of performing matching processing with a frame of a corresponding standard pattern while determining whether or not the frame corresponds to a start frame of a matching path.
And a second matching processor that performs a matching process on a frame of a standard pattern corresponding to a frame determined to belong to a section other than the initial section by the section determiner.

【００１６】請求項２に係る音声認識処理装置は、請求
項１に係る音声認識処理装置であって、第１のマッチン
グ処理器は、入力したフレームの一つ前のフレームのマ
ッチング処理の結果に従い、対応する標準パターンのフ
レームとマッチング処理を行なう第１処理器と、入力し
たフレームと対応する標準パターンの開始フレームとの
間でマッチング処理を行なう第２処理器と、第１処理器
の結果と第２処理器の結果とを比較して、マッチング経
路を決定する比較器とを含む。According to a second aspect of the present invention, there is provided the voice recognition processing apparatus according to the first aspect, wherein the first matching processor determines a result of the matching processing of a frame immediately before the input frame. A first processor for performing a matching process with a frame of a corresponding standard pattern, a second processor for performing a matching process between an input frame and a start frame of the corresponding standard pattern, and a result of the first processor. A comparator for comparing the result with the second processor to determine a matching path.

【００１７】請求項３に係る音声認識処理装置は、請求
項２にかかる音声認識処理装置であって、第２のマッチ
ング処理器におけるマッチング処理の結果に基づき、対
応する標準パターンを音声認識の対象とするか否かを判
定する検査器をさらに備え、第２のマッチング処理器
は、検査器の判定結果に基づき、対応する標準パターン
とのマッチング処理を中止する。According to a third aspect of the present invention, there is provided the voice recognition processing apparatus according to the second aspect, wherein a corresponding standard pattern is subjected to voice recognition based on a result of the matching processing in the second matching processor. And a second matching processor that stops the matching process with the corresponding standard pattern based on the determination result of the inspector.

【００１８】請求項４に係る音声認識処理装置は、請求
項２に係る音声認識処理装置であって、初期区間におけ
るマッチング処理終了後の初期区間におけるマッチング
処理の結果により、対応する標準パターンを音声認識の
対象とするか否かを判定する検査器をさらに備え、第２
のマッチング処理器は、検査器の判定結果に基づき、対
応する標準パターンとのマッチング処理を中止する。A speech recognition processing device according to a fourth aspect is the speech recognition processing device according to the second aspect, wherein a corresponding standard pattern is converted into a speech based on a result of the matching process in the initial section after the matching process in the initial section. An inspection device for determining whether or not to perform recognition;
Of the matching processor stops the matching process with the corresponding standard pattern based on the determination result of the inspection device.

【００１９】請求項５に係る音声認識処理方法は、入力
した音声パターンが、予め格納されている標準パターン
と類似しているか否かを判定することにより音声認識を
行なう音声認識処理方法であって、入力した音声パター
ンをフレーム分割して、フレーム単位で出力する音声入
力ステップと、音声入力ステップの出力するフレームが
音声区間に属するか否かを検出する検出ステップと、検
出ステップにおいて音声区間と判定されたフレームに対
してマッチング処理を行なうマッチング処理ステップと
を含み、マッチング処理ステップは、検出ステップの出
力するフレームが音声区間の初期区間に属するか否かを
判定する区間判定ステップと、区間判定ステップおいて
初期区間に属すると判定されたフレームについて、フレ
ームがマッチング経路の開始フレームに該当するか否か
を判別しながら対応する標準パターンのフレームとマッ
チング処理を行なう第１のマッチング処理ステップと、
区間判定ステップにおいて初期区間以外の区間に属する
と判定されたフレームについて、対応する標準パターン
のフレームとマッチング処理を行なう第２のマッチング
処理ステップとを備える。According to a fifth aspect of the present invention, there is provided a voice recognition processing method for performing voice recognition by determining whether an input voice pattern is similar to a standard pattern stored in advance. A voice input step of dividing an input voice pattern into frames and outputting the frame in frame units; a detection step of detecting whether a frame output by the voice input step belongs to a voice section; and determining the voice section in the detection step. A matching process step of performing a matching process on the selected frame, wherein the matching process step includes a section determining step of determining whether a frame output by the detecting step belongs to an initial section of the voice section, and a section determining step. Frame is determined to belong to the initial section A first matching processing step of performing frame and matching processing of corresponding standard pattern while it is determined whether or not corresponding to the start frame of the road,
A second matching processing step of performing a matching process on a frame determined to belong to a section other than the initial section in the section determination step with a frame of a corresponding standard pattern.

【００２０】請求項６に係る音声認識処理方法は、請求
項５に係る音声認識処理方法であって、第１のマッチン
グ処理ステップは、入力したフレームの一つ前のフレー
ムのマッチング処理の結果に従い、対応する標準パター
ンのフレームとマッチング処理を行なう第１処理ステッ
プと、入力したフレームと対応する標準パターンの開始
フレームとの間でマッチング処理を行なう第２処理ステ
ップと、第１処理ステップの結果と第２処理ステップの
結果とを比較して、マッチング経路を決定する比較ステ
ップとを含む。According to a sixth aspect of the present invention, there is provided the voice recognition processing method according to the fifth aspect, wherein the first matching processing step is performed in accordance with a result of the matching processing of the frame immediately before the input frame. A first processing step of performing matching processing with a frame of a corresponding standard pattern, a second processing step of performing matching processing between an input frame and a start frame of the corresponding standard pattern, and a result of the first processing step. Comparing the result of the second processing step to determine a matching path.

【００２１】請求項７に係る音声認識処理方法は、請求
項６に係る音声認識処理方法であって、第２のマッチン
グ処理ステップにおけるマッチング処理の結果に基づ
き、対応する標準パターンを音声認識の対象とするか否
かを判定する検査ステップをさらに備え、第２のマッチ
ング処理ステップでは、検査ステップおける判定結果に
基づき、対応する標準パターンとのマッチング処理を中
止する。According to a seventh aspect of the present invention, there is provided the voice recognition processing method according to the sixth aspect, wherein a corresponding standard pattern is subjected to voice recognition based on a result of the matching processing in the second matching processing step. In the second matching processing step, the matching processing with the corresponding standard pattern is stopped based on the determination result in the inspection step.

【００２２】請求項８に係る音声認識処理方法は、請求
項６に係る音声認識処理方法であって、第２のマッチン
グ処理ステップにおけるマッチング処理の結果に基づ
き、対応する標準パターンを音声認識の対象とするか否
かを判定する第１検査ステップをさらに備え、第１のマ
ッチング処理ステップは、初期区間における最終フレー
ムのマッチング処理が終了したか否かを判定するステッ
プと、初期区間における最終フレームのマッチング処理
の終了後に、初期区間におけるマッチング処理の結果に
より、対応する標準パターンを音声認識の対象とするか
否かを判定する第２検査ステップとをさらに含み、第２
のマッチング処理ステップでは、第１検査ステップの判
定結果または第２検査ステップの判定結果に基づき、対
応する標準パターンとのマッチング処理を中止する。The voice recognition processing method according to claim 8 is the voice recognition processing method according to claim 6, wherein a corresponding standard pattern is subjected to voice recognition based on a result of the matching processing in the second matching processing step. A first inspection step of determining whether or not the first frame has been processed. The first matching processing step includes a step of determining whether or not the matching processing of the last frame in the initial section has been completed, and a step of determining whether the last frame in the initial section has been completed. A second inspection step of determining whether or not the corresponding standard pattern is to be subjected to speech recognition based on a result of the matching process in the initial section after the end of the matching process;
In the matching process step, the matching process with the corresponding standard pattern is stopped based on the determination result of the first inspection step or the determination result of the second inspection step.

【００２３】[0023]

【発明の実施の形態】［実施の形態１］図１は、本発明
の実施の形態１における音声認識処理装置１００の構成
の概要を示すブロック図である。音声認識処理装置８０
０、９００と同じ機能には、同じ番号および同じ記号を
付しその説明を省略する。[First Embodiment] FIG. 1 is a block diagram showing an outline of a configuration of a speech recognition processing apparatus 100 according to a first embodiment of the present invention. Voice recognition processing device 80
The same functions as 0 and 900 are denoted by the same reference numerals and symbols, and description thereof will be omitted.

【００２４】図１を参照して、音声認識処理装置１００
は、マイク４０、Ａ／Ｄ変換器４１、音響分析器４２、
区間位置判定器２、マッチング処理器３〜４、標準パタ
ーン記憶器５、マッチングバッファ６、初期マッチング
バッファ７♯１〜７♯２、およびビームサーチ処理器８
を備える。Referring to FIG. 1, speech recognition processing device 100
Is a microphone 40, an A / D converter 41, an acoustic analyzer 42,
Section position determiner 2, matching processors 3 to 4, standard pattern storage 5, matching buffer 6, initial matching buffers 7 # 1 to 7 # 2, and beam search processor 8
Is provided.

【００２５】上述したように、マイク４０、Ａ／Ｄ変換
器４１を介して、入力した音声パターンをフレーム分割
し、音響分析器４２においてデータ構造を変換する。そ
して、音声区間検出器４３において、音声区間を検出す
る。区間位置判定器２は、音声区間検出器４３の出力す
るフレームが、初期区間のフレームであるか否かを判定
する。初期区間のフレームは、マッチング処理器４で、
それ以外のフレームはマッチング処理器３でそれぞれ処
理する。As described above, the input voice pattern is divided into frames via the microphone 40 and the A / D converter 41, and the data structure is converted in the acoustic analyzer 42. Then, the voice section detector 43 detects a voice section. The section position determiner 2 determines whether the frame output from the voice section detector 43 is a frame of an initial section. The frames in the initial section are processed by the matching processor 4.
Other frames are processed by the matching processor 3 respectively.

【００２６】図２は、初期区間について説明するための
図である。図２において記号９は音声を表わしている。
初期区間とは、図２に示すように、音声の初端を含むあ
る区間（時刻ｔ０〜ｔ４）を意味する。図２の場合、初
期区間のフレームとして、フレームＦ０（時刻ｔ０〜ｔ
２）、Ｆ１（時刻ｔ１〜ｔ３）、Ｆ２（時刻ｔ２〜ｔ
４）およびフレームＦ３（時刻ｔ３〜ｔ５）が記載され
ている。FIG. 2 is a diagram for explaining the initial section. In FIG. 2, symbol 9 represents a voice.
The initial section means a certain section (time t0 to t4) including the beginning of the voice, as shown in FIG. In the case of FIG. 2, the frame F0 (time t0 to t
2), F1 (time t1 to t3), F2 (time t2 to t)
4) and a frame F3 (time t3 to t5).

【００２７】フレームＦ０〜Ｆ３は、マッチング処理器
４の処理対象となり、その他のフレーム、たとえばフレ
ームＦ４（時刻ｔ４〜ｔ６）、Ｆ５（時刻ｔ５〜ｔ
７）、Ｆ６（時刻ｔ６〜ｔ８）、Ｆ７（時刻ｔ７〜…）
は、マッチング処理器３の処理対象となる。The frames F0 to F3 are processed by the matching processor 4, and other frames, for example, frames F4 (time t4 to t6) and F5 (time t5 to t5)
7), F6 (time t6 to t8), F7 (time t7 to...)
Are processed by the matching processor 3.

【００２８】図１を参照して、マッチング処理器４にお
けるマッチングの結果は、初期マッチングバッファ７♯
１に格納される。また、マッチング処理器３におけるマ
ッチング処理の結果は、マッチングバッファ６に貯えら
れる。なお、マッチング処理の対象となる標準パターン
１〜Ｎは、標準パターン記憶器５に格納されている。Referring to FIG. 1, the result of matching in matching processor 4 is an initial matching buffer 7 #.
1 is stored. The result of the matching process in the matching processor 3 is stored in the matching buffer 6. The standard patterns 1 to N to be matched are stored in the standard pattern storage 5.

【００２９】図３は、標準パターンに対する入力パター
ンのマッチング経路を表わすための図である。横軸は入
力パターンにおけるフレームを、縦軸は標準パターンに
おけるフレームをそれぞれ時系列で表わしている。記号
２５は、マッチング経路を表わしており、最適なマッチ
ング処理の結果をもたらす入力パターンと標準パターン
との位置関係を示している。FIG. 3 is a diagram showing a matching path of an input pattern with respect to a standard pattern. The horizontal axis represents frames in the input pattern, and the vertical axis represents frames in the standard pattern in time series. Symbol 25 indicates a matching path, and indicates a positional relationship between an input pattern and a standard pattern that provides an optimum matching result.

【００３０】入力パターン側は、フレーム１〜Ｆｉを、
標準パターン側は、フレーム１〜Ｆｈをそれぞれ端点フ
リー区間とする。この際、初期区間は、Ｋ個のフレーム
を含んでいるものとする（Ｋ＞ｉ）。なお、初期区間の
取り方は、認識率と処理速度の兼ね合いから決定する。On the input pattern side, frames 1 to Fi are
On the standard pattern side, frames 1 to Fh are each defined as an end point free section. At this time, it is assumed that the initial section includes K frames (K> i). The method of setting the initial section is determined based on a balance between the recognition rate and the processing speed.

【００３１】次に、本発明の実施の形態１における音声
認識処理装置１００の動作を図４を用いて説明する。図
４は、音声認識処理装置１００の動作を説明するための
フローである。マイク４０およびＡ／Ｄ変換器４１によ
り、音声入力を行なう（ステップＳ１）。音声分析器４
２により、音声分析処理を行なう（ステップＳ２）。音
声区間検出器４３により、音声区間を検出する（ステッ
プＳ３）。音声区間検出器４３において音声区間である
と判定されると、区間位置判定器２により、初期区間で
ある（初期段階）か否かが判定される（ステップＳ
４）。Next, the operation of the speech recognition processing device 100 according to the first embodiment of the present invention will be described with reference to FIG. FIG. 4 is a flowchart for explaining the operation of the speech recognition processing device 100. Voice input is performed by the microphone 40 and the A / D converter 41 (step S1). Voice analyzer 4
2, a voice analysis process is performed (step S2). A voice section is detected by the voice section detector 43 (step S3). When the voice section detector 43 determines that the voice section is a voice section, the section position determination unit 2 determines whether or not the voice section is an initial section (initial stage) (step S).
4).

【００３２】初期区間に含まれるフレームはマッチング
処理器４に伝送される。マッチング処理器４では、端点
判別付き（初端に該当するか否かの判断機能を有する）
のマッチング処理を行なう（ステップＳ５）。初期区間
以外の区間に含まれるフレームはマッチング処理器３に
伝送される。マッチング処理器３では、通常のマッチン
グ処理を行なう（ステップＳ６）。すなわち、前回の入
力パターンのフレームとマッチしたフレームから制限の
あるパスを探索して、今回の入力パターンのフレームと
マッチするものをみつける。The frames included in the initial section are transmitted to the matching processor 4. The matching processor 4 has an end point discriminator (has a function of judging whether or not it corresponds to the first end)
(Step S5). Frames included in sections other than the initial section are transmitted to the matching processor 3. The matching processor 3 performs a normal matching process (step S6). That is, a limited path is searched for from a frame that matches the frame of the previous input pattern, and a frame that matches the frame of the current input pattern is found.

【００３３】端点判別付きのマッチング処理（ステップ
Ｓ５）の結果は、初期マッチングバッファ７♯１に格納
される。通常のマッチング処理（ステップＳ６）の結果
は、マッチングバッファ６に格納される。ビームサーチ
処理器８は、マッチングバッファ６の格納データを用い
てビームサーチ処理を行なう（ステップＳ７）。上記ス
テップＳ１〜Ｓ７までの処理を繰返し行ない、音声区間
が検出されなくなると、マッチングバッファ６および初
期マッチングバッファ７♯１の結果を用いて、音声の認
識判定を行なう（ステップＳ８）。この認識判定結果
は、図示しないディスプレイ上に出力される（ステップ
Ｓ９）。The result of the matching processing with end point determination (step S5) is stored in the initial matching buffer 7 # 1. The result of the normal matching process (step S6) is stored in the matching buffer 6. The beam search processor 8 performs a beam search process using the data stored in the matching buffer 6 (Step S7). The above steps S1 to S7 are repeated, and when a speech section is no longer detected, speech recognition is determined using the results of the matching buffer 6 and the initial matching buffer 7 # 1 (step S8). This recognition determination result is output on a display (not shown) (step S9).

【００３４】図５は、図４に示す端点判別付きのマッチ
ング処理（マッチング処理器４）における動作を説明す
るためのフローである。図５を参照して、まず、通常の
マッチング処理を行なう（ステップＳ１０）。たとえ
ば、前回マッチした標準パターンのフレームおよびこの
近傍に位置するフレームと、入力パターンのフレームと
でマッチング処理を行なう。この結果は、初期マッチン
グバッファ７♯１に格納する（ステップＳ１１）。FIG. 5 is a flow chart for explaining the operation in the matching processing with end point discrimination (matching processor 4) shown in FIG. Referring to FIG. 5, first, a normal matching process is performed (step S10). For example, the matching process is performed between the frame of the standard pattern and the frame located in the vicinity of the previously matched standard pattern and the frame of the input pattern. This result is stored in the initial matching buffer 7 # 1 (step S11).

【００３５】次に、マッチング処理の対象を標準フレー
ムにおけるフレーム１（開始フレーム）とし（ステップ
Ｓ１２）、マッチング処理を行なう（ステップＳ１
３）。マッチング処理の結果は、初期マッチングバッフ
ァ７♯２に格納する。Next, the target of the matching process is set to frame 1 (start frame) in the standard frame (step S12), and the matching process is performed (step S1).
3). The result of the matching process is stored in the initial matching buffer 7 # 2.

【００３６】続いて、初期マッチングバッファ７♯１と
７♯２との結果の比較を行なう（ステップＳ１５）。初
期マッチングバッファ７♯１に格納された結果の方が、
初期マッチングバッファ７♯２に格納された結果よりも
良い（すなわち、ステップＳ１３よりステップＳ１０の
マッチング経路のほうが良い）場合、当該フレームにつ
いてのマッチング処理を終了する。Subsequently, the results of the initial matching buffers 7 # 1 and 7 # 2 are compared (step S15). The result stored in the initial matching buffer 7 # 1 is
If the result is better than the result stored in the initial matching buffer 7 # 2 (that is, the matching path in step S10 is better than that in step S13), the matching processing for the frame is ended.

【００３７】一方、初期マッチングバッファ７♯２に格
納された結果の方が、初期マッチングバッファ７♯１に
格納された結果よりも良い（すなわち、ステップＳ１０
よりステップＳ１３のマッチング経路のほうが良い）場
合、初期マッチングバッファ７♯２の情報を初期マッチ
ングバッファ７♯１にコピーする。On the other hand, the result stored in the initial matching buffer 7 # 2 is better than the result stored in the initial matching buffer 7 # 1 (ie, step S10).
If the matching path in step S13 is better), the information in the initial matching buffer 7 # 2 is copied to the initial matching buffer 7 # 1.

【００３８】一例を挙げて説明する。たとえば、前回、
入力パターンのフレーム５と標準パターンのフレーム４
とがマッチングしていたとする。マッチング経路を
[１，２，２，３，４]とし、類似度を３０とする。ステ
ップＳ１０では、入力パターンのフレーム６と、標準パ
ターンのフレーム４および近傍のフレーム（フレーム
５、６）とのマッチング処理を行なう。これにより、た
とえば、標準パターンのフレーム６が他のフレームより
もマッチしていたとする（類似度が３６になったとす
る）。An example will be described. For example, last time
Input pattern frame 5 and standard pattern frame 4
And are matched. Matching path
[1, 2, 2, 3, 4] and the similarity is 30. In step S10, a matching process is performed between the input pattern frame 6 and the standard pattern frame 4 and neighboring frames (frames 5, 6). Thus, for example, it is assumed that the frame 6 of the standard pattern matches more than the other frames (similarity becomes 36).

【００３９】次に、ステップＳ１３では、標準パターン
のフレーム１と入力パターンのフレーム６とでマッチン
グ処理を行なう。ステップＳ１４では、ステップＳ１３
の結果（類似度）とステップＳ１０の結果（類似度）と
を比較する。たとえば、ステップＳ１３の結果が類似度
１０であった場合、標準パターンのフレーム６が最もマ
ッチングしているとする。また、ステップＳ１３の結果
が類似度２であった場合には、標準パターンにおけるフ
レーム１が最もマッチングしているとし、初期マッチン
グバッファ７♯１の情報を初期マッチングバッファ７♯
２を用いて書換える。Next, in step S13, a matching process is performed between frame 1 of the standard pattern and frame 6 of the input pattern. In step S14, step S13
Is compared with the result (similarity) of step S10. For example, when the result of step S13 is the similarity 10, it is assumed that the standard pattern frame 6 is the best matching. If the result of step S13 is similarity 2, the frame 1 in the standard pattern is determined to be the best matching, and the information in the initial matching buffer 7 # 1 is replaced with the initial matching buffer 7 #.
Rewrite using 2.

【００４０】このようにして、初期区間については、い
ずれのフレームが開始フレーム（初端）となるかを判別
しながらマッチング処理を行なう。そして、この結果
は、初期マッチングバッファ７♯１に格納される。As described above, in the initial section, the matching process is performed while determining which frame is the start frame (initial end). The result is stored in the initial matching buffer 7 # 1.

【００４１】図４を参照して、初期区間以外の区間に含
まれフレームについては、通常のマッチング処理を行な
う。この結果は、マッチングバッファ６に格納される。
ビームサーチ処理器８は、マッチングバッファ６に格納
された類似度をもとにビームサーチ処理を行なう。この
際、初期区間におけるマッチング処理の結果は計算量を
膨大とするため、ビームサーチ処理器８におけるビーム
サーチ処理の対象外とする。Referring to FIG. 4, normal matching processing is performed on frames included in sections other than the initial section. This result is stored in the matching buffer 6.
The beam search processor 8 performs a beam search process based on the similarity stored in the matching buffer 6. At this time, since the result of the matching process in the initial section requires a huge amount of calculation, it is excluded from the beam search process in the beam search processor 8.

【００４２】ビームサーチ処理器８は、明らかに非類似
度が大きな標準パターンに対してはこれを認識対象外と
認定する。ビームサーチ処理器８の認定結果は、マッチ
ングバッファに転送される。マッチングバッファ６は、
マッチング処理を行なうか否かを示すフラグを備え、こ
のフラグにビームサーチ処理器８の認定結果を書込む。
マッチング処理器３は、フラグを確認しながらマッチン
グ処理を行なう。認識対象外とされるものについては、
マッチング処理を停止する。なお、ビームサーチ処理器
８の結果により、マッチング処理器３を直接制御するよ
うに構成してもよい。The beam search processor 8 recognizes a standard pattern having a clearly high degree of dissimilarity as a recognition target. The certification result of the beam search processor 8 is transferred to the matching buffer. The matching buffer 6
A flag indicating whether or not to perform the matching process is provided, and the certification result of the beam search processor 8 is written in this flag.
The matching processor 3 performs a matching process while checking the flag. For those that are not recognized,
Stop the matching process. Note that the matching processor 3 may be directly controlled based on the result of the beam search processor 8.

【００４３】ステップＳ１〜Ｓ７までの動作を、音声区
間から外れるまで行なう。音声区間外になると、初期マ
ッチングバッファ７♯１に格納した初期区間でのマッチ
ング処理の結果とマッチングバッファ６に格納したマッ
チング処理の結果とを考慮して、ビームサーチ処理器８
において認識対象とされた標準パターンの中から最良の
マッチング処理の結果を表わす標準パターンを選択して
出力する。The operations in steps S1 to S7 are performed until the operation goes out of the voice section. Outside the voice section, the beam search processor 8 takes into account the result of the matching process in the initial section stored in the initial matching buffer 7 # 1 and the result of the matching process stored in the matching buffer 6.
In step (2), a standard pattern representing the result of the best matching process is selected and output from the standard patterns to be recognized.

【００４４】このように、本発明の実施の形態１におけ
る音声認識処理装置１００によれば、初期区間内につい
ては、入力パターン側について初端に該当するか否かを
判別しながらマッチング処理を行う（入力パターン側に
端点フリー区間を設ける）。これにより、音声区間の検
出の精度によらず認識率を上げることが可能となる。As described above, according to the speech recognition processing apparatus 100 in the first embodiment of the present invention, in the initial section, the matching process is performed while determining whether or not the input pattern side corresponds to the first end. (An end point free section is provided on the input pattern side). This makes it possible to increase the recognition rate irrespective of the accuracy of voice section detection.

【００４５】また、初期区間外については、ビームサー
チ処理を行なうことにより、対象を絞り込むことが可能
となるので、複数の標準パターンに対しても高速処理が
可能となり、高い認識率を得ることが可能となる。Further, by performing beam search processing outside the initial section, it is possible to narrow down the target, so that high-speed processing can be performed on a plurality of standard patterns, and a high recognition rate can be obtained. It becomes possible.

【００４６】［実施の形態２］図６は、本発明の実施の
形態２における音声認識処理装置２００の構成の概要を
示すブロック図である。図１に示す音声認識処理装置１
００と同じ機能には、同じ番号および同じ記号を付しそ
の説明を省略する。音声認識処理装置２００が、音声認
識処理装置１００と異なる点は、ビームサーチ制御器２
６を含むことにある。[Second Embodiment] FIG. 6 is a block diagram showing an outline of a configuration of a speech recognition processing device 200 according to a second embodiment of the present invention. Speech recognition processing device 1 shown in FIG.
The same functions as those of 00 are denoted by the same reference numerals and symbols, and description thereof is omitted. The difference between the speech recognition processing device 200 and the speech recognition processing device 100 is that the beam search controller 2
6 inclusive.

【００４７】ビームサーチ制御器２６は、初期マッチン
グバッファ７♯１に初期区間の最終フレームについての
マッチング処理の結果が格納されたことを判定し、初期
マッチングバッファ７♯１に格納されるマッチング処理
の結果を表わす値に基づきビームサーチ処理を行なうよ
うにビームサーチ処理器８を制御する。マッチング処理
の結果を表わす値としては、たとえば格納してある入力
パターンの各フレームにおける類似度の平均値、中心値
等である。The beam search controller 26 determines that the result of the matching process for the last frame of the initial section has been stored in the initial matching buffer 7 # 1, and performs the matching process stored in the initial matching buffer 7 # 1. The beam search processor 8 is controlled to perform a beam search process based on a value representing the result. The value representing the result of the matching process is, for example, an average value, a center value, or the like of the similarities in each frame of the stored input pattern.

【００４８】次に、本発明の実施の形態１における音声
認識処理装置２００の動作を図７を用いて説明する。図
７は、音声認識処理装置２００の動作を説明するための
フローである。図４に示す音声認識処理装置１００の動
作と同じステップには、同じ符号を付しその説明を省略
する。Next, the operation of the speech recognition processing device 200 according to Embodiment 1 of the present invention will be described with reference to FIG. FIG. 7 is a flowchart for explaining the operation of the speech recognition processing device 200. The same steps as those of the operation of the speech recognition processing device 100 shown in FIG. 4 are denoted by the same reference numerals, and description thereof will be omitted.

【００４９】上述したように、区間位置判定器２によ
り、入力した音声が初期区間に属する（初期段階）か否
かが判定される（ステップＳ４）。上述したように、初
期区間については、端点判別付きのマッチング処理を行
なう（ステップＳ５）。そして、ビームサーチ制御器２
６により、初期マッチングバッファ７♯１に初期区間の
最終フレームについてのマッチング処理の結果が格納さ
れたことを判定する（ステップＳ２０）。最終フレーム
についての処理が終了している場合（ステップＳ２
１）、ビームサーチ処理器８は、初期マッチングバッフ
ァ７♯１に格納されるマッチング処理の結果を表わす値
に基づきビームサーチ処理を行なう。As described above, the section position determiner 2 determines whether or not the input voice belongs to the initial section (initial stage) (step S4). As described above, the matching process with the end point determination is performed for the initial section (step S5). And the beam search controller 2
6, it is determined that the result of the matching process for the last frame of the initial section is stored in the initial matching buffer 7 # 1 (step S20). When the processing for the last frame has been completed (step S2
1) The beam search processor 8 performs a beam search process based on a value representing the result of the matching process stored in the initial matching buffer 7 # 1.

【００５０】ビームサーチ処理器８は、初期区間におけ
るマッチング処理の結果を表わす値が所定のしきい値以
上である場合には、当該標準パターンを認識対象外とす
る。ビームサーチ処理の結果は、上述したように、マッ
チングバッファ６に格納される。When the value representing the result of the matching process in the initial section is equal to or larger than the predetermined threshold, the beam search processor 8 excludes the standard pattern from recognition. The result of the beam search processing is stored in the matching buffer 6 as described above.

【００５１】初期区間以外の区間における入力パターン
のフレームに対しては、通常のマッチング処理を行なう
（ステップＳ６）。この際、初期区間のマッチング処理
の結果、または初期区間以外の区間におけるマッチング
処理の結果により認識対象外とされた標準パターンは、
マッチング処理の対象とならない。そして、ビームサー
チ処理器８は、初期区間以外の区間におけるマッチング
処理の結果を受けてビームサーチ処理を行なう（ステッ
プＳ７）。A normal matching process is performed on the frame of the input pattern in a section other than the initial section (step S6). At this time, the standard pattern that is not recognized as a result of the matching processing in the initial section or the result of the matching processing in the section other than the initial section is
Not subject to matching processing. Then, the beam search processor 8 performs a beam search process in response to the result of the matching process in a section other than the initial section (step S7).

【００５２】このように、本発明の実施の形態２におけ
る音声認識処理装置２００によれば、初期区間のマッチ
ングの結果をビームサーチ処理の対象とすることで認識
率を向上させることが可能となる。また、初期区間で既
にマッチングの悪いものに対しては認識対象外とするた
め、高速に認識処理が行なうことが可能となる。As described above, according to the speech recognition processing apparatus 200 in the second embodiment of the present invention, it is possible to improve the recognition rate by making the result of the matching in the initial section the target of the beam search processing. . In addition, since the recognition is not performed for the one that is already bad in the initial section, the recognition processing can be performed at high speed.

【００５３】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

【００５４】[0054]

【発明の効果】このように、請求項１および請求項２に
係る音声認識処理装置によれば、入力音声の初端を含む
区間（初期区間）については、端点（開始フレーム）と
なり得るか否かを判別しながらマッチング処理を行な
う。このため、特に、初期段階における音声区間の検出
の精度によらず認識率を上げることが可能となる。As described above, according to the speech recognition processing apparatus according to the first and second aspects, it is determined whether or not the section (initial section) including the beginning of the input speech can be an end point (start frame). The matching process is performed while judging whether or not. For this reason, it is possible to increase the recognition rate irrespective of the accuracy of the detection of the voice section in the initial stage.

【００５５】請求項３に係る音声認識処理装置は、請求
項２に係る音声認識処理装置であって、さらに、初期区
間以外の区間では、マッチング処理の結果に基づき、対
応する標準パターンを認識対象とするか否かの検査（ビ
ームサーチ処理）を行なう。これにより、対象を絞り込
むことが可能となるので、複数の標準パターンに対して
も高速処理が可能となり、高い認識率を得ることが可能
となる。A speech recognition processing device according to a third aspect is the speech recognition processing device according to the second aspect, and further includes, in a section other than the initial section, a corresponding standard pattern to be recognized based on a result of the matching processing. (Beam search processing) is performed. This makes it possible to narrow down the targets, so that high-speed processing can be performed on a plurality of standard patterns, and a high recognition rate can be obtained.

【００５６】請求項４に係る音声認識処理装置は、請求
項２に係る音声認識処理装置であってあって、特に、初
期区間でのマッチング処理結果により、対応する標準パ
ターンを認識対象とするか否かの検査（ビームサーチ処
理）を行なう。これにより、初期区間で既にマッチング
の悪いものに対しては認識対象外とするため、高速に認
識処理が行なうことが可能となる。The speech recognition processing device according to a fourth aspect is the speech recognition processing device according to the second aspect, and in particular, determines whether a corresponding standard pattern is to be recognized based on a matching process result in an initial section. Inspection (beam search processing) is performed. This makes it possible to perform a high-speed recognition process because a poorly matched object in the initial section is excluded from recognition targets.

【００５７】さらに、請求項５および請求項６に係る音
声認識処理方法によれば、入力音声の初端を含む区間
（初期区間）については、端点（開始フレーム）となり
得るか否かを判別しながらマッチング処理を行なう。こ
のため、特に、初期段階における音声区間の検出の精度
によらず認識率を上げることが可能となる。Further, according to the speech recognition processing method according to the fifth and sixth aspects, it is determined whether or not a section (initial section) including the beginning of the input speech can be an end point (start frame). While performing the matching process. For this reason, it is possible to increase the recognition rate irrespective of the accuracy of the detection of the voice section in the initial stage.

【００５８】請求項７に係る音声認識処理方法は、請求
項６に係る音声認識処理方法であって、さらに、初期区
間以外の区間では、マッチング処理の結果に基づき、対
応する標準パターンを認識対象とするか否かの検査（ビ
ームサーチ処理）を行なう。これにより、対象を絞り込
むことが可能となるので、複数の標準パターンに対して
も高速処理が可能となり、高い認識率を得ることが可能
となる。According to a seventh aspect of the present invention, there is provided the voice recognition processing method according to the sixth aspect, wherein a corresponding standard pattern is recognized based on a result of the matching process in a section other than the initial section. (Beam search processing) is performed. This makes it possible to narrow down the target, so that high-speed processing can be performed on a plurality of standard patterns, and a high recognition rate can be obtained.

【００５９】請求項８に係る音声認識処理方法は、請求
項６に係る音声認識処理方法であってあって、特に、初
期区間でのマッチング処理結果により、対応する標準パ
ターンを認識対象とするか否かの検査（ビームサーチ処
理）を行なう。これにより、初期区間で既にマッチング
の悪いものに対しては認識対象外とするため、高速に認
識処理が行なうことが可能となる。The speech recognition processing method according to the eighth aspect is the speech recognition processing method according to the sixth aspect. In particular, a method for determining whether a corresponding standard pattern is to be recognized based on a matching processing result in an initial section. Inspection (beam search processing) is performed. This makes it possible to perform a high-speed recognition process because a poorly matched object in the initial section is excluded from recognition targets.

[Brief description of the drawings]

【図１】本発明の実施の形態１における音声認識処理
装置１００の構成の概要を示すブロック図である。FIG. 1 is a block diagram illustrating an outline of a configuration of a speech recognition processing device 100 according to a first embodiment of the present invention.

【図２】初期区間について説明するための図である。FIG. 2 is a diagram illustrating an initial section.

【図３】標準パターンに対する入力パターンのマッチ
ング経路を表わすための図である。FIG. 3 is a diagram illustrating a matching path of an input pattern with respect to a standard pattern.

【図４】音声認識処理装置１００の動作を説明するた
めのフローである。FIG. 4 is a flowchart for explaining the operation of the speech recognition processing device 100.

【図５】図４に示す端点判別付きのマッチング処理
（マッチング処理器４）における動作を説明するための
フローである。FIG. 5 is a flowchart for explaining the operation in the matching processing with end point determination (matching processor 4) shown in FIG. 4;

【図６】本発明の実施の形態２における音声認識処理
装置２００の構成の概要を示すブロック図である。FIG. 6 is a block diagram illustrating an outline of a configuration of a speech recognition processing device 200 according to a second embodiment of the present invention.

【図７】音声認識処理装置２００の動作を説明するた
めのフローである。FIG. 7 is a flowchart for explaining the operation of the speech recognition processing device 200.

【図８】従来の音声認識処理装置８００の構成を示す
ブロック図である。FIG. 8 is a block diagram showing a configuration of a conventional speech recognition processing device 800.

【図９】音声認識処理装置８００における標準パター
ンに対する入力パターンのマッチング経路を説明するた
めの概念図である。FIG. 9 is a conceptual diagram for explaining a matching path of an input pattern with respect to a standard pattern in the speech recognition processing device 800.

【図１０】従来の音声認識処理装置９００の構成を示
すブロック図である。FIG. 10 is a block diagram showing a configuration of a conventional speech recognition processing device 900.

【図１１】音声認識処理装置９００における標準パタ
ーンに対する入力パターンのマッチング経路を説明する
ための概念図である。FIG. 11 is a conceptual diagram for explaining a matching path of an input pattern with respect to a standard pattern in the speech recognition processing device 900.

[Explanation of symbols]

２区間位置判定器、３，４マッチング処理器、５
標準パターン記憶器、６マッチングバッファ、７♯１
〜７♯２初期マッチングバッファ、８，２６ビームサ
ーチ処理器、２６ビームサーチ制御器、４０マイ
ク、４１Ａ／Ｄ変換器、４２音響分析器、４３音
声区間検出器、１００〜２００音声認識処理装置。2 Section position determiner, 3, 4 matching processor, 5
Standard pattern storage, 6 matching buffers, 7♯1
♯7♯2 Initial matching buffer, 8,26 beam search processor, 26 beam search controller, 40 microphone, 41 A / D converter, 42 acoustic analyzer, 43 voice section detector, 100-200 voice recognition processor .

Claims

[Claims]

1. A voice recognition processing device for performing voice recognition by determining whether an input voice pattern is similar to a standard pattern stored in advance, and a storage unit storing the standard pattern A voice input device that divides the input voice pattern into frames and outputs the frame in frame units; a detector that detects whether a frame output from the voice input device belongs to a voice section; and A section determiner that determines whether the output frame belongs to an initial section of a voice section; and for the frame determined to belong to the initial section in the section determiner, the frame is determined as a start frame of a matching path. A first matching processor that performs matching processing with a frame of a corresponding standard pattern while determining whether or not the frame is applicable; A speech recognition processing device, comprising: a second matching processor that performs a matching process on a frame of a standard pattern corresponding to a frame determined to belong to a section other than an initial section by a section determiner.

2. The first matching processor according to claim 1, further comprising: a first processor that performs a matching process on a frame of a corresponding standard pattern according to a result of the matching process on a frame immediately before the input frame. A second processor for performing a matching process between the frame and a start frame of the corresponding standard pattern; and comparing a result of the first processor with a result of the second processor to determine a matching path. The speech recognition processing device according to claim 1, further comprising a comparator.

3. An inspection device for determining whether or not a corresponding standard pattern is to be subjected to the speech recognition based on a result of the matching process in the second matching process device, wherein the second matching process is performed. 3. The speech recognition processing device according to claim 2, wherein the device stops the matching process with the corresponding standard pattern based on the determination result of the inspection device.

4. A tester for determining whether or not a corresponding standard pattern is to be subjected to the speech recognition based on a result of the matching process in the initial section after the matching process in the initial section is completed, The speech recognition processing device according to claim 2, wherein the second matching processor stops the matching process with the corresponding standard pattern based on the determination result of the inspection device.

5. A voice recognition processing method for performing voice recognition by determining whether or not an input voice pattern is similar to a standard pattern stored in advance, wherein the input voice pattern is divided into frames. A voice input step of outputting in frame units; a detection step of detecting whether or not the frame output by the voice input step belongs to a voice section; and a frame determined to be a voice section in the detection step. A matching process step of performing a matching process, wherein the matching process step includes: a section determining step of determining whether the frame output by the detecting step belongs to an initial section of a voice section; For the frame determined to belong to the initial section, the frame is A first matching processing step of performing matching processing with a frame of a corresponding standard pattern while determining whether or not the frame corresponds to a start frame of the matching path; and the section determining step determines that the frame belongs to a section other than the initial section. A second matching process step of performing a matching process on the frame with a corresponding standard pattern frame.

6. The first matching processing step includes: a first processing step of performing matching processing with a frame of a corresponding standard pattern according to a result of matching processing of a frame immediately before the input frame; A second processing step of performing a matching process between the frame and a start frame of a corresponding standard pattern; and comparing a result of the first processing step with a result of the second processing step to determine a matching path. 6. The speech recognition processing method according to claim 5, comprising a comparing step.

7. An inspection step for determining whether or not a corresponding standard pattern is to be subjected to the speech recognition based on a result of the matching processing in the second matching processing step, wherein the second matching processing is performed. 7. The speech recognition processing method according to claim 6, wherein in the step, the matching process with the corresponding standard pattern is stopped based on a determination result in the inspection step.

8. A first method for judging whether or not a corresponding standard pattern is to be subjected to the speech recognition based on a result of the matching processing in the second matching processing step.
Further comprising an inspection step, wherein the first matching processing step is a step of determining whether or not the matching processing of the last frame in the initial section has been completed; and A second inspection step of determining whether or not a corresponding standard pattern is to be subjected to the voice recognition based on a result of the matching processing in the initial section; and wherein the second matching processing step includes the first inspection step 7. The speech recognition processing method according to claim 6, wherein the matching process with the corresponding standard pattern is stopped based on the determination result of the second inspection step.