JPH0887292A

JPH0887292A - Word voice recognition device

Info

Publication number: JPH0887292A
Application number: JP22183694A
Authority: JP
Inventors: Yoshihiro Irie; 佳洋入江
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 1994-09-16
Filing date: 1994-09-16
Publication date: 1996-04-02

Abstract

PURPOSE: To provide a word voice recognition device in which similar words pronounced by an unspecified speaker are quickly recognized with a high precision. CONSTITUTION: The device is provided with a voice input means 11 which detects uttered voice and inputs it as sound signals, an acoustic analysis means 12 which converts the signals into time series of a feature parameter based on the waveforms of the inputted sound signals, a feature point extracting means 13 which detects a feature time point at which the norm of the amount of spectrum changes becomes a maximum or a minimum from the time series of the feature parameter and extracts a feature point time series, a word standard pattern storage means 14 which stores the word standard patterns beforehand generated from word voice learning samples, a word collating means 15 which calculates the degree of similarity between the extracted feature point time series and the word standard patterns and a discriminating means 16 which outputs a recognition result from the degree of similarity of word.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、単語音声認識装置に係
り、特に識別能力を劣化させることなく不特定話者の単
語音声認識に際して、類似単語に対する識別能力を向上
させるようにした単語音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word voice recognition device, and particularly to a word voice recognition system capable of improving the recognition ability for similar words when recognizing the word voice of an unspecified speaker without deteriorating the recognition ability. Regarding the device.

【０００２】[0002]

【従来の技術】不特定話者の単語音声認識では、個人差
によるスペクトル変動および時間長変動を、いかに吸収
するかが、大きな課題である。この課題に対して、スペ
クトル変動は標準パタ−ンの複数化、時間長変動はＤＰ
マッチング（ＤＰ：ＤｙｎａｍｉｃＰｒｏｇｒａｍｉ
ｎｇ）でそれぞれ吸収するという手法が、従来より用い
られている。2. Description of the Related Art In word speech recognition of an unspecified speaker, how to absorb spectrum fluctuation and time length fluctuation due to individual differences is a major issue. In response to this problem, spectral fluctuations are standard patterns and time length fluctuations are DP.
Matching (DP: Dynamic Program)
The method of absorbing each of the ng) is conventionally used.

【０００３】図５に従来の単語音声認識装置の概要を示
す。この装置では、発話された音声は、音声入力部１か
ら入力され、音声信号として音響分析部２に与えられ
る。音響分析部２では、音声の特徴パラメ−タが計算さ
れ、入力パタ−ンとして得られる。FIG. 5 shows an outline of a conventional word voice recognition device. In this device, the spoken voice is input from the voice input unit 1 and given to the acoustic analysis unit 2 as a voice signal. The acoustic analysis unit 2 calculates the characteristic parameter of the voice and obtains it as an input pattern.

【０００４】一方、単語標準パタ−ン格納部３には、予
め登録された単語毎に、学習サンプルデ−タからクラス
タリングの手法によつて得られた複数の単語標準パタ−
ンが格納されており、単語照合部４において、上記入力
パタ−ンと上記単語標準パタ−ンとのＤＰマッチングに
より類似度計算を行い、判定部５において、類似度から
単語の認識結果を判定していた。On the other hand, in the word standard pattern storage unit 3, a plurality of word standard patterns obtained from learning sample data by a clustering method for each word registered in advance.
The word matching unit 4 calculates the degree of similarity by DP matching between the input pattern and the standard word pattern, and the determination unit 5 determines the word recognition result from the similarity. Was.

【０００５】[0005]

【発明が解決しようとする課題】ところが、この種の装
置では、認識対象語彙の増加および不特定話者への対応
といった場合、単語標準パターンが増加する。However, in this type of apparatus, the number of word standard patterns increases in the case of increasing the recognition target vocabulary and dealing with unspecified speakers.

【０００６】そこで計算量を少なくし、不特定話者の音
声の認識に適する単語音声認識方式を提供すべく、入力
音声の特徴パラメータの時系列と、特徴点検出用音声パ
ターンとを動的計画法を用いたマッチングにより入力音
声の特徴点の位置を検出し、その検出された特徴点にお
ける入力音声の特徴パターンと標準特徴パターンとを照
合して入力音声を特徴パターンの記号系列に変換する様
式が提案されている（特開昭５９−７７５００）。すな
わち、この装置では、照合時のＤＰマッチングの回数を
削減するため、ＤＰマッチングを特徴点の位置合わせの
みに利用し、ＤＰマッチングで詳細な照合は行わず、位
置合わせで得られた特徴点の識別によって詳細な照合を
行うようになっている。従って、多数話者に対応するに
は複数の標準パターンによって特徴点の位置合わせをす
る必要があり、ＤＰマッチングの処理量の増大は免れ得
ないという問題がある。Therefore, in order to reduce the calculation amount and provide a word voice recognition method suitable for recognizing the voice of an unspecified speaker, the time series of the feature parameters of the input voice and the feature point detecting voice pattern are dynamically planned. A method of detecting the position of the feature point of the input voice by matching using the method, and matching the feature pattern of the input voice at the detected feature point with the standard feature pattern to convert the input voice into a symbol sequence of the feature pattern. Has been proposed (JP-A-59-77500). That is, in this apparatus, in order to reduce the number of DP matching at the time of matching, the DP matching is used only for the alignment of the feature points, the detailed matching is not performed by the DP matching, and the feature points obtained by the alignment are A detailed collation is performed by identification. Therefore, in order to deal with a large number of speakers, it is necessary to align the feature points with a plurality of standard patterns, and there is a problem that the processing amount of DP matching cannot be increased.

【０００７】このように従来の装置では、単語照合部に
おけるＤＰマッチングの計算量が増加し、認識処理に要
する時間が長いという問題があつた。As described above, the conventional apparatus has a problem in that the calculation amount of DP matching in the word matching unit increases and the time required for the recognition processing is long.

【０００８】本発明は、前記実情に鑑みてなされたもの
で、不特定話者の類似単語を、高速かつ高精度に識別で
きるようにした単語音声認識装置を提供することを目的
とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a word voice recognition device capable of identifying similar words of an unspecified speaker at high speed and with high accuracy.

【０００９】[0009]

【課題を解決するための手段】本発明の単語音声認識装
置は、発話された音声を検知し音声信号として入力する
音声入力手段と、入力された音声信号の波形に基づい
て、特徴パラメ−タの時系列に変換する音響分析手段
と、特徴パラメータの時系列から、スペクトルの変化量
のノルムが極大および極小となる特徴的な時点を検出し
て、特徴点時系列を抽出する特徴点抽出手段と、予め単
語音声の学習サンプルデ−タから作成した単語標準パタ
−ンを格納する単語標準パタ−ン格納手段と、前記抽出
された特徴点時系列と単語標準パタ−ンの類似度を計算
する単語照合手段と、単語類似度から認識結果を出力す
る判定手段とを具備したことを特徴とする。（ここでノ
ルムとは絶対値の和をいう。）すなわち本発明では、音韻情報を損失することなく、特
徴的なフレームを特徴点として抽出し、いわゆる間引き
処理によりこれを照合するようにしている。そしてこの
抽出にあたっては、音声データを特徴空間上の軌跡とし
てとらえ、定常部の静的情報（空間の位置……スペクト
ル特徴量）と、過渡部の動的情報（空間から空間への移
動……スペクトル変化特徴量）と、定常性および過渡性
（一定の空間での継続性……スペクトル変化量：スペク
トル変化特徴量の絶対値の和（ノルム））との３つを特
徴パラメータとして用い、このノルムの極大および極小
を検出し、この時点を特徴点とし、特徴点のみのパラメ
ータを抽出し、これらの特徴パラメータで構成される特
徴点の時系列で音声データを近似する。そしてこれらの
特徴的なフレームのみをＤＰマッチングにより照合す
る。A word voice recognition apparatus of the present invention detects a spoken voice and inputs it as a voice signal, and a feature parameter based on a waveform of the input voice signal. And a characteristic point extracting means for extracting a characteristic point time series by detecting characteristic time points at which the norm of the amount of change in spectrum is maximum and minimum from the time series of characteristic parameters. And a word standard pattern storage means for storing a standard word pattern created in advance from learning sample data of word voice, and a similarity between the extracted feature point time series and the standard word pattern. And a determination means for outputting a recognition result from the word similarity. (Here, the norm means the sum of absolute values.) That is, in the present invention, a characteristic frame is extracted as a characteristic point without loss of phoneme information, and this is collated by so-called thinning processing. . In this extraction, the speech data is regarded as a trajectory in the feature space, and static information of the stationary part (position of the space ... Spectral feature amount) and dynamic information of the transient part (movement from space to space ... Spectral change feature quantity) and stationarity and transientness (continuity in a certain space ... Spectral change quantity: sum of absolute values of spectrum change feature quantity (norm)) are used as feature parameters. The maximum and minimum of the norm are detected, the point of time is set as a feature point, parameters of only the feature point are extracted, and the voice data is approximated by a time series of feature points composed of these feature parameters. Then, only these characteristic frames are collated by DP matching.

【００１０】音響分析手段において、望ましくは、時系
列の変換に用いられる特徴パラメータを、ＬＰＣケプス
トラム係数、ＬＰＣケプストラム係数の時間変化量（Δ
ケプストラム）およびΔケプストラムのノルムすなわち
動的尺度の３つとする。そして、特徴点抽出手段は、前
記ＬＰＣケプストラム係数、ＬＰＣケプストラム係数の
時間変化量（Δケプストラム）およびΔケプストラムの
ノルムから、ノルムの極大値および極小値をとる時点を
特徴点とし、この特徴点における、特徴パラメータを抽
出する。In the acoustic analysis means, it is desirable that the characteristic parameter used for time series conversion is the LPC cepstrum coefficient, and the time change amount (Δ) of the LPC cepstrum coefficient.
Cepstrum) and Δ cepstrum norm, that is, the dynamic scale. Then, the feature point extraction means sets the time point at which the maximum value and the minimum value of the norm are taken from the LPC cepstrum coefficient, the amount of time change of the LPC cepstrum coefficient (Δ cepstrum) and the norm of the Δ cepstrum as a feature point, and at this feature point , Feature parameters are extracted.

【００１１】[0011]

【作用】この装置では、発話された音声を検出して入力
し、入力された音声を分析して特徴パラメ−タの時系列
に変換するに際し、特徴パラメータの時系列から特徴的
な時点を検出して、特徴点時系列を抽出し、これと単語
標準パタ−ンとを比較して類似度を計算するようにして
いるため、処理量が削減されるとともに、標準パターン
に必要な記憶容量の削減を行うことができる。また、こ
の装置では単語毎に切る必要がなく連続ＤＰマッチング
が可能であるため、連続音声への適用が容易である。本
発明の装置によれば、このようにして、音韻情報を損失
することなく、特徴点を適格に抽出し、高速で高精度の
認識を行うことが可能となる。In this device, when the uttered voice is detected and input, the input voice is analyzed and converted into the time series of the characteristic parameters, the characteristic time point is detected from the time series of the characteristic parameters. Then, the feature point time series is extracted, and the degree of similarity is calculated by comparing this with the word standard pattern, which reduces the processing amount and reduces the storage capacity required for the standard pattern. Reductions can be made. Further, in this device, continuous DP matching is possible without having to cut each word, so that it is easy to apply to continuous speech. According to the apparatus of the present invention, feature points can be properly extracted in this manner without loss of phonological information, and recognition can be performed at high speed and with high accuracy.

【００１２】また望ましくは、特徴パラメータとして、
ＬＰＣケプストラム係数、ＬＰＣケプストラム係数の時
間変化量（Δケプストラム）およびΔケプストラムのノ
ルム（動的尺度）を用いる。ここで、動的尺度の極大は
過渡部の動的情報に相当し、また、極小は定常部の静的
情報に相当するため、この動的尺度の極大および極小を
特徴点として抽出することにより、音韻情報を維持し、
最も特徴的な部分データのみを取り出すことができ、少
ないデータ処理で適格な出力を得ることが可能となり、
より高精度の認識を行うことが可能となる。Further, preferably, as the characteristic parameter,
The LPC cepstrum coefficient, the time variation of the LPC cepstrum coefficient (Δ cepstrum), and the norm of the Δ cepstrum (dynamic scale) are used. Here, the maximum of the dynamic scale corresponds to the dynamic information of the transient part, and the minimum corresponds to the static information of the stationary part. Therefore, by extracting the maximum and the minimum of this dynamic scale as feature points, Maintain phonological information,
It is possible to extract only the most characteristic partial data, and it is possible to obtain a qualified output with a small amount of data processing.
It becomes possible to perform recognition with higher accuracy.

【００１３】また、特徴パラメータとしては、ＬＰＣケ
プストラム係数、ＬＰＣケプストラム係数の時間変化量
（Δケプストラム）およびΔケプストラムのノルム（動
的尺度）を用いるほか、同様に、他のスペクトル、スペ
クトルの時間変化、その変化量の絶対値の和という関係
をもつ３つを用いるようにすればよい。例えばこのスペ
クトルとしては、バンドパスフィルタの出力によるも
の、ＦＦＴ分析の出力によるもの等を用いればよい。As the characteristic parameters, the LPC cepstrum coefficient, the amount of time change of the LPC cepstrum coefficient (Δ cepstrum), and the norm of the Δ cepstrum (dynamic scale) are used. , Three having the relationship of the sum of the absolute values of the change amounts may be used. For example, as the spectrum, the output from the band pass filter, the output from the FFT analysis, or the like may be used.

【００１４】そして、抽出された特徴パラメータ系列
と、単語標準パターンとの類似度を比較するＤＰマッチ
ングを行い、類似度を演算し、最も類似している標準パ
ターンを決定して、認識結果を出力するようにしている
ため、高速で高精度の音声認識が可能となる。Then, DP matching is performed to compare the similarity between the extracted characteristic parameter series and the word standard pattern, the similarity is calculated, the most similar standard pattern is determined, and the recognition result is output. Therefore, high-speed and high-accuracy voice recognition can be performed.

【００１５】[0015]

【実施例】次に、本発明の実施例について図面を参照し
つつ詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１６】図１は、本発明の一実施例を示す概略ブロ
ック図である。この装置は、特徴パラメータとして、Ｌ
ＰＣケプストラム係数、ＬＰＣケプストラム係数の時間
変化量（Δケプストラム）およびΔケプストラムのノル
ム（動的尺度）を用い、この動的尺度の極大および極小
を特徴点として抽出する特徴点抽出部１３を配設したこ
とを特徴とするものである。そしてまた、単語標準パタ
−ン格納部１４は、あらかじめ多数話者の発話した単語
音声の学習サンプルデータから、同様の手法で動的尺度
の極大および極小を、特徴点として抽出したものを、す
べての単語について作成し、これを標準パターンとし
て、格納している。他の部分については図５に示した従
来例の装置と同様に構成されている。FIG. 1 is a schematic block diagram showing an embodiment of the present invention. This device uses L as a characteristic parameter.
A feature point extraction unit 13 for extracting the maximum and minimum of the dynamic scale as feature points by using the time variation of the PC cepstrum coefficient, the LPC cepstrum coefficient (Δ cepstrum) and the norm of the Δ cepstrum (dynamic scale) is provided. It is characterized by having done. In addition, the word standard pattern storage unit 14 extracts all the maximum and minimum dynamic scales as feature points from the learning sample data of the word voice uttered by many speakers in advance by the same method. The word is created and stored as a standard pattern. Other parts are configured similarly to the conventional device shown in FIG.

【００１７】すなわち、この装置は、音声を検出して音
声信号として出力するように、マイクロフォン等で構成
された音声入力部１１と、この音声信号から特徴パラメ
ータを抽出する音響分析部１２と、抽出された特徴パラ
メータから、動的尺度の極大および極小を求め、これを
抽出する特徴点抽出部１３と、単語標準パターン格納部
１４と、特徴点抽出部１３で抽出されたその時点のパラ
メータ時系列（特徴点）と単語標準パターン格納部１４
の格納データとを照合し類似度を求める単語照合部１５
と、単語照合部１５で得られた単語類似度から入力され
た単語音声を判定し、認識結果を出力する判定部１６と
から構成されている。That is, this apparatus includes a voice input section 11 composed of a microphone or the like so as to detect a voice and output it as a voice signal, an acoustic analysis section 12 for extracting a characteristic parameter from the voice signal, and an extraction. The maximum and minimum of the dynamic scale are obtained from the extracted feature parameters, and the feature point extracting unit 13 for extracting the maximum and the minimum of the dynamic scale, the word standard pattern storage unit 14, and the parameter time series at the time point extracted by the feature point extracting unit 13 are extracted. (Feature points) and word standard pattern storage unit 14
The word collating unit 15 that collates the stored data to find the similarity.
And a determination unit 16 that determines the input word voice from the word similarity obtained by the word matching unit 15 and outputs the recognition result.

【００１８】ここで単語標準パターン格納部１４は、あ
らかじめ発話された単語音声の学習サンプルデータか
ら、上述の特徴点抽出の手法で作成された標準パターン
が格納されている。Here, the word standard pattern storage unit 14 stores the standard pattern created by the above-mentioned feature point extraction method from the learning sample data of the spoken word voice in advance.

【００１９】次に本発明の一実施例の音声認識装置の具
体的な動作について説明する。Next, a specific operation of the voice recognition device according to the embodiment of the present invention will be described.

【００２０】音声入力部１１は、発話された音声を検出
して音声信号として、音響分析部１２に与える。そし
て、音響分析部１２は、与えられた音声信号から、特徴
パラメ−タを抽出する。これは、与えられた音声信号に
対し、線形予測分析、ケプストラム分析などを施すこと
により、図２に示すような入力音声の特徴パラメ−タ時
系列を求めるものである。図２は、「取り次ぎ（とりつ
ぎ）」と発話された単語音声に対する音響分析部１２の
出力例である。抽出するパラメ−タは、線形予測モデル
によるケプストラム係数（ＬＰＣケプストラム係数）、
ＬＰＣケプストラム係数の各次元毎の回帰係数であるΔ
ＬＰＣケプストラム係数、およびΔＬＰＣケプストラム
係数のノルムに相当する動的尺度である。図２におい
て、ＬＰＣケプストラム係数はスペクトル強度に変換し
て表示している。これらの特徴パラメ−タ時系列は、特
徴点抽出部１３に出力される。The voice input unit 11 detects a spoken voice and supplies it as a voice signal to the acoustic analysis unit 12. Then, the acoustic analysis unit 12 extracts characteristic parameters from the given audio signal. This is to obtain the characteristic parameter time series of the input voice as shown in FIG. 2 by performing linear prediction analysis, cepstrum analysis, etc. on the given voice signal. FIG. 2 is an output example of the acoustic analysis unit 12 for a word voice spoken as "intermediate". The parameters to be extracted are the cepstrum coefficient (LPC cepstrum coefficient) by the linear prediction model,
Δ which is a regression coefficient for each dimension of the LPC cepstrum coefficient
It is a dynamic measure corresponding to the norm of the LPC cepstrum coefficient and the ΔLPC cepstrum coefficient. In FIG. 2, the LPC cepstrum coefficient is converted into spectral intensity and displayed. These characteristic parameter time series are output to the characteristic point extraction unit 13.

【００２１】図３は、特徴点抽出部１３の処理を説明す
るための説明図である。特徴パラメータ時系列は、代表
してＬＰＣケプストラム係数から変換したスペクトグラ
ムを表示している。特徴点抽出部１３は、まずこれらの
特徴パラメータらのうち、動的尺度に注目し、この動的
尺度の極大および極小となる時点を検出する。図３にお
いて、極大となる時点は音声中の過渡部で、破裂音であ
る／ｔ／などの子音部や、／ｒ／と／ｉ／などの音素間
遷移部の位置である。また、極小となる時点は音声中の
定常部で、／ｉ／などの母音部や、／ｔｓ／の摩擦音な
ど時間的に継続する子音部の位置である。従って、動的
尺度の極小および極大値となる時点は、音素あるいは音
素間遷移部で、音韻情報を残す位置である。この時点を
特徴点として、特徴点パラメータ時系列から特徴点のみ
のパラメータを抽出して特徴点時系列を構成する。特徴
点として抽出する特徴パラメータは、ＬＰＣケプストラ
ム係数、ΔＬＰＣケプストラム係数およびΔＬＰＣケプ
ストラム係数のノルムである。そしてこの特徴点時系列
は、単語照合部１５に与えられる。FIG. 3 is an explanatory diagram for explaining the processing of the feature point extraction unit 13. The characteristic parameter time series representatively displays a spectrogram converted from the LPC cepstrum coefficient. The feature point extraction unit 13 first pays attention to the dynamic scale among these feature parameters, and detects the time points at which the dynamic scale becomes maximum and minimum. In FIG. 3, the maximum point is the transitional part in the voice, which is the position of the consonant part such as / t / which is a plosive sound, and the inter-phoneme transition part such as / r / and / i /. In addition, the time point of the minimum is the stationary part in the voice, and is the position of the vowel part such as / i / or the consonant part such as the frictional sound such as / ts / which continues in time. Therefore, the minimum and maximum values of the dynamic scale are positions at which phoneme information is left at the phonemes or the transition parts between phonemes. Using this time point as a feature point, parameters of only the feature points are extracted from the feature point parameter time series to form the feature point time series. The characteristic parameters extracted as the characteristic points are the LPC cepstrum coefficient, the ΔLPC cepstrum coefficient, and the norm of the ΔLPC cepstrum coefficient. Then, this feature point time series is given to the word matching unit 15.

【００２２】一方、単語標準パタ−ン格納部１４は、あ
らかじめ多数話者の発話した単語音声の学習サンプルデ
ータから、上述の特徴点抽出の手法で、すべての単語に
ついてＬＰＣケプストラム係数、ΔＬＰＣケプストラム
係数およびΔＬＰＣケプストラム係数の動的尺度を得、
この動的尺度の極大および極小を抽出し、これを標準パ
ターンとして格納することによって得られる。On the other hand, the word standard pattern storage unit 14 extracts the LPC cepstrum coefficient and the ΔLPC cepstrum coefficient for all words from the learning sample data of the word speech spoken by a large number of speakers in advance by the above-described feature point extraction method. And a dynamic measure of the ΔLPC cepstrum coefficient,
It is obtained by extracting the maximum and the minimum of this dynamic measure and storing them as a standard pattern.

【００２３】そして単語照合部１５では、特徴点時系列
パターンと単語標準パターンとのＤＰマッチングを行
い、単語類似度を出力する。図４(a) および(b) に、こ
のＤＰマッチングに際して必要となる、計算量の比較を
示す。図４(a) および(b) はそれぞれ、従来の方式にお
けるＤＰ計算量と本発明の方式におけるＤＰ計算量とを
示し、この図においてＤＰ計算量は、整合窓内の面積に
相当する。これらの比較から本発明によれば大幅に計算
量が低減されることがわかる。Then, the word matching unit 15 performs DP matching between the feature point time series pattern and the word standard pattern, and outputs the word similarity. Figures 4 (a) and 4 (b) show a comparison of the amount of calculation required for this DP matching. 4 (a) and 4 (b) respectively show the DP calculation amount in the conventional method and the DP calculation amount in the method of the present invention, in which the DP calculation amount corresponds to the area within the matching window. From these comparisons, it can be seen that the present invention significantly reduces the calculation amount.

【００２４】このようにして演算がなされ、単語照合部
１５でえられた単語類似度は、判定部１６に出力され
る。The word similarity obtained by the word collating section 15 is calculated in this way, and is output to the judging section 16.

【００２５】そして判定部１６は、全単語のなかで、単
語類似度が最大となる単語を選択し、しきい値から結果
の判断を行う。このようにして、高速でかつ高精度の音
声認識を行うことが可能となる。Then, the judgment unit 16 selects the word having the maximum word similarity from all the words, and judges the result from the threshold value. In this way, high-speed and high-accuracy voice recognition can be performed.

【００２６】なお、本発明は、趣旨を逸脱しない範囲で
適宜変形可能であり、例えばオンライン手書き文字認識
など、種々の分野で適用可能である。The present invention can be appropriately modified without departing from the spirit of the invention, and can be applied to various fields such as online handwritten character recognition.

【００２７】[0027]

【発明の効果】以上説明してきたように、本発明によれ
ば、発話された音声の特徴的な時点のみのパラメータ時
系列をＤＰマッチングで扱うことにより、大語彙あるい
は不特定話者の音声認識において認識能力を劣化させる
ことなく、認識速度を向上させることができる。As described above, according to the present invention, the speech recognition of a large vocabulary or an unspecified speaker is performed by handling the parameter time series of only the characteristic points of the uttered speech by DP matching. In, the recognition speed can be improved without deteriorating the recognition ability.

[Brief description of drawings]

【図１】本発明の一実施例の概略ブロツク図FIG. 1 is a schematic block diagram of an embodiment of the present invention.

【図２】本発明の一実施例における音響分析部１２の出
力例を示す図FIG. 2 is a diagram showing an output example of an acoustic analysis unit 12 in one embodiment of the present invention.

【図３】本発明の一実施例における特徴点抽出部１３の
出力例を示す図FIG. 3 is a diagram showing an output example of a feature point extraction unit 13 in one embodiment of the present invention.

【図４】従来方式と本発明とのＤＰ計算量の比較図FIG. 4 is a comparison diagram of DP calculation amounts between the conventional method and the present invention.

【図５】従来の単語音声認識装置を示す概略ブロツク図FIG. 5 is a schematic block diagram showing a conventional word voice recognition device.

[Explanation of symbols]

１音声入力部２音響分析部３単語標準パタ−ン格納部４単語照合部５判定部１１音声入力部１２音響分析部１３特徴点抽出部１４単語標準パターン格納部１５単語照合部１６判定部 1 Speech Input Section 2 Acoustic Analysis Section 3 Word Standard Pattern Storage Section 4 Word Matching Section 5 Judgment Section 11 Speech Input Section 12 Acoustic Analysis Section 13 Feature Point Extraction Section 14 Word Standard Pattern Storage Section 15 Word Matching Section 16 Judgment Section

Claims

[Claims]

1. A voice input means for detecting a spoken voice and inputting it as a voice signal pattern, and an acoustic analysis means for converting into a time series of characteristic parameters based on the waveform of the input voice signal pattern. From the time series of feature parameters, the feature point extracting means for detecting the time points at which the norm of the amount of change in the spectrum becomes maximum and minimum as feature points, and extracting the feature point time series, and learning sample data of the word speech in advance. The word standard pattern storage means for storing the standard word pattern created from the data, the word matching means for calculating the similarity between the extracted feature point time series and the standard word pattern, and the word matching means A word voice recognition device, comprising: a determination unit that outputs a recognition result from the obtained word similarity.

2. The acoustic analysis means is configured to use an LPC cepstrum coefficient, an amount of time change of the LPC cepstrum coefficient (Δ cepstrum) and a norm of the Δ cepstrum as characteristic parameters. 1. The word voice recognition device described in 1.