JPH0115079B2

JPH0115079B2 -

Info

Publication number: JPH0115079B2
Application number: JP57171571A
Authority: JP
Inventors: Yasuo Sato; Tadayasu Sugita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-09-30
Filing date: 1982-09-30
Publication date: 1989-03-15
Also published as: JPS5960499A

Description

【発明の詳細な説明】Ａ発明の技術分野本発明は単語音声認識方式、特に未知入力単語
音声についての入力特徴パラメータ時系列から、
種々の単語長に対応して効率よく認識できるよう
に、複数の区間分割による複数の入力縮小パラメ
ータ時系列を生成し、登録単語長に応じた時系列
長をもつ登録縮小パラメータ時系列と、上記複数
の入力縮小パラメータ時系列の中の同じ時系列長
をもつものとを逐次照合して、認識対象候補単語
を選び出すようにした単語音声認識方式に関する
ものである。[Detailed Description of the Invention] A. Technical Field of the Invention The present invention relates to a word speech recognition method, in particular, to a word speech recognition method, which is based on input feature parameter time series for unknown input word speech.
In order to efficiently recognize various word lengths, multiple input reduced parameter time series are generated by dividing into multiple intervals, and a registered reduced parameter time series with a time series length corresponding to the registered word length and the above-mentioned reduced parameter time series are generated. The present invention relates to a word speech recognition method in which a candidate word to be recognized is selected by sequentially comparing a plurality of input reduction parameter time series with those having the same time series length.

Ｂ従来技術と問題点音声認識システムにおいては、音声信号の周波
数分析結果を利用して各音素の特徴を表わす特徴
パラメータを抽出し、該抽出された特徴パラメー
タと登録単語に対応した予め登録されている特徴
パラメータと照合して未知入力音声の認識を行な
うようにされる。即ち上記特徴パラメータとして
例えば第１ホルマント周波数および第２ホルマン
ト周波数などをサンプリングしてこのパラメータ
を使用するようにされる。しかし、上記照合に当
つてデータ処理量が大となり、認識カテゴリ数が
大となるにつれて上記照合処理に要する時間が大
となる。B. Prior Art and Problems In a speech recognition system, feature parameters representing the features of each phoneme are extracted using the frequency analysis results of a speech signal, and pre-registered words corresponding to the extracted feature parameters and registered words are extracted. The unknown input speech is recognized by comparing it with the feature parameters that are available. That is, for example, the first formant frequency and the second formant frequency are sampled and used as the characteristic parameters. However, the amount of data to be processed for the above matching increases, and as the number of recognized categories increases, the time required for the above matching process increases.

そこで、特願昭52−43972号、特願昭53−53966
号、特願昭53−53967号、特願昭53−53965号等に
みられるように、より少ない数の照合すべき特徴
量のもとで認識率を高める方式が、種々提案され
ている。 Therefore, Japanese Patent Application No. 52-43972 and Patent Application No. 53966
As seen in Japanese Patent Application No. 53-53967, Japanese Patent Application No. 53965-1983, various methods have been proposed for increasing the recognition rate with a smaller number of features to be compared.

特に、本発明者らは、特願昭55−62059号によ
つて、比較的簡単なアルゴリズムの下で、効率よ
く認識対象単語候補を決定する方式を提案してい
る。該方式は、入力音声についての入力特徴パラ
メータ時系列を小数の区間に区分し、各区間毎に
パラメータ値を平均化した平均値からなる縮小特
徴パラメータ時系列を抽出し、当該縮小特徴パラ
メータ時系列によつて認識対象候補単語を選び出
し、該候補単語に対して照合をとるようにして処
理速度を大幅に向上するようにしたものである。
この方式によれば、照合し候補を決定するための
演算に当つて、例えば５個というようなあらかじ
め定められた少ない個数の平均値パラメータ相互
の演算で足り、演算時間が大幅に短縮される。 In particular, the present inventors have proposed, in Japanese Patent Application No. 55-62059, a system for efficiently determining recognition target word candidates using a relatively simple algorithm. This method divides the input feature parameter time series for input speech into decimal intervals, extracts a reduced feature parameter time series consisting of the average value obtained by averaging the parameter values for each interval, and extracts the reduced feature parameter time series. In this method, a candidate word to be recognized is selected by , and a comparison is made against the candidate word, thereby greatly improving the processing speed.
According to this method, when performing calculations to determine matching candidates, it is sufficient to perform mutual calculations on a predetermined small number of average value parameters, such as five, and the calculation time is significantly reduced.

一般に認識対象となる単語には、単語長の長い
ものと短いものとが混在している。単語の長さに
無関係に照合すべきパラメータ数があらかじめ決
定されている上記従来の方式は、処理の簡便さか
らみると、１つの長所をもつと考えられるが、ど
のような単語についても認識精度を一定以上に保
つことを考えれば、単語長に応じて照合すべきパ
ラメータ数、すなわち登録パラメータ時系列長を
変えることができれば、更に効率的な認識が可能
になると考えられる。 In general, words to be recognized include a mixture of long words and short words. The conventional method described above, in which the number of parameters to be matched is determined in advance regardless of the length of the word, is considered to have one advantage in terms of processing simplicity, but it is difficult to recognize the accuracy of any word. Considering that the number of parameters to be compared, that is, the registered parameter time series length, can be changed depending on the word length, more efficient recognition will be possible.

Ｃ発明の目的と構成本発明は上記の点に着目して、上記従来の方法
を改良・発点させ、認識率を低下させることな
く、処理速度の向上および登録辞書のメモリ量の
節減を可能にすることを目的としている。そのた
め、本発明は、入力音声に対して複数の単語長を
仮定して、数種の長さの入力縮小パラメータ時系
列を求め、照合の際、登録単語長に応じたパラメ
ータ時系列でもつて照合するようにしたものであ
る。すなわち、本発明の単語音声認識方式は、未
知入力単語音声の音声信号を分析し、当該音声信
号から抽出された入力特徴パラメータ時系列をも
とに、所定の区分により分割された各区間におけ
るパラメータ値に基づいて、時系列長が上記入力
特徴パラメータ時系列より小さい入力縮小パラメ
ータ時系列を生成するよう構成され、該入力縮小
パラメータ時系列と、あらかじめ登録されている
登録縮小パラメータ時系列とを照合して、未知入
力単語音声の認識を行う単語音声認識方式におい
て、上記未知入力単語音声の始端から終端までの
上記区間分割についての区分数を複数個あらかじ
め定め、該区分数の異なる各区間分割に対応する
パラメータ時系列長の異なる複数種類の入力縮小
パラメータ時系列を、当該入力特徴パラメータ時
系列から生成するよう構成され、あらかじめ登録
単語毎に定められたパラメータ時系列長を有する
登録縮小パラメータ時系列と、上記複数種類の入
力縮小パラメータ時系列のうち照合する登録単語
と同じパラメータ時系列長を有する入力縮小パラ
メータ時系列とを照合することによつて、認識対
象候補単語を決定して、未知入力単語音声の認識
を行うようにしたことを特徴としている。以下図
面を参照しつつ説明する。C. Object and Structure of the Invention The present invention focuses on the above-mentioned points, improves and launches the conventional method described above, and makes it possible to improve processing speed and reduce the memory amount of registered dictionaries without reducing the recognition rate. It is intended to be. Therefore, the present invention assumes a plurality of word lengths for the input speech, obtains input reduction parameter time series of several lengths, and performs verification using the parameter time series according to the registered word length. It was designed to do so. That is, the word speech recognition method of the present invention analyzes the speech signal of unknown input word speech, and based on the time series of input feature parameters extracted from the speech signal, parameters in each section divided into predetermined sections are determined. Based on the value, the input reduction parameter time series is configured to generate an input reduction parameter time series whose time series length is smaller than the input feature parameter time series, and the input reduction parameter time series is compared with a registered reduction parameter time series registered in advance. In a word speech recognition method for recognizing unknown input word speech, a plurality of number of sections are predetermined for the section division from the beginning to the end of the unknown input word speech, and each section division with a different number of sections is A registered reduced parameter time series that is configured to generate multiple types of input reduced parameter time series with different corresponding parameter time series lengths from the input feature parameter time series, and has a parameter time series length determined in advance for each registered word. and an input reduction parameter time series that has the same parameter time series length as the registered word to be compared among the plurality of types of input reduction parameter time series, to determine recognition target candidate words and identify unknown input words. It is characterized by recognition of word sounds. This will be explained below with reference to the drawings.

Ｄ発明の実施例第１図は登録単語とパラメータ時系列長との関
係を説明するための説明図、第２図は本発明にお
ける縮小パラメータ時系列を生成する処理例を説
明するための説明図、第３図は本発明における縮
小パラメータ時系列を生成する他の処理例を説明
するための説明図、第４図は本発明の一実施例構
成、第５図は上記第３図に対応した処理例におけ
る区間決定を行う処理についてフローチヤートの
形で表わした説明図を示す。D. Embodiments of the Invention Fig. 1 is an explanatory diagram for explaining the relationship between registered words and parameter time series lengths, and Fig. 2 is an explanatory diagram for explaining a processing example for generating a reduced parameter time series in the present invention. , FIG. 3 is an explanatory diagram for explaining another processing example for generating a reduced parameter time series in the present invention, FIG. 4 is an example configuration of the present invention, and FIG. 5 corresponds to the above-mentioned FIG. 3. An explanatory diagram in the form of a flowchart of a process for determining a section in a process example is shown.

例えば、登録単語が地名であるとすると、第１
図図示の如く、登録単語には「北海道」のように
単語長が比較的長いものや、「津」のように短い
もの、また「東京」のように平均的なもの等、
種々存在する。一般に、長い単語のものほど、短
い単語のものに比べて、多くの特徴変化を含んで
いると考えられる。本発明においては、入力音声
についての入力特徴パラメータ時系列から、より
少ない数の入力縮小パラメータ時系列を生成し
て、それでもつて、あらかじめ同一の手法で抽出
され登録されている登録縮小パラメータ時系列と
照合することを前提としているが、多くの特徴変
化を含む長い単語のものは、パラメータ時系列長
の大きいもので照合し、短い単語のものは、パラ
メータ時系列長の小さいもので照合したほうが、
認識精度と処理速度とのバランスの点から望しい
と言うことができる。従つて、第１図に示す如
く、登録単語長に応じて、例えば「北海道」につ
いては登録のパラメータ時系列長が“５”、「津」
については“３”というように、あらかじめパラ
メータ時系列長が定められ、その時系列長に対応
した登録縮小パラメータ時系列が登録される。 For example, if the registered word is a place name, the first
As shown in the figure, registered words include relatively long words such as "Hokkaido", short words such as "tsu", and average words such as "Tokyo".
There are various types. Generally, longer words are considered to contain more feature changes than shorter words. In the present invention, a smaller number of input reduced parameter time series are generated from the input feature parameter time series for input speech, and they can still be compared with the registered reduced parameter time series that have been extracted and registered using the same method in advance. However, it is better to match long words that include many feature changes with a long parameter time series length, and match short words with a small parameter time series length.
This can be said to be desirable in terms of the balance between recognition accuracy and processing speed. Therefore, as shown in Figure 1, depending on the registered word length, for example, for "Hokkaido", the registered parameter time series length is "5", and for "Tsu".
For example, the parameter time series length is determined in advance, such as "3", and the registered reduced parameter time series corresponding to the time series length is registered.

しかし、未知入力単語音声から求める入力縮小
パラメータ時系列については、いくらの時系列長
にすればよいかは、不明であり、照合の際、同じ
時系列長のもので照合する必要があるので、本発
明においては、後に詳述する如く、入力縮小パラ
メータ時系列について、あらかじめ定められたす
べての時系列長のものを用意するようにされる。 However, it is unclear what time series length should be used for the input reduction parameter time series obtained from unknown input word speech, and it is necessary to use the same time series length for matching. In the present invention, as will be described in detail later, input reduction parameter time series of all predetermined time series lengths are prepared.

パラメータ時系列長が与えられているときの縮
小パラメータ時系列の生成は、例えば次のように
行われる。 Generation of a reduced parameter time series when a parameter time series length is given is performed, for example, as follows.

第２図図示の如く、時点T₀からT_Eまでの間に、
サンプリングされた特徴パラメータＰが存在する
ものとするとき、例えばパラメータ時系列長Nf
として“５”のものが必要である場合に、時点
T₀からT_Eまでの時間を５つに等分し、時点T_E／
５、2T_E／５、3T_E／５、4T_E／５、T_Eを決定す
る。そして、時点T₀ないしT_E／５までの間の特
徴パラメータ値を平均し、時点T_E／５ないし
2T_E／５までの間の各特徴パラメータ値を平均
し、…時点4T_E／５ないしT_Eまでの間の各特徴パ
ラメータ値を平均し、５個の平均値パラメータよ
りなる縮小パラメータ時系列を抽出するようにす
る。 As shown in Figure 2, between time T ₀ and T _E ,
Assuming that there is a sampled feature parameter P, for example, the parameter time series length Nf
If "5" is required as
Divide the time from T ₀ to T _E into five equal parts and set the time T _E /
5. Determine 2T _E /5, 3T _E /5, 4T _E /5, T _E. Then, the feature parameter values from time T ₀ to T _E /5 are averaged, and the values from time T E /5 to T _E /5 are averaged.
Average each feature parameter value up to 2T _E /5, ... average each feature parameter value between 4T _E /5 and T _E , and create a reduced parameter time series consisting of the five average value parameters. Make it extract.

また、時間を等分するのではなく、第３図図示
の如く、特徴パラメータの累積変動量を等分する
ことによつて、定められたパラメータ時系列長の
縮小パラメータ時系列を生成するようにしてもよ
い。 In addition, instead of dividing the time equally, as shown in Figure 3, by dividing the cumulative variation of the characteristic parameters equally, a reduced parameter time series with a predetermined parameter time series length is generated. It's okay.

第３図に示す例の場合、第２図に示す例におい
て時間軸上で等間隔に区分されるのに対して、特
徴パラメータの変化率が比較的大きい箇所での区
間間隔を小に選ぶようにしている。即ち、特徴パ
ラメータＰが第２図図示の如くあるものとすると
き、このパラメータＰの変動量を累積した値即ち
累積変動量を第３図図示の如く時間を横軸にとつ
て描く。このように描かれた図形について、累積
変動量の最大値TAVを５等分した値1/5TAV、
２／５TAV、3/5TAV、4/5TAV、TAVを選ぶ。
そして累積変動量が上記値1/5TAV、2/5TAV、
…となる時点T₁，T₂，…T_Eを抽出し、時点T₀か
らT₁までの間の第２図図示の各特徴パラメータ
値を平均し、時点T₁からT₂までの間の第２図図
示の各特徴パラメータ値を平均し、…、時点T₄
からT_Eまでの間の第２図図示の各特徴パラメー
タ値を平均し、５個の平均値パラメータよりなる
縮小パラメータ時系列を抽出するようにする。パ
ラメータ時系列長が、例えば“３”の縮小パラメ
ータ時系列を求める場合には、勿論３個の区間に
分割し、それぞれの区間の平均値パラメータより
なる縮小パラメータ時系列を抽出するようにす
る。 In the case of the example shown in Fig. 3, the sections are divided at equal intervals on the time axis in the example shown in Fig. 2, whereas the intervals at points where the rate of change of the characteristic parameter is relatively large are selected to be small. I have to. That is, when the characteristic parameter P is assumed to be as shown in FIG. 2, the cumulative amount of variation of this parameter P, that is, the cumulative amount of variation, is plotted with time as the horizontal axis as shown in FIG. For the figure drawn in this way, the value 1/5TAV is obtained by dividing the maximum value of cumulative variation TAV into 5 equal parts,
Select 2/5TAV, 3/5TAV, 4/5TAV, TAV.
And the cumulative fluctuation amount is the above value 1/5TAV, 2/5TAV,
Extract the time points T ₁ , T ₂ , ...T _E , and average the values of each feature parameter shown in Figure ₂ from _time T ₀ to T ₁ . The values of each characteristic parameter shown in Figure 2 are averaged, and..., at time T ₄
The values of each characteristic parameter shown in FIG. 2 between T _E and T E are averaged, and a reduced parameter time series consisting of five average value parameters is extracted. When obtaining a reduced parameter time series with a parameter time series length of, for example, "3", it is of course divided into three sections, and a reduced parameter time series consisting of the average value parameter of each section is extracted.

上記縮小パラメータ時系列の平均値パラメータ
については、例えば特願昭55−62059号に示され
ているようにして演算されるが、周知となつてい
るので、詳細な説明は省略する。 The average value parameter of the reduced parameter time series is calculated, for example, as shown in Japanese Patent Application No. 55-62059, but since it is well known, detailed explanation will be omitted.

なお、上記特徴パラメータ値を平均する代わり
に、簡略化して、区間境界値からなる縮小パラメ
ータ時系列を抽出するようにしてもよい。 Note that instead of averaging the feature parameter values, it may be simplified to extract a reduced parameter time series consisting of interval boundary values.

第４図は本発明の一実施例構成を示す。図中の
符号１は帯域フイルタ群、２はパラメータ抽出回
路、３はパラメータ平均区間決定回路、４はパラ
メータ平均回路、５は切替回路、６は登録単語縮
小パラメータ時系列登録部、７−１ないし７−ｎ
は各パラメータ時系列長N₁−N_o毎の入力縮小パ
ラメータ時系列バツフア、８は縮小パラメータ時
系列照合部、９は候補単語判定部を表わす。 FIG. 4 shows the configuration of an embodiment of the present invention. In the figure, 1 is a group of band filters, 2 is a parameter extraction circuit, 3 is a parameter average interval determination circuit, 4 is a parameter average circuit, 5 is a switching circuit, 6 is a registered word reduction parameter time series registration unit, 7-1 to 7-n
denotes an input reduced parameter time series buffer for each parameter time series length N ₁ -N _o , 8 represents a reduced parameter time series collation unit, and 9 represents a candidate word determination unit.

入力音声信号が帯域フイルタ群１に入力され、
パラメータ抽出回路２によつて入力音声信号に対
応した入力特徴パラメータが抽出される。パラメ
ータ平均区間決定回路３は、例えばパラメータ時
系列長としてN₁からN_oまでｎ種類定められてい
るとすると、各パラメータ時系列長Nf毎に、第
２図に示した例で言えば、時点T_Eを抽出した上
で、T₀ないしT_Eまでの間をNf等分した時点T_E／
Nf，2T_E／Nf，3T_E／Nf，…，T_Eをそれぞれ決
定する。なお第３図図示の時点T₁，T₂…につい
ては第５図を参照して後述する。上記時点にもと
づいて区間が決定されると、パラメータ平均回路
４は入力特徴パラメータにもとづいて各区間毎に
パラメータ値の平均値を演算する。 The input audio signal is input to band filter group 1,
The parameter extraction circuit 2 extracts input feature parameters corresponding to the input audio signal. For example, if n types of parameter time series lengths are determined from N ₁ to _No , the parameter average interval determination circuit 3 determines the time point for each parameter time series length Nf in the example shown in FIG. After extracting T _E , the period from T ₀ to T _E is divided into Nf equal time points T _E /
Determine Nf, 2T _E /Nf, 3T _E /Nf, ..., T _E, respectively. The time points T ₁ , T _{2 .} . . shown in FIG. 3 will be described later with reference to FIG. 5. Once the sections are determined based on the above points, the parameter averaging circuit 4 calculates the average value of the parameter values for each section based on the input feature parameters.

パラメータ平均区間決定回路３およびパラメー
タ平均回路４を、パラメータ時系列長N₁〜N_oに
対応して、それぞれｎ個設けて、並列的に処理す
るようにしてもよいし、１個の回路を繰り返し使
用することにより、直列的に処理するようしても
よい。 n parameter averaging interval determining circuits 3 and n parameter averaging circuits 4 may be provided each corresponding to the parameter time series lengths N ₁ to _No , and processing may be performed in parallel, or one circuit may be configured to perform processing in parallel. By using it repeatedly, it may be possible to process it serially.

切替回路５は、登録モードと、認識モードにお
ける各入力縮小パラメータ時系列バツフア７−１
〜７−ｎとを切替える回路である。登録モードの
場合、登録する単語の単語長はわかつているの
で、予め適当なパラメータ時系列長Nfを単語毎
に定めておくことができる。従つて、パラメータ
平均回路４によつて、その適当なパラメータ時系
列長Nfをもつ縮小パラメータ時系列を抽出し、
登録単語縮小パラメータ時系列登録部６に登録単
語の文字コードとともに登録する。 The switching circuit 5 converts each input reduction parameter time series buffer 7-1 in the registration mode and the recognition mode.
This circuit switches between 7-n and 7-n. In the registration mode, since the word length of the word to be registered is known, an appropriate parameter time series length Nf can be determined in advance for each word. Therefore, the parameter averaging circuit 4 extracts a reduced parameter time series with an appropriate parameter time series length Nf,
The registered word reduction parameter is registered in the time series registration section 6 together with the character code of the registered word.

認識モードの場合、入力音声の単語長は不明で
ある。従つて、上述の如く、すべてのパラメータ
時系列長N₁〜N_oについての縮小パラメータ時系
列をパラメータ平均回路４によつて抽出するよう
にし、切替回路５を経由して、それぞれ入力縮小
パラメータ時系列バツフア７−１ないし７−ｎに
結課を導いて格納する。 In recognition mode, the word length of the input speech is unknown. Therefore, as described above, the reduced parameter time series for all parameter time series lengths N ₁ to _{N o} are extracted by the parameter averaging circuit 4, and the input reduced parameter time series are extracted via the switching circuit 5. The consequences are led to and stored in series buffers 7-1 to 7-n.

縮小パラメータ時系列照合部８は、認識モード
時に、登録単語縮小パラメータ時系列登録部６か
ら、登録単語の登録縮小パラメータ時系列を順次
読み出し、ｎ個の入力縮小パラメータ時系列バツ
フア７−１〜７−ｎのうちから、登録縮小パラメ
ータ時系列の時系列長と一致するものを選択し、
そこに格納された入力縮小パラメータ時系列と照
合する。 In the recognition mode, the reduction parameter time series collation unit 8 sequentially reads out the registered reduction parameter time series of registered words from the registered word reduction parameter time series registration unit 6, and stores n input reduction parameter time series buffers 7-1 to 7. -n, select one that matches the time series length of the registered reduction parameter time series,
Check against the input reduction parameter time series stored there.

未知入力単語〓と、登録単語〓ⁱとの距離Ｄ
（〓、〓ⁱ）は、Ｄ（〓、〓ⁱ）＝１／Ｎｄ（〓^N、〓ⁱ）または、Ｄ（〓、〓ⁱ）＝Ｃ（Ｎ）・ｄ（〓^N、〓ⁱ）
で与えられる。 Distance D between unknown input word and registered word ⁱ
(〓, ^〓i ) is D(〓, ^〓i )=1/Nd( ^〓N , ^〓i ) or D(〓, ^〓i )=C(N)・d( ^〓N , ^〓i )
is given by

ここで、Ｎは〓ⁱのパラメータ時系列長、〓^Nは
未知入力単語音声から抽出されたパラメータのう
ち時系列長がＮのパラメータである。また、上式
においてｄ（〓^N、〓ⁱ）は、ｄ（〓^N、〓ⁱ）＝_N 〓^N=1 _J 〓^j=1 ｜x^N _j（ｎ）−Sⁱ _j（ｎ）｜で与えられる。x^N _j（ｎ）およびsⁱ _j（ｎ）は、それぞ
れ入力縮小パラメータ時系列および登録縮小パラ
メータ時系列に対応するものである。 Here, N is the parameter time series length of ⁱ , and ^N is a parameter whose time series length is N among the parameters extracted from the unknown input word speech. Also, in the above equation, d(〓 ^N , 〓 ⁱ ) is d(〓 ^N , 〓 ⁱ )= _N 〓 ^N=1 _J 〓 ^j=1 |x ^N _j (n)−S ⁱ _j (n)| Given. x ^N _j (n) and s ⁱ _j (n) correspond to the input reduction parameter time series and the registered reduction parameter time series, respectively.

候補単語判定部９は、縮小パラメータ時系列照
合部８が演算で求めた上記距離Ｄにもとづいて、
登録単語が認識対象候補単語として適当であるか
どうかを判別するものである。こうして判別され
た候補単語名が候補単語判定部９から出力される
ことになる。 Based on the distance D calculated by the reduced parameter time series matching unit 8, the candidate word determination unit 9 calculates
It is determined whether the registered word is suitable as a candidate word to be recognized. The candidate word names determined in this way are output from the candidate word determination section 9.

上記第３図に示す時点T₁，T₂，…を決定する
場合、第４図図示のパラメータ平均区間決定回路
３は第５図にフローチヤートの形で示す如き処理
を行なうものと考えてよい。即ち、パラメータ時
系列長がNfであるものを抽出する場合、次のよ
うに処理する。 When determining the time points T ₁ , T ₂ , . . . shown in FIG. 3 above, the parameter average interval determination circuit 3 shown in FIG. . That is, when extracting a time series whose parameter time series length is Nf, the following processing is performed.

(1) パラメータ抽出回路２によつて抽出されたパ
ラメータにもとづいて第３図に示す如き累積変
動量TAVを抽出する。(1) Based on the parameters extracted by the parameter extraction circuit 2, the cumulative variation amount TAV as shown in FIG. 3 is extracted.

(2) そして累積変動量TAVの値をNf等分した値
DTAVを決定する。(2) And the value obtained by dividing the value of the cumulative fluctuation amount TAV into Nf equal parts
Determine DTAV.

(3) そして最初の時点T₁を求めるべくＪ＝１と
しておき、レジスタAVHに上記値DTAVをセ
ツトし、計時スタート・レジスタTS(J)に値Ｔ
(I)をセツトする。(3) Then, in order to find the first time point _T1 , set J = 1, set the above value DTAV in register AVH, and set value T in timing start register TS(J).
Set (I).

(4) 以下順次特徴パラメータの累積値AV(I)がレ
ジスタAVHの内容と等しいか大となるときま
で、特徴パラメータ値を累算してゆく。(4) Thereafter, feature parameter values are accumulated sequentially until the cumulative value AV(I) of the feature parameters becomes equal to or larger than the contents of the register AVH.

(5) 累積値AV(I)がレジスタAVHの内容と等し
いか大となると、そのときのタイミング値Ｔ(I)
が時点T₁用レジスタTE(1)にセツトされ、上記
レジスタTS（Ｊ＋１）に値Ｔ（Ｉ＋１）をセツ
トし、レジスタAVHに値（AVH＋DTAV）
をセツトし、次の時点T₂を求めるべくＪ＝２
とする。(5) When the cumulative value AV(I) is equal to or greater than the contents of the register AVH, the timing value T(I) at that time
is set in the register TE(1 ₎ for time T1, the value T(I+1) is set in the register TS(J+1), and the value (AVH+DTAV) is set in the register AVH.
and set J=2 to find the next time _T2 .
shall be.

(6) 以下同様に累積値AV(I)がレジスタAVHの
内容と等しいか大となるまで、特徴パラメータ
値を累算してゆく。即ち、時点T₂，T₃，T₄を
求めてゆく。(6) Similarly, feature parameter values are accumulated until the cumulative value AV(I) becomes equal to or greater than the contents of the register AVH. That is, time points T ₂ , T ₃ , and T ₄ are found.

(7) そして累積回路Ｉが値Ｎに達すると、即ち累
算処理が第３図図示時点T_Eに対応する特徴パ
ラメータの累算に達すると、その時点で時点
T_Eが決定される。(7) Then, when the accumulation circuit I reaches the value N, that is, when the accumulation process reaches the accumulation of the characteristic parameters corresponding to the time T _E shown in FIG.
T _E is determined.

Ｅ発明の効果以上説明した如く、本発明によれば、効率のよ
い単語音声認識が可能になる。E. Effects of the Invention As explained above, according to the present invention, efficient word speech recognition becomes possible.

一般的に、認識率を向上させようとすると処理
速度が犠牲になり、処理速度をあげようとすると
認識率を劣化することになるが、本発明によれ
ば、各登録単語に最も適当なパラメータ時系列長
を選ぶことができるので、従来方式に比べて、認
識率を低下させることなく、処理速度を向上させ
ることができる。また、登録辞書のメモリ容量も
節減することが可能となる。 Generally, if you try to improve the recognition rate, you sacrifice the processing speed, and if you try to increase the processing speed, the recognition rate deteriorates, but according to the present invention, the most appropriate parameters for each registered word Since the time series length can be selected, processing speed can be improved compared to conventional methods without reducing the recognition rate. Furthermore, the memory capacity of the registered dictionary can also be reduced.

[Brief explanation of drawings]

第１図は登録単語とパラメータ時系列長との関
係を説明するための説明図、第２図は本発明にお
ける縮小パラメータ時系列を生成する処理例を説
明するための説明図、第３図は本発明における縮
小パラメータ時系列を生成する他の処理例を説明
するための説明図、第４図は本発明の一実施例構
成、第５図は上記第３図に対応した処理例におけ
る区間決定を行う処理についてフローチヤートの
形で表わした説明図を示す。図中、３はパラメータ平均区間決定回路、４は
パラメータ平均回路、６は登録単語縮小パラメー
タ時系列登録部、７−１ないし７−ｎは入力縮小
パラメータ時系列バツフア、８は縮小パラメータ
時系列照合部を表わす。 FIG. 1 is an explanatory diagram for explaining the relationship between registered words and parameter time series lengths, FIG. 2 is an explanatory diagram for explaining a processing example for generating a reduced parameter time series in the present invention, and FIG. An explanatory diagram for explaining another processing example for generating a reduced parameter time series in the present invention, FIG. 4 shows the configuration of an embodiment of the present invention, and FIG. 5 shows interval determination in a processing example corresponding to the above-mentioned FIG. 3. An explanatory diagram in the form of a flowchart is shown for the process of performing. In the figure, 3 is a parameter average interval determination circuit, 4 is a parameter average circuit, 6 is a registered word reduction parameter time series registration unit, 7-1 to 7-n are input reduction parameter time series buffers, and 8 is a reduction parameter time series comparison. represents the department.

Claims

[Claims] 1. Analyze the audio signal of unknown input word audio, and based on the input feature parameter time series extracted from the audio signal, based on the parameter values in each section divided into predetermined sections. , is configured to generate an input reduction parameter time series whose time series length is smaller than the input feature parameter time series, and compares the input reduction parameter time series with a registered reduction parameter time series registered in advance,
In a word speech recognition method for recognizing unknown input word speech, a plurality of division numbers for the section division from the start to the end of the unknown input word speech are predetermined, and parameters corresponding to each section division with a different number of sections are provided. The registered reduced parameter time series is configured to generate multiple types of input reduced parameter time series with different time series lengths from the input feature parameter time series, and has a parameter time series length determined in advance for each registered word; Candidate words to be recognized are determined by comparing registered words to be matched with input reduction parameter time series having the same parameter time series length among multiple types of input reduction parameter time series, and recognition target words are determined. A word speech recognition method characterized by performing recognition.