JPH0293696A

JPH0293696A - Speech recognition device

Info

Publication number: JPH0293696A
Application number: JP63247845A
Authority: JP
Inventors: Hiroki Onishi; 宏樹大西; Kazuyoshi Okura; 計美大倉
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1988-09-30
Filing date: 1988-09-30
Publication date: 1990-04-04

Abstract

PURPOSE:To perform accurate speech recognition by setting a free area which is set before a temporary start point in terms of time longer than a free area which is set after the temporary start point when the start point and end point of a partial pattern are set. CONSTITUTION:This device is equipped with a microphone 1, a speech analysis part 2, an input speech pattern buffer 3, a speech section segmentation part 4, an asymmetrical end point free DP matching part 5, and a standard speech pattern memory 6. In this case, the free area set before the head of a part segmented as a speech section candidate in terms of time is set longer than the free area set after said head and the free area set after the tail of the part segmented as the speech section candidate in terms of time is set longer than the free area set before said tail. Consequently, a language section in noises and continuous speech is accurately segmented and the recognition rate can be improved.

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、入力された音声中から音声区間を正確に切り
出すことによって、正確な音声認識を行なう音声認識装
置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (A) Field of Industrial Application The present invention relates to a speech recognition device that performs accurate speech recognition by accurately cutting out speech sections from input speech.

（ロ）従来の技術音声認識において、雑音中での認識、連続音声中の音素
認識など、入力音声中から音声区間の切ン出しを行なう
実用的な音声認識装置は、先ず、人力音声のパワーが、
あるしきい値以上となる区間を音声区間候補として切り
出し、仮の始端、終端を決め、切り出された部分パター
ンと該標準音声パターンとを入力音声パターン側の仮の
始端、終端をフリーとした非線形マツチングにより比較
し、該部分パターンの始端、終端を決定する場合が多い
。(b) Conventional technology In speech recognition, a practical speech recognition device that cuts out speech sections from input speech, such as recognition in noise or phoneme recognition in continuous speech, is based on the power of human speech. but,
A section that exceeds a certain threshold value is cut out as a speech section candidate, a temporary start and end are determined, and the cut out partial pattern and the standard speech pattern are used as a nonlinear method with the temporary start and end of the input speech pattern side free. In many cases, the starting end and ending end of the partial pattern are determined by comparing by matching.

第２図にこのような従来の音声認識装置の一例を示す。FIG. 2 shows an example of such a conventional speech recognition device.

マイクロフォン［７］より、入力された音声は、音声分
析部［８コで分析され、１０ｍｓ程度のフレーム周期で
スペクトルやケプストラムのパラメータ時系列に変換さ
れる。このパラメータ時系列は入力音声パターンバッフ
ァ［９コに格納される。音声区間切り出し部［１０］で
は、入力音声のパワーが、あるしきい値（ＴＨ）以上と
なる区間を音声区間候補として切り出し、かかる音声区
間候補情報と、パラメータ時系列を端点フ）−ＤＰマツ
チング部［１１コに送る。The voice input from the microphone [7] is analyzed by the voice analysis unit [8] and converted into a spectrum or cepstrum parameter time series at a frame period of about 10 ms. This parameter time series is stored in the input voice pattern buffer [9]. The speech section extraction unit [10] cuts out sections in which the power of the input speech is equal to or higher than a certain threshold value (TH) as speech section candidates, and performs endpoint DP matching between the speech section candidate information and the parameter time series. Department [Send to 11th.

この端点フリーＤＰマンチング部［＋　１］の動作は以
下のとおりである。The operation of this end point free DP munching unit [+1] is as follows.

即ち、音声区間切り出し部［１０］より送られてきたデ
ータをもとに、第３図（ａ）に示した様な、仮の始端よ
り時間方向で前にとるフリーエリアＦＢｂと、仮の始端
より時間方向で後にとるフリーエリアＦＢａとを同じ時
間長に設定する。更に、仮の終端より時間方向で前にと
るフリーエリアＦＡｂと、仮の終端より時間方向で後に
とるフリー工ＪアＦ、Ａａとを同じ時間長に設定する。That is, based on the data sent from the voice section extraction unit [10], a free area FBb taken before the temporary starting point in the time direction and a temporary starting point as shown in FIG. 3(a) are determined. The free area FBa, which is taken later in the time direction, is set to the same time length. Furthermore, the free area FAb, which is located before the temporary end in the time direction, and the free area JAF, Aa, which is taken after the temporary end in the time direction, are set to the same time length.

斯くして得られたフリーエリアを用いた端点フリーＤＰ
マンチングにより、標準音声パターンメモリ［１２］内
の標準音声パターンと入力音声パターンとのマＶヂング
を行なうことになる。End point free DP using the free area obtained in this way
By munching, mapping is performed between the standard voice pattern in the standard voice pattern memory [12] and the input voice pattern.

（ハ）発明が解決しようとする課題り述の従来の音声認識装置においては、仮の始端、終端
におけるフリーエリアが、ＦＢａ＝ＦＢｂ、Ｆ　、Ａａ
＝　Ｆ　Ａｂになっている場合、第４図に示す様な問題
が起こる。(c) Problems to be Solved by the Invention In the conventional speech recognition device described above, the free areas at the temporary start and end points are FBa=FBb, F, Aa
= F Ab, a problem as shown in FIG. 4 occurs.

即ち、例えば標準パターン音声メモリ［１２コに”あい
かぎ”と”いか”という１語が記憶されているものとす
る。That is, for example, it is assumed that one word "Aikagi" and "Squid" is stored in 12 standard pattern voice memories.

今、マイクロフォン［７］より”あいかぎ”という単語
を入力したが、第４図（ａ）に示したように語頭、語尾
のパワーが小さくなってしまいしきい値（ＴＨ）でのパ
ワーによる音声候補区間の切り出し結果が、同図（ｂ）
のようになる。I just input the word "Aikagi" through the microphone [7], but as shown in Figure 4 (a), the power at the beginning and end of the word is small, so the voice candidates are determined by the power at the threshold (TH). The result of cutting out the section is shown in the same figure (b).
become that way.

かかる音声候補区間に第３図（ａ）に示したフリーエリ
アを適用してマツチングを行なうと、入力音声の”あい
かぎ”という単語の語頭、語尾が削除された形で、単語
”いか”とマツチングがとれてしまう。この結果、同図
（ｃ）の”いか”とのマツチング距離のほうが、同図（
ｄ）の”あいかぎ”　とのそれより小さくなり、誤認識
を招くこととなる。When matching is performed by applying the free area shown in Figure 3(a) to this voice candidate section, the input voice word ``Aikagi'' is matched with the word ``Ika'' with the beginning and end of the word deleted. It comes off. As a result, the matching distance with "squid" in the same figure (c) is higher than that of the "squid" in the same figure (c).
It will be smaller than that of the "Ai-key" in d), leading to erroneous recognition.

（ニ）課題を解決するための手段本発明の音声認識装置は、音声のパワーがあるしきい値
以上となる区間を音声区間候補として切り出し、該切り
出された部分パターンと該標準音声パターンとを入力音
声パターン側の仮の始端、終端をフリーとした非線形マ
ツチングにより比較し、該部分パターンの始端、終端を
決定するときに、麻３図（ｂ）に示すように、仮の始端
より時間方向で前にとるフリーエリアＦＢｂを仮の始端
よ少時間方向で後にとるフリーエリアＦＢａよりも長く
設定し、かつ仮の終端より時間方向で後にとるフリーエ
リアＦＡａを仮の終端より時間方向で前にとるフリーエ
リアＦＡｂよりも長く設定するものである。(d) Means for Solving the Problems The speech recognition device of the present invention cuts out a section in which the voice power exceeds a certain threshold value as a speech section candidate, and compares the cut out partial pattern with the standard speech pattern. When determining the start and end of the partial pattern by comparing by non-linear matching with the temporary start and end of the input audio pattern free, as shown in Figure 3 (b), from the temporary start Set the free area FBb to be taken before the temporary start point to be longer than the free area FBa to be taken after the temporary start point in the time direction, and set the free area FAa to be taken after the temporary end point in the time direction to be earlier than the temporary end point in the time direction. This is set to be longer than the free area FAb.

（ホ）作用本発明の音声認識装置に於ては、フリーエリアを第４図
（ｅ）に示した様にＦＢｂ＞ＦＢａ、ＦＸａ＞ＦＡｂと
設定することにより、同図（ａ）　　・　（ｂ）に示し
たと同様の入力音声条件下にであっても、同図（ｆ）の
”いか”とのマツチング距離を同図（ｅＪのハｌチング
で示す領域に対応する分大きくすることができる。従っ
て、同図（ｇ）の”あいかぎ”との７ンチング距離の方
が小さくなり、”あいかぎ”として認識することができ
る。(e) Effect In the speech recognition device of the present invention, by setting the free area as FBb>FBa and FXa>FAb as shown in FIG. 4(e), ) Even under the same input voice conditions as shown in Figure 3(f), the matching distance with ``squid'' in Figure 2(f) can be increased by an amount corresponding to the area shown by the hatching in Figure 3(eJ). .Therefore, the 7inch distance from the "Ai-key" shown in FIG.

（へ）実施例第１図に本発明の音声認識装置の一実施例を示す。マイ
クロフォン［１］より、入力された音声は、音声分析部
［２コで分析され、１０ｍ５程度のフレーム周期でスペ
クトルやケプストラムのパラメータ時系列に変換される
。このパラメータ時系列は入力音声パターンバッファ［
３］に格納される。音声区間切り出し部［４］では、入
力音声のパワーが、あるしきい値（ＴＨ）以上となる区
間を音声区間候補として切り出し、かかる音声区間候補
情報と、パラメータ時系列を非対称端点フＪ−ＤＰマツ
チング部［５］に送る。(f) Embodiment FIG. 1 shows an embodiment of the speech recognition device of the present invention. The voice input from the microphone [1] is analyzed by the voice analysis unit [2] and converted into a spectrum or cepstrum parameter time series at a frame period of about 10 m5. This parameter time series is the input speech pattern buffer [
3]. The speech section extraction unit [4] cuts out sections in which the power of the input speech is equal to or higher than a certain threshold value (TH) as speech section candidates, and uses the speech section candidate information and the parameter time series as an asymmetric endpoint filter J-DP. Send it to the matching section [5].

本発明装置が最も特徴とする非対称端点フＩＪ−ＤＰマ
ツチング部［５］は、音声区間切り出し部［４］より送
られてきたデータをもとに、′第３図（ｂ）に示した様
な、仮の始端より時間方向で前にとるフリーエリアＦＢ
ｂと、仮の始端より時間方向で後にとるフリーエリアＦ
ＢａとをＦ　Ｂｂ＞　Ｆ　Ｂａとなるように設定し、か
つ仮の終端より時間方向で前にとるフリーエリアＦＡｂ
と、仮の終端より時間方向で後にとるフリーエリアＦＡ
ａとをＦ、Ａａ＞ＦＡｂとなるように設定する。さらに
この条件で設定されたフリーエリアで端点フリーＤＰマ
ンチング処理を行い、標準音声パターンメモリ［６］内
の標準音声パターンと入力音声パターンとのマツチング
を行なう。The asymmetrical endpoint IJ-DP matching unit [5], which is the most characteristic feature of the device of the present invention, performs the following process based on the data sent from the voice section extraction unit [4], as shown in Fig. 3(b). A free area FB taken in front of the temporary starting point in the time direction.
b, and the free area F taken after the temporary starting point in the time direction.
Set Ba so that F Bb > F Ba, and set the free area F Ab before the temporary end in the time direction.
and the free area FA taken after the temporary end in the time direction.
a and F so that Aa>FAb. Furthermore, end point free DP munching processing is performed in the free area set under these conditions to match the standard voice pattern in the standard voice pattern memory [6] with the input voice pattern.

（ト）発明の効果以上の説明から明らかな如く、本発明の音声認識装置に
よれば、端点フリーＤＰマツチングにおける局所パター
ンのマツチング誤りを防ぎ、精度よく雑音中や連続音声
中の単語区間の切り出しを行うことができ、認識率の向
上が図れる。(G) Effects of the Invention As is clear from the above description, the speech recognition device of the present invention prevents local pattern matching errors in endpoint-free DP matching and accurately cuts out word sections in noise or continuous speech. The recognition rate can be improved.

[Brief explanation of the drawing]

第１図は本発明の音声認識装置の一実施例を示す構成図
、第２図は従来音声認識装置の構成図、第３図（ａ）　
　（ｂ）及び第４図に）牛Ｉ中井は音声パターン図であ
る。［１］、、、？イクロフオン、［２］、、、音声分析部、［３］　、、、入力音声パターンバッファ、［４］、、
、音声区間切り出し部、［５］　、、、非対称端点フリーＤＰマツチング部、［
６］　、、、標準音声パターンメモリ。FIG. 1 is a block diagram showing an embodiment of the speech recognition device of the present invention, FIG. 2 is a block diagram of a conventional speech recognition device, and FIG. 3(a)
(b) and Figure 4) Ushi I Nakai is a voice pattern diagram. [1],,,? Iklofon, [2], Speech analysis section, [3], Input speech pattern buffer, [4],...
,Speech section extraction unit, [5] ,,Asymmetric end point free DP matching unit, [
6] ,,,Standard voice pattern memory.

Claims

[Claims]

(1) Among the standard speech pattern extracted in advance by the speech analysis means and the input speech pattern extracted by the speech analysis means, sections in which the speech power exceeds a certain threshold are selected as speech section candidates. In a speech recognition device, the cut out partial pattern and the standard speech pattern are compared by non-linear matching with the starting end and ending end of the input speech pattern side free, and determining the true starting end and ending end of the partial pattern, The free area taken before the beginning of the part cut out as a speech section candidate in the time direction is
A free area that is set longer than a free area taken after the beginning of the part cut out as a speech section candidate in the time direction and a free area taken after the end of the part cut out as a speech section candidate in the time direction is cut out as a speech section candidate. 1. A speech recognition device comprising non-target end point free matching means for setting a free area longer than a free area taken temporally before the end of a part of the target part.