JPH0293696A - Speech recognition device - Google Patents

Speech recognition device

Info

Publication number
JPH0293696A
JPH0293696A JP63247845A JP24784588A JPH0293696A JP H0293696 A JPH0293696 A JP H0293696A JP 63247845 A JP63247845 A JP 63247845A JP 24784588 A JP24784588 A JP 24784588A JP H0293696 A JPH0293696 A JP H0293696A
Authority
JP
Japan
Prior art keywords
speech
free area
pattern
free
cut out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63247845A
Other languages
Japanese (ja)
Inventor
Hiroki Onishi
宏樹 大西
Kazuyoshi Okura
計美 大倉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanyo Electric Co Ltd
Original Assignee
Sanyo Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanyo Electric Co Ltd filed Critical Sanyo Electric Co Ltd
Priority to JP63247845A priority Critical patent/JPH0293696A/en
Publication of JPH0293696A publication Critical patent/JPH0293696A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To perform accurate speech recognition by setting a free area which is set before a temporary start point in terms of time longer than a free area which is set after the temporary start point when the start point and end point of a partial pattern are set. CONSTITUTION:This device is equipped with a microphone 1, a speech analysis part 2, an input speech pattern buffer 3, a speech section segmentation part 4, an asymmetrical end point free DP matching part 5, and a standard speech pattern memory 6. In this case, the free area set before the head of a part segmented as a speech section candidate in terms of time is set longer than the free area set after said head and the free area set after the tail of the part segmented as the speech section candidate in terms of time is set longer than the free area set before said tail. Consequently, a language section in noises and continuous speech is accurately segmented and the recognition rate can be improved.

Description

【発明の詳細な説明】 (イ)産業上の利用分野 本発明は、入力された音声中から音声区間を正確に切り
出すことによって、正確な音声認識を行なう音声認識装
置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION (A) Field of Industrial Application The present invention relates to a speech recognition device that performs accurate speech recognition by accurately cutting out speech sections from input speech.

(ロ)従来の技術 音声認識において、雑音中での認識、連続音声中の音素
認識など、入力音声中から音声区間の切ン出しを行なう
実用的な音声認識装置は、先ず、人力音声のパワーが、
あるしきい値以上となる区間を音声区間候補として切り
出し、仮の始端、終端を決め、切り出された部分パター
ンと該標準音声パターンとを入力音声パターン側の仮の
始端、終端をフリーとした非線形マツチングにより比較
し、該部分パターンの始端、終端を決定する場合が多い
(b) Conventional technology In speech recognition, a practical speech recognition device that cuts out speech sections from input speech, such as recognition in noise or phoneme recognition in continuous speech, is based on the power of human speech. but,
A section that exceeds a certain threshold value is cut out as a speech section candidate, a temporary start and end are determined, and the cut out partial pattern and the standard speech pattern are used as a nonlinear method with the temporary start and end of the input speech pattern side free. In many cases, the starting end and ending end of the partial pattern are determined by comparing by matching.

第2図にこのような従来の音声認識装置の一例を示す。FIG. 2 shows an example of such a conventional speech recognition device.

マイクロフォン[7]より、入力された音声は、音声分
析部[8コで分析され、10ms程度のフレーム周期で
スペクトルやケプストラムのパラメータ時系列に変換さ
れる。このパラメータ時系列は入力音声パターンバッフ
ァ[9コに格納される。音声区間切り出し部[10]で
は、入力音声のパワーが、あるしきい値(TH)以上と
なる区間を音声区間候補として切り出し、かかる音声区
間候補情報と、パラメータ時系列を端点フ)−DPマツ
チング部[11コに送る。
The voice input from the microphone [7] is analyzed by the voice analysis unit [8] and converted into a spectrum or cepstrum parameter time series at a frame period of about 10 ms. This parameter time series is stored in the input voice pattern buffer [9]. The speech section extraction unit [10] cuts out sections in which the power of the input speech is equal to or higher than a certain threshold value (TH) as speech section candidates, and performs endpoint DP matching between the speech section candidate information and the parameter time series. Department [Send to 11th.

この端点フリーDPマンチング部[+ 1]の動作は以
下のとおりである。
The operation of this end point free DP munching unit [+1] is as follows.

即ち、音声区間切り出し部[10]より送られてきたデ
ータをもとに、第3図(a)に示した様な、仮の始端よ
り時間方向で前にとるフリーエリアFBbと、仮の始端
より時間方向で後にとるフリーエリアFBaとを同じ時
間長に設定する。更に、仮の終端より時間方向で前にと
るフリーエリアFAbと、仮の終端より時間方向で後に
とるフリー工JアF、Aaとを同じ時間長に設定する。
That is, based on the data sent from the voice section extraction unit [10], a free area FBb taken before the temporary starting point in the time direction and a temporary starting point as shown in FIG. 3(a) are determined. The free area FBa, which is taken later in the time direction, is set to the same time length. Furthermore, the free area FAb, which is located before the temporary end in the time direction, and the free area JAF, Aa, which is taken after the temporary end in the time direction, are set to the same time length.

斯くして得られたフリーエリアを用いた端点フリーDP
マンチングにより、標準音声パターンメモリ[12]内
の標準音声パターンと入力音声パターンとのマVヂング
を行なうことになる。
End point free DP using the free area obtained in this way
By munching, mapping is performed between the standard voice pattern in the standard voice pattern memory [12] and the input voice pattern.

(ハ)発明が解決しようとする課題 り述の従来の音声認識装置においては、仮の始端、終端
におけるフリーエリアが、FBa=FBb、F 、Aa
= F Abになっている場合、第4図に示す様な問題
が起こる。
(c) Problems to be Solved by the Invention In the conventional speech recognition device described above, the free areas at the temporary start and end points are FBa=FBb, F, Aa
= F Ab, a problem as shown in FIG. 4 occurs.

即ち、例えば標準パターン音声メモリ[12コに”あい
かぎ”と”いか”という1語が記憶されているものとす
る。
That is, for example, it is assumed that one word "Aikagi" and "Squid" is stored in 12 standard pattern voice memories.

今、マイクロフォン[7]より”あいかぎ”という単語
を入力したが、第4図(a)に示したように語頭、語尾
のパワーが小さくなってしまいしきい値(TH)でのパ
ワーによる音声候補区間の切り出し結果が、同図(b)
のようになる。
I just input the word "Aikagi" through the microphone [7], but as shown in Figure 4 (a), the power at the beginning and end of the word is small, so the voice candidates are determined by the power at the threshold (TH). The result of cutting out the section is shown in the same figure (b).
become that way.

かかる音声候補区間に第3図(a)に示したフリーエリ
アを適用してマツチングを行なうと、入力音声の”あい
かぎ”という単語の語頭、語尾が削除された形で、単語
”いか”とマツチングがとれてしまう。この結果、同図
(c)の”いか”とのマツチング距離のほうが、同図(
d)の”あいかぎ” とのそれより小さくなり、誤認識
を招くこととなる。
When matching is performed by applying the free area shown in Figure 3(a) to this voice candidate section, the input voice word ``Aikagi'' is matched with the word ``Ika'' with the beginning and end of the word deleted. It comes off. As a result, the matching distance with "squid" in the same figure (c) is higher than that of the "squid" in the same figure (c).
It will be smaller than that of the "Ai-key" in d), leading to erroneous recognition.

(ニ)課題を解決するための手段 本発明の音声認識装置は、音声のパワーがあるしきい値
以上となる区間を音声区間候補として切り出し、該切り
出された部分パターンと該標準音声パターンとを入力音
声パターン側の仮の始端、終端をフリーとした非線形マ
ツチングにより比較し、該部分パターンの始端、終端を
決定するときに、麻3図(b)に示すように、仮の始端
より時間方向で前にとるフリーエリアFBbを仮の始端
よ少時間方向で後にとるフリーエリアFBaよりも長く
設定し、かつ仮の終端より時間方向で後にとるフリーエ
リアFAaを仮の終端より時間方向で前にとるフリーエ
リアFAbよりも長く設定するものである。
(d) Means for Solving the Problems The speech recognition device of the present invention cuts out a section in which the voice power exceeds a certain threshold value as a speech section candidate, and compares the cut out partial pattern with the standard speech pattern. When determining the start and end of the partial pattern by comparing by non-linear matching with the temporary start and end of the input audio pattern free, as shown in Figure 3 (b), from the temporary start Set the free area FBb to be taken before the temporary start point to be longer than the free area FBa to be taken after the temporary start point in the time direction, and set the free area FAa to be taken after the temporary end point in the time direction to be earlier than the temporary end point in the time direction. This is set to be longer than the free area FAb.

(ホ)作用 本発明の音声認識装置に於ては、フリーエリアを第4図
(e)に示した様にFBb>FBa、FXa>FAbと
設定することにより、同図(a)  ・ (b)に示し
たと同様の入力音声条件下にであっても、同図(f)の
”いか”とのマツチング距離を同図(eJのハlチング
で示す領域に対応する分大きくすることができる。従っ
て、同図(g)の”あいかぎ”との7ンチング距離の方
が小さくなり、”あいかぎ”として認識することができ
る。
(e) Effect In the speech recognition device of the present invention, by setting the free area as FBb>FBa and FXa>FAb as shown in FIG. 4(e), ) Even under the same input voice conditions as shown in Figure 3(f), the matching distance with ``squid'' in Figure 2(f) can be increased by an amount corresponding to the area shown by the hatching in Figure 3(eJ). .Therefore, the 7inch distance from the "Ai-key" shown in FIG.

(へ)実施例 第1図に本発明の音声認識装置の一実施例を示す。マイ
クロフォン[1]より、入力された音声は、音声分析部
[2コで分析され、10m5程度のフレーム周期でスペ
クトルやケプストラムのパラメータ時系列に変換される
。このパラメータ時系列は入力音声パターンバッファ[
3]に格納される。音声区間切り出し部[4]では、入
力音声のパワーが、あるしきい値(TH)以上となる区
間を音声区間候補として切り出し、かかる音声区間候補
情報と、パラメータ時系列を非対称端点フJ−DPマツ
チング部[5]に送る。
(f) Embodiment FIG. 1 shows an embodiment of the speech recognition device of the present invention. The voice input from the microphone [1] is analyzed by the voice analysis unit [2] and converted into a spectrum or cepstrum parameter time series at a frame period of about 10 m5. This parameter time series is the input speech pattern buffer [
3]. The speech section extraction unit [4] cuts out sections in which the power of the input speech is equal to or higher than a certain threshold value (TH) as speech section candidates, and uses the speech section candidate information and the parameter time series as an asymmetric endpoint filter J-DP. Send it to the matching section [5].

本発明装置が最も特徴とする非対称端点フIJ−DPマ
ツチング部[5]は、音声区間切り出し部[4]より送
られてきたデータをもとに、′第3図(b)に示した様
な、仮の始端より時間方向で前にとるフリーエリアFB
bと、仮の始端より時間方向で後にとるフリーエリアF
BaとをF Bb> F Baとなるように設定し、か
つ仮の終端より時間方向で前にとるフリーエリアFAb
と、仮の終端より時間方向で後にとるフリーエリアFA
aとをF、Aa>FAbとなるように設定する。さらに
この条件で設定されたフリーエリアで端点フリーDPマ
ンチング処理を行い、標準音声パターンメモリ[6]内
の標準音声パターンと入力音声パターンとのマツチング
を行なう。
The asymmetrical endpoint IJ-DP matching unit [5], which is the most characteristic feature of the device of the present invention, performs the following process based on the data sent from the voice section extraction unit [4], as shown in Fig. 3(b). A free area FB taken in front of the temporary starting point in the time direction.
b, and the free area F taken after the temporary starting point in the time direction.
Set Ba so that F Bb > F Ba, and set the free area F Ab before the temporary end in the time direction.
and the free area FA taken after the temporary end in the time direction.
a and F so that Aa>FAb. Furthermore, end point free DP munching processing is performed in the free area set under these conditions to match the standard voice pattern in the standard voice pattern memory [6] with the input voice pattern.

(ト)発明の効果 以上の説明から明らかな如く、本発明の音声認識装置に
よれば、端点フリーDPマツチングにおける局所パター
ンのマツチング誤りを防ぎ、精度よく雑音中や連続音声
中の単語区間の切り出しを行うことができ、認識率の向
上が図れる。
(G) Effects of the Invention As is clear from the above description, the speech recognition device of the present invention prevents local pattern matching errors in endpoint-free DP matching and accurately cuts out word sections in noise or continuous speech. The recognition rate can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の音声認識装置の一実施例を示す構成図
、第2図は従来音声認識装置の構成図、第3図(a) 
 (b)及び第4図に)牛I中井は音声パターン図であ
る。 [1]、、、?イクロフオン、 [2]、、、音声分析部、 [3] 、、、入力音声パターンバッファ、[4]、、
、音声区間切り出し部、 [5] 、、、非対称端点フリーDPマツチング部、[
6] 、、、標準音声パターンメモリ。
FIG. 1 is a block diagram showing an embodiment of the speech recognition device of the present invention, FIG. 2 is a block diagram of a conventional speech recognition device, and FIG. 3(a)
(b) and Figure 4) Ushi I Nakai is a voice pattern diagram. [1],,,? Iklofon, [2], Speech analysis section, [3], Input speech pattern buffer, [4],...
,Speech section extraction unit, [5] ,,Asymmetric end point free DP matching unit, [
6] ,,,Standard voice pattern memory.

Claims (1)

【特許請求の範囲】[Claims] (1)予め該音声分析手段により抽出しておいた標準音
声パターンと該音声分析手段より抽出される入力音声パ
ターンのうち、音声のパワーがあるしきい値以上となる
区間を、音声区間候補として切り出し、該切り出された
部分パターンと該標準音声パターンとを入力音声パター
ン側の始端、終端をフリーとした非線形マッチングによ
り比較し、該部分パターンの真の始端、終端を決定する
音声認識装置において、音声区間候補として切り出され
た部分の先頭より時間方向で前にとるフリーエリアを、
音声区間候補として切り出された部分の先頭より時間方
向で後にとるフリーエリアよりも長く設定し、かつ音声
区間候補として切り出された部分の末尾より時間方向で
後にとるフリーエリアを、音声区間候補として切り出さ
れた部分の末尾より時間方向で前にとるフリーエリアよ
りも長く設定する非対象端点フリーマッチング手段を備
えたことを特徴とする音声認識装置。
(1) Among the standard speech pattern extracted in advance by the speech analysis means and the input speech pattern extracted by the speech analysis means, sections in which the speech power exceeds a certain threshold are selected as speech section candidates. In a speech recognition device, the cut out partial pattern and the standard speech pattern are compared by non-linear matching with the starting end and ending end of the input speech pattern side free, and determining the true starting end and ending end of the partial pattern, The free area taken before the beginning of the part cut out as a speech section candidate in the time direction is
A free area that is set longer than a free area taken after the beginning of the part cut out as a speech section candidate in the time direction and a free area taken after the end of the part cut out as a speech section candidate in the time direction is cut out as a speech section candidate. 1. A speech recognition device comprising non-target end point free matching means for setting a free area longer than a free area taken temporally before the end of a part of the target part.
JP63247845A 1988-09-30 1988-09-30 Speech recognition device Pending JPH0293696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63247845A JPH0293696A (en) 1988-09-30 1988-09-30 Speech recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63247845A JPH0293696A (en) 1988-09-30 1988-09-30 Speech recognition device

Publications (1)

Publication Number Publication Date
JPH0293696A true JPH0293696A (en) 1990-04-04

Family

ID=17169522

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63247845A Pending JPH0293696A (en) 1988-09-30 1988-09-30 Speech recognition device

Country Status (1)

Country Link
JP (1) JPH0293696A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001527202A (en) * 1997-12-22 2001-12-25 コーニング インコーポレイテッド Method for firing ceramic honeycomb body and tunnel kiln used for firing
WO2003107326A1 (en) * 2002-06-12 2003-12-24 三菱電機株式会社 Speech recognizing method and device thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61260299A (en) * 1985-05-15 1986-11-18 株式会社日立製作所 Voice recognition equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61260299A (en) * 1985-05-15 1986-11-18 株式会社日立製作所 Voice recognition equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001527202A (en) * 1997-12-22 2001-12-25 コーニング インコーポレイテッド Method for firing ceramic honeycomb body and tunnel kiln used for firing
JP4723085B2 (en) * 1997-12-22 2011-07-13 コーニング インコーポレイテッド Method for firing ceramic honeycomb body and tunnel kiln used for firing
WO2003107326A1 (en) * 2002-06-12 2003-12-24 三菱電機株式会社 Speech recognizing method and device thereof

Similar Documents

Publication Publication Date Title
CN112289323B (en) Voice data processing method and device, computer equipment and storage medium
JPH0293696A (en) Speech recognition device
JPS6138479B2 (en)
JPH06266386A (en) Word spotting method
JP3063855B2 (en) Finding the minimum value of matching distance value in speech recognition
JPS61292199A (en) Voice recognition equipment
JP2710045B2 (en) Voice recognition method
JPS61260299A (en) Voice recognition equipment
JPS59204099A (en) Voice recognition system
JPS6120879B2 (en)
JPH0160160B2 (en)
JP2768938B2 (en) Pattern comparison method
JP3063856B2 (en) Finding the minimum value of matching distance value in speech recognition
JP2996977B2 (en) Voice recognition device
JPH0293695A (en) Speech recognition device
JPH08146986A (en) Speech recognition device
JPS59170894A (en) Voice section starting system
JPH0458638B2 (en)
JPS60159798A (en) Voice recognition equipment
JPS6027000A (en) Pattern matching
JPS60149097A (en) Voice recognition
JPS60170900A (en) Syllabic voice standard pattern registration system
JPS6265099A (en) Voice recognition equipment
JPS60200294A (en) Phoneme lattice generator
JPS6177099A (en) Voice recognition equipment