JPS6147999A - Voice recognition system - Google Patents

Voice recognition system

Info

Publication number
JPS6147999A
JPS6147999A JP59169569A JP16956984A JPS6147999A JP S6147999 A JPS6147999 A JP S6147999A JP 59169569 A JP59169569 A JP 59169569A JP 16956984 A JP16956984 A JP 16956984A JP S6147999 A JPS6147999 A JP S6147999A
Authority
JP
Japan
Prior art keywords
syllable
candidate
interval
speech
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP59169569A
Other languages
Japanese (ja)
Other versions
JPH067346B2 (en
Inventor
外川 文雄
伸 神谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Priority to JP59169569A priority Critical patent/JPH067346B2/en
Publication of JPS6147999A publication Critical patent/JPS6147999A/en
Publication of JPH067346B2 publication Critical patent/JPH067346B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】[Detailed description of the invention]

〈産業上の利用分野〉 本発明は、音声入力式の文書処理装置等に用いられる音
声入力装置における音声B織方式の改良に関するもので
ある。 〈従来技術〉 人力音声をより細分化された音節単位に分解(以下、音
節セグメンテーションという比で抽出した音節区間系列
について、各音節毎に識別した複数個の音節候補からな
る音節候補の時系列(以下、音節ラティスという)に基
づいて識別確度の高い組み合わせによって得られる候補
音節列を辞書と照合して得た候補を入力音声に対する認
識結果とする場合、従来は、上記の音節セグメンテーシ
ョンの際の音節境界を一意に決定することが一般に行な
われている。しかし、このような処理では、入力音声が
より連続的になり会話音声に近くなると、音節境界が不
明確になる為検出が困難となり、認識率が悪くなるとい
う問題点がある。 発明者らは、この点に着目し、音節境界の検出精度を上
げる為に、音節セグメンテーションのアルゴリズムによ
って音節境界を一意に決定せず、発声速度を推定し、こ
の推定値を活用して認識率を向上することを提案してい
るが、これによってもなお音節セグメンテーシヨンの誤
りが生じている。 〈発明が解決しようとする問題、α〉 本発明は、発声速度を活用してもなお音節セグメンテ−
シコンの誤1)が生ずる点を解決し、認識率の良好な音
声認識方式を提供することを目的としてなされたもので
ある。 く問題点を解決する為の手段〉 上述の目的を達する為に、本発明は、各音節の識別距離
と発声速度に基づいた信頼度の両者を用いて候補音節列
の識別確度を決定するようにしたことを特徴としている
。 く作用〉 本発明の方式では、音節境界を一意に決定せず、音節の
識別距離と発声速度に基づく信頼度の二つの要素を用い
ており、音節境界を複数の候補として検出し、同時刻に
競合した複数の音節区間の存在を許した音節セグメンテ
ーションを行なう。そして、各音節区間候補について音
節を識別して得られる競合のある音節候補ラティスを用
いて、音節区間検出の信頼度の高い音節区間を優先して
候補音節列が作成される。 〈実施例〉 以下、図面の一実施例について、本発明を具体的に説明
する。 第1図は、本発明を実施した音声入力装置の溝威を示す
図である。 入力された音声は、先ず、音声分析部1で音響分析され
、無声区間検出部に送られて無音区間が検出され、更に
文節境界検出部5で文節境界としての無音区間かどうか
の検出が行なわれる。一方、有声区間検出部3では、有
音部、即ち音のかたまりが検出され、音節境界候補検出
部6でパワー変化、スペクトル変化、音韻変化等、及び
発声速度記憶部4に記憶されている発声速度情報を用い
て音節境界候補が与えられる0以上の結果は、音声  
 −分析部1で得られた明らかな音節境界情報とともに
音節区間候補検出部7で統合され、妥当な音節区間候補
を信頼度の高い順に得る。 これらの音節区間候補は、パターンメモリ12に予め記
憶させである音節標準パターンとのパターンマツチング
などを、音節識別部8で行なって識別され、後述するよ
うに、識別距離の小さい順に候補音節が出力される。こ
うして出力された識別距離を持つ音節候補の時系列が音
節候補ラティス(b)である。一方、音節区間候補の信
it算出部□9で、発声速度記憶部4に記憶された発声
速度情報を用いて、゛後述する(1)式に従って音節区
間候補の信頼度が算出され、最適な音節区間系列が推定
される。 候補音節列作成部10では、この最適音節区間系列に基
づいて、その他の音節区間系列に対して各音節区間の荷
重係数Kを決定し、後述する(2)式に従って識別距離
と区間検出の信頼度を用いて信頼度Sの小さい順に候補
音節列を順次作成する。 候補音節列は、言語処理部11で辞書13を用いて順次
辞書照合を含む言語処理が行なわれ、解釈が可能な候補
列が認識結果(c)として出力される。そして、話者(
或はオペレータ)によって確定された候補列を構成する
音節区’fJJ系列の各音節長は、発声速度情報として
以前の平均音節長の値とともに発声速度記憶部4に記憶
され、以後の処理に利眉される。 次に、音節区間の決定処理について述べる゛。
<Industrial Application Field> The present invention relates to an improvement of the voice B-weaving method in a voice input device used in a voice input type document processing device or the like. <Prior art> Human speech is decomposed into smaller syllable units (hereinafter referred to as syllable segmentation), and a time series of syllable candidates consisting of a plurality of syllable candidates identified for each syllable ( Conventionally, when the candidate syllable string obtained by combining candidate syllables with high identification accuracy based on the syllable lattice (hereinafter referred to as the syllable lattice) is compared with a dictionary and the obtained candidate is used as the recognition result for the input speech, the syllable lattice used in the above syllable segmentation It is common practice to uniquely determine boundaries. However, with this type of processing, as the input speech becomes more continuous and approaches conversational speech, syllable boundaries become unclear and difficult to detect, making recognition difficult. The inventors focused on this point, and in order to improve the accuracy of syllable boundary detection, the syllable segmentation algorithm does not uniquely determine syllable boundaries, but instead estimates the speech rate. proposed to improve the recognition rate by utilizing this estimated value, but this still causes errors in syllable segmentation. <Problem to be solved by the invention, α> The present invention solves the following problems. , syllable segmentation is still difficult even when using speech rate.
This was done with the aim of solving the problem of error 1) and providing a speech recognition method with a good recognition rate. Means for Solving Problems> In order to achieve the above-mentioned object, the present invention determines the identification accuracy of a candidate syllable string using both the identification distance of each syllable and the reliability based on the utterance rate. It is characterized by the fact that In the method of the present invention, syllable boundaries are not determined uniquely, but two factors are used: syllable identification distance and reliability based on utterance speed, and syllable boundaries are detected as multiple candidates and We perform syllable segmentation that allows the existence of multiple syllable intervals that compete with each other. Then, using a competitive syllable candidate lattice obtained by identifying syllables for each syllable interval candidate, a candidate syllable string is created with priority given to syllable intervals with high reliability of syllable interval detection. <Example> The present invention will be specifically described below with reference to an example shown in the drawings. FIG. 1 is a diagram showing the structure of a voice input device embodying the present invention. The input speech is first subjected to acoustic analysis in the speech analysis section 1, and sent to the silent section detection section to detect silent sections.Furthermore, the phrase boundary detection section 5 detects whether or not it is a silent section as a phrase boundary. It will be done. On the other hand, the voiced section detection section 3 detects a voiced section, that is, a cluster of sounds, and the syllable boundary candidate detection section 6 detects power changes, spectrum changes, phoneme changes, etc., and the utterances stored in the speech rate storage section 4. A result of 0 or more in which syllable boundary candidates are given using velocity information is a speech
- It is integrated in the syllable interval candidate detection unit 7 together with the clear syllable boundary information obtained by the analysis unit 1 to obtain valid syllable interval candidates in order of reliability. These syllable interval candidates are identified by the syllable identification unit 8 through pattern matching with syllable standard patterns that are stored in advance in the pattern memory 12, and as will be described later, candidate syllables are sorted in descending order of identification distance. Output. The time series of syllable candidates having the identification distance thus output is the syllable candidate lattice (b). On the other hand, the syllable interval candidate confidence calculation unit □9 uses the speech rate information stored in the speech rate storage unit 4 to calculate the reliability of the syllable interval candidate according to equation (1) described later. A syllable interval sequence is estimated. The candidate syllable sequence creation unit 10 determines the weighting coefficient K of each syllable interval for other syllable interval series based on this optimal syllable interval series, and calculates the identification distance and the reliability of interval detection according to equation (2) described later. candidate syllable strings are created sequentially in descending order of reliability S. The candidate syllable string is sequentially subjected to language processing including dictionary checking using the dictionary 13 in the language processing unit 11, and an interpretable candidate string is output as a recognition result (c). And the speaker (
The length of each syllable in the syllable block 'fJJ series constituting the candidate string determined by the operator) is stored in the speech rate storage unit 4 as speech rate information together with the previous average syllable length value, and is used for subsequent processing. I get frowned upon. Next, the process of determining syllable intervals will be described.

【1】 音節区間の信頼度の算出 発声速度に基づいた音節区間候補の信頼度は、音節長の
平均値からの偏差で求められる。この為、音声入力装置
を使用する際には、先ず、音節数が既知である標準文章
を使用者が音声入力し、平均音節長りが推定される。平
均音節長りは、音節数nの文章の1番目の有音区間の1
1続時間をL(i)とすると、 1;] で求められる。 そこで、同時間内に複数の音節区間系列に対する信頼度
DIID21・・・・・・*Dx*・・・は次の(1)
式に上って求められる。 但し、X(i):音節区間系列Xの第1番目の音節長X
:音節区間系列Xの音節区間数 L:平均音節長 d(’X (i)、 L ):X (i )のLからの
偏差であり、通常d(X(i)、L)= l X(i)
−L I /Lで与えられる。− DX:音節区間系列Xの信頼度 この(1)式で求められた信頼度の最小の音節区間が最
適音節区間系列ということになる。
[1] Calculation of reliability of syllable interval The reliability of a syllable interval candidate based on the speaking rate is determined by the deviation from the average value of the syllable length. Therefore, when using a voice input device, the user first inputs a standard sentence whose number of syllables is known, and the average syllable length is estimated. The average syllable length is 1 of the first voiced interval of a sentence with n syllables.
If one duration time is L(i), it is calculated as follows. Therefore, the reliability DIID21...*Dx*... for multiple syllable interval sequences within the same time is as follows (1)
It can be found using a formula. However, X(i): the first syllable length X of the syllable interval series X
: Number of syllable intervals L in syllable interval series (i)
−L I /L. - DX: Reliability of syllable interval series X The syllable interval with the minimum reliability determined by this equation (1) is the optimal syllable interval series.

【2】候補音節列の作成 同時間内に複数個の音節区間候補が検出されて競合状態
にある音節候補ラティスは、音節の識別距離と上述した
音節区間の信頼度の両者を用いて評価値Sを求め、この
値の小さい順に候補音節列を作成する。音節の識別距離
は、上述の最適音節区間系列を基準にして次のようにし
て求められる。 今、第2図に示すような音節区間候補が検出され、B−
C系列とD系列とが競合しているとする。 ここで、 La1aj:音節区間候補Aの音節長と第i音節候補の
識別距離 Lb、bj:音節区間候補Bの音節長と第j音節候補の
識別距離 LC+Ck:音節区間候補Cの音節長と第に音節候補の
識別距離 Ld、dl:音節区間候補りの音節長と第1音節候補の
識別距離 L:平均音節長 とすると、競合する系列の信頼度は、 B−C系列の信頼度 Dbc= id(Lb、 L )+d(Lc、 L )
l/ 2D系列の信頼度 Dd=dCLd、L) となり、競合部の評価値は、 5bc=(bi+ck)X Kbc+ DbcScl=
dlXKd+Dd で求められる。但し、Kbj、Kdは、最適音節区間系
列を基準にした識別距離に対する荷重係数で、B−Cが
最適系列の場合(即ち、Dbc<Ddの場合)は、Kb
c”1.Kd=2であり、Dが最適系列の場合(Dd<
Dbc)は、Kd=1−Kbc=1/2である。 以上の結果から、全体の評価値は、 であり、S abc、 S adのうち小さいものを評
価値Sとし、この値の小さい組み合わせ順に候補音節列
が作成される。 以上の手順を「家を」という入力音声を例として具体的
に説明する。 第3因は、「家」と入力された場合の音節境界の状態を
示す図である。即ち、A、B、C,Dの4個の音節区間
候補があり、競合する部分の各候補B。 C,Dの音節長が夫々Lb=12. Lc=27. L
d=39、平均音節長U=23であるとする。音節区間
の各信頼度は、 Bについて:  d(Lb、U)=0.48Cについて
:  d(LcJ:)=0.17Dについて:  d(
Ld、]:)=0.70である。従って、音節区間系列
の各信頼度は、B−C系列: Dbc=(d(Lb、U)+d(Lc、 t、)、/ 
2 =0.32D系列: Dd=d(Ld、’L:)        =0.70
となり、B−CilS列が最適音節区間系列であると推
定される。このB−C系列について信頼度を正規化すれ
ば、 D’bc”O D’d =Q、38 となる。 第4図は、各候補青筋の識別距離を示す音節候補ラティ
スであり、これらの各識別距離と上述の信頼度(正規化
後)とを用い、各候補音節列の評価値を(2)式に従っ
て 5abc=ai+(bj+ck+D’bc)Sacl 
=ai+(dlX 2 +D ’d)によって求め、小
さい値を与える組合せの順に候補列を順次作成して行く
、。 こうして作成された候補音節列に言語処理を施し、認識
結果が別表のように出力される。 この結果によれば、第1候補はABC系列の「いえお」
となっており、正しい結果が出ていることがわかる。 〈発明の効果〉 上述の説明から明らかなように、本発明によれば1、各
音節の識別距離と発声速度の両者を用いて音節セグメン
テーションが行なわれる為、セグメンテーションの誤り
が少なくなり、後の言語処理によって正しく修正される
確率を高くすることができるのである。
[2] Creation of a candidate syllable string A syllable candidate lattice in which multiple syllable interval candidates are detected within the same time and is in a competitive state is evaluated using both the syllable identification distance and the reliability of the syllable interval described above. S is determined, and candidate syllable strings are created in descending order of this value. The syllable identification distance is determined as follows based on the above-mentioned optimal syllable interval series. Now, a syllable interval candidate as shown in Fig. 2 has been detected, and B-
Suppose that the C series and the D series are competing with each other. Here, La1aj: Discrimination distance between the syllable length of syllable interval candidate A and the i-th syllable candidate Lb, bj: Discrimination distance between the syllable length of syllable interval candidate B and the j-th syllable candidate LC+Ck: The syllable length of syllable interval candidate C and the i-th syllable candidate. Assuming that the identification distance Ld, dl of the syllable candidate is the syllable length of the syllable interval candidate and the identification distance L of the first syllable candidate is the average syllable length, the reliability of the competing sequences is as follows: Reliability of the B-C sequence Dbc= id(Lb, L)+d(Lc, L)
l/ 2D sequence reliability Dd=dCLd,L), and the evaluation value of the competitive part is 5bc=(bi+ck)X Kbc+ DbcScl=
It is determined by dlXKd+Dd. However, Kbj and Kd are weighting coefficients for the identification distance based on the optimal syllable interval sequence, and when B-C is the optimal sequence (that is, when Dbc<Dd), Kb
c”1.Kd=2, and when D is the optimal sequence (Dd<
Dbc) is Kd=1-Kbc=1/2. From the above results, the overall evaluation value is: The smaller one of S abc and S ad is set as the evaluation value S, and candidate syllable strings are created in the order of combinations with smaller values. The above procedure will be specifically explained using the input voice "House" as an example. The third factor is a diagram showing the state of syllable boundaries when "house" is input. That is, there are four syllable section candidates A, B, C, and D, and each candidate B is a competing portion. The syllable lengths of C and D are each Lb=12. Lc=27. L
Assume that d=39 and the average syllable length U=23. The reliability of each syllable interval is as follows: For B: d(Lb, U) = 0.48 For C: d(LcJ:) = 0.17 For D: d(
Ld, ]:)=0.70. Therefore, the reliability of each syllable interval sequence is the B-C sequence: Dbc = (d (Lb, U) + d (Lc, t,), /
2 =0.32D series: Dd=d(Ld,'L:) =0.70
Therefore, it is estimated that the B-CilS sequence is the optimal syllable interval sequence. If the reliability is normalized for this B-C sequence, D'bc''O D'd = Q, 38. Figure 4 is a syllable candidate lattice showing the identification distance of each candidate blue line. Using each identification distance and the above-mentioned reliability (after normalization), the evaluation value of each candidate syllable string is calculated according to formula (2) as 5abc=ai+(bj+ck+D'bc)Sacl
=ai+(dlX 2 +D'd), and candidate columns are created one after another in the order of combinations that give the smallest value. The candidate syllable strings created in this way are subjected to language processing, and the recognition results are output as shown in the attached table. According to this result, the first candidate is "Ieo" from the ABC series.
It can be seen that the correct results are obtained. <Effects of the Invention> As is clear from the above description, according to the present invention, 1. Since syllable segmentation is performed using both the identification distance and utterance speed of each syllable, errors in segmentation are reduced, and subsequent This makes it possible to increase the probability of correct correction through language processing.

【図面の簡単な説明】[Brief explanation of drawings]

$1図は、本発明を実施した音声入力装置の構成を示す
ブロック図、 第2図は、同上、音節候補ラティスを示す図、第3図は
、同上、入力音声の音節境界の1例を示す図、 第4図は、同上、具体例の音節候補ラティスを示す図で
ある。 1・・・音声分析部 4・・・発声速度記憶部 7・・・音部区間候補検出部 8・・・音節識別部 9・・・音節区間候補の信頼度算出部 10・・・候補音節列作成部 11・・・言語処理部
Figure 1 is a block diagram showing the configuration of a voice input device implementing the present invention, Figure 2 is a diagram showing a syllable candidate lattice, same as above, and Figure 3 is an example of syllable boundaries of input speech, same as above. FIG. 4 is a diagram showing a specific example of a syllable candidate lattice. 1... Speech analysis section 4... Speech rate storage section 7... Syllable section candidate detection section 8... Syllable identification section 9... Syllable section candidate reliability calculation section 10... Candidate syllable Column creation unit 11...language processing unit

Claims (1)

【特許請求の範囲】[Claims] 入力音声を音節単位に分解し、各音節を識別して複数個
の音節候補の時系列をもとに識別確度の高い順に単語や
文節に相当する候補音節列を逐次作成して単語音声や文
節音声などを認識する音声入力装置において、各音節の
識別距離と発声速度に基づいた信頼度の両者を用いて候
補音節列の識別確度を決定するようにしたことを特徴と
する音声認識方式。
The input speech is broken down into syllables, each syllable is identified, and candidate syllable strings corresponding to words and phrases are sequentially created based on the time series of multiple syllable candidates in descending order of identification accuracy to generate word sounds and phrases. A speech recognition method, in a speech input device for recognizing speech, characterized in that the identification accuracy of a candidate syllable string is determined using both the identification distance of each syllable and the reliability based on the utterance speed.
JP59169569A 1984-08-14 1984-08-14 Voice recognizer Expired - Lifetime JPH067346B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59169569A JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59169569A JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Publications (2)

Publication Number Publication Date
JPS6147999A true JPS6147999A (en) 1986-03-08
JPH067346B2 JPH067346B2 (en) 1994-01-26

Family

ID=15888899

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59169569A Expired - Lifetime JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Country Status (1)

Country Link
JP (1) JPH067346B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63161499A (en) * 1986-12-24 1988-07-05 松下電器産業株式会社 Voice recognition equipment
JP2001100790A (en) * 1999-08-30 2001-04-13 Koninkl Philips Electronics Nv Method and device for speech recognition
JP2008242082A (en) * 2007-03-27 2008-10-09 Konami Digital Entertainment:Kk Speech processing device, speech processing method, and program
JP2011053425A (en) * 2009-09-01 2011-03-17 Nippon Telegr & Teleph Corp <Ntt> Phoneme dividing device, method and program

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63161499A (en) * 1986-12-24 1988-07-05 松下電器産業株式会社 Voice recognition equipment
JP2001100790A (en) * 1999-08-30 2001-04-13 Koninkl Philips Electronics Nv Method and device for speech recognition
JP2008242082A (en) * 2007-03-27 2008-10-09 Konami Digital Entertainment:Kk Speech processing device, speech processing method, and program
JP4563418B2 (en) * 2007-03-27 2010-10-13 株式会社コナミデジタルエンタテインメント Audio processing apparatus, audio processing method, and program
JP2011053425A (en) * 2009-09-01 2011-03-17 Nippon Telegr & Teleph Corp <Ntt> Phoneme dividing device, method and program

Also Published As

Publication number Publication date
JPH067346B2 (en) 1994-01-26

Similar Documents

Publication Publication Date Title
JPH09127972A (en) Vocalization discrimination and verification for recognitionof linked numeral
JPS62217295A (en) Voice recognition system
Ravinder Comparison of hmm and dtw for isolated word recognition system of punjabi language
JPS61219099A (en) Voice recognition equipment
JPH07219579A (en) Speech recognition device
JPS6147999A (en) Voice recognition system
JP2002278579A (en) Voice data retrieving device
JP3104900B2 (en) Voice recognition method
JP2000099084A (en) Voice recognition method and device therefor
JPH08314490A (en) Word spotting type method and device for recognizing voice
JP3031081B2 (en) Voice recognition device
JPS6325366B2 (en)
JP3291073B2 (en) Voice recognition method
JPH049320B2 (en)
Mary et al. Keyword spotting techniques
JPH0554678B2 (en)
Kalinli et al. Continuous speech recognition using attention shift decoding with soft decision.
JP2655637B2 (en) Voice pattern matching method
JPH05303391A (en) Speech recognition device
JPH0455518B2 (en)
JPS62111295A (en) Voice recognition equipment
JPS6155680B2 (en)
JPS60147797A (en) Voice recognition equipment
JP2000137495A (en) Device and method for speech recognition
JPS6346499A (en) Big vocaburary word voice recognition system