JPH067346B2 - Voice recognizer - Google Patents

Voice recognizer

Info

Publication number
JPH067346B2
JPH067346B2 JP59169569A JP16956984A JPH067346B2 JP H067346 B2 JPH067346 B2 JP H067346B2 JP 59169569 A JP59169569 A JP 59169569A JP 16956984 A JP16956984 A JP 16956984A JP H067346 B2 JPH067346 B2 JP H067346B2
Authority
JP
Japan
Prior art keywords
syllable
candidate
section
reliability
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP59169569A
Other languages
Japanese (ja)
Other versions
JPS6147999A (en
Inventor
文雄 外川
伸 神谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Priority to JP59169569A priority Critical patent/JPH067346B2/en
Publication of JPS6147999A publication Critical patent/JPS6147999A/en
Publication of JPH067346B2 publication Critical patent/JPH067346B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】Detailed Description of the Invention

〈産業上の利用分野〉 本発明は、音声入力式の文書処理装置等に用いられる音
声入力装置における音声認識装置の改良に関するもので
ある。 〈従来技術〉 入力音声をより細分化された音節単位に分解(以下、音
節セグメンテーションという)して抽出した音節区間系
列について、各音節毎に識別した複数個の音節候補から
なる音節候補の時系列(以下、音節ラティスという)に
基づいて識別確度の高い組み合わせによって得られる候
補音節列を辞書と照合して得た候補を入力音声に対する
認識結果とする場合、従来は、上記の音節セグメンテー
ションの際の音節境界を一意に決定することが一般に行
なわれている。しかし、このような処理では、入力音声
がより連続的になり会話音声に近くなると、音節境界が
不明確になる為検出が困難となり、認識率が悪くなると
いう問題点がある。 発明者らは、この点に着目し、音節境界の検出精度を上
げる為に、音節セグメンテーションのアルゴリズムによ
って音節境界を一意に決定せず、発声速度を推定し、こ
の推定値を活用して認識率を向上することを提案してい
るが、これによってもなお音節セグメンテーションの誤
りが生じている。 〈発明が解決しようとする問題点〉 本発明は、発声速度を活用してもなお音節セグメンテー
ションの誤りが生ずる点を解決し、認識率の良好な音声
認識装置を提供することを目的としてなされたものであ
る。 〈問題点を解決する為の手段〉 上述の目的を達する為に、本発明は、各音節の識別距離
と発声速度に基づいた信頼度の両者を用いて候補音節列
の識別確度を決定するようにしたことを特徴としてい
る。 〈作用〉 本発明の装置では、音節境界を一意に決定せず、音節の
識別距離と発声速度に基づく信頼度の二つの要素を用い
ており、音節境界を複数の候補として検出し、同時刻に
競合した複数の音節区間の存在を許した音節セグメンテ
ーションを行なう。そして、各音節区間候補について音
節を識別して得られる競合のある音節候補ラティスを用
いて、音節区間検出の信頼度の高い音声区間を優先して
候補音節列が作成される。 〈実施例〉 以下、図面の一実施例について、本発明を具体的に説明
する。 第1図は、本発明を実施した音声入力装置の構成を示す
図である。 入力された音声は、先ず、音声分析部1で音響分析さ
れ、無声区間検出部に送られて無音区間が検出され、更
に文節境界検出部5で文節境界としての無音区間かどう
かの検出が行なわれる。一方、有声区間検出部3では、
有音部、即ち音のかたまりが検出され、音節境界候補検
出部6でパワー変化、スペクトル変化、音韻変化等、及
び発声速度記憶部4に記憶されている発声速度情報を用
いて音節境界候補が与えられる。以上の結果は、音声分
析部1で得られた明らかな音節境界情報とともに音節区
間候補検出部7で統合され、妥当な音節区間候補を信頼
度の高い順に得る。 これらの音節区間候補は、パターンメモリ12に予め記憶
させてある音節標準パターンとのパターンマッチングな
どを音節識別部8で行なって識別され、後述するよう
に、識別距離の小さい順に候補音節が出力される。こう
して出力された識別距離を持つ音節候補の時系列が音節
候補ラティス(b)である。一方、音節区間候補の信頼度
算出部9で、発声速度記憶部4に記憶された発声速度情
報を用いて、後述する(1)式に従って音節区間候補の信
頼度が算出され、最適な音節区間系列が推定される。 候補音節作成部10では、この最適音節区間系列に基づい
て、その他の音節区間系列に対して各音節区間の荷重係
数Kを決定し、後述する(2)式に従って識別距離と区間
検出の信頼度を用いて信頼度Sの小さい順に候補音節列
を順次作成する。候補音節列は、言語処理部11で辞書13
を用いて順次辞書照合を含む言語処理が行なわれ、解釈
が可能な候補列が認識結果(c)として出力される。そし
て、話者(或はオペレータ)によって確定された候補列
を構成する音節区間系列の各音節長は、発声速度情報と
して以前の平均音節長の値とともに発声速度記憶部4に
記憶され、以後の処理に利用される。 次に、音節区間の決定処理について述べる。
<Field of Industrial Application> The present invention relates to an improvement of a voice recognition device in a voice input device used in a voice input type document processing device or the like. <Prior Art> A time series of syllable candidates consisting of a plurality of syllable candidates identified for each syllable of a syllable section sequence extracted by decomposing the input speech into more subdivided syllable units (hereinafter referred to as syllable segmentation). (Hereinafter, referred to as syllable lattice) When a candidate obtained by collating a candidate syllable string obtained by a combination with high identification accuracy against a dictionary is used as a recognition result for an input speech, conventionally, in the case of the above syllable segmentation, It is common practice to uniquely determine syllable boundaries. However, in such a process, when the input voice becomes more continuous and becomes closer to the conversation voice, the syllable boundary becomes unclear, which makes detection difficult, resulting in a poor recognition rate. The inventors have paid attention to this point, and in order to improve the detection accuracy of the syllable boundary, the syllable boundary is not uniquely determined by the algorithm of the syllable segmentation, the vocalization rate is estimated, and the estimated value is utilized to recognize the recognition rate. However, this still results in syllable segmentation errors. <Problems to be Solved by the Invention> The present invention has been made for the purpose of solving the problem that syllable segmentation errors occur even when the utterance speed is utilized, and providing a speech recognition apparatus having a good recognition rate. It is a thing. <Means for Solving Problems> In order to achieve the above-mentioned object, the present invention determines the identification accuracy of a candidate syllable string by using both the identification distance of each syllable and the reliability based on the utterance speed. It is characterized by having done. <Operation> The device of the present invention does not uniquely determine the syllable boundary, but uses two elements of the syllable identification distance and the reliability based on the utterance speed, detects the syllable boundary as a plurality of candidates, and detects the same time. Performs syllable segmentation that allows the existence of multiple syllable intervals that compete with each other. Then, using a syllable candidate lattice with competition obtained by identifying syllables for each syllable section candidate, a candidate syllable string is created by giving priority to a speech section with high reliability in syllable section detection. <Example> Hereinafter, the present invention will be specifically described with reference to an example of the drawings. FIG. 1 is a diagram showing a configuration of a voice input device embodying the present invention. The input voice is first subjected to acoustic analysis by the voice analysis unit 1, sent to the unvoiced section detection unit to detect a silent section, and the bunsetsu boundary detection unit 5 further detects whether or not the section is a silent section. Be done. On the other hand, in the voiced section detection unit 3,
A voiced part, that is, a lump of sound is detected, and a syllable boundary candidate detection unit 6 determines a syllable boundary candidate by using power change, spectrum change, phoneme change, and the utterance speed information stored in the utterance speed storage unit 4. Given. The above results are integrated by the syllable section candidate detection section 7 together with the clear syllable boundary information obtained by the speech analysis section 1 to obtain valid syllable section candidates in the order of high reliability. These syllable section candidates are identified by performing pattern matching with a syllable standard pattern stored in the pattern memory 12 in advance by the syllable identification unit 8, and as will be described later, candidate syllables are output in ascending order of identification distance. It The syllabic candidate lattice (b) is the time series of the syllable candidates having the identification distance thus output. On the other hand, the syllable section candidate reliability calculation unit 9 calculates the reliability of the syllable section candidate according to the expression (1) described later using the utterance speed information stored in the utterance speed storage unit 4, and determines the optimum syllable section. The sequence is estimated. The candidate syllable creating unit 10 determines the weighting factor K of each syllable section with respect to the other syllable section series based on the optimum syllable section series, and the identification distance and the reliability of section detection according to the equation (2) described later. Using, the candidate syllable strings are sequentially created in the ascending order of reliability S. The candidate syllable sequence is stored in the dictionary 13 by the language processing unit 11.
Is used to sequentially perform a language process including dictionary matching, and an interpretable candidate string is output as a recognition result (c). Then, each syllable length of the syllable section sequence that constitutes the candidate sequence determined by the speaker (or the operator) is stored in the utterance speed storage unit 4 together with the previous average syllable length value as utterance speed information. Used for processing. Next, the syllable section determination process will be described.

【1】音節区間の信頼度の算出 発声速度に基づいた音節区間候補の信頼度は、音節長の
平均値からの偏差で求められる。この為、音声入力装置
を使用する際には、先ず、音節数が既知である標準文章
を使用者が音声入力し、平均音節長が推定される。平
均音節長は、音節数nのI個の有音区間から成る文章
のi番目の有音区間の継続時間をL(i)とすると、 で求められる。 そこで、同時間内に複数の音節区間系列に対する信頼度
1,D2,……,Dx,…は次の(1)式によって求められ
る。 但し、X(i):音節区間系列Xの第i番目の音節長 x:音節区間系列Xの音節区間数 :平均音節長 d(X(i),L):X(i)のLからの偏差であり、 通常d(X(i),=|X(i)-|/ Dx:音節区間系列Xの信頼度 で与えられる。 この(1)式で求められた信頼度の最小の音節区間が最適
音節区間系列ということになる。
[1] Calculation of reliability of syllable section The reliability of the syllable section candidate based on the utterance speed is obtained by the deviation from the average value of the syllable length. Therefore, when using the voice input device, first, the user inputs a standard sentence whose number of syllables is known, and the average syllable length is estimated. The average syllable length is L (i), where L (i) is the duration of the i-th voiced section of a sentence consisting of I voiced sections of the number of syllables n. Required by. Therefore, the reliability D 1 , D 2 , ..., Dx, ... With respect to a plurality of syllable section sequences within the same time is obtained by the following equation (1). However, X (i): i-th syllable length of syllable section sequence X x: number of syllable sections of syllable section sequence X: average syllable length d (X (i), L): from L of X (i) It is a deviation and is usually given by d (X (i), = | X (i)-| / Dx: reliability of the syllable interval series X. The syllable interval with the minimum reliability obtained by this equation (1). Is the optimum syllable section sequence.

【2】候補音節列の作成 同時間内に複数個の音節区間候補が検出されて競合状態
にある音節候補ラティスは、音節の識別距離と上述した
音節区間の信頼度の両者を用いて評価値Sを求め、この
値の小さい順に候補音節列を作成する。音節の識別距離
は、上述の最適音節区間系列を基準にして次のようにし
て求められる。 今、第2図に示すような音節区間候補が検出され、B−
C系列とD系列とが競合しているとする。ここで、 La,ai:音節区間候補Aの音節長と第i音節候補の識別
距離 Lb,bj:音節区間候補Bの音節長と第j音節候補の識別
距離 Lc,ck:音節区間候補Cの音節長と第k音節候補の識別
距離 Ld,dl:音節区間候補Dの音節長と第l音節候補の識別
距離 :平均音節長 とすると、競合する系列の信頼度は、 B−C系列の信頼度 Dbc={d(Lb,)+d(Lc,)}/2 D系列の信頼度 Dd=d(Ld,) となり、競合部の評価値は、 Sbc=(bj+ck)×Kbc+Dbc Sd=d|×Kd+Dd で求められる。但し、Kbc,Kdは、最適音節区間系列を
基準にした識別距離に対する荷重係数で、B−Cが最適
系列の場合(即ち、Dbc<Ddの場合)は、Kbc=1,Kd=2
であり、Dが最適系列の場合(Dd<Dbc)は、Kd=1,Kbc=
1/2である。 以上の結果から、全体の評価値は、 であり、Sabc,Sadのうち小さいものを評価値Sとし、こ
の値の小さい組み合わせ順に候補音節列が作成される。 以上の手順を「家を」という入力音声を例として具体的
に説明する。 第3図は、「家」と入力された場合の音節境界の状態を
示す図である。即ち、A,B,C,Dの4個の音節区間候補が
あり、競合する部分の各候補B,C,Dの音節長が夫々Lb=1
2,Lc=27,Ld=39、平均音節長=23であるとする。
音節区間の各信頼度は、 Bについて:d(Lb,)=0.48 Cについて:d(Lc,)=0.17 Dについて:d(Ld,)=0.70 である。従って、音節区間系列の各信頼度は、 B−C系列: Dbc={d(Lb,)+d(Lc,)/2=0.32 D系列: Dd=d(Ld,)=0.70 となり、B−C系列が最適音節区間系列であると推定さ
れる。このB−C系列について信頼度を正規化すれば、 D’bc=0 D’d=0.38 となる。 第4図は、各候補音節の識別距離を示す音節候補ラティ
スであり、これらの各識別距離と上述の信頼度(正規化
後)とを用い、各候補音節列の評価値を(2)式に従って Sabc=ai+(bj+ck+D’bc) Sad=ai+(dl×2+D'd) によって求め、小さい値を与える組合せの順に候補列を
順次作成して行く。 こうして作成された候補音節列に言語処理を施し、認識
結果が別表のように出力される。 この結果によれば、第1候補はABC系列の「いえお」
となっており、正しい結果が出ていることがわかる。 〈発明の効果〉 上述の説明から明らかなように、本発明によれば、各音
節の識別距離と発声速度の両者を用いて音節セグメンテ
ーションが行なわれる為、セグメンテーションの誤りが
少なくなり、後の言語処理によって正しく修正される確
率を高くすることができるのである。
[2] Creation of candidate syllable sequence A syllabic candidate lattice in a competitive state where multiple syllable section candidates are detected within the same time is evaluated using both the syllable identification distance and the reliability of the syllable section described above. S is obtained, and candidate syllable strings are created in ascending order of this value. The syllable identification distance is obtained as follows based on the optimal syllable section sequence described above. Now, a syllable section candidate as shown in FIG. 2 is detected, and B-
It is assumed that the C series and the D series compete with each other. Here, La, ai: syllable length of syllable section candidate A and i-th syllable candidate identification distance Lb, bj: syllable length of syllable section candidate B and j-th syllable candidate identification distance Lc, ck: syllable section candidate C Syllable length and kth syllable candidate identification distance Ld, dl: Syllable length of syllable section candidate D and lth syllable candidate identification distance: Given the average syllable length, the reliability of competing sequences is the confidence of the BC sequence. Degree Dbc = {d (Lb,) + d (Lc,)} / 2 The reliability of the D sequence becomes Dd = d (Ld,), and the evaluation value of the competitive part is Sbc = (bj + ck) × Kbc + Dbc Sd = d | × Kd + Dd However, Kbc and Kd are weighting factors for the discrimination distance based on the optimal syllable section sequence, and when BC is the optimal sequence (that is, when Dbc <Dd), Kbc = 1 and Kd = 2.
And when D is the optimum sequence (Dd <Dbc), Kd = 1, Kbc =
It is 1/2. From the above results, the overall evaluation value is Therefore, the smaller one of Sabc and Sad is used as the evaluation value S, and candidate syllable strings are created in the ascending order of combinations. The above procedure will be specifically described by taking the input voice "house" as an example. FIG. 3 is a diagram showing a state of a syllable boundary when "house" is input. That is, there are four syllable section candidates A, B, C, D, and the syllable lengths of the candidates B, C, D in the competing part are Lb = 1, respectively.
2, Lc = 27, Ld = 39, and average syllable length = 23.
The reliability of each syllable section is as follows: For B: d (Lb,) = 0.48 For C: d (Lc,) = 0.17 For D: d (Ld,) = 0.70. Therefore, the reliability of each syllable section sequence is as follows: BC sequence: Dbc = {d (Lb,) + d (Lc,) / 2 = 0.32 D sequence: Dd = d (Ld,) = 0.70, and BC The sequence is estimated to be the optimal syllable interval sequence. If the reliability of this BC sequence is normalized, D'bc = 0 D'd = 0.38. FIG. 4 is a syllable candidate lattice showing the identification distance of each candidate syllable. Using these identification distances and the above-mentioned reliability (after normalization), the evaluation value of each candidate syllable string is expressed by the formula (2). According to Sabc = ai + (bj + ck + D'bc) Sad = ai + (dl × 2 + D'd), the candidate sequences are sequentially created in the order of the combination giving a smaller value. The candidate syllable string thus created is subjected to language processing, and the recognition result is output as in a separate table. According to this result, the first candidate is the ABC series "No."
It means that the correct result is obtained. <Effects of the Invention> As is apparent from the above description, according to the present invention, since syllable segmentation is performed using both the identification distance of each syllable and the utterance speed, segmentation errors are reduced, and the subsequent language The probability of correct correction by the processing can be increased.

【図面の簡単な説明】[Brief description of drawings]

第1図は、本発明を実施した音声入力装置の構成を示す
ブロック図、 第2図は、同上、音節候補ラティスを示す図、 第3図は、同上、入力音声の音節境界の1例を示す図、 第4図は、同上、具体例の音節候補ラティスを示す図で
ある。 1…音声分析部 4…発声速度記憶部 7…音節区間候補検出部 8…音節識別部 9…音節区間候補の信頼度算出部 10…候補音節列作成部 11…言語処理部
FIG. 1 is a block diagram showing a configuration of a voice input device embodying the present invention, FIG. 2 is a diagram showing a syllable candidate lattice of the same as above, and FIG. 3 is an example of a syllable boundary of an input voice. FIG. 4 is a diagram showing a syllable candidate lattice of a specific example as above. DESCRIPTION OF SYMBOLS 1 ... Voice analysis part 4 ... Speaking speed storage part 7 ... Syllable section candidate detection part 8 ... Syllable identification part 9 ... Syllable section candidate reliability calculation part 10 ... Candidate syllable string creation part 11 ...

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】入力音声を音節単位に分解し、各音節を識
別して複数個の音節候補の時系列をもとに、識別確度の
高い順に単語や文節に相当する候補音節列を逐次作成し
て単語音節や文節音節などを認識する音声入力装置にお
ける音声認識装置おいて、入力された音声を幾通りかに
分けた音節区間候補について、音節標準パターンとのパ
ターンマッチングなどにより識別距離の小さい時系列の
順に候補音節を出力するようにした音節候補ラティスを
もった音節識別部8と、発生速度情報を用いて前記音節
区間候補の信頼度について、最適音節区間系列を推定す
る音節区間候補の信頼度算出文節9と、該最適音節区間
系列に基づいて他の音節区間系列に対して各音節区間の
荷重係数Kを決定し、前記識別距離と区間検出の信頼度
とを用いて信頼度Sの小さい順に候補音節列を順次に作
成する候補音節列作成部10とを有することを特徴とす
る音声認識装置。
1. An input speech is decomposed into syllable units, each syllable is identified, and a candidate syllable string corresponding to a word or a syllable is sequentially created based on a time series of a plurality of syllable candidates in order of high identification accuracy. In a voice recognition device in a voice input device for recognizing word syllables or syllable syllables, a recognizable syllabic segment candidate obtained by dividing the input voice into several ways has a small identification distance by pattern matching with a syllable standard pattern. A syllable identification unit 8 having a syllable candidate lattice that outputs candidate syllables in order of time series, and a reliability of the syllable section candidate using the occurrence speed information, of a syllable section candidate for estimating an optimum syllable section sequence. The reliability calculation clause 9 and the weighting factor K of each syllable section with respect to other syllable section series are determined based on the optimum syllable section series, and the reliability is calculated using the identification distance and the reliability of section detection. Speech recognition apparatus characterized by having a candidate syllable string creation unit 10 sequentially creates the candidate syllable string in ascending order of.
JP59169569A 1984-08-14 1984-08-14 Voice recognizer Expired - Lifetime JPH067346B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59169569A JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59169569A JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Publications (2)

Publication Number Publication Date
JPS6147999A JPS6147999A (en) 1986-03-08
JPH067346B2 true JPH067346B2 (en) 1994-01-26

Family

ID=15888899

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59169569A Expired - Lifetime JPH067346B2 (en) 1984-08-14 1984-08-14 Voice recognizer

Country Status (1)

Country Link
JP (1) JPH067346B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63161499A (en) * 1986-12-24 1988-07-05 松下電器産業株式会社 Voice recognition equipment
DE19941227A1 (en) * 1999-08-30 2001-03-08 Philips Corp Intellectual Pty Method and arrangement for speech recognition
JP4563418B2 (en) * 2007-03-27 2010-10-13 株式会社コナミデジタルエンタテインメント Audio processing apparatus, audio processing method, and program
JP4809918B2 (en) * 2009-09-01 2011-11-09 日本電信電話株式会社 Phoneme division apparatus, method, and program

Also Published As

Publication number Publication date
JPS6147999A (en) 1986-03-08

Similar Documents

Publication Publication Date Title
JP5282737B2 (en) Speech recognition apparatus and speech recognition method
JPS62217295A (en) Voice recognition system
JPH09127972A (en) Vocalization discrimination and verification for recognitionof linked numeral
US20070203700A1 (en) Speech Recognition Apparatus And Speech Recognition Method
JPH067346B2 (en) Voice recognizer
Abdo et al. Semi-automatic segmentation system for syllables extraction from continuous Arabic audio signal
JP2853418B2 (en) Voice recognition method
KR100673834B1 (en) Text-prompted speaker independent verification system and method
JPH0777998A (en) Successive word speech recognition device
JPH08314490A (en) Word spotting type method and device for recognizing voice
JP3277522B2 (en) Voice recognition method
JPS6325366B2 (en)
JP3291073B2 (en) Voice recognition method
JPH05303391A (en) Speech recognition device
JPS6155680B2 (en)
JP3115016B2 (en) Voice recognition method and apparatus
JPH054678B2 (en)
JPH0554678B2 (en)
JPS632100A (en) Voice recognition equipment
JPS59173884A (en) Pattern comparator
JPS6180298A (en) Voice recognition equipment
JPH0646357B2 (en) Continuous speech recognizer
JPS63217399A (en) Voice section detecting system
JPH0632006B2 (en) Voice recognizer
JPS6118758B2 (en)