JPS6078495A - Big vocabulary word recognition equipment - Google Patents

Big vocabulary word recognition equipment

Info

Publication number
JPS6078495A
JPS6078495A JP58186481A JP18648183A JPS6078495A JP S6078495 A JPS6078495 A JP S6078495A JP 58186481 A JP58186481 A JP 58186481A JP 18648183 A JP18648183 A JP 18648183A JP S6078495 A JPS6078495 A JP S6078495A
Authority
JP
Japan
Prior art keywords
word
result
center
matching
vowel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58186481A
Other languages
Japanese (ja)
Inventor
三船 義照
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP58186481A priority Critical patent/JPS6078495A/en
Publication of JPS6078495A publication Critical patent/JPS6078495A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、日本1.1)の最小発声単位であるCV音節
もしくはV1CV2i’冒ホを前もって登録しておき、
登録話者が発声した、ll′L語をCV音節単位に認1
11(して、犬語華の単語辞11;、バタンとマツチン
グして犬語蘭の単語認識夕?−fう犬語華単語音声認識
装置に関する。
[Detailed Description of the Invention] Industrial Field of Application The present invention registers in advance CV syllables or V1CV2i' pronunciations, which are the minimum vocal unit of Japan 1.1),
Recognize the ll'L word uttered by the registered speaker in CV syllable units1
This invention relates to a dog language word speech recognition device that performs instant matching to recognize dog language words.

従来例の構成とその問題点 従来の単語音):・認識装置には、簡単なものとしては
、入力音声の′15漱ベクトル系列の単位で、単H5辞
書を登録して一1?き、発声された単語の特徴ベクトル
系列との間でバタンマツチングを行って認識するもので
あったが、このような174成では、大語実の単語を認
識−、l−るための記憶容置が膨大なものとなり、まだ
処理11.7間も艮くかかるために、犬語粟の単語音声
認識装置に適用するためには、実用」二の問題があった
。また大語葉の単語音声認識を行うために、入力音声の
特徴ベクトルのフレーム毎に、前もって登録さ九た音素
パターン(例えば5母音/A//i// u//E/1
0/、子VS////CZ4//p//l//に/、/
b//d//q/、/m//n//r/等)との距離を
割算して音素識別した系列の結果をマージ処理(例えば
連続音素は1音素に代表し、不連続音素は切り捨てする
等の処理)を行って入力音声を音素系列の認識結果とし
た上で、単語辞書と照合して認識結果としていた。この
ような方法は、記憶容量や処理時間の点では、実用に供
するものであったが、音素識別の識別率は、主として音
素の結合による調音結合によって子音の変形が起こるこ
とやセグメンテーションがあい”まい(音素区間が不明
瞭)であるために認識率は著しく低下する原因となって
いた。
Structure of the conventional example and its problemsConventional word sounds): -In a simple recognition device, a simple H5 dictionary is registered in units of '15 Sou vector series of input speech. However, in such a 174-format system, the memory required to recognize words with large actual words is recognized. Since the storage space is enormous and the process still takes 11.7 hours, there are two practical problems in applying it to a dog language word speech recognition device. In addition, in order to perform word speech recognition of large words, nine phoneme patterns (for example, 5 vowels /A//i// u//E/1) are registered in advance for each frame of the feature vector of the input speech.
0/, child VS////CZ4//p//l///,/
b//d//q/, /m//n//r/, etc.) and merge the results of the phoneme identification series (for example, continuous phonemes are represented by one phoneme, discontinuous After processing (such as truncating phonemes) to obtain the recognition result of the phoneme sequence, the input speech was compared with a word dictionary to obtain the recognition result. Although such methods have been practical in terms of memory capacity and processing time, the identification rate of phoneme identification has been affected mainly by the deformation of consonants caused by articulatory combinations caused by phoneme combinations and by segmentation. This was due to the lack of clarity (the phoneme intervals were unclear), which caused the recognition rate to drop significantly.

発明の目的 本発明は、上記従来の問題点を解消し、実時間処理可能
で認識率の高い大語案単語音声認識装置を捉供すること
を1」的とする。
OBJECTS OF THE INVENTION It is an object of the present invention to solve the above-mentioned conventional problems and to provide a speech recognition device for large word drafts that can perform real-time processing and have a high recognition rate.

発明の構成 本発8J′Jは、入力1“fJj・を=Iゲ徴ベクトル
JXt、lと、電力(pt□)の14!i系列に変換し
たものと、各フレーム毎の母音識別結果から、tυ音定
′帛部中心(□■1゜、■2.・・・・・・i■Nと無
1°?区間([:1SS1 、1sE1]〔1SS2,
1sE2〕・・・・・[155M、iSEM]lを;炙
出し、語頭から最終IJ、、R定常部中心+、VNl−
1:での範囲において、’Jfj l/I’間?よび無
1イ区間終了(isEl。
Composition of the Invention The present invention 8J'J converts the input 1 "fJj・ into a 14!i series of =I Ge characteristic vector JXt,l and power (pt□), and the vowel identification results for each frame. , tυ sound constant' center (□■1°, ■2....i■N and no 1°? section ([:1SS1, 1sE1] [1SS2,
1sE2]... [155M, iSEM] l ; From the beginning of the word to the final IJ, R stationary part center +, VNl-
1: Between 'Jfj l/I' in the range? and end of no-1 section (isEl.

1<2<Ml からlil、、 7:7定常部中心1で
は、前もって記憶したCV標11.Cバタンとのバタン
マツチング結果から優先順位順に尤度を付加して複数個
のCv音節を出力する。−」だ無if−区間を含−まな
い1宵接した1」音定常部中心から1ζ月1一定常部中
心の範囲+ iv8〜.VE:(E=S−1−1月でi
i、V1CV2標準バタンとのバタンマツチング結果か
ら優先順位順に尤度を付加して’f5J19.個のC■
2音節を出力する。上記の処理を母音つi−常部中心毎
に、最終母音’ i vN ’壕で行ない、CVi“f
 hi)とCv2音節の系列をつなぎ合わせて最終認識
結果とするが、この場合に出力された複数個のC■音節
とC■2 音節の組り・合せについて、尤度を金言1し
、優先順位順に金言1さ汎た尤度を付加して複数の(C
V音・節とCv2 音節の組み合せ結果からなる)単語
を出力する。そして筒先順位の高い、組み合ぜられた単
語について、例えばBノ′音の系列から大語位の単語辞
書のカテゴリーを限定した上で、単語辞書とのバタンマ
ツチングを行い、単語辞書とのマツチング結果と、組み
合せられた単語の合計された尤度の重み和を刷算し、重
み和の最も高い単語辞書を認識結果とすることで、簡単
な装置構成で実時間処理をゞ鵠゛能とし、さらに高い認
識率の得られる犬語垂q1語音声認識装置の実用化を図
るものである。
1<2<Ml to lil, 7:7 At the center of the stationary region 1, the previously memorized CV mark 11. A plurality of Cv syllables are output by adding likelihoods in order of priority from the result of matching the Cv syllable. - The range from the center of the constant constant part of the 1'' sound that has been in contact for 1 night, not including the interval, to the center of the constant constant part of 1ζ month + iv8~. VE: (E=S-1-i in January
i, 'f5J19.' by adding likelihood in order of priority from the result of matching the button with the V1CV2 standard button. Individual C■
Outputs two syllables. The above process is performed for each vowel i-regular center, and the final vowel 'i vN' is converted to CVi"f
The series of syllables hi) and Cv2 are combined to obtain the final recognition result.In this case, the likelihood is set to 1 and priority is given to the combinations of the multiple C■ syllables and Cv2 syllables that are output. Adding a generalized likelihood to maximal proverb 1 in order of ranking, multiple (C
A word (consisting of the combination of V syllables and Cv2 syllables) is output. Then, for combined words with a high ranking, for example, after limiting the categories of words in the word dictionary of large words from the series of B-sounds, matching with the word dictionary is performed. By printing the matching result and the weighted sum of the summed likelihoods of the combined words, and using the word dictionary with the highest weighted sum as the recognition result, real-time processing is possible with a simple device configuration. The purpose is to put into practical use a dog language speech recognition device that can obtain even higher recognition rates.

実施例の説明 本発明の実施例について以下eこ説1jJJする。第1
図は本発明における人語葉単、fHj5辺認識装置の一
実施例を示している。
DESCRIPTION OF EMBODIMENTS Examples of the present invention will be described below. 1st
The figure shows an embodiment of the fHj five-side recognition device according to the present invention.

図において、100は音響処理部で、この音響処理部i
ooは、A/D変換器1、電力系列変換手段2、特徴系
列変換手段3、入力音声の電力系列によって長い無j’
s’ ?I’検出してj:、、r、、 j3j区間4a
を検出する音声区間検出J−124、音声区間4apC
おいて電力系列pcよって短い;tl!!、高を検出し
て無14区間5aを検出する無音1ス間検出手段5、人
力音声のピーク7b、力を検出ず/、、ピーク電力検出
手段6a(!:特徴ベクトル系列のベクトル毎に1;月
°S−識別を行うI:)音−識別手段6bから在り、ピ
ーク電力の前後のフレームVCおけるf)月g’ ji
iQ別結果の同−fU音中心から、′B)1η一定常部
中心6 C?J−検出する/U音中心検出手段6、人力
音声を!15向ベクトルの形でCV音節7aもしくはv
1cv27’TI質)7bの弔位で記゛臆する標準バタ
ン記憶部T、人力j′1声の特徴ベクトルを一定区聞記
′1石する’4’5r:’に系列記1;iff部8、入
力音声の特徴ベクトル系列に1.・い−C1語頭4aも
しくは無音区ljl終了5aから!l’l液のIU音中
心6 c−1,では、CV音節標準バタンとバタンマツ
チングを行い、無音区間の無い1對接しfc IR1ニ
ア中心6c〜IJ音中心6cまでは、v1C■21°−
節標準バタンとバタンマツチングを行腔距離に基づく尤
度を伺加した優先順位で複数CVもしくiJ:CV2音
節を出力するバタンマツチング手段9からなる。
In the figure, 100 is a sound processing section, and this sound processing section i
oo is a long value due to the power sequence of the A/D converter 1, the power sequence converting means 2, the feature sequence converting means 3, and the input voice.
s'? I' detects j:,, r,, j3j interval 4a
Voice section detection J-124, voice section 4apC that detects
It is shorter than the power series PC; tl! ! , 1 silence interval detection means 5 which detects high and no 14 intervals 5a, peak 7b of human voice, does not detect force/, , peak power detection means 6a (!: 1 for each vector of the feature vector series ; month ° S - I perform the identification:) sound - present from the identification means 6b, f) month g' ji in frames VC before and after the peak power
From the same-fU sound center of iQ-specific results, 'B) 1η constant constant center 6 C? J-Detect/U sound center detection means 6, human voice! CV syllable 7a or v in the form of a 15-way vector
1cv27'TI quality) 7b's funeral position is recorded in the standard slam memory part T, human power j'1 voice feature vector is recorded in a certain interval '1 stone' is recorded in '4'5r:' Series 1; if part 8. Add 1. to the feature vector sequence of the input voice. - From the beginning of C1 word 4a or the end of silent section ljl 5a! At IU sound center 6 c-1 of l'l liquid, CV syllable standard bang and bang matching is performed, and from IR1 near center 6c to IJ sound center 6c, v1C ■ 21° −
It consists of a bang matching means 9 which outputs a plurality of CV or iJ:CV2 syllables in a priority order based on the clausal standard bang and bang matching based on the likelihood based on the line space distance.

単語照合部200は1.バタンマツチング手段9で出力
された母音中心6C毎に尤度付の複数CV音節を順次記
憶する尤度付CV音節系列記憶部10゜前記尤度付CV
音節系列記憶部のCV音節系列の母音毎の組合せについ
て尤度を合計し、優先順位に従って単語を合成し、合計
した尤度から、候補単語を限定して記憶する候補単語記
憶部11、犬語粱の単語を音節数と母音並び等によって
カテゴリーを分類して記憶する犬語粟単語辞書14、候
補単語記憶部11の候補単語について、音節数や母音並
びを検出して大語葉単語辞書14のカテゴリーを選択し
た上で照合を行う単語マツチング手段12、単語マツチ
ング手段12のマツチング結果に基づくスコアと候補単
語記憶部11の候補単語の尤度を重み加算して、得点の
最も高い単語を認識結果とする重み加算手段13からな
る。
The word matching unit 200 performs 1. CV syllable sequence storage unit 10° with likelihood that sequentially stores a plurality of CV syllables with likelihood for each vowel center 6C output by the bang matching means 9; CV with likelihood;
Candidate word storage unit 11 that totals the likelihoods for each vowel combination of the CV syllable series in the syllable series storage unit, synthesizes words according to the priority order, and stores limited candidate words from the total likelihood; The dog word millet word dictionary 14 classifies and stores the words ``粱'' according to the number of syllables, vowel arrangement, etc., and the large word leaf word dictionary 14 detects the number of syllables and vowel arrangement of candidate words in the candidate word storage section 11. A word matching means 12 performs matching after selecting a category, and the score based on the matching result of the word matching means 12 and the likelihood of the candidate word in the candidate word storage section 11 are weighted and added, and the word with the highest score is recognized. It consists of a weight adding means 13 for adding results.

16は、音響処理部100の音声区間検出手段4、無音
区間検出手段6、母音中心検出手段6、特徴系列記憶部
8、標準バタン記憶部7およびバタンマツチング手段9
を総合的に制御し、入力音声の母音中心毎6cに、尤度
を伺加した複数のCV音節を出力し、−i/;這1′L
語照合部の候補単語記憶部11、単語マツチング手段1
2、重み加算手段13および大語葉単Wt、! i’:
r )!i’ 14を総合的に制御し、音響処理部で出
力さ1+、ん、母音毎の尤度を付加した複数のCV音節
系列1oに対してCV音節の組合せから候補単語を作成
し、大語葉単語辞lとマツチングを行い、マツチング結
果と候補単語の尤にの重み和から単語識別を行う総合制
御手段である。
Reference numerals 16 refer to a voice section detection means 4, a silent section detection means 6, a vowel center detection means 6, a feature series storage section 8, a standard bang storage section 7, and a bang matching means 9 of the acoustic processing section 100.
It comprehensively controls and outputs multiple CV syllables with added likelihood for each vowel center 6c of the input voice, -i/;
Candidate word storage section 11 of word matching section, word matching means 1
2. Weight addition means 13 and large word unit Wt,! i':
r)! By comprehensively controlling i' 14, candidate words are created from combinations of CV syllables for a plurality of CV syllable sequences 1o, which are outputted by the acoustic processing unit and added with likelihoods for each vowel. It is a comprehensive control means that performs matching with a leaf word dictionary l and performs word identification from the matching result and the sum of likelihood weights of candidate words.

本実施例の動竹原Jllを以下に説明する。The motion Takehara Jll of this example will be explained below.

おり、音声区間の語頭4aと語尾4aは、音声区間検出
手段4により−C検出される。
The beginning 4a and the ending 4a of the voice section are detected by the voice section detecting means 4 by -C.

母音中心検出手段6は、入力音声16に対して語頭4a
から順次、ピーク電力検出手段6aによって一定の閾植
以」二となる電力系列が一定フレーム長以上連続しさら
に母音識別手段6bによって識別された母音が同−挿類
で一定フレーム長以上連続する場合に母音中心(〔、■
1.i■2.・−・・、、V5]16Cとして検出する
。そして母音中心6Cが検出される毎に、その1つ前の
母音中心との間に無音区間が存在するか否かを、無音区
間検出手段6によって検出して、無音区間開始、終了[
1SS1゜1sE1]5aとして検出する。
The vowel center detecting means 6 detects the beginning of the word 4a for the input speech 16.
If the power sequence that is set after a certain threshold by the peak power detection means 6a continues for a certain frame length or more, and furthermore, the vowels identified by the vowel identification means 6b continue for a certain frame length or more with the same insertion. vowel center (〔,■
1. i■2. ...,,V5] is detected as 16C. Each time a vowel center 6C is detected, the silent interval detecting means 6 detects whether a silent interval exists between it and the previous vowel center, and the silent interval start, end [
1SS1°1sE1]5a.

寸だ語頭もしくは無音区間終了から母音中心1.v、 
:寸での区間に対しては、CV41準パターン7aとバ
タンマツチング手段9によってマツチングを行い、CV
音節を尤度を付加して優先順位に複? (lift出力
9aし、無音区間の存在しない隣接する母音中心間しこ
対しては、■1C■2標垢バタン7bとバタンマツチン
グ手段9Vこよってマツチングを行い同様にCV2音節
を尤度を付加して優先順位に複数個出力9aする。入力
音声/尼崎(アマガザキ)/に対する音響処理部の処理
結果例を第3図に示−J′。
From the beginning of the word or the end of the silent section to the vowel center 1. v,
: For the section with 100 cm, matching is performed using the CV41 semi-pattern 7a and the matching means 9, and the CV
Add likelihood to syllables and prioritize them? (The lift output 9a is used, and for adjacent vowel centers where there is no silent interval, matching is performed using the ■1C■2 mark slam 7b and the bang matching means 9V, and CV2 syllables are similarly added to the likelihood. An example of the processing result of the audio processing section for the input voice /Amagasaki (Amagasaki)/ is shown in FIG. 3-J'.

次に単語照合部200の動作原理を第4図〜第5図を用
いて以下に説明する。
Next, the principle of operation of the word matching section 200 will be explained below using FIGS. 4 and 5.

音響処理部100の処理結果が第3図に示したようなも
のであるとすると、CV音節の組合せと組合せに対応し
た尤度の金側+Si+から、候補単語11は、第4図6
・こ示したように決定される。候補単語が決定されると
候補単語の条件(例えば単ノチング手段12によってス
コア(SWD、暑 とl ] して用筆する。イして前記スコア(SW、、、) と1
] 腹の金側(S□;の申み加算(swD、+ks、)の1
] 最犬植を力える単ii、′t 1’i”4.1!候補を
認識単語WDとするO 第5図にCJコ、第3図で示し/こ音響処理からイIJ
られる候?+Ii 単Wn カ、W 1−/A/MA/
 G A / b A / H1/。
Assuming that the processing result of the acoustic processing unit 100 is as shown in FIG.
・Determined as shown. Once a candidate word is determined, the condition of the candidate word (for example, the score (SWD, heat and l) is determined by the single notating means 12, and then the score (SW,...) and 1 are written.
] 1 of the money side of the belly (S□;'s offer addition (swD, +ks,)
] The most effective single ii, 't 1'i'' 4.1! Let the candidate be the recognized word WD.
Are you going to get caught? +Ii Single Wn Ka, W 1-/A/MA/
G A/b A/H1/.

W2=/A/MA/ZA/SA/Hi/、W3=/A/
MA/ G A / S A / K i /・・・・
・となシ、各々の尤度の合劃が811”’2153・・
山となることが示されており、このことから大詰’、1
1”+1単語辞」14の中で5音節単語で、切砕並びが
/A/A/A/A/i/である、尼崎(/ A 7M 
A / G A / S A / K i / )と1
廉い(/ A / TA/T A / K A / i
 / )が単語辞書候補として選択されたことを示して
おり、候補単語とのスコアs’wD、 。
W2=/A/MA/ZA/SA/Hi/, W3=/A/
MA/GA/SA/Ki/・・・・
・The sum of the respective likelihoods is 811”'2153...
It is shown that it will become a mountain, and from this, Otsume', 1
Amagasaki (/ A 7M
A / G A / S A / K i / ) and 1
Inexpensive (/A/TA/TA/KA/i
/ ) indicates that it has been selected as a word dictionary candidate, and the score s'wD, with respect to the candidate word.

3 と尤度の合計(S、)の重み加算の最大値が、尼崎であ
る事を示している。
3 and the sum of the likelihood (S,) indicates that the maximum value of the weighted addition is Amagasaki.

発明の効果 本発明は、入力音声の母音定常部中心毎に、前もって登
録したCV音節もしくはv1Cv2音節標準バタンとマ
ツチングを行い、認識結果をマツチング距離に基づく尤
度を付加した優先順位で複数のCVもしくはCV2 音
節で出力し、最終の母音定常部中心を検出した時点で、
母音中心毎の認識結果の組み合せによる候補単語を各々
の尤度の合計を付加した優先順位で複数個出力する。そ
して前記候補単語の音節数や母音並びから犬語っ看単語
辞書のカテゴリーを限定し、前記候補単語との照合結果
のスコアと前記候補単語の尤度の重み加算の最大値を単
語の認識結果とすることによって、簡単な装置構成で実
時間処理を可能とし、さらに高い認識率の得られる犬語
粱単語音声認識装置の実用化を図ることが可能となる。
Effects of the Invention The present invention performs matching for each vowel stationary part center of input speech with a CV syllable or v1Cv2 syllable standard bang registered in advance, and combines the recognition results into multiple CVs in a priority order with a likelihood based on the matching distance. Or output as CV2 syllable, and when the center of the final vowel stationary part is detected,
A plurality of candidate words based on combinations of recognition results for each vowel center are output in priority order based on the sum of their respective likelihoods. Then, the categories of the Inu-Kankan word dictionary are limited based on the number of syllables and vowel arrangement of the candidate word, and the maximum value of the score of the matching result with the candidate word and the weighted addition of the likelihood of the candidate word is used as the word recognition result. By doing so, it becomes possible to realize real-time processing with a simple device configuration and to put into practical use a dog language word speech recognition device that can obtain a higher recognition rate.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明に」・・ける一実施例の大語蕾単語音声
認識装置のプ目ツク図、第2図は同実施例において、入
力音′Jニー・が都市名「尼崎(アマガサキ)」と発声
された場合の音響処理部の動作説明図、第3図は音響処
理部の最終出力である、母音中心毎の複数個尤度+J”
 ■i?節認識結果系列を示す図、第4図は第1図に示
しだ実施例に=t′3−1いて、単語照合処理部の候補
単品の作成、大飴粱単語のカテゴリー限定方法、単l+
11照合結果と候補単語の尤度の重み和から単1.l 
F 1fjR識j−る方法を示す図、第6図は第4図に
示した単1’M’+照合処理部において、第3図に示し
た候補j1′1語、大語葉単語辞書の候補単語が「尼崎
」と1−1!ガい」の場合の単語認識結果を示す図であ
る。 カ 2・・・・・・電全系列変換手段、3・・・・・・特徴
系列変換手段、4・・・・・音声区間検出手段、5・・
・・・無音区間検出手段、6・・・・・I;月′?〜中
心検出手段、7・・・・・・標準バタン記、憶部、8・
・・・・・特徴系列記憶部、9・・・・・・バタンマツ
チング手段、10・・・・・・尤度性CV音節系列記1
.(ユ部、11・・・・・候補ili語記憶部、12・
・・・・・単1;t1マツチング手段、13・・・・・
・重み加算手段、14・・・大語葉単詔1’l・(11
:、15・・・・・・総合制御手段、10Q−・・音響
処理)11S、200・・・・・・単、?1%重合部。 代理人の氏名 Jfl’l! I−中 尾 敏 男 ほ
か1名手続補正置部式) 昭和59年2 月2シI」 1ポ件の表示 昭和58イ1 ′11 許 助゛l 第 186481
 月2発明の名称 人語用11’!、 、T72.71.ノI・認識装置3
補正をするに′ ”Ili’l酔)1ポ、H+、 4LI 許 出 L1
+11t 人任 ノ:ii 人1υi 111門真市大
′j・閂頁1006番地名 イ゛1・ (582)松ド
電富;産業株式会?I−イー−1、1111下 俊 彦 4代理人 〒、’+71 住 ノヅ「 人l:I< Ilj閂貞山人′j・閂↓°
+H10o6itシ地1こ・≧1・電:ソ:;産業+4
、式会11内図面第3図イL凍伺図而のとおり補正いた
し寸ず。・仕 リ
Fig. 1 is a schematic diagram of a large word speech recognition device according to an embodiment of the present invention; )” is uttered, and Figure 3 is the final output of the acoustic processing unit, which is the multiple likelihood + J” for each vowel center.
■i? FIG. 4 is a diagram showing the clause recognition result series, which is based on the embodiment shown in FIG.
11 From the weighted sum of the matching result and the likelihood of the candidate word, single 1. l
Fig. 6 is a diagram showing a method for identifying F 1fjR. The candidate word is “Amagasaki” 1-1! FIG. 3 is a diagram showing word recognition results for "gai". F2... Electrical series conversion means, 3... Characteristic series conversion means, 4... Voice section detection means, 5...
...Silent section detection means, 6...I;month'? ~Center detection means, 7...Standard bang memory, memory section, 8.
...Feature series storage unit, 9...Bang matching means, 10...Likelihood CV syllable series record 1
.. (Yu section, 11...Candidate ili word storage section, 12.
...Single; t1 matching means, 13...
・Weight addition means, 14...Large word single edict 1'l・(11
:, 15... General control means, 10Q-... Sound processing) 11S, 200... Single, ? 1% polymerization part. Agent's name Jfl'l! I - Toshio Nakao and 1 other person (procedural amendment ceremony) February 2, 1980 1 Display of 1 item Showa 58-1 '11 License No. 186481
Month 2 invention name for human language 11'! , ,T72.71. No I/Recognition device 3
To make corrections, 1 point, H+, 4LI allowed L1
+11t Personnel No: ii Person 1υi 111 Kadoma City Dai'j, Barpage 1006 Address name I゛1. (582) Matsudo Dentomi; Sangyo Co., Ltd.? I-E-1, 1111 Lower Toshihiko 4 agent 〒,'+71 Nozu ``人l:I< IljBar Teizanjin'j・Bar↓°
+H10o6it ground 1・≧1・electronic: so:; industry +4
, I have corrected the dimensions as shown in Figure 3 of the drawing in Ceremony 11.・Work Ri

Claims (1)

【特許請求の範囲】[Claims] 入力音声を特徴ベクトル(Xtil の系列に変換する
特徴系列変換手段と、入力音声を電力値1Pti1の系
列に変換する電力値系列変換手段と、電力値系列(Pt
、lのレベル判定と特徴ベクトル(Xtよ)の句音識別
結果から旬・音定常部中心を検出する母音定常部中心検
出手段とよりなり、前記W計定常中心検出手段で検出さ
t″した隣接するW″音一定常部中心の間で、Cv2(
C:子音、■:母音)音節もしくは、v1cv2音節標
’lfAバタンとバタンマツチングを行い、母音定常部
中心毎に、Cv2 音節のRg nfk結果を、バタン
マツチング結果による距離に基づき、尤度を供う優先順
位で複数個出力し、単語としての認識結果は、各母音定
常部中心毎のC■2 音節認識結果における尤度をすべ
ての組合せについて合計し、合計した尤度に基づく優先
順位で、組合せに対応した単語を複数個出力し、前記複
数個の単語と単語辞書とのマツチングを行う際に、合計
した尤度をマソチノダ結果に重みイτ1加算して認識結
果とすることイI−,T、7C1′I1.とする大語華
単語音声認識装置。
Feature sequence converting means converts the input audio into a sequence of feature vectors (Xtil); power value sequence converting means converts the input audio into a sequence of power values 1Pti1;
, l, and a vowel stationary part center detecting means for detecting the center of the seasonal/sound stationary part from the result of the level determination of the characteristic vector (Xt yo) and the phrasal sound identification result of the feature vector (Xt yo). Between adjacent W″ sound constant center centers, Cv2(
C: consonant, ■: vowel) syllable or v1cv2 syllable mark 'lfA bang and bang matching is performed, and for each center of the vowel stationary part, the Rg nfk result of the Cv2 syllable is calculated based on the distance based on the bang matching result. The recognition result as a word is the C■2 syllable recognition result for each vowel stationary part center, and the likelihoods are summed for all combinations, and the priority order is based on the summed likelihood. Then, when outputting a plurality of words corresponding to the combination and matching the plurality of words with the word dictionary, the total likelihood is added with a weight of τ1 to the Masochinoda result to obtain the recognition result. -, T, 7C1′I1. A speech recognition device for Chinese and Chinese words.
JP58186481A 1983-10-05 1983-10-05 Big vocabulary word recognition equipment Pending JPS6078495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58186481A JPS6078495A (en) 1983-10-05 1983-10-05 Big vocabulary word recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58186481A JPS6078495A (en) 1983-10-05 1983-10-05 Big vocabulary word recognition equipment

Publications (1)

Publication Number Publication Date
JPS6078495A true JPS6078495A (en) 1985-05-04

Family

ID=16189236

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58186481A Pending JPS6078495A (en) 1983-10-05 1983-10-05 Big vocabulary word recognition equipment

Country Status (1)

Country Link
JP (1) JPS6078495A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56111897A (en) * 1980-02-08 1981-09-03 Fujitsu Ltd Ward unit voice recognizing system utilizing monoosylable voice recogniton
JPS5855995A (en) * 1981-09-29 1983-04-02 富士通株式会社 Voice recognition system
JPS5872995A (en) * 1981-10-28 1983-05-02 電子計算機基本技術研究組合 Word voice recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56111897A (en) * 1980-02-08 1981-09-03 Fujitsu Ltd Ward unit voice recognizing system utilizing monoosylable voice recogniton
JPS5855995A (en) * 1981-09-29 1983-04-02 富士通株式会社 Voice recognition system
JPS5872995A (en) * 1981-10-28 1983-05-02 電子計算機基本技術研究組合 Word voice recognition

Similar Documents

Publication Publication Date Title
Wickelgren Distinctive features and errors in short‐term memory for English vowels
JP4867804B2 (en) Voice recognition apparatus and conference system
Li et al. Recognizing emotions in speech using short-term and long-term features
US11935523B2 (en) Detection of correctness of pronunciation
Deterding et al. Brunei Malay
Mousikou et al. Masked primes activate feature representations in reading aloud.
JPS6078495A (en) Big vocabulary word recognition equipment
KR102274766B1 (en) Pronunciation prediction and evaluation system for beginner foreign language learners
Setter A comparison of speech rhythm in British and Hong Kong English
Rubin The subjective estimation of relative syllable frequency
Gunnar Phonetic and phonemic basis for the transcription of Swedish word material
JP2001282098A (en) Foreign language learning device, foreign language learning method and medium
JP3240691B2 (en) Voice recognition method
Aasmäe et al. How free is alternating stress in Erzya
JPS6193484A (en) Pronunciation practicing apparatus
MEISTER et al. EESTI LASTE KÕNE II: Vokaalide akustiline analüüs.
Olson Reanalyzing the Banda-Linda vowel system
Burstein et al. A review of computer-based speech technology for TOEFL 2000
JP7195593B2 (en) Language learning devices and language learning programs
Tedlock On the Representation of Discourse in Discourse
JPS5872996A (en) Word voice recognition
Warren Perceptual bases for the evolution of speech
Rayxona et al. COMPARATIVE ANALYSIS BETWEEN ENGLISH AND UZBEK CONSONANTS
Djawa et al. SUPRASEGMENTAL PHONEMICS IN ANAKALANG LANGUAGE IN CENTRAL SUMBA DISTRICT
Marchewka et al. Implications of speech recognition technology