JPS6346499A

JPS6346499A - Big vocaburary word voice recognition system

Info

Publication number: JPS6346499A
Application number: JP61191398A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-04-18
Filing date: 1986-08-15
Publication date: 1988-02-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、大詰３単語音声認識方式、より詳細には、大
詰９単語音声認識装置における単語の予備選択方式並び
に大語朶単語音声の分類方式に関する。TECHNICAL FIELD The present invention relates to a three-word speech recognition system, and more particularly to a pre-selection method for words in a nine-word speech recognition device, and a classification method for large-word speech.

従来技術従来、大語受単語音声を認識する方法として、入力音声
中の子音や母音の系列を認識した後に。BACKGROUND TECHNOLOGY Conventionally, as a method for recognizing large word speech, a series of consonants and vowels in input speech is recognized.

子音と母音のラベル系列と大語朶単語のラベル系列との
マツチングを行い、入力音声のラベル系列に最も近いも
のを単語辞書から選択して認識結果とすることがよく行
われているが、この方法は。It is common practice to match the label series of consonants and vowels with the label series of large words, and select the one closest to the label series of the input speech from a word dictionary as the recognition result. How?

入力音声中の音韻のセグメンテーション（切り出し）や
認識が非常に雉しく、認識誤りも生じ易い。Segmentation and recognition of phonemes in input speech is extremely difficult, and recognition errors are likely to occur.

したがって、音韻認識結果を複数候補用意して所謂「音
韻ラティス」と呼ばれる系列と単語辞ＧＦ、とのラベル
マツチングを行うために、認識結果を単一に同定するこ
とが難しいといった欠点があった。Therefore, since multiple phoneme recognition results are prepared and label matching is performed between the so-called "phoneme lattice" and the word GF, there is a drawback that it is difficult to identify a single recognition result. .

目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、大語費単語音声の認識を高速に行うために、予め
入力音声中の音韻の特徴に基づいて大分類を行い認識対
象語を限定する方式、及び、認識対象数の多い犬語朶の
単語音声を認識する際に、認識処理時間の短縮および認
識精度の向上のために、認識対象から候補単語を限定す
る単語の予６ｉｆｆ選択方式に関連して、予め大語檗単
語を分類して効率的な辞書検索を行なう方式を提供する
ことを目的としてなされたものである。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, in order to recognize large word-expense word speech at high speed, we have developed a method that broadly classifies the input speech in advance based on the phonological characteristics to limit the words to be recognized, and a method that limits the recognition target words. When recognizing word sounds, in order to shorten recognition processing time and improve recognition accuracy, Daigobo words are classified in advance in conjunction with a word pre-selection method that limits candidate words from recognition targets. This was done with the purpose of providing a method for efficient dictionary searches.

眉−一一戊本発明は、上記目的を達成するために、音声を入力する
ためのマイクロフォン、音声中の特徴的な時系列を求め
るための特徴分析部、大語位単語音声の認識に先立って
単語の候補を選択する予備選択部７予備選択を行う際に
照合する予備選択用辞書、予備選択部で絞られた候補単
語を認識する認識処理部、認識処理の際に参照するため
の単語標準パターン格納部、認識結果を出力する認識結
果出力端子とから成り、入力音声中の先頭の子音。In order to achieve the above-mentioned object, the present invention provides a microphone for inputting speech, a feature analysis section for obtaining a characteristic time series in speech, and a feature analysis section for obtaining a characteristic time sequence in speech, prior to recognition of large word speech. Preliminary selection section 7 that selects word candidates using the preliminary selection section 7 A preliminary selection dictionary that is checked when performing preliminary selection, a recognition processing section that recognizes the candidate words narrowed down by the preliminary selection section, and words that are referred to during recognition processing. It consists of a standard pattern storage section and a recognition result output terminal that outputs the recognition result, which is the first consonant in the input speech.

母音２語中の子音、母音等の分類又は認識結果に基づい
て順次候補単語を絞り込んで予備選択をすること、及び
、音声を入力するためのマイクロフォン、音声中の特徴
的な時系列を求めるための特徴分析部、大語朶単語音声
の認識に先立って単語の候補を選択する予備選択部、予
備選択を行う際に照合する予備選択用辞書、予備選択部
で絞られた候補２丁−語を認識する認識処理部、認識処
理の際に参照するための単語槽べ０パタ一ン格納部、認
識結果を出力する認識結果出力端子とから成り、単語標
準パターンの分類をグループ化したカテゴリー名と固定
したカテゴリー名との階層的なネットワーク構造に基づ
いて行ない、入力された未知音声を前記２種のカテゴリ
ー名に沿って分類して候補単語名を限定することを特徴
としたものである。To sequentially narrow down and make preliminary selections of candidate words based on the classification or recognition results of consonants, vowels, etc. in two vowel words, a microphone for inputting speech, and obtaining a characteristic time sequence in speech. a feature analysis unit, a preliminary selection unit that selects word candidates prior to recognition of major word sounds, a dictionary for preliminary selection that is checked when performing preliminary selection, and two candidate words narrowed down by the preliminary selection unit. It consists of a recognition processing unit that recognizes word standard patterns, a word tank zero pattern storage unit for reference during recognition processing, and a recognition result output terminal that outputs the recognition results.Category names that group the classifications of word standard patterns. and fixed category names, and the input unknown speech is classified according to the two category names to limit candidate word names.

以下２本発明の実施例に基づいて説明する。The following is a description based on two embodiments of the present invention.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で１図中、１は音声入力用マイクロフォン、
２は音声の特徴抽出部、３は単語の予備選択部、４は予
備選択用辞書格納部、５は認識処理部、６は標準パター
ン（辞書）格納部。FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention. In the figure, 1 is a microphone for audio input;
2 is a speech feature extraction section, 3 is a word preliminary selection section, 4 is a preliminary selection dictionary storage section, 5 is a recognition processing section, and 6 is a standard pattern (dictionary) storage section.

７は認識結果出力部で、マイクロフォン１より入力され
た音声は特徴抽出部２で音声に特有な特徴パラメータの
時系列に変換される。予備選択部３では、この特徴パラ
メータの系列を用いて予め大語党単語辞書から予備選択
用辞書４を作成しておいたものとのマツチングを行って
候補単語を校り込んでおく。認識処理部５では予備選択
部３で絞られた単語について辞書格納部６の単語標傳パ
ターンとのマツチングが行われ、入カバターンに最も近
いパターン名を認識結果として認識結果出力部７に出力
する。Reference numeral 7 denotes a recognition result output unit, in which the voice input through the microphone 1 is converted into a time series of characteristic parameters specific to the voice in a feature extraction unit 2. In the preliminary selection section 3, candidate words are refined by performing matching with a preliminary selection dictionary 4 created in advance from the major word dictionary using this series of feature parameters. In the recognition processing section 5, the words narrowed down by the preliminary selection section 3 are matched with the word standard patterns in the dictionary storage section 6, and the pattern name closest to the input cover pattern is outputted as a recognition result to the recognition result output section 7. .

第２図は、単語標準パターンの予備選択処理について説
明するための電気的ブロック線図で、図中、３１は処理
開始端子、３２は入力音声パターン中の無音区間検出部
、３３は継続時間長検出部。FIG. 2 is an electrical block diagram for explaining the preliminary selection process of word standard patterns. In the figure, 31 is a processing start terminal, 32 is a silent section detection unit in an input speech pattern, and 33 is a duration length. Detection unit.

３４は語頭Ｃｖ分類部（ただし、Ｃは子音、■は母音を
表わす）、３５は語中ｖＣｖ分類部である。34 is a word-initial Cv classification unit (where C represents a consonant and ■ represents a vowel), and 35 is a word-internal vCv classification unit.

無音区間検出部３２では入力音声中の無音区間を検出し
、主に無音の数や位置に基づいて予め分析しておいた標
準パターンとの差異が大きいパターンについては認識対
象から除外する。同様に、継続時間長検出部３３では入
力音声を標準パターンとの長さが大きく異なるもの（通
常は±３０％以上）のものを除外する。なお、無音区間
検出部３２と継続時間長検出部３３の処理と並行して語
頭ＣＶ分類部３４で入力音声中の先頭のＣＶ（又はＶ）
の分類及び語中ｖＣｖ分類部３５で語中のＶＣＶの分類
を順次行い、標準パターンの候補を絞り込んでいく。語
頭Ｃｖ分類部３４での０７分類及び語中ｖＣ■分順部３
５での７６７分ｍにおける母音認識は子音認識に先立っ
て行い、子音の認識率は母音に比べて極めて低いために
大まかな分類を行うのに留める。The silent section detecting unit 32 detects silent sections in the input speech, and excludes patterns that are significantly different from standard patterns that have been analyzed in advance mainly based on the number and position of silences from recognition targets. Similarly, the duration detection unit 33 excludes input voices whose lengths are significantly different from the standard pattern (usually by ±30% or more). In addition, in parallel with the processing of the silent interval detection section 32 and the duration detection section 33, the word-initial CV classification section 34 analyzes the beginning CV (or V) in the input speech.
The classification and intra-word vCv classification unit 35 sequentially classifies the intra-word VCVs and narrows down the standard pattern candidates. 07 classification by the word-initial Cv classification section 34 and the word-initial VC■ minute order section 3
Vowel recognition at 767 minutes m in No. 5 is performed prior to consonant recognition, and since the recognition rate of consonants is extremely low compared to vowels, only rough classification is performed.

第３図は、語頭のＣＶの分類を示した図であり。FIG. 3 is a diagram showing the classification of word-initial CVs.

母音Ｖが／１／の場合である。他の母音についても同様
な分類となる。先頭子音Ｃの分類は帯域通過フィルタ群
出力のうち低域と高域に特徴が現われることに着目し、
子音の継続フレーム長Ｆｃが、低域ではある閾値ＦＬよ
り大きい場合には、７ｍ。This is the case where the vowel V is /1/. Similar classifications apply to other vowels. The classification of the initial consonant C focuses on the fact that characteristics appear in the low and high ranges of the bandpass filter group output.
When the continuous frame length Fc of a consonant is larger than a certain threshold FL in the low range, it is 7 m.

ｒｌ＋　ｂｒ　ｄ２ｇ＋　Ｚｔ　ｙＴ　Ｗｒ　ｒ／と見
做し、高域において別のある閾値ＦＨより小さければ、
／　Ｓ　＋　Ｐ　ｒ　’ｊ　＋　ｋ＋　ｈ／　ｒ大きけ
れば／Ｓ／であると分類を行う。このようにして先頭子
音の分類により、認識対象単語の限定を行う。rl+ br d2g+ Zt yT Wr r/, and if it is smaller than another certain threshold FH in the high range,
/S+Pr'j+k+h/If r is larger, it is classified as /S/. In this way, words to be recognized are limited by classifying the initial consonants.

次に、第４図に語中子音による分類例を示す。Next, FIG. 4 shows an example of classification based on word middle consonants.

ここで語中子音とは入力音声中の先頭のＣＶ音節の次に
来るｖＣｖ音節中の子音を指す。この分類は１図中に示
すように、子音の前に無音区間を生ずるもの／　ｐ　ｒ
　ｔＨｋ　＋　８　＋　）１　／、子音部においてパワ
ーのデイツプを生ずるもの／ｂ＋ｄｒｇ＋ｚ、ｒ／、そ
の他の子音のグループになる。ｖＣＶ音節の両側の母音
は間の子音に先立って認識しておき、先頭の０７分類に
より絞られた単語候補に対してｖＣｖ音節による分類を
行ってさらに単語候補を絞っていくことができる。Here, the word middle consonant refers to the consonant in the vCv syllable that follows the first CV syllable in the input speech. As shown in Figure 1, this classification is those that produce a silent interval before a consonant / p r
tHk + 8 + )1 /, which causes a power dip in the consonant part /b+drg+z, r/, which is a group of other consonants. The vowels on both sides of a vCV syllable are recognized before the consonant between them, and the word candidates narrowed down by the initial 07 classification are classified by vCv syllables to further narrow down the word candidates.

第５図は、入力音声が［千葉」と発声された場合の音韻
分類の様子を示したものであり、図中、先頭のＣｖ音節
中の母音候補は／　ｉ　／と／　ｕ　／が得られている
が、予め語中から切り出しておいた母音槽重パターンと
のマツチングにより／　ｉ　／と決定される。また、先
頭子音は高域パワーの継続時間から／Ｓ＋　Ｐ＋　ｔ＋
　ｋ＋　ｈ／のいずれかが決定され、同様にして、語中
ｖＣｖ音簡の母音や子音も決定される。このようにして
、音韻分類に基づく単語の予備選択法では、音声の入力
と並行して順次候補単語を絞り込んでいけるので、高速
な予備選択処理が可能である。最終段の認識処理部５で
は、絞られた候補単語について標１専パターン６を参照
して単語単位のパターンマツチングを行って最小距離を
持つ単語名を認識結果として出力する。Figure 5 shows the state of phoneme classification when the input speech is uttered as "Chiba". In the figure, the vowel candidates in the first Cv syllable are / i / and / u /. However, it is determined as / i / by matching with the vowel weight pattern cut out from the word in advance. Also, the initial consonant is determined by the duration of the high frequency power /S+ P+ t+
Either k+h/ is determined, and in the same way, the vowels and consonants of the vCv syllable in the word are also determined. In this way, in the word preliminary selection method based on phoneme classification, candidate words can be narrowed down in parallel with the input of speech, so high-speed preliminary selection processing is possible. The recognition processing unit 5 at the final stage performs pattern matching on a word-by-word basis with reference to the standard 1 special pattern 6 for the narrowed down candidate words, and outputs the word name with the minimum distance as a recognition result.

第６図は、第３図及び第４図の子音分類と母音分類に基
づいて大語金単語セットを木構造に分類しておく様子を
表わす図で、図中、Ｇｌａ、　Ｇ、ｂ。FIG. 6 is a diagram showing how the large word gold word set is classified into a tree structure based on the consonant classification and vowel classification in FIGS. 3 and 4. In the diagram, Gla, G, b.

Ｇｌｃは第３図の子音分類グループを、また、Ｇ、ａ。Glc is the consonant classification group in Figure 3, and G, a.

Ｇ２ｂ、　Ｇ２ｃは第４図の子音分類グループを、ａ、
ｉ。G2b and G2c are the consonant classification groups in Figure 4, a,
i.

ｕ、ｅ、Ｏは５母音を表わす。入力音声が第５図で述べ
た「千葉」の場合には第６図のＧｉｂ−）ｉ→ａ２ｂ、
→ａの経路を辿りｒＴＩＢＡＪ　を含む単語グループ４
０を大語党単語中から検索してくることを示している。u, e, and O represent five vowels. If the input voice is "Chiba" mentioned in Figure 5, Gib-)i→a2b in Figure 6,
→ Word group 4 containing rTIBAJ following route a
This indicates that 0 is to be searched from among the major words.

このように単語セントを木構造に分類しておくことによ
り、単語の検索が高速に行なえる。By classifying word cents into a tree structure in this way, word searches can be performed at high speed.

第７図は、第１図に示した標７（Ｑパターン格納部６に
おける単語辞書の分類について説明するための図で、図
中、６１ａ〜６１ｃは第１階層におけるあるカテゴリー
の３グループを示し、６３ａ〜６３ｅは第２階層におけ
る固定されたカテゴリー名、６２ａ〜６２ｃは第３階層
における、前記力テゴリーとは異なるグループ、６４ａ
〜６４ｅは第４階層における固定されたカテゴリー名を
示す。FIG. 7 is a diagram for explaining the classification of the word dictionary in mark 7 (Q pattern storage unit 6) shown in FIG. , 63a to 63e are fixed category names in the second hierarchy, 62a to 62c are groups different from the power category in the third hierarchy, 64a
~64e indicates a fixed category name in the fourth hierarchy.

なお、以下の階層についてもグループ化された階層１り
と固定されたカテゴリーの階層が交互に構成されている
。６５□〜６５ｎは、前記階層構造化されたカテゴリー
のルートを辿って決定される標準パターン名である。従
って、Ｗ１〜Ｗｎの標準パターン名（単語名）に付随し
てカテゴリーの連鎖の情報が記録されている。例えば、
第８図に示すように、未知入力音声が入力された場合、
音声の先頭部分から適当なセグメンテーションを行ない
、Ｇ工。Note that the following hierarchies are also alternately composed of one grouped hierarchy and fixed category hierarchies. 65□ to 65n are standard pattern names determined by tracing the roots of the hierarchically structured categories. Therefore, information on the chain of categories is recorded along with the standard pattern names (word names) W1 to Wn. for example,
As shown in Fig. 8, when unknown input voice is input,
Appropriate segmentation is performed from the beginning of the audio, and G-engineering is performed.

Ｆｌ、　Ｇ２．　Ｆ、と４つのカテゴリーを決定する。Fl, G2. F, and four categories are determined.

グループＧ、は、第９Ｕ′Ａの語頭子音分類でＧｌｂに
　。Group G is classified as Glb in the 9th U'A initial consonant classification.

１′１ハし、固定カテゴリーＦ１は母音の／１／と判定
さ九たものとする。同様に、Ｇ２は第１０図の語中子音
分類のＧ２ｂに、属し、Ｆ２は母音の／　ａ　／と同定
されたものとする。従って、第７図の６５１の単語名に
は、例えば、ｒｔｉｂａ　（千葉）」という地名が入っ
ているわけである。この例では、Ｇ、、Ｇ２は各々語頭
および語中の子音グループを、また、Ｆ□Ｉ　Ｆ２は５
母音を表わしている。また、他の分類方法としては、第
８図の未知入力音声中のカテゴリー分けを各セグメント
毎に認識の信頼度（例えば、音素標準パターンとの照合
距離）を基準にして行なってもよい。即ち、信頼度が高
い場合の音素は固定カテゴリーとし、信頼度が低い場合
の音素はグループカテゴリーとする。そのために、第７
図に示した辞書分類もこの信頼度に暴づいてｒ皆層構造
化しておく必要のあることは勿論である。1'1, and the fixed category F1 is determined to be the vowel /1/. Similarly, it is assumed that G2 belongs to G2b of the middle consonant classification in FIG. 10, and F2 is identified as the vowel / a /. Therefore, the word name 651 in FIG. 7 includes the place name ``rtiba (Chiba)'', for example. In this example, G, , G2 represent the initial and middle consonant groups, and F□I F2 represents the 5
represents a vowel. Alternatively, as another classification method, the unknown input speech shown in FIG. 8 may be categorized for each segment based on the reliability of recognition (for example, the comparison distance with the phoneme standard pattern). That is, phonemes with high reliability are set to a fixed category, and phonemes with low reliability are set to a group category. For that reason, the seventh
It goes without saying that the dictionary classification shown in the figure also needs to be structured into layers based on this reliability.

このようにして、入力音声の先頭部分から順にカテゴリ
ーの分類を行なっていくことにより、大語党単語辞書か
ら高速に候補単語を限定していくことが可能となる。ま
た、入力音声中の比較的認識が確実に出来ろ部分と、逆
に不確実な部分とを別の階層として分類しているので、
候補を限定していく際のりジェクト（正解候補が排除さ
れてしまうこと）が生起しにくくなる効果がある。In this way, by classifying the input speech into categories in order from the beginning, it is possible to quickly limit candidate words from the major word dictionary. In addition, parts of the input audio that can be relatively reliably recognized and parts that are uncertain are classified as separate layers.
This has the effect of making it difficult for cross-rejects (exclusion of correct candidates) to occur when the candidates are limited.

勲果以上の説明から明らかなように、本発明によると、入力
音声中の語頭の子音、母音、語中の子音、母音というよ
うに先頭部分から音韻の分類結果や認識結果にＪｌ（づ
いて順次大語堂単語中から候補単語を絞り込んでゆくの
で、音声の入力と並行して予ａ１ｕ　Ｓ択処理を行うこ
とができ、高速な認識処理が可能となる。また、入力音
声の先頭部分から順にカテゴリーの分類を行なっていく
ことにより。Achievements As is clear from the above explanation, according to the present invention, the classification results and recognition results of phonemes are calculated from the initial part of the input speech, such as the initial consonant, vowel, middle consonant, and vowel. Since candidate words are narrowed down sequentially from the Daigodo words, pre-a1u S selection processing can be performed in parallel with the input of speech, enabling high-speed recognition processing.Also, starting from the beginning of the input speech By classifying the categories in order.

犬語堂単語辞−１１から高速に候補単語を限定していく
ことが可能となる。更に、入力音声中の比較的Ｌ２　ｏ
ａが確実に出来る部分と、逆に不確実な部分とを別の階
層として分類しているので、候補を限定していく際のり
ジェクト（正解候補が排除されてしまうこと）が生起し
にくくなる効果がある。It becomes possible to quickly limit candidate words from the Inugado Word Dictionary-11. Furthermore, relatively L2 o in the input speech
Since the parts where a can be reliably achieved and the parts where it is uncertain are classified as separate layers, cross-rejects (exclusion of correct candidates) are less likely to occur when limiting the candidates. effective.

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第２図は、標準パターンの予（′１ｕ選択
処理を説明するための電気的ブロック線ｊ４、第３図は
、語頭Ｃｖの分類を示す図、第４図は１．＋１１中子音
による分類を示す図、第５図は、音シ（１分バ１の一例
を示す図、第６図は、人語堂単語セットの木構造を示す
図、第７図は、標準パターン格納部における単語辞書の
分類を説明するための図、第８図は、カテゴリーの決定
の仕方を説明するための図、第９図及び第１０図は、グ
ループ分けの例を示す図である。１・・・音声入力用マイクロフォン、２・・・音声の特
徴抽出部、３・・・単語の予備選択部、４・・・予備選
択用辞書格納部、５・・・認識処理部、６・・・標準パ
ターン（辞書）格納部、７・・・認識結果出力部、３１
・・・処理開始端子、３２・・・入力音声パターン中の
無音区間検出部、３３・・・継続時間長検出部、３４・
・・語頭Ｃｖ分分布部３５・・・語中ＶＣ■分類部、４
０・・単語グループ。特許出願人　　　株式会社　リコー代理人　　高　牙　明透ミ゛゛：）′　　ｌ〆 ′？・第　　Ｉ　　図Ｍ　２　区第３図第　４　図第　５　図＃９．ｉｔＪ　　　　　　　　　／ｉん’ｕｌ　　　　
　　　ｔ’ａｉ、、’ｏｌマ＝、＋ンプｘ＋　　−ｎｕ
ｔ　　　−ｃａＡＡａ−Ｌ＋六友　　□　　ＨＩ　　　
　　　　　、ζｌセ青冷彌１ｓ／ＩＦ！／１ｌｌｋ／ｌ
ｈ／　−７ｂＨｄｌ／ｑ／１ｘｌｌｒｉ−第６図第７図FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, FIG. 2 is an electrical block diagram for explaining the standard pattern pre-('1u selection process), is a diagram showing the classification of word-initial Cv, Figure 4 is a diagram showing the classification by 1.+11 middle consonants, Figure 5 is a diagram showing an example of sound si (1 minute bar 1), and Figure 6 is a diagram showing the classification by FIG. 7 is a diagram showing the tree structure of the word set; FIG. 7 is a diagram for explaining the classification of the word dictionary in the standard pattern storage unit; FIG. The figure and FIG. 10 are diagrams showing examples of grouping. 1... Voice input microphone, 2... Voice feature extraction section, 3... Preliminary word selection section, 4... Preliminary selection dictionary storage unit, 5... recognition processing unit, 6... standard pattern (dictionary) storage unit, 7... recognition result output unit, 31
. . . Processing start terminal, 32 . . . Silent interval detection unit in the input audio pattern, 33 . . . Continuation time length detection unit, 34.
... Word initial Cv distribution part 35 ... Word middle VC ■ Classification part, 4
0...Word group. Patent Applicant Ricoh Co., Ltd. Agent Taka Fang Akira Mi゛゛: )′ l〆′?・Figure I M2 Ward Figure 3 Figure 4 Figure 5 Figure #9. itJ /in'ul
t'ai,,'olma=,+umpx+ -nu
t -caAAAa-L+Rokutomo □ HI
,ζlSeisei Reiya 1s/IF! /1llk/l
h/-7bHdl/q/1xllri-Figure 6Figure 7

Claims

[Claims]

(1) A microphone for inputting speech, a feature analysis section for finding characteristic time series in speech, a preliminary selection section for selecting word candidates prior to recognition of large vocabulary word speech, and a preliminary selection A dictionary for preliminary selection to be compared during recognition processing, a recognition processing section for recognizing candidate words narrowed down by the preliminary selection section, a word standard pattern storage section for reference during recognition processing, and a recognition result output terminal for outputting recognition results. and a large vocabulary word speech characterized by having a preliminary selection section that sequentially narrows down candidate words based on classification or recognition results of initial consonants, vowels, middle consonants, vowels, etc. in input speech. Recognition method.

(2) The preliminary selection is performed by detecting silent sections in the input speech and comparing the duration of the input speech pattern and the standard pattern to narrow down candidate words from the standard pattern. A large vocabulary word speech recognition method according to claim (1).

(3) Claims characterized by having a dictionary in which large vocabulary words are grouped into a tree structure based on classifications such as initial consonant classification, initial vowel name, consonant classification in words, vowel name, etc. The large vocabulary word speech recognition method described in paragraph (1).

(4) A microphone for inputting speech, a feature analysis section for finding characteristic time sequences in speech, a preliminary selection section for selecting word candidates prior to recognition of large vocabulary word speech, and a preliminary selection A dictionary for preliminary selection to be compared during recognition processing, a recognition processing section for recognizing candidate words narrowed down by the preliminary selection section, a word standard pattern storage section for reference during recognition processing, and a recognition result output terminal for outputting recognition results. The classification of standard word patterns is performed based on a hierarchical network structure of grouped category names and fixed category names, and the input unknown speech is classified according to the two category names. A large vocabulary word speech recognition method characterized by limiting candidate word names.

(5) Classify categories such as phonemes for each segment of the unknown input speech, and according to the level of reliability of the phoneme classification of each segment, if the reliability is high, each segment is associated with a fixed category name. , the large vocabulary word speech according to claim (4), characterized in that, when reliability is low, classification of word standard patterns is performed based on a hierarchical network structure in correspondence with grouped category names. Recognition method.