JPS58159592A

JPS58159592A - Polysyllabic word recognition system

Info

Publication number: JPS58159592A
Application number: JP57024978A
Authority: JP
Inventors: 佐藤　泰雄; 大山　隆之
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-02-18
Filing date: 1982-02-18
Publication date: 1983-09-21

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（ａ）発明の技術分野本発明は音声認識システムに係り、特に複数音節単語と
単音節単語とを認識する上でモード切替等を必要としな
い複数音節単語認識方式に関する。Detailed Description of the Invention (a) Technical Field of the Invention The present invention relates to a speech recognition system, and particularly to a multi-syllable word recognition method that does not require mode switching to recognize multi-syllable words and monosyllabic words. .

（ｂ）技術の背景主として単音節単語を認識する音声認識システノ・に於
゛Ｃ１複数音節単語の認識も行うことが出来れば改行と
か１句点、読点、ｒ、Ｊ、等の特殊記号及びセーブ、プ
リント、等の制御用語の同時入力が可能になり、単語モ
ード、カナモード等モード切替用キーが不要となって音
声認識システムの構成がそれだけ簡易化し、操作性の向
上ともなる。(b) Technical background The speech recognition system mainly recognizes monosyllabic words, but if multi-syllabic words can also be recognized, special symbols such as line breaks, single periods, commas, r, J, etc., and saves, etc. It becomes possible to input control terms such as "print" simultaneously, and keys for switching modes such as word mode and kana mode are no longer required, which simplifies the configuration of the voice recognition system and improves operability.

従って単音節単語を認識すると共に複数音節単語の認識
処理速度の速い音声認識システムの出現が望まれている
。Therefore, it is desired to develop a speech recognition system that can recognize monosyllabic words and has a high recognition processing speed for multisyllabic words.

（ｃ）発明の目的本発明の目的は上記要望に基づきモード切替等を必要と
せず、且つ複数音節単語と単音節単語とを認識し得ると
共に複数音節単語の認識処理速度の速い音声認識システ
ムを提供することにある。(c) Object of the Invention The object of the present invention is to provide a speech recognition system that does not require mode switching, can recognize multi-syllable words and mono-syllabic words, and has a fast recognition processing speed for multi-syllable words. It is about providing.

（ｄ）発明の構成本発明の構成は音声の特徴バラメークを不均一サンプリ
ングして固定長の特徴パターンである縮小パラメータ時
系列を得る手段を設りて登録し、該縮小パラメータ時系
列を用いて同一手段で抽出された未知入力音声の特徴パ
ラメータを照合し複故音節単語か単音節単語かを先ず選
別する。複数音節単語と判定された単語は最も良く伯た
特徴パラメータを持つ単語が第一候補として選出され順
次複数の候補が選出される。次に該複数ｆｆ節単語は候
補選択部により照合候補が絞られてＤ　Ｐ照合される。(d) Configuration of the Invention The configuration of the present invention includes providing and registering means for non-uniformly sampling voice feature variations to obtain a reduced parameter time series which is a feature pattern of a fixed length, and registering the reduced parameter time series using the reduced parameter time series. The feature parameters of the unknown input speech extracted by the same means are compared to determine whether it is a multi-syllable word or a monosyllabic word. For words determined to be multi-syllable words, the word with the best characteristic parameter is selected as the first candidate, and multiple candidates are sequentially selected. Next, the candidate selection unit narrows down the matching candidates for the plural ff clause words and performs DP matching.

又単音節単語と判定された単語は単音節単語認識部によ
りそのままＤＰ照合されるものである。Furthermore, words determined to be monosyllabic words are directly subjected to DP matching by the monosyllabic word recognition unit.

尚縮小パラメータ時系列を得る手段については比較的簡
単なアルゴリズムの下で認識対象単語の内容や大きさに
拘わず効率良く認識対象候補を決定する方式（特願昭５
５−０６２０５９）が提案されており、その中に述べら
れている。簡単に説明すると未知入力単語の音声信号を
分析し、練合）ｊｉ　（８号から抽出された入力特徴パ
ラメータ時系列を多くても１０個以下の区間に分割して
各区間内の特徴パラメータ値を平均する様にしたもので
ある。Regarding the means for obtaining the reduced parameter time series, there is a method that efficiently determines recognition target candidates regardless of the content and size of the recognition target word under a relatively simple algorithm (Japanese Patent Application No. 1983).
5-062059) and is described therein. To briefly explain, the audio signal of an unknown input word is analyzed and refined) ji (The input feature parameter time series extracted from No. 8 is divided into at most 10 sections or less, and the feature parameter values in each section are calculated. It is designed to average.

（ｅ）発明の実施例図は本発明の一実施例を示す回路のブロック図である。(e) Examples of the invention The figure is a block diagram of a circuit showing one embodiment of the present invention.

先ず話者は予め音声を登録する為制御部９の制御により
切替部３を縮小パラメータ時系列抽出部４及び複数音節
単語認識部１３と単音節単語認識部１４に接続し、単音
節単語と特定の複数ａ節単語とを入力より加える。前処
理部１は音声レベル調整及びアナログディジタル変換等
を行いパラメータ抽出部２へ送る。パラメータ抽出部２
はパラメータを抽出して切替部３へ送る。切替部３より
縮小パラメータ時系列抽出部４に入った特徴パラメータ
は不均一サンプリングされ固定長の縮小パラメータ時系
列が抽出され切替部５へ送られる。切替部５に入った単
音節単語用＃Ａｉ！小パラメータ時系列は単音節用縮小
パラメータ時系列格納部６へ、複数音節単語用縮小パラ
メータ時系列は複数音節用縮小パラメータ時系列格納部
７へ格納される。又同時に単音節単語の特徴パラメータ
は単音節単語認識部１４へ、複数音節単語の特徴パラメ
ータは複数音節単語認識部１３に夫々格納される。First, in order to register speech in advance, the speaker connects the switching unit 3 to the reduced parameter time series extraction unit 4, the multi-syllable word recognition unit 13, and the monosyllabic word recognition unit 14 under the control of the control unit 9, and identifies it as a monosyllabic word. Add multiple a clause words from the input. The preprocessing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends the results to the parameter extraction section 2. Parameter extraction part 2
extracts the parameters and sends them to the switching section 3. The feature parameters input from the switching unit 3 to the reduced parameter time series extraction unit 4 are non-uniformly sampled, and a fixed length reduced parameter time series is extracted and sent to the switching unit 5. #Ai for monosyllabic words entered in switching section 5! The small parameter time series is stored in the monosyllable reduced parameter time series storage section 6, and the reduced parameter time series for multi-syllable words is stored in the multi-syllable reduced parameter time series storage section 7. At the same time, the characteristic parameters of monosyllabic words are stored in the monosyllabic word recognition section 14, and the characteristic parameters of multi-syllable words are stored in the multi-syllable word recognition section 13, respectively.

次に音声認識を行なわせるため話者は制御部９の制御に
より切替部３を縮小パラメータ時系列抽出部４と記憶部
ｌＯに接続し、切替部５を照合判定部８に接続する。入
力より入った音声は前記同様に前処理部ｌ、パラメータ
抽出部２．切替部３゜縮小パラメータ時系列抽出部４．
切替部５を経て照合判定部８に入り単音節用縮小パラメ
ータ時系列格納部６及び複数音節用縮小パラメータ時系
列格納部７よりの縮小パラメータ時系列と照合を行・う
。照合判定部８は複数音節単語と判定すると最も良く似
た特徴パラメータを持つ単語を第−Ｍ袖として選出し順
次複数の候補を選出して候補選択ｉｔｓ　ｌ　２−１送
る。候補選択部１２は該候補を１．　２の候補に絞り複
数音節単語認識部１３へ送る。ここでＤＰ照合され認識
結果が制御部９を経゛ζ出力に送出される。１４１音節
単語と判定が出ると選択部１１は記憶部１０に人ってい
る特徴パラメータを単ｆ１″節単語認識部１４へ送り、
単音節単語の認識を行い、制御部９を経て出力へ認識結
果を送出する。Next, in order to perform speech recognition, the speaker connects the switching section 3 to the reduced parameter time series extraction section 4 and the storage section IO, and connects the switching section 5 to the collation determining section 8 under the control of the control section 9. The input audio is processed by the preprocessing section 1, the parameter extraction section 2, and the like described above. Switching unit 3° reduction parameter time series extraction unit 4.
The data passes through the switching unit 5 and enters the matching determination unit 8 where it is compared with the reduced parameter time series from the single syllable reduced parameter time series storage unit 6 and the multi-syllable reduced parameter time series storage unit 7. When the collation determining unit 8 determines that the word is a multi-syllable word, it selects the word with the most similar characteristic parameters as the -Mth sleeve, sequentially selects a plurality of candidates, and sends the candidate selection its l 2-1. The candidate selection unit 12 selects the candidates as 1. The candidates are narrowed down to two and sent to the multi-syllable word recognition unit 13. Here, the DP is collated and the recognition result is sent to the ζ output via the control section 9. When it is determined that the word is a 141-syllable word, the selection unit 11 sends the feature parameters stored in the storage unit 10 to the single f1'' syllable word recognition unit 14.
Monosyllabic words are recognized and the recognition results are sent to the output via the control unit 9.

（ｆ）発明の詳細な説明した如く２本発明は縮小パラメータ時系列を用い
た照合により複数音節単語を先ず分類し、その分類結果
により詳細に照合する複数音節単語の候補を絞りＤＰ照
合で確認するため処理時間の短い、且つモード切替を不
必要とする音声認識システムを提供出来るため１　その
効果は大なるものがある。(f) Detailed explanation of the invention As described above, the present invention first classifies multi-syllabic words by matching using a reduced parameter time series, and then narrows down candidates for multi-syllabic words to be matched in detail based on the classification results and confirms them by DP matching. Therefore, it is possible to provide a speech recognition system that requires short processing time and does not require mode switching.1 The effects are significant.

[Brief explanation of drawings]

図は本発明の一実施例を示す回路のブロック図である。１は前処理部、２はパラメータ抽出部。３．５は切替部、４は縮小パラメータ時系列抽出部、６
は単音節用縮小パラメータ時系列格納部。７は複数音節用縮小パラメータ時系列格納部、８は照合
判定部、９は制御部、１０は記憶部、１１は選択部、１
２は候補選択部、１３は複数音節単語認識部、１４は単
音節単語認識部である。The figure is a block diagram of a circuit showing one embodiment of the present invention. 1 is a preprocessing section, and 2 is a parameter extraction section. 3.5 is a switching unit, 4 is a reduced parameter time series extraction unit, 6
is the reduced parameter time series storage unit for monosyllables. 7 is a reduced parameter time series storage unit for multiple syllables, 8 is a collation determination unit, 9 is a control unit, 10 is a storage unit, 11 is a selection unit, 1
2 is a candidate selection section, 13 is a multi-syllable word recognition section, and 14 is a monosyllabic word recognition section.

Claims

[Claims]

In a speech recognition system that recognizes multi-syllable words and single-f, -clausal words, there is provided a means for non-uniformly sampling speech feature parameters to obtain a reduced parameter time series of fixed length °ζ, and the reduced parameter time series. The multi-syllable words selected by the screening means are recognized after narrowing down and selecting the matching candidates to be recognized using the information at the time of screening. A multi-syllable word recognition method characterized by: