JPS5857195A

JPS5857195A - Voice recognition system

Info

Publication number: JPS5857195A
Application number: JP56155654A
Authority: JP
Inventors: 一成畑中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1983-04-05

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、音声認識方式、特に標準辞書を用いて複数個
の単音節および／または単語（本明細書においては両者
を綜合して音声という）を夫々優先順位を附、して候補
音声として抽出すると共Ｋ。DETAILED DESCRIPTION OF THE INVENTION The present invention uses a speech recognition method, particularly a standard dictionary, to assign priorities to a plurality of single syllables and/or words (hereinafter collectively referred to as speech). , and extract it as a candidate voice.

複数個の音声候補系列を識別子χ附して候補系列辞書に
格納しておき、上記抽出された結果の正当性を上記識別
子によって判定するようにした音声認識方式に関するも
のである。The present invention relates to a speech recognition method in which a plurality of speech candidate sequences are stored in a candidate sequence dictionary with identifiers χ attached, and the validity of the extracted results is determined based on the identifiers.

従来から単音節および／または単語についての音声ｇ識
装置においては、標準辞書ｔそなえておくと共に、未知
入力音声について特徴量を抽出し、誼抽出された特徴量
と上記標準辞書から読出された標準時微量とを照合して
、上記未知入力音声が属するカテゴリを決定するように
している。しかし、上記従来の構成の場合には、認識基
を向上しようとすると、時として本来あるべきでない極
端な形の認識結果が得られることが生じる。Conventionally, in speech recognition devices for monosyllables and/or words, a standard dictionary is provided, feature quantities are extracted for unknown input speech, and the extracted feature quantities and the standard time read from the standard dictionary are used. The category to which the unknown input voice belongs is determined by comparing it with a trace amount. However, in the case of the above-mentioned conventional configuration, when trying to improve the recognition group, an extreme recognition result that should not be originally obtained may sometimes be obtained.

本発明は、このような問題点を解決することＹ目的とし
ており、認識結果の正当性Ｙ自らチェックできるように
することを目的としている。そしてそのため、本発明の
音声ｉ！職方式は、入力音声信号の周波数分析結果にも
とづいて尚該入力音声信、号の特徴量を抽出する特徴抽
出回路部、標準音声に対応した標準時微量を格納してな
る標準辞書、Ｒよび上記特徴抽出回路部によって得られ
た特徴量と上記標準辞書から読出された標準％微量とを
照合する照合回路部ン有し、上記入力音声信号に対応し
た音声を認識する音声綾織装置において、上記照合回路
部Ｋｇいて複数個の候補音声を優先順位を附与して抽出
し当該抽出された音声候補系列を出力し得るよう構成さ
れると共に、複数種類の音声候補系列を格納しかつ該各
系列対応に職別子’ｔ’ｌｌ＃４　した候補系列辞書、
および上記照合回路から出力された音声候補系列と上記
候補系列辞書から読出された音声候補系列との比較を行
なう比較回路部をもうけ、該比較回路部において一致の
とれた音声候補系列に附与されている上記識別子にもと
づいて上記照合回路部から出力された音声候補系列中の
最優先順位をもつ音声Ｚ認識結果として採択するか否か
を判定するようにしたことを特徴としている。以下図面
を参照しつつ説明する。The purpose of the present invention is to solve such problems, and it is an object of the present invention to enable users to check the validity of recognition results themselves. And for that reason, the audio i! of the present invention! The system includes a feature extraction circuit section that extracts the feature amount of the input audio signal based on the frequency analysis result of the input audio signal, a standard dictionary storing the standard time trace amount corresponding to the standard speech, R, and the above. In the voice twill weaving device which recognizes the voice corresponding to the input voice signal, the voice twill weaving device includes a collation circuit unit that collates the feature amount obtained by the feature extraction circuit unit and the standard percentage trace amount read from the standard dictionary. The circuit unit Kg is configured to be able to extract a plurality of candidate voices by assigning priorities to them and output the extracted voice candidate sequences, and also stores a plurality of types of voice candidate sequences and supports each of the candidate voices. Candidate series dictionary with job name 't'll #4,
and a comparison circuit section that compares the speech candidate series outputted from the matching circuit with the speech candidate series read from the candidate series dictionary, and in the comparison circuit section, the speech candidate series that is matched is added to the speech candidate series. The present invention is characterized in that it is determined whether or not to adopt the speech Z recognition result having the highest priority among the speech candidate series outputted from the collation circuit unit based on the identifier that is present. This will be explained below with reference to the drawings.

図は本発明の一実施例構成を示す。図中の符号１は特徴
抽出回路部、２は標準辞書であって例えば数字ｒｏＪ　
、　ｒＩＪ　、ｒ２Ｊ　＊・・・「９」を発音した場合
についての標準時微量がカテゴリ名と一緒に格納されて
いるもの、３は照合回路部、４は本発明においてもうゆ
られる候補系列辞書、５は本発明においてもうけられる
比較回路部χ表わしている。The figure shows the configuration of an embodiment of the present invention. In the figure, numeral 1 is a feature extraction circuit, and 2 is a standard dictionary, for example, the number roJ.
, rIJ, r2J *... The standard time trace amount for the case of pronouncing "9" is stored together with the category name, 3 is a matching circuit section, 4 is a dictionary of candidate series that can be changed in the present invention, 5 represents the comparator circuit section χ provided in the present invention.

候補′系列辞書４には、Ｉ！識カテゴリ内の例えば単語
ｒＯＪ　ｔ　ｒｌＪ　＠　ｒ２Ｊ　ｔ・・・「９」につ
いてｍ個の組合わせよりなる候補系列「０−４−５Ｊ　
、　「ｏ−１−２Ｊ　。Candidate' series dictionary 4 contains I! For example, for the word rOJ t rlJ @ r2J t... "9" in the recognition category, a candidate series "0-4-5J" consisting of m combinations is created.
, “o-1-2J.

・・・・・・が格納されており、各候補系列毎に例えば
統計的な処理にもとづいて得られた識別子ｉＤが附与さ
れている。腋識−子ＩＤＫ示す識別子ｒＦＪは対応する
候補系列が得られている場合に当鋏候補系列中の最優先
順位にある単語が正答であるとしてよいことを意味し、
また識別子ｒＲＪは尚該候補系列が得られた場合には前
段の照合回路部３による照合結果を破棄して「リジェク
ト」を解答すべきことを意味している。. . . are stored, and an identifier iD obtained based on, for example, statistical processing is assigned to each candidate sequence. The identifier rFJ indicating the armpit IDK means that if the corresponding candidate series is obtained, the word with the highest priority in the current scissors candidate series may be considered as the correct answer,
Further, the identifier rRJ means that if the candidate sequence is obtained, the matching result by the matching circuit section 3 at the previous stage should be discarded and "reject" should be answered.

図において、従来の音声認識装置と同様に、照合回路部
３は、特徴抽出向７路部１によって抽出された特徴量と
標準辞書２から読出された標準時微量との例えば距離を
判定する。本発明の場合、判定された照合距離のより小
さいものから順Ｋ例えば３個のカテゴリを候補として決
定し、照合距離のより小さいものから順に即ち優先順位
層に配列した候補系列例えば図示の如（（０−１−２）
　ｖ出力するようにされる。In the figure, similar to the conventional speech recognition device, the matching circuit section 3 determines, for example, the distance between the feature amount extracted by the feature extraction section 1 and the standard time minute amount read out from the standard dictionary 2. In the case of the present invention, K, for example, three categories are determined as candidates in order from the one with the smallest determined matching distance, and the candidate series is arranged in order from the one with the smallest matching distance, that is, in the priority layer, for example, as shown in the figure. (0-1-2)
v output.

該候補系列（０−１−２）は比較回路部５に導ひかれ、
一方候補系列辞ｗ４から例えば（０−４−５゜Ｆ）ｓ（
０１２，、Ｒ）・・・の如く各候補系列が識別子ｌＤと
一緒ＫＪｌＩ香に読出されて比較回路部５に導ひかれる
。比較回路部５は、上記両者の候補系列（０−１−２）
と（０４５）ｔ（０１２）、・・・とを比較する。そし
て一致がとれた場合に該当する識別子ｉＤＶ調べ、識別
子ｒＦＪである場合Ｋ）言当該候補系列中の最優先順位
にある単語ｔもって正答とし、また識別子ｒＲＪである
場合Ｋ）ま刑合回路９１３からの照合結果をリジェクト
するようにする。図示の如く、照合回路部３から候補系
列（０−１−２）が得られている場合にｉｉ、比較回路
部５はリジェクトを発する。これは、上記候補系ダ１１
（０−１−２）Ｋ抽出されて〜・る単語「０」、「１」
。The candidate series (0-1-2) is led to the comparison circuit section 5,
On the other hand, from the candidate series word w4, for example (0-4-5°F)s(
012, , R) . The comparison circuit unit 5 selects both of the above candidate series (0-1-2).
and (045)t(012), . . . are compared. Then, if a match is found, the corresponding identifier iDV is checked, and if the identifier is rFJ, the word t with the highest priority in the candidate series is considered the correct answer, and if the identifier is rRJ, the correct answer is K). Reject the matching results from . As shown in the figure, when the candidate series (0-1-2) is obtained from the matching circuit section 3, the comparison circuit section 5 issues a reject. This is the candidate system Da 11 mentioned above.
(0-1-2)K extracted words “0”, “1”
.

「２」が音韻上からみていればノくラノ（うなもＪ）で
あり、最優先順位にある単語ｒＯＪの照合結果に信頼性
がとばしいととン意味しているからである。This is because, from a phonetic perspective, "2" is Nokurano (Unamo J), which means that the collation result for the word rOJ, which has the highest priority, is extremely reliable.

なお、図示の構成において、上記リジエクトカ玉生じた
場合に、再度発声をやり直したり、ある−一は図示特徴
抽出１路部１による特徴抽出とは別の観点からの特徴抽
出を行なって照合をやり直しするようＫする構成を附加
することができる。また上記説明において、数字などの
単語ｔｉ！識するものとしたが、単音節ｖｇ識する場合
にも適用できることは言うまでもない。In addition, in the illustrated configuration, if the above-mentioned redirect error occurs, the utterance is redone, or the feature extraction is performed from a different perspective than the feature extraction by the illustrated feature extraction 1 section 1, and the matching is redone. It is possible to add a configuration to do so. Also, in the above explanation, words such as numbers ti! However, it goes without saying that it can also be applied to the case of recognizing monosyllables.

以上説明した如く、本発明によれば、照合回路部による
照合結果の正当性を自らチェックさせることが可能であ
り、ａｍ結果の信頼性を高めることが可能となる。As described above, according to the present invention, it is possible to have the verification circuit section check the validity of the verification result by itself, and it is possible to improve the reliability of the am result.

[Brief explanation of drawings]

図は本発明の一実施例構成？示す。・１匂中、１は％全抽出回路部、２は欅準辞書、３は照
合回路部、４は候補系列辞書、５は比較１０回路８７表
わす。特許出願人　富士通株式会社Is the figure an example configuration of the present invention? show. - In 1, 1 represents the % total extraction circuit, 2 represents the keyaki quasi-dictionary, 3 represents the collation circuit, 4 represents the candidate series dictionary, and 5 represents the comparison 10 circuit 87. Patent applicant Fujitsu Limited

Claims

[Claims]

A feature extraction circuit section that extracts feature quantities of an input speech signal based on the frequency analysis results of the input speech signal, a standard dictionary that stores standard feature quantities corresponding to standard speech, and a feature extraction circuit section that extracts feature quantities of the input speech signal based on the frequency analysis results of the input speech signal. It has a matching circuit unit that compares the feature quantity read out from the standard dictionary with the standard feature quantity read from the standard dictionary, and generates a voice vil! corresponding to the input voice signal. ! The speech recognition device is configured such that the collation circuit unit is capable of extracting a plurality of candidate speeches by assigning priorities to them and outputting the extracted speech candidate sequences. a candidate sequence dictionary in which the candidate sequence dictionary is stored and an identifier is assigned to each corresponding sequence, and a comparison circuit section t that compares the voice candidate sequence outputted from the matching circuit and the voice candidate sequence read from the candidate sequence dictionary. Make money,
The highest priority among the speech candidate sequences outputted from the collation circuit section based on the identifier assigned to the speech candidate series that matched in the comparison circuit section? Motsu audio tw
A speech recognition method characterized in that it is determined whether or not to be adopted as a recognition result.