JPH0573094A

JPH0573094A - Continuous speech recognizing method

Info

Publication number: JPH0573094A
Application number: JP3232132A
Authority: JP
Inventors: Yoshihiro Matsuura; 嘉宏松浦
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1991-09-12
Filing date: 1991-09-12
Publication date: 1993-03-26

Abstract

PURPOSE:To improve the processing speed and document recognition rate by correcting a phoneme series signal by acoustic processing by a neural network by learning based upon a tutor signal. CONSTITUTION:The neural network obtains a phoneme series, corrected by the learning based upon the phoneme series signal and an actually voiced phoneme series signal, from an acoustic processing part 1 and supplies this corrected phoneme series signal as a phoneme series signal to a language processing part 2. The phoneme series is corrected while the phoneme series signal is learnt by using the speech signal that a speaker actually voices as the tutor signal. Therefore, this learning is improved in correcting ability as the process is repeated more. Consequently, word candidates at the language processing part 2 are decreased to make the processing speed fast and also improve the document recognition rate at the same time.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、連続音声認識方法に係
り、特に会話音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition method, and more particularly to a conversational speech recognition method.

【０００２】[0002]

【従来の技術】音声認識方式のうち、単語を連続して発
声した音声を認識する連続音声認識方式は、比較的少数
の語いを認識する連続単語音声認識、さらには言語的知
識を付加して比較的多数の語いの意味内容を認識する会
話音声認識がある。図２は会話音声認識方式の構成を示
す。この構成は階層モデルを示し、音響処理部１と言語
処理部２による階層で分散処理する。音響処理部１は、
入力音声から特徴抽出を行い、音声信号のセグメント
化、音素認識により音素列に変換する。言語処理部２は
辞書と音韻規則から単語あるいは単語列の候補を作成
し、構文、意味、文脈などの言語情報を用いることによ
って音素列を補正し、文章として出力する。2. Description of the Related Art Among the speech recognition methods, a continuous speech recognition method for recognizing a speech in which a word is uttered in succession is a continuous word speech recognition for recognizing a relatively small number of vocabulary words, and further for adding linguistic knowledge There is speech recognition that recognizes the meaning and content of a relatively large number of words. FIG. 2 shows the configuration of the conversational voice recognition system. This configuration shows a hierarchical model in which the acoustic processing unit 1 and the language processing unit 2 perform distributed processing in hierarchical layers. The sound processing unit 1 is
Feature extraction is performed from the input voice, and the voice signal is segmented and converted into a phoneme string by phoneme recognition. The language processing unit 2 creates a word or word string candidate from a dictionary and phonological rules, corrects the phoneme string by using linguistic information such as syntax, meaning, and context, and outputs it as a sentence.

【０００３】この階層モデルのほかに、音響レベルと言
語レベルに共通のデータベースを用いるブラックボード
モデル、言語レベルにネットワークを用いるネットワー
クモデルがある。In addition to this hierarchical model, there are a blackboard model that uses a database common to acoustic levels and language levels, and a network model that uses a network at language levels.

【０００４】[0004]

【発明が解決しようとする課題】従来の方式において、
音響処理部１では完全な音素認識を行うことが難しく、
出力される音素列には誤りが含まれている。そこで、言
語処理部では誤りが含まれていることを前提に処理する
ため、完全にマッチしていない単語も候補に上げ、構文
や意味等の言語情報から誤った候補を除いている。SUMMARY OF THE INVENTION In the conventional method,
It is difficult for the acoustic processing unit 1 to perform complete phoneme recognition,
The output phoneme sequence contains an error. Therefore, since the language processing unit processes on the assumption that an error is included, the word that does not completely match is also added as a candidate, and the incorrect candidate is removed from the language information such as the syntax and the meaning.

【０００５】このため、音響処理部での音素認識率が低
下すると、言語処理部で上げる単語候補も多くなり、処
理速度の低下及び文章認識率の低下になる問題があっ
た。For this reason, when the phoneme recognition rate in the sound processing section is lowered, the number of word candidates to be raised in the language processing section is increased, and there is a problem that the processing speed is lowered and the sentence recognition rate is lowered.

【０００６】本発明の目的は、処理速度及び文章認識率
を高めた連続音声認識方法を提供することにある。It is an object of the present invention to provide a continuous speech recognition method with improved processing speed and sentence recognition rate.

【０００７】[0007]

【課題を解決するための手段】入力音声信号に対する音
響処理によって音素列信号を取出し、この音素列信号に
対する言語処理によって文章認識を行うにおいて、話者
の実際の音声信号を教師信号として学習によって前記音
素列信号を修正するニューラルネットワークによって修
正した音素列信号で言語処理することを特徴とする。When a phoneme string signal is extracted by acoustic processing on an input speech signal and sentence recognition is performed by language processing on this phoneme string signal, the actual speech signal of a speaker is learned as a teacher signal to learn the above. It is characterized in that language processing is performed with the phoneme string signal modified by a neural network that modifies the phoneme string signal.

【０００８】[0008]

【作用】音響処理した音素列信号に対し、ニューラルネ
ットワークによる教師あり学習によって修正した音素列
信号を得、これを言語処理のための音素列信号に使用す
ることで音響処理部での音響処理誤りを少なくし、言語
処理部での単語候補低減を図る。Operation: A phoneme string signal corrected by supervised learning by a neural network is obtained from a phoneme string signal that has been acoustically processed, and this is used as a phoneme string signal for language processing. To reduce word candidates in the language processing unit.

【０００９】[0009]

【実施例】図１は本発明の一実施例を示す構成図であ
る。同図中、音響処理部１及び言語処理部２は従来の階
層モデルと同様の処理を行う。1 is a block diagram showing an embodiment of the present invention. In the figure, the sound processing unit 1 and the language processing unit 2 perform the same processing as the conventional hierarchical model.

【００１０】ニューラルネットワーク３は音響処理部１
から音素列信号と実際に発声された音素列信号とから学
習によって修正した音素列を得、この修正音素列信号を
言語処理部２へ音素列信号として与える。The neural network 3 is a sound processing unit 1.
A phoneme string corrected by learning is obtained from the phoneme string signal and the phoneme string signal actually uttered, and the corrected phoneme string signal is given to the language processing unit 2 as a phoneme string signal.

【００１１】ニューラルネットワーク３による音素列修
正は、教師あり学習を行うもので、話者が実際に発生し
た音声信号を教師信号とし音響処理した音素列に対する
修正音素列信号を学習によって修正するか、又は実音声
との誤差を教師信号として学習を修正する。The phoneme string correction by the neural network 3 is performed by supervised learning, and a corrected phoneme string signal for a phoneme string acoustically processed by a speaker using a voice signal actually generated as a teacher signal is corrected by learning. Alternatively, the learning is corrected by using the error from the actual voice as the teacher signal.

【００１２】このようなニューラルネットワーク３を介
在することにより、音響処理部１で発生する音素列化へ
の誤りの傾向又は誤差そのものがニューラルネットワー
クで学習され、この学習は処理を繰り返すほど段階的に
修正能力が高められ、音素列の誤りを修正する。これに
より、言語処理部２における単語候補も少なくし、処理
速度を高め、また文章認識率も高める。By interposing the neural network 3 as described above, the tendency of the error to the phoneme string generation or the error itself generated in the acoustic processing unit 1 is learned by the neural network, and this learning is stepwise as the processing is repeated. The ability to correct is enhanced and corrects phoneme string errors. As a result, the number of word candidates in the language processing unit 2 is reduced, the processing speed is increased, and the sentence recognition rate is also increased.

【００１３】なお、本発明は階層モデルに限らず、ブラ
ックボードモデルやネットワークモデルでの音響レベル
処理に適用して同等の作用効果が得られる。The present invention is not limited to the hierarchical model, and can be applied to the sound level processing in the blackboard model or the network model to obtain the same effect.

【００１４】[0014]

【発明の効果】以上のとおり、本発明によれば、音響処
理による音素列信号に対しニューラルネットワークによ
って教師あり学習による修正を行い、この修正音素列信
号を言語処理するようにしたため、音響処理部での処理
の誤りにも言語処理部での単語候補数を低減してその処
理速度向上及び文章認識率の向上を図ることができる。As described above, according to the present invention, a phoneme string signal obtained by acoustic processing is corrected by supervised learning by a neural network, and the corrected phoneme string signal is subjected to language processing. The number of word candidates in the language processing unit can be reduced to improve the processing speed and the sentence recognition rate even in the case of processing error.

[Brief description of drawings]

【図１】本発明の一実施例を示す構成図。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図２】従来の方式を示す構成図。FIG. 2 is a block diagram showing a conventional method.

[Explanation of symbols]

１…音響処理部，２…言語処理部，３…ニューラルネッ
トワーク。1 ... Acoustic processing unit, 2 ... Language processing unit, 3 ... Neural network.

Claims

[Claims]

1. A phoneme string signal is extracted by acoustic processing of an input speech signal, and sentence recognition is performed by language processing of the phoneme string signal. When the phoneme string signal is learned by learning an actual speech signal of a speaker as a teacher signal. A continuous speech recognition method characterized by performing language processing on a phoneme string signal modified by a modifying neural network.