JPH0573094A - Continuous speech recognizing method - Google Patents

Continuous speech recognizing method

Info

Publication number
JPH0573094A
JPH0573094A JP3232132A JP23213291A JPH0573094A JP H0573094 A JPH0573094 A JP H0573094A JP 3232132 A JP3232132 A JP 3232132A JP 23213291 A JP23213291 A JP 23213291A JP H0573094 A JPH0573094 A JP H0573094A
Authority
JP
Japan
Prior art keywords
signal
phoneme series
phoneme string
phoneme
corrected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP3232132A
Other languages
Japanese (ja)
Inventor
Yoshihiro Matsuura
嘉宏 松浦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP3232132A priority Critical patent/JPH0573094A/en
Publication of JPH0573094A publication Critical patent/JPH0573094A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To improve the processing speed and document recognition rate by correcting a phoneme series signal by acoustic processing by a neural network by learning based upon a tutor signal. CONSTITUTION:The neural network obtains a phoneme series, corrected by the learning based upon the phoneme series signal and an actually voiced phoneme series signal, from an acoustic processing part 1 and supplies this corrected phoneme series signal as a phoneme series signal to a language processing part 2. The phoneme series is corrected while the phoneme series signal is learnt by using the speech signal that a speaker actually voices as the tutor signal. Therefore, this learning is improved in correcting ability as the process is repeated more. Consequently, word candidates at the language processing part 2 are decreased to make the processing speed fast and also improve the document recognition rate at the same time.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、連続音声認識方法に係
り、特に会話音声認識方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition method, and more particularly to a conversational speech recognition method.

【0002】[0002]

【従来の技術】音声認識方式のうち、単語を連続して発
声した音声を認識する連続音声認識方式は、比較的少数
の語いを認識する連続単語音声認識、さらには言語的知
識を付加して比較的多数の語いの意味内容を認識する会
話音声認識がある。図2は会話音声認識方式の構成を示
す。この構成は階層モデルを示し、音響処理部1と言語
処理部2による階層で分散処理する。音響処理部1は、
入力音声から特徴抽出を行い、音声信号のセグメント
化、音素認識により音素列に変換する。言語処理部2は
辞書と音韻規則から単語あるいは単語列の候補を作成
し、構文、意味、文脈などの言語情報を用いることによ
って音素列を補正し、文章として出力する。
2. Description of the Related Art Among the speech recognition methods, a continuous speech recognition method for recognizing a speech in which a word is uttered in succession is a continuous word speech recognition for recognizing a relatively small number of vocabulary words, and further for adding linguistic knowledge There is speech recognition that recognizes the meaning and content of a relatively large number of words. FIG. 2 shows the configuration of the conversational voice recognition system. This configuration shows a hierarchical model in which the acoustic processing unit 1 and the language processing unit 2 perform distributed processing in hierarchical layers. The sound processing unit 1 is
Feature extraction is performed from the input voice, and the voice signal is segmented and converted into a phoneme string by phoneme recognition. The language processing unit 2 creates a word or word string candidate from a dictionary and phonological rules, corrects the phoneme string by using linguistic information such as syntax, meaning, and context, and outputs it as a sentence.

【0003】この階層モデルのほかに、音響レベルと言
語レベルに共通のデータベースを用いるブラックボード
モデル、言語レベルにネットワークを用いるネットワー
クモデルがある。
In addition to this hierarchical model, there are a blackboard model that uses a database common to acoustic levels and language levels, and a network model that uses a network at language levels.

【0004】[0004]

【発明が解決しようとする課題】従来の方式において、
音響処理部1では完全な音素認識を行うことが難しく、
出力される音素列には誤りが含まれている。そこで、言
語処理部では誤りが含まれていることを前提に処理する
ため、完全にマッチしていない単語も候補に上げ、構文
や意味等の言語情報から誤った候補を除いている。
SUMMARY OF THE INVENTION In the conventional method,
It is difficult for the acoustic processing unit 1 to perform complete phoneme recognition,
The output phoneme sequence contains an error. Therefore, since the language processing unit processes on the assumption that an error is included, the word that does not completely match is also added as a candidate, and the incorrect candidate is removed from the language information such as the syntax and the meaning.

【0005】このため、音響処理部での音素認識率が低
下すると、言語処理部で上げる単語候補も多くなり、処
理速度の低下及び文章認識率の低下になる問題があっ
た。
For this reason, when the phoneme recognition rate in the sound processing section is lowered, the number of word candidates to be raised in the language processing section is increased, and there is a problem that the processing speed is lowered and the sentence recognition rate is lowered.

【0006】本発明の目的は、処理速度及び文章認識率
を高めた連続音声認識方法を提供することにある。
It is an object of the present invention to provide a continuous speech recognition method with improved processing speed and sentence recognition rate.

【0007】[0007]

【課題を解決するための手段】入力音声信号に対する音
響処理によって音素列信号を取出し、この音素列信号に
対する言語処理によって文章認識を行うにおいて、話者
の実際の音声信号を教師信号として学習によって前記音
素列信号を修正するニューラルネットワークによって修
正した音素列信号で言語処理することを特徴とする。
When a phoneme string signal is extracted by acoustic processing on an input speech signal and sentence recognition is performed by language processing on this phoneme string signal, the actual speech signal of a speaker is learned as a teacher signal to learn the above. It is characterized in that language processing is performed with the phoneme string signal modified by a neural network that modifies the phoneme string signal.

【0008】[0008]

【作用】音響処理した音素列信号に対し、ニューラルネ
ットワークによる教師あり学習によって修正した音素列
信号を得、これを言語処理のための音素列信号に使用す
ることで音響処理部での音響処理誤りを少なくし、言語
処理部での単語候補低減を図る。
Operation: A phoneme string signal corrected by supervised learning by a neural network is obtained from a phoneme string signal that has been acoustically processed, and this is used as a phoneme string signal for language processing. To reduce word candidates in the language processing unit.

【0009】[0009]

【実施例】図1は本発明の一実施例を示す構成図であ
る。同図中、音響処理部1及び言語処理部2は従来の階
層モデルと同様の処理を行う。
1 is a block diagram showing an embodiment of the present invention. In the figure, the sound processing unit 1 and the language processing unit 2 perform the same processing as the conventional hierarchical model.

【0010】ニューラルネットワーク3は音響処理部1
から音素列信号と実際に発声された音素列信号とから学
習によって修正した音素列を得、この修正音素列信号を
言語処理部2へ音素列信号として与える。
The neural network 3 is a sound processing unit 1.
A phoneme string corrected by learning is obtained from the phoneme string signal and the phoneme string signal actually uttered, and the corrected phoneme string signal is given to the language processing unit 2 as a phoneme string signal.

【0011】ニューラルネットワーク3による音素列修
正は、教師あり学習を行うもので、話者が実際に発生し
た音声信号を教師信号とし音響処理した音素列に対する
修正音素列信号を学習によって修正するか、又は実音声
との誤差を教師信号として学習を修正する。
The phoneme string correction by the neural network 3 is performed by supervised learning, and a corrected phoneme string signal for a phoneme string acoustically processed by a speaker using a voice signal actually generated as a teacher signal is corrected by learning. Alternatively, the learning is corrected by using the error from the actual voice as the teacher signal.

【0012】このようなニューラルネットワーク3を介
在することにより、音響処理部1で発生する音素列化へ
の誤りの傾向又は誤差そのものがニューラルネットワー
クで学習され、この学習は処理を繰り返すほど段階的に
修正能力が高められ、音素列の誤りを修正する。これに
より、言語処理部2における単語候補も少なくし、処理
速度を高め、また文章認識率も高める。
By interposing the neural network 3 as described above, the tendency of the error to the phoneme string generation or the error itself generated in the acoustic processing unit 1 is learned by the neural network, and this learning is stepwise as the processing is repeated. The ability to correct is enhanced and corrects phoneme string errors. As a result, the number of word candidates in the language processing unit 2 is reduced, the processing speed is increased, and the sentence recognition rate is also increased.

【0013】なお、本発明は階層モデルに限らず、ブラ
ックボードモデルやネットワークモデルでの音響レベル
処理に適用して同等の作用効果が得られる。
The present invention is not limited to the hierarchical model, and can be applied to the sound level processing in the blackboard model or the network model to obtain the same effect.

【0014】[0014]

【発明の効果】以上のとおり、本発明によれば、音響処
理による音素列信号に対しニューラルネットワークによ
って教師あり学習による修正を行い、この修正音素列信
号を言語処理するようにしたため、音響処理部での処理
の誤りにも言語処理部での単語候補数を低減してその処
理速度向上及び文章認識率の向上を図ることができる。
As described above, according to the present invention, a phoneme string signal obtained by acoustic processing is corrected by supervised learning by a neural network, and the corrected phoneme string signal is subjected to language processing. The number of word candidates in the language processing unit can be reduced to improve the processing speed and the sentence recognition rate even in the case of processing error.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例を示す構成図。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図2】従来の方式を示す構成図。FIG. 2 is a block diagram showing a conventional method.

【符号の説明】[Explanation of symbols]

1…音響処理部,2…言語処理部,3…ニューラルネッ
トワーク。
1 ... Acoustic processing unit, 2 ... Language processing unit, 3 ... Neural network.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力音声信号に対する音響処理によって
音素列信号を取出し、この音素列信号に対する言語処理
によって文章認識を行うにおいて、話者の実際の音声信
号を教師信号として学習によって前記音素列信号を修正
するニューラルネットワークによって修正した音素列信
号で言語処理することを特徴とする連続音声認識方法。
1. A phoneme string signal is extracted by acoustic processing of an input speech signal, and sentence recognition is performed by language processing of the phoneme string signal. When the phoneme string signal is learned by learning an actual speech signal of a speaker as a teacher signal. A continuous speech recognition method characterized by performing language processing on a phoneme string signal modified by a modifying neural network.
JP3232132A 1991-09-12 1991-09-12 Continuous speech recognizing method Pending JPH0573094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3232132A JPH0573094A (en) 1991-09-12 1991-09-12 Continuous speech recognizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3232132A JPH0573094A (en) 1991-09-12 1991-09-12 Continuous speech recognizing method

Publications (1)

Publication Number Publication Date
JPH0573094A true JPH0573094A (en) 1993-03-26

Family

ID=16934502

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3232132A Pending JPH0573094A (en) 1991-09-12 1991-09-12 Continuous speech recognizing method

Country Status (1)

Country Link
JP (1) JPH0573094A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4987203B2 (en) * 1999-11-12 2012-07-25 フェニックス ソリューションズ インコーポレーテッド Distributed real-time speech recognition system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4987203B2 (en) * 1999-11-12 2012-07-25 フェニックス ソリューションズ インコーポレーテッド Distributed real-time speech recognition system

Similar Documents

Publication Publication Date Title
US8019602B2 (en) Automatic speech recognition learning using user corrections
WO2022083083A1 (en) Sound conversion system and training method for same
US6233553B1 (en) Method and system for automatically determining phonetic transcriptions associated with spelled words
US8498857B2 (en) System and method for rapid prototyping of existing speech recognition solutions in different languages
US5787230A (en) System and method of intelligent Mandarin speech input for Chinese computers
US6138099A (en) Automatically updating language models
JP2003316386A (en) Method, device, and program for speech recognition
JP2001100781A (en) Method and device for voice processing and recording medium
KR20070098094A (en) An acoustic model adaptation method based on pronunciation variability analysis for foreign speech recognition and apparatus thereof
US20020087317A1 (en) Computer-implemented dynamic pronunciation method and system
JPH06110494A (en) Pronounciation learning device
Azim et al. Large vocabulary Arabic continuous speech recognition using tied states acoustic models
JPH0573094A (en) Continuous speech recognizing method
Polyakova et al. Learning from errors in grapheme-to-phoneme conversion.
JPH03226785A (en) Linguistic education device with voice recognition device
JP2001188556A (en) Method and device for voice recognition
JPH0736481A (en) Interpolation speech recognition device
JPS6229796B2 (en)
JPH08171396A (en) Speech recognition device
JPH0434499A (en) Vocalization indicating method
US20100161312A1 (en) Method of semantic, syntactic and/or lexical correction, corresponding corrector, as well as recording medium and computer program for implementing this method
Lin et al. A Multi-modal Soft Targets Approach for Pronunciation Erroneous Tendency Detection
CN116013261A (en) Voice recognition method, device, storage medium and equipment
KR19980013825A (en) Speech recognition device with language model adaptation function and control method thereof
CN117079637A (en) Mongolian emotion voice synthesis method based on condition generation countermeasure network