JPH05188998A - Speech recognizing method - Google Patents

Speech recognizing method

Info

Publication number
JPH05188998A
JPH05188998A JP4006156A JP615692A JPH05188998A JP H05188998 A JPH05188998 A JP H05188998A JP 4006156 A JP4006156 A JP 4006156A JP 615692 A JP615692 A JP 615692A JP H05188998 A JPH05188998 A JP H05188998A
Authority
JP
Japan
Prior art keywords
phoneme
phoneme series
neural network
phoneme sequence
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4006156A
Other languages
Japanese (ja)
Inventor
Yoshihiro Matsuura
嘉宏 松浦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP4006156A priority Critical patent/JPH05188998A/en
Publication of JPH05188998A publication Critical patent/JPH05188998A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To shorten the matching process time by employing a neural network for a matching process between a phoneme series obtained from an acoustic process part and the phoneme series of a dictionary. CONSTITUTION:The phoneme series sent out of the acoustic process part which extracts features from an input speech and converts them into the phoneme series by segmentation and phoneme recognition is supplied to a phoneme series input part 13 consisting of the neural network 12 which performs the matching process. The neural network 12 learns the phoneme series from the dictionary input part 14 which contains the previously registered phoneme series and the phoneme series input part 13 and obtains the proper phoneme series from the phoneme series from the phoneme series input part 13 which includes errors at an output part 15. Namely, the phoneme series obtained from the acoustic process part 11 is matched by the neural network 12 with the phoneme series from the dictionary input part 14. Then the phoneme series obtained at the output part 15 is supplied to a language process part 16, which generates a correct document by using language information on syntax, meaning, context, etc.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は連続音声認識方法に係
り、特に音素系列のマッチング処理に改良を施した音声
認識方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition method, and more particularly to a speech recognition method having an improved phoneme sequence matching process.

【0002】[0002]

【従来の技術】連続音声認識方法の概略構成は入力され
る音声データから音素系列を得る音響処理部と、この音
響処理部により得られた音素系列から単語を構成し、文
章を生成する言語処理部からなる。言語処理部では得ら
れた音響処理部から入力される音素系列と辞書に登録さ
れた音素系列とを比較しながら単語を切り出し、文章を
生成して行く。その際、音響処理部から送出される音素
系列は誤りを含んでいる可能性があるので、辞書に登録
されている音素系列と総当りのマッチングを採りなが
ら、音素系列の処理を行っている。
2. Description of the Related Art A general structure of a continuous speech recognition method is a speech processing unit for obtaining a phoneme sequence from input speech data, and a language processing for constructing a word from the phoneme sequence obtained by the sound processing unit to generate a sentence. Consists of parts. The language processing unit cuts out words and compares the phoneme sequence input from the obtained sound processing unit with the phoneme sequence registered in the dictionary to generate a sentence. At that time, since the phoneme sequence transmitted from the acoustic processing unit may include an error, the phoneme sequence is processed while performing a brute force match with the phoneme sequence registered in the dictionary.

【0003】[0003]

【発明が解決しようとする課題】上記のように音響処理
部から送出された音素系列と辞書に登録された音素系列
とは総当りのマッチング処理を採用しているため、その
処理時間が極めて長くなってしまう問題がある。
Since the phoneme sequence sent from the acoustic processing unit and the phoneme sequence registered in the dictionary are subjected to the brute force matching process as described above, the processing time is extremely long. There is a problem that becomes.

【0004】この発明は上記の事情に鑑みてなされたも
ので、音素系列のマッチング処理時間の短縮化を図るよ
うにした音声認識方法を提供することを目的とする。
The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a speech recognition method designed to shorten the phoneme sequence matching processing time.

【0005】[0005]

【課題を解決するための手段】この発明は上記の目的を
達成するために、入力される音声を音響処理部で音素系
列に変換した後、この音素系列をニューラルネットワー
クの一方の入力部に供給し、その他方の入力部には登録
された音素系列を収納した辞書からの音素系列を供給し
て両音素系列のマッチング処理をニューラルネットワー
クで行った後、ニューラルネットワークの出力部を言語
処理部に供給して文章を生成するようにしたことを特徴
とするものである。
In order to achieve the above object, the present invention converts an input speech into a phoneme sequence in a sound processing unit and then supplies this phoneme sequence to one input unit of a neural network. Then, the other input unit is supplied with the phoneme sequence from the dictionary that stores the registered phoneme sequences, and the matching process of both phoneme sequences is performed by the neural network, and then the output unit of the neural network is set to the language processing unit. It is characterized in that it is supplied to generate a sentence.

【0006】[0006]

【作用】音響処理部から送出された音素系列はニューラ
ルネットワークの一方の音素系列入力部に与えられ、そ
の他方の辞書入力部には登録された音素系列が与えられ
る。ニューラルネットワークは両音素系列を学習して出
力部に適正な音素系列を得る。この音素系列によって言
語処理部は文章を生成する。
The phoneme sequence sent from the acoustic processing unit is given to one phoneme sequence input unit of the neural network, and the registered phoneme sequence is given to the other dictionary input unit. The neural network learns both phoneme sequences to obtain an appropriate phoneme sequence at the output section. The language processing unit generates a sentence based on this phoneme sequence.

【0007】[0007]

【実施例】以下この発明の一実施例を図面に基づいて説
明する。図1において、11は入力音声から特徴抽出を
行い、セグメント化、音素認識により音素系列に変換す
る音響処理部でこの音響処理部11から送出された音素
系列はマッチング処理を行うニューラルネットワーク1
2からなる音素系列入力部13に与えられる。14は同
じくニューラルネットワークからなる予め登録された音
素系列が収納された辞書入力部で、ニューラルネットワ
ーク12は両入力部13,14からの音素系列を学習し
て、音素系列入力部13からの誤りを含んだ音素系列か
ら適正な音素系列を出力部15に得る。出力部15に得
られた音素系列は言語処理部16に供給され、ここで構
文、意味、文脈などの言語情報を用いて正しい文章を生
成する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, reference numeral 11 denotes an acoustic processing unit that performs feature extraction from input speech and converts it into a phoneme sequence by segmentation and phoneme recognition. The neural network 1 that performs matching processing on the phoneme sequence sent from this acoustic processing unit 11
It is given to the phoneme sequence input unit 13 consisting of two. Reference numeral 14 is a dictionary input unit that also stores a pre-registered phoneme sequence, which is also a neural network, and the neural network 12 learns the phoneme sequences from both input units 13 and 14 to eliminate errors from the phoneme sequence input unit 13. An appropriate phoneme sequence is obtained from the included phoneme sequence at the output unit 15. The phoneme sequence obtained at the output unit 15 is supplied to the language processing unit 16, where a correct sentence is generated using language information such as syntax, meaning, and context.

【0008】次に上記のように構成した実施例におい
て、音響処理部11からの音素系列には誤りを含んでい
るので、この音素系列をニューラルネットワーク12で
辞書入力部14の音素系列で学習し、例えば音素系列が
マッチした部分が「1」となり、それ以外は「0」とな
るように、ニューラルネットワーク12の出力部15に
出力される。出力部15に得られた適正な音素系列を用
いて言語処理部16で文章を生成する。このように、両
音素系列をニューラルネットワーク12でマッチング処
理すればその処理時間は大幅に短くなる。
Next, in the embodiment configured as described above, since the phoneme sequence from the acoustic processing unit 11 contains an error, this phoneme sequence is learned by the neural network 12 by the phoneme sequence of the dictionary input unit 14. For example, it is output to the output unit 15 of the neural network 12 so that the part where the phoneme sequence matches becomes “1” and the other parts become “0”. The language processing unit 16 generates a sentence using the appropriate phoneme sequence obtained by the output unit 15. In this way, if both phoneme sequences are subjected to matching processing by the neural network 12, the processing time will be greatly shortened.

【0009】[0009]

【発明の効果】以上述べたように、この発明によれば、
音響処理部から得られる音素系列と辞書の音素系列との
マッチング処理にニューラルネットワークを採用したの
で、マッチング処理時間の短縮化を図ることができると
ともに音響処理部での誤りの傾向に適用したマッチング
を実現できる等の利点がある。
As described above, according to the present invention,
Since the neural network is used for matching processing between the phoneme sequence obtained from the acoustic processing unit and the phoneme sequence in the dictionary, the matching processing time can be shortened and the matching applied to the error tendency in the acoustic processing unit can be achieved. There are advantages such as realization.

【図面の簡単な説明】[Brief description of drawings]

【図1】この発明の一実施例を示す構成説明図。FIG. 1 is a structural explanatory view showing an embodiment of the present invention.

【符号の説明】[Explanation of symbols]

11…音響処理部、12…ニューラルネットワーク、1
3…音素系列入力部、14…辞書入力部、15…出力
部、16…言語処理部。
11 ... Acoustic processing unit, 12 ... Neural network, 1
3 ... Phoneme sequence input unit, 14 ... Dictionary input unit, 15 ... Output unit, 16 ... Language processing unit.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力される音声を音響処理部で音素系列
に変換した後、この音素系列をニューラルネットワーク
の一方の入力部に供給し、その他方の入力部には登録さ
れた音素系列を収納した辞書からの音素系列を供給して
両音素系列のマッチング処理をニューラルネットワーク
で行った後、ニューラルネットワークの出力部を言語処
理部に供給して文章を生成することを特徴とする音声認
識方法。
1. A sound processing unit converts an input speech into a phoneme sequence, and then supplies this phoneme sequence to one input unit of a neural network, and stores the registered phoneme sequence in the other input unit. A speech recognition method comprising supplying a phoneme sequence from the dictionary and performing matching processing of both phoneme sequences by a neural network, and then supplying an output unit of the neural network to a language processing unit to generate a sentence.
JP4006156A 1992-01-17 1992-01-17 Speech recognizing method Pending JPH05188998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4006156A JPH05188998A (en) 1992-01-17 1992-01-17 Speech recognizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4006156A JPH05188998A (en) 1992-01-17 1992-01-17 Speech recognizing method

Publications (1)

Publication Number Publication Date
JPH05188998A true JPH05188998A (en) 1993-07-30

Family

ID=11630670

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4006156A Pending JPH05188998A (en) 1992-01-17 1992-01-17 Speech recognizing method

Country Status (1)

Country Link
JP (1) JPH05188998A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100486735B1 (en) * 2003-02-28 2005-05-03 삼성전자주식회사 Method of establishing optimum-partitioned classifed neural network and apparatus and method and apparatus for automatic labeling using optimum-partitioned classifed neural network
US10468030B2 (en) 2016-12-19 2019-11-05 Samsung Electronics Co., Ltd. Speech recognition method and apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100486735B1 (en) * 2003-02-28 2005-05-03 삼성전자주식회사 Method of establishing optimum-partitioned classifed neural network and apparatus and method and apparatus for automatic labeling using optimum-partitioned classifed neural network
US10468030B2 (en) 2016-12-19 2019-11-05 Samsung Electronics Co., Ltd. Speech recognition method and apparatus

Similar Documents

Publication Publication Date Title
US7711105B2 (en) Methods and apparatus for processing foreign accent/language communications
Fry Theoretical aspects of mechanical speech recognition
US20040024585A1 (en) Linguistic segmentation of speech
WO2006097975A1 (en) Voice recognition program
CN109493846B (en) English accent recognition system
EP1460615B1 (en) Voice processing device and method, recording medium, and program
CN110942767B (en) Recognition labeling and optimization method and device for ASR language model
KR100379994B1 (en) Verbal utterance rejection using a labeller with grammatical constraints
CN113470622A (en) Conversion method and device capable of converting any voice into multiple voices
JPH05188998A (en) Speech recognizing method
JP3039634B2 (en) Voice recognition device
CN113160828A (en) Intelligent auxiliary robot interaction method and system, electronic equipment and storage medium
KR100308274B1 (en) Variable vocabulary recognition system
JPH10116093A (en) Voice recognition device
JP2001188556A (en) Method and device for voice recognition
JPH09230889A (en) Speech recognition and response device
JPS6229796B2 (en)
CN113903327B (en) Voice environment atmosphere recognition method based on deep neural network
US8249869B2 (en) Lexical correction of erroneous text by transformation into a voice message
JP2002082691A (en) Automatic recognition method of company name included in uttering
JPH0211919B2 (en)
Macherey et al. Multi-level error handling for tree based dialogue course management
JPH09244692A (en) Uttered word certifying method and device executing the same method
JP2004309654A (en) Speech recognition apparatus
KR100677197B1 (en) Voice recognizing dictation method