JPH05188998A

JPH05188998A - Speech recognizing method

Info

Publication number: JPH05188998A
Application number: JP4006156A
Authority: JP
Inventors: Yoshihiro Matsuura; 嘉宏松浦
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1992-01-17
Filing date: 1992-01-17
Publication date: 1993-07-30

Abstract

PURPOSE:To shorten the matching process time by employing a neural network for a matching process between a phoneme series obtained from an acoustic process part and the phoneme series of a dictionary. CONSTITUTION:The phoneme series sent out of the acoustic process part which extracts features from an input speech and converts them into the phoneme series by segmentation and phoneme recognition is supplied to a phoneme series input part 13 consisting of the neural network 12 which performs the matching process. The neural network 12 learns the phoneme series from the dictionary input part 14 which contains the previously registered phoneme series and the phoneme series input part 13 and obtains the proper phoneme series from the phoneme series from the phoneme series input part 13 which includes errors at an output part 15. Namely, the phoneme series obtained from the acoustic process part 11 is matched by the neural network 12 with the phoneme series from the dictionary input part 14. Then the phoneme series obtained at the output part 15 is supplied to a language process part 16, which generates a correct document by using language information on syntax, meaning, context, etc.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は連続音声認識方法に係
り、特に音素系列のマッチング処理に改良を施した音声
認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition method, and more particularly to a speech recognition method having an improved phoneme sequence matching process.

【０００２】[0002]

【従来の技術】連続音声認識方法の概略構成は入力され
る音声データから音素系列を得る音響処理部と、この音
響処理部により得られた音素系列から単語を構成し、文
章を生成する言語処理部からなる。言語処理部では得ら
れた音響処理部から入力される音素系列と辞書に登録さ
れた音素系列とを比較しながら単語を切り出し、文章を
生成して行く。その際、音響処理部から送出される音素
系列は誤りを含んでいる可能性があるので、辞書に登録
されている音素系列と総当りのマッチングを採りなが
ら、音素系列の処理を行っている。2. Description of the Related Art A general structure of a continuous speech recognition method is a speech processing unit for obtaining a phoneme sequence from input speech data, and a language processing for constructing a word from the phoneme sequence obtained by the sound processing unit to generate a sentence. Consists of parts. The language processing unit cuts out words and compares the phoneme sequence input from the obtained sound processing unit with the phoneme sequence registered in the dictionary to generate a sentence. At that time, since the phoneme sequence transmitted from the acoustic processing unit may include an error, the phoneme sequence is processed while performing a brute force match with the phoneme sequence registered in the dictionary.

【０００３】[0003]

【発明が解決しようとする課題】上記のように音響処理
部から送出された音素系列と辞書に登録された音素系列
とは総当りのマッチング処理を採用しているため、その
処理時間が極めて長くなってしまう問題がある。Since the phoneme sequence sent from the acoustic processing unit and the phoneme sequence registered in the dictionary are subjected to the brute force matching process as described above, the processing time is extremely long. There is a problem that becomes.

【０００４】この発明は上記の事情に鑑みてなされたも
ので、音素系列のマッチング処理時間の短縮化を図るよ
うにした音声認識方法を提供することを目的とする。The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a speech recognition method designed to shorten the phoneme sequence matching processing time.

【０００５】[0005]

【課題を解決するための手段】この発明は上記の目的を
達成するために、入力される音声を音響処理部で音素系
列に変換した後、この音素系列をニューラルネットワー
クの一方の入力部に供給し、その他方の入力部には登録
された音素系列を収納した辞書からの音素系列を供給し
て両音素系列のマッチング処理をニューラルネットワー
クで行った後、ニューラルネットワークの出力部を言語
処理部に供給して文章を生成するようにしたことを特徴
とするものである。In order to achieve the above object, the present invention converts an input speech into a phoneme sequence in a sound processing unit and then supplies this phoneme sequence to one input unit of a neural network. Then, the other input unit is supplied with the phoneme sequence from the dictionary that stores the registered phoneme sequences, and the matching process of both phoneme sequences is performed by the neural network, and then the output unit of the neural network is set to the language processing unit. It is characterized in that it is supplied to generate a sentence.

【０００６】[0006]

【作用】音響処理部から送出された音素系列はニューラ
ルネットワークの一方の音素系列入力部に与えられ、そ
の他方の辞書入力部には登録された音素系列が与えられ
る。ニューラルネットワークは両音素系列を学習して出
力部に適正な音素系列を得る。この音素系列によって言
語処理部は文章を生成する。The phoneme sequence sent from the acoustic processing unit is given to one phoneme sequence input unit of the neural network, and the registered phoneme sequence is given to the other dictionary input unit. The neural network learns both phoneme sequences to obtain an appropriate phoneme sequence at the output section. The language processing unit generates a sentence based on this phoneme sequence.

【０００７】[0007]

【実施例】以下この発明の一実施例を図面に基づいて説
明する。図１において、１１は入力音声から特徴抽出を
行い、セグメント化、音素認識により音素系列に変換す
る音響処理部でこの音響処理部１１から送出された音素
系列はマッチング処理を行うニューラルネットワーク１
２からなる音素系列入力部１３に与えられる。１４は同
じくニューラルネットワークからなる予め登録された音
素系列が収納された辞書入力部で、ニューラルネットワ
ーク１２は両入力部１３，１４からの音素系列を学習し
て、音素系列入力部１３からの誤りを含んだ音素系列か
ら適正な音素系列を出力部１５に得る。出力部１５に得
られた音素系列は言語処理部１６に供給され、ここで構
文、意味、文脈などの言語情報を用いて正しい文章を生
成する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, reference numeral 11 denotes an acoustic processing unit that performs feature extraction from input speech and converts it into a phoneme sequence by segmentation and phoneme recognition. The neural network 1 that performs matching processing on the phoneme sequence sent from this acoustic processing unit 11
It is given to the phoneme sequence input unit 13 consisting of two. Reference numeral 14 is a dictionary input unit that also stores a pre-registered phoneme sequence, which is also a neural network, and the neural network 12 learns the phoneme sequences from both input units 13 and 14 to eliminate errors from the phoneme sequence input unit 13. An appropriate phoneme sequence is obtained from the included phoneme sequence at the output unit 15. The phoneme sequence obtained at the output unit 15 is supplied to the language processing unit 16, where a correct sentence is generated using language information such as syntax, meaning, and context.

【０００８】次に上記のように構成した実施例におい
て、音響処理部１１からの音素系列には誤りを含んでい
るので、この音素系列をニューラルネットワーク１２で
辞書入力部１４の音素系列で学習し、例えば音素系列が
マッチした部分が「１」となり、それ以外は「０」とな
るように、ニューラルネットワーク１２の出力部１５に
出力される。出力部１５に得られた適正な音素系列を用
いて言語処理部１６で文章を生成する。このように、両
音素系列をニューラルネットワーク１２でマッチング処
理すればその処理時間は大幅に短くなる。Next, in the embodiment configured as described above, since the phoneme sequence from the acoustic processing unit 11 contains an error, this phoneme sequence is learned by the neural network 12 by the phoneme sequence of the dictionary input unit 14. For example, it is output to the output unit 15 of the neural network 12 so that the part where the phoneme sequence matches becomes “1” and the other parts become “0”. The language processing unit 16 generates a sentence using the appropriate phoneme sequence obtained by the output unit 15. In this way, if both phoneme sequences are subjected to matching processing by the neural network 12, the processing time will be greatly shortened.

【０００９】[0009]

【発明の効果】以上述べたように、この発明によれば、
音響処理部から得られる音素系列と辞書の音素系列との
マッチング処理にニューラルネットワークを採用したの
で、マッチング処理時間の短縮化を図ることができると
ともに音響処理部での誤りの傾向に適用したマッチング
を実現できる等の利点がある。As described above, according to the present invention,
Since the neural network is used for matching processing between the phoneme sequence obtained from the acoustic processing unit and the phoneme sequence in the dictionary, the matching processing time can be shortened and the matching applied to the error tendency in the acoustic processing unit can be achieved. There are advantages such as realization.

[Brief description of drawings]

【図１】この発明の一実施例を示す構成説明図。FIG. 1 is a structural explanatory view showing an embodiment of the present invention.

[Explanation of symbols]

１１…音響処理部、１２…ニューラルネットワーク、１
３…音素系列入力部、１４…辞書入力部、１５…出力
部、１６…言語処理部。11 ... Acoustic processing unit, 12 ... Neural network, 1
3 ... Phoneme sequence input unit, 14 ... Dictionary input unit, 15 ... Output unit, 16 ... Language processing unit.

Claims

[Claims]

1. A sound processing unit converts an input speech into a phoneme sequence, and then supplies this phoneme sequence to one input unit of a neural network, and stores the registered phoneme sequence in the other input unit. A speech recognition method comprising supplying a phoneme sequence from the dictionary and performing matching processing of both phoneme sequences by a neural network, and then supplying an output unit of the neural network to a language processing unit to generate a sentence.