JPH077273B2 - Syntax processor for continuous speech recognition - Google Patents

Syntax processor for continuous speech recognition

Info

Publication number
JPH077273B2
JPH077273B2 JP1280442A JP28044289A JPH077273B2 JP H077273 B2 JPH077273 B2 JP H077273B2 JP 1280442 A JP1280442 A JP 1280442A JP 28044289 A JP28044289 A JP 28044289A JP H077273 B2 JPH077273 B2 JP H077273B2
Authority
JP
Japan
Prior art keywords
word
candidate
syntax
input
word string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP1280442A
Other languages
Japanese (ja)
Other versions
JPH03141398A (en
Inventor
泰 石川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP1280442A priority Critical patent/JPH077273B2/en
Publication of JPH03141398A publication Critical patent/JPH03141398A/en
Publication of JPH077273B2 publication Critical patent/JPH077273B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は連続音声認識装置で入力音声を音響的に処理
する音響処理装置から入力単語候補を受け構文規則を適
用して音声認識結果を得る構文処理装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Industrial field of use] The present invention receives an input word candidate from an acoustic processing device that acoustically processes an input speech by a continuous speech recognition device and applies a syntax rule to obtain a speech recognition result. Regarding a syntax processing device.

〔従来の技術〕[Conventional technology]

第7図は例えば文献(共立出版・情報科学講座E・19・
3,音声認識,139頁〜142頁,1979)に示された従来例の連
続音声認識用構文処理装置の機能ブロック図である。
Figure 7 shows, for example, literature (Kyoritsu Shuppan, Information Science Course E.19.
FIG. 3 is a functional block diagram of a conventional continuous speech recognition syntax processing device shown in 3, Speech Recognition, pages 139 to 142, 1979).

図において(20)は入力音声を音響的に処理して得られ
る入力単語候補(11)を単語列候補記憶手段(50)から
読出した単語列候補(22)に対して構文規則記憶手段
(60)から検索した構文規則(33)を適用して構文検定
をし接続条件を充足する単語列候補(22)に入力単語候
補(11)を接続して成る新たな単語列候補(22)を単語
列候補記憶手段(50)に追加記憶をさせる構文判定手
段,(40)は入力単語候補(11)の入力を全て終えると
単語列候補記憶手段(50)から総合音響距離値が最小の
単語列候補(22)を読出して音声確認結果(77)として
出力する認識結果判定手段,(50)は構文処理で生成さ
れる単語列候補(22)を記憶する単語列候補記憶手段,
(60)は構文状態遷移網で記述した構文規則(33)を記
憶する構文規則記憶手段である。
In the figure, (20) is a syntax rule storage means (60) for a word string candidate (22) obtained by reading an input word candidate (11) obtained by acoustically processing an input voice from a word string candidate storage means (50). A new word string candidate (22) is formed by connecting the input word candidate (11) to the word string candidate (22) that satisfies the connection condition by applying the syntax rule (33) searched from A syntax determination means for additionally storing in the string candidate storage means (50), and (40) is a word string having a minimum total acoustic distance value from the word string candidate storage means (50) when all input word candidates (11) have been input. A recognition result judging means for reading out the candidate (22) and outputting it as a voice confirmation result (77), (50) a word string candidate storing means for storing the word string candidate (22) generated by the syntax processing,
Reference numeral (60) is a syntax rule storage means for storing the syntax rule (33) described in the syntax state transition network.

上記従来例の連続音声認識用構文処理装置は,まず構文
判定手段(20)で第8図のように入力音声の先頭から時
間軸方向に沿って処理していく前向き法(left−to−ri
ght method)により入力音声を音響的に処理して得られ
る入力単語候補(11)の単語名wが単語列候補記憶手段
(50)から読出した単語列候補(22)の最終状態pで受
理されるか否かを構文規則記憶手段(60)から検索した
構文規則(33)を適用して構文検定をする。このとき受
理・生成される単語列候補(文章仮説)の全てについて
第9図のようにその構成する各単語の総合音響距離値
(音響距離値の総合評価値)の大きいもの即ち尤度の低
いものを棄却し,小さいもの即ち尤度の高いものだけを
その時点での最終的な単語列候補として採用し次の入力
単語候補(11)を処理していくビーム探索法(beam sea
rch method)で仮説検定をする。第9図で節点に付した
数字は尤度,丸印は検定を進める仮説,黒点印は棄却す
る仮説をそれぞれ示す。この構文検定の結果,接続条件
を充足する単語列候補(22)に入力単語候補(11)を接
続して成る新たな単語列候補(22)を単語列候補記憶手
段(50)に追加記憶をする。
In the syntax processing device for continuous speech recognition of the above-mentioned conventional example, first, the syntax determination means (20) processes forward speech (left-to-ri) from the beginning of the input speech along the time axis direction as shown in FIG.
The word name w of the input word candidate (11) obtained by acoustically processing the input speech by the ght method) is accepted in the final state p of the word string candidate (22) read from the word string candidate storage means (50). The syntax rule (33) retrieved from the syntax rule storage means (60) is applied to perform a syntax check. As shown in FIG. 9, for all of the word string candidates (sentence hypotheses) that are accepted / generated at this time, the one having a large total acoustic distance value (total evaluation value of the acoustic distance value) of each of the constituent words, that is, the likelihood is low. A beam search method (beam seam) that rejects objects and adopts only the smaller ones, that is, the ones with higher likelihood as final word string candidates at that time and processes the next input word candidates (11).
rch method) for hypothesis testing. The numbers attached to the nodes in FIG. 9 indicate the likelihood, the circles indicate the hypotheses that advance the test, and the black dots indicate the hypotheses that are rejected. As a result of this syntax test, a new word string candidate (22) formed by connecting the input word candidate (11) to the word string candidate (22) that satisfies the connection condition is additionally stored in the word string candidate storage means (50). To do.

次に発声を終り入力単語候補(11)の入力を全て終えた
とき認識結果判定手段(40)で単語列候補記憶手段(5
0)から総合音響距離値が最小の単語列候補(22)を読
出して音声認識結果(77)として出力する。
Next, when the utterance ends and the input word candidates (11) are all input, the recognition result determination means (40) causes the word string candidate storage means (5).
The word string candidate (22) having the smallest total acoustic distance value is read from 0) and output as the voice recognition result (77).

なお処理の方向として上記前向き法を用いる方式に対し
て第10図のように最初に入力単語候補(11)のいったん
全て記憶しておいた中から最も確からしい単語候補を初
期仮説としその初期仮説から文頭側および文末側に拡張
しながら時間軸の前後方向に残りの部分を処理していく
島駆動法(island−driven method)を用いる方式もあ
る。
For the method using the forward method as the processing direction, the most probable word candidate is stored as the initial hypothesis from among all the input word candidates (11) that were initially stored, as shown in Fig. 10. There is also a method of using an island-driven method in which the remaining part is processed in the front-back direction of the time axis while expanding from the beginning of the sentence to the end of the sentence.

〔発明が解決しようとする課題〕[Problems to be Solved by the Invention]

上記のような従来の連続音声認識用構文処理装置では,
前向き法を用いる方式は仮説の評価検証を時間軸方向に
処理するから,途中で正しい仮説をいったん棄却してし
まうと正しい認識結果を得られず,また発生の先頭付近
の単語候補は音響距離値が小さいとは限らず正しい認識
結果を得るためには多くの仮説をたてる必要があり,更
に文中においても棄却を少なくして文認識率を向上させ
るためには音響距離値が比較的大きい単語候補まで同様
に扱う必要があり処理量が膨大になる。
In the conventional continuous speech recognition syntax processing device as described above,
Since the method using the forward method processes the evaluation and verification of the hypothesis in the time axis direction, a correct recognition result cannot be obtained if the correct hypothesis is once rejected in the middle, and the word candidates near the beginning of the occurrence have acoustic distance values. Is not necessarily small, it is necessary to make many hypotheses in order to obtain correct recognition results. Furthermore, in order to reduce the rejection in the sentence and improve the sentence recognition rate, words with a relatively large acoustic distance value are used. It is necessary to handle the candidates in the same way, and the processing amount becomes huge.

一方,島駆動法を用いる方式は処理の初期に正しい仮説
を棄却する可能性が低く高い認識率を期待できるが,発
声文章中の一単語候補を初期仮説とすることはその対象
の単語が出現する構文的な状態を全て仮説とすることに
なるから,その検証に必要な処理量は大きくなる。また
発生文章中の全ての単語候補を検出してから初期仮説を
決定するので実時間処理ができない。
On the other hand, the method using the island drive method is unlikely to reject the correct hypothesis at the early stage of processing and can be expected to have a high recognition rate. However, if one word candidate in a spoken sentence is used as the initial hypothesis, the target word appears. Since all the syntactical states that are set are hypotheses, the amount of processing required for the verification increases. Moreover, since the initial hypothesis is determined after detecting all the word candidates in the generated sentence, real-time processing cannot be performed.

従来の技術では以上のような課題があった。The conventional techniques have the above problems.

この発明が解決しようとする課題は,この発明の構文処
理装置を用いる連続音声認識装置が従来と比べ文認識率
を低下させないで処理量を低減し,かつ実時間処理を可
能にすることである。
The problem to be solved by the present invention is to enable a continuous speech recognition apparatus using the syntax processing apparatus of the present invention to reduce the processing amount without lowering the sentence recognition rate and to enable real-time processing as compared with the prior art. .

〔課題を解決するための手段〕[Means for Solving the Problems]

上記課題を達成するためこの発明の連続音声認識用構文
処理装置は,下記の手段を含むことを特徴とするもので
ある。
In order to achieve the above object, the syntax processing device for continuous speech recognition of the present invention is characterized by including the following means.

連続音声認識装置で入力音声を音響的に処理する音響処
理装置から入力単語候補を受け構文規則を適用して音声
認識結果を得る構文処理装置において,入力音声の音響
的特徴量と標準パターンの音響的特徴量との差である入
力単語候補の音響距離値を閾値と比較する閾値判定手段
と、前記閾値判定手段が閾値以下とした出力つまり単語
候補を、構文状態遷移網形式の構文規則と後述の単語列
候補記憶手段から読み出した単語列候補への単語候補の
接続を判定する構文判定手段と、ある単語名とその前に
接続できる単語名と接続時の遷移状態で表した状態遷移
網形式で記憶する逆引き構文規則記憶手段と、前記閾値
判定手段が閾値以上で単語列候補に接続される可能性の
低い入力単語候補を記憶する単語候補記憶手段と、前記
構文判定手段が単語列候補への接続がないと判断した単
語候補と前記単語候補記憶手段からの単語候補とに前記
逆引き構文規則記憶手段中の規則を適用して単語列候補
への接続の判定を行う後向き構文判定手段と、前記構文
判定手段又は前記後向き構文判定手段が生成する単語列
候補を記憶する単語列候補記憶手段とを備え、入力音声
の音響処理終了時、総合音響距離値が最小の単語列候補
を出力するようにした。
In a syntax processing device that receives input word candidates from an acoustic processing device that acoustically processes input speech with a continuous speech recognition device and obtains a speech recognition result by applying syntax rules, acoustic features of input speech and standard pattern acoustics Threshold value determining means for comparing the acoustic distance value of the input word candidate, which is the difference with the dynamic feature amount, with the threshold value, and the output, that is, the word candidate, which is less than or equal to the threshold value by the threshold value determining means, and the syntax rule in the syntax state transition network format, which will be described later. Syntax determining means for determining connection of a word candidate to a word string candidate read out from the word string candidate storing means, and a state transition network format represented by a certain word name, a word name that can be connected before it, and a transition state at the time of connection The reverse lookup syntax rule storage means for storing the input word candidates, the word candidate storage means for storing the input word candidates whose threshold value determination means is equal to or more than the threshold value and is unlikely to be connected to the word string candidates, and the syntax determination means. A backward-looking syntax for determining connection to a word string candidate by applying a rule in the reverse-lookup syntax rule storage means to a word candidate determined to have no connection to a string candidate and a word candidate from the word candidate storage means. A word string candidate storing means for storing word string candidates generated by the syntax judging means or the backward syntax judging means, and a word string candidate having a minimum total acoustic distance value at the end of the acoustic processing of the input voice. Is output.

〔作 用〕[Work]

上記のように構成した連続音声認識用構文処理装置は,
まず閾値判定手段で入力音声を音響的に処理して得られ
る入力単語候補が音声認識結果に含まれる可能性を判断
し,可能性の低い入力単語候補はいったん記憶する。
The continuous speech recognition syntax processor configured as described above is
First, it is determined whether the input word candidate obtained by acoustically processing the input voice by the threshold value determination means is included in the voice recognition result, and the input word candidate having a low possibility is temporarily stored.

次に構文判定手段で,閾値判定手段で音声認識結果に含
まれる可能性が高いと判定された入力単語候補が単語列
候補の後に接続できるか否かを構文規則を適用すること
で判定し,接続可能な入力単語候補はその単語列候補に
接続して成る新たな単語列候補として追加記憶をする。
Next, the syntax determination unit determines whether or not the input word candidate determined to be likely to be included in the speech recognition result by the threshold value determination unit can be connected after the word string candidate by applying the syntax rule, The connectable input word candidates are additionally stored as new word string candidates formed by connecting to the word string candidates.

更に後向き構文判定手段で,構文判定手段で単語列候補
に接続できないと判定された入力単語候補が記憶してお
いた単語候補に続く単語として接続でき更にその単語候
補が単語列候補の後に接続できるか否かを逆引き構文規
則を適用することで判定し,接続可能な入力単語候補は
その単語候補と単語列候補に接続して成る新たな単語列
候補として追加記憶をする。
Further, the backward syntax determining means can connect as a word following the word candidate stored in the input word candidate determined to be unable to connect to the word string candidate by the syntax determining means, and the word candidate can be connected after the word string candidate. Whether or not it is determined by applying the reverse-lookup syntax rule, and the connectable input word candidates are additionally stored as new word string candidates formed by connecting the word candidates and the word string candidates.

最後に認識結果判定手段で音響処理終了信号が入力され
ると上記構文処理で生成された単語列候補の終端位置が
入力音声の終端位置付近にあり総合音響距離値が最小の
ものを音声認識結果として出力する。
Finally, when the acoustic processing end signal is input by the recognition result determination means, the end position of the word string candidate generated by the above-described syntax processing is near the end position of the input voice and the total acoustic distance value is the minimum. Output as.

〔実施例〕〔Example〕

第1図はこの発明の一実施例を示す連続音声認識用構文
処理装置の機能ブロック図である。
FIG. 1 is a functional block diagram of a syntax processing device for continuous speech recognition showing an embodiment of the present invention.

図において(1)は入力音声を音響的に処理して得られ
る入力単語候補(11)の音響距離値の閾値判定をし,そ
の入力単語候補(11)を構文判定させるか,あるいは単
語候補としていったん記憶させるかを判断する閾値判定
手段,(2)はこの閾値判定手段からの入力単語候補
(11)の始端位置が単語列候補記憶手段(5)から読出
した単語列候補(22)の終端位置と隣接し,更にその単
語列候補(22)の最終状態pで入力単語候補(11)の単
語名wが受理されることを示す構文規則(33)が構文規
則記憶手段(6)から検索できるとき,その単語列候補
(22)と入力単語候補(11)を接続して成る新たな単語
列候補の総合音響距離値の閾値判定をし,新たな単語列
候補(22)として追加記憶をさせるか,あるいは接続で
きる単語列候補がないとして入力単語候補(11)を更に
後向き構文判定させるかを判断する構文判定手段,
(3)はこの構文判定手段からの入力単語候補(11)の
始端位置が単語候補記憶手段(8)から読出した単語候
補(55)の終端位置と隣接し,かつその単語候補(55)
の始端位置が単語列候補記憶手段(5)から読出した単
語列候補(22)の終端位置と隣接し,更にその単語列候
補(22)の最終状態sで単語候補(55)の単語名w′と
その後の入力単語候補(11)の単語名wが接続されるこ
とを示す逆引き構文規則(44)を逆引き構文規則記憶手
段(7)から検索できるとき,その単語列候補(22)と
単語候補(55)と入力単語候補(11)を接続して成る新
たな単語列候補の総合音響距離値の閾値判定をし,新た
な単語列候補(22)として追加記憶をさせるか否かを判
定する後向き構文判定手段,(4)は全ての入力単語候
補(11)の入力を終えて音響処理終了信号(66)が入力
されると上記構文処理で生成された単語列候補の終端位
置が入力音声の終端位置付近にあり総合音響距離値が最
小のものを単語列接続記憶手段(5)から読出し音声認
識結果(77)として出力する認識結果判定手段,(5)
は構文処理で生成された単語列候補(22)を第2図のよ
うに単語列中の各単語のアドレス,単語名,単語音響距
離値,単語列総合音響距離値,始端位置,終端位置,文
頭からの単語数,状態および各単語の前単語が記憶され
ているアドレスの構成で記憶する単語列候補記憶手段,
(6)はある状態pとその状態から受理できる単語名w
とその単語を受理したときに遷移する状態qとを第3図
のように構文状態遷移網で記述した構文規則(33)を予
め記憶し,第4図のようにある状態piと単語名wijをキ
ーとして先頭の2ワードとの比較で遷移状態がqijであ
ることを検索できるように構成された構文規則記憶手
段,(7)はある単語名wとそ単語の前に接続できる単
語名w′と接続したときに遷移する状態s,p,qとを第5
図のように構文状態遷移網で記述した逆引き構文規則
(44)を予め記憶し,第6図のようにある単語名wiの前
には単語名w′ijが接続できそれによる遷移状態がsij,
pij,qijであることを時間軸の逆方向に遡って検索でき
るように構成された逆引き構文規則記憶手段,(8)は
単語列候補(22)に属する可能性の低い単語候補(11)
を記憶する単語候補記憶手段である。
In the figure, (1) is a threshold judgment of the acoustic distance value of the input word candidate (11) obtained by acoustically processing the input speech, and the input word candidate (11) is judged as a syntax or as a word candidate. Threshold decision means for deciding whether to store once, (2) is the end of the word string candidate (22) whose starting end position of the input word candidate (11) from this threshold decision means is read from the word string candidate storage means (5) A syntax rule (33) indicating that the word name w of the input word candidate (11) is accepted in the final state p of the word string candidate (22) adjacent to the position is searched from the syntax rule storage means (6). When possible, a threshold judgment of the total acoustic distance value of the new word string candidate (22) and the input word candidate (11) is performed, and additional memory is stored as the new word string candidate (22). Or there is no word string candidate that can be connected The syntax determining means for determining further whether to backward syntax determined input word candidates (11),
In (3), the start position of the input word candidate (11) from the syntax determining means is adjacent to the end position of the word candidate (55) read from the word candidate storing means (8), and the word candidate (55).
Is adjacent to the end position of the word string candidate (22) read from the word string candidate storage means (5), and the word name w of the word candidate (55) is in the final state s of the word string candidate (22). 'And the subsequent input word candidate (11) is connected to the word name w, the reverse lookup syntax rule (44) can be retrieved from the reverse lookup syntax rule storage means (7), and the word string candidate (22) Whether or not to perform a threshold judgment of the total acoustic distance value of a new word string candidate formed by connecting the word candidate (55) and the input word candidate (11) and to additionally store it as a new word string candidate (22) The backward-syntax determining means (4) determines the end position of the word string candidate generated by the above-described syntax processing when all the input word candidates (11) are input and the acoustic processing end signal (66) is input. Is stored near the end position of the input speech and the total acoustic distance value is the minimum Recognition result judging means for outputting the read speech recognition result (77) from the means (5), (5)
Shows the word string candidates (22) generated by the syntactic process as shown in FIG. 2 for each word address, word name, word acoustic distance value, word string total acoustic distance value, start position, end position, Word string candidate storage means for storing the number of words from the beginning of the sentence, the state, and the configuration of the address in which the preceding word of each word is stored,
(6) is a state p and a word name w that can be accepted from that state
And the state q to which the word transits when the word is accepted are pre-stored in the syntactic rule (33) described in the syntactic state transition network as shown in FIG. 3, and the state pi and the word name wij as shown in FIG. Is used as a key, the syntax rule storage means is configured to search for the transition state qij by comparing with the first two words, (7) is a word name w and a word name w that can be connected before the word name w The states s, p, and q that transit when connected to
As shown in the figure, the reverse lookup syntax rule (44) described in the syntax state transition network is stored in advance, and the word name w'ij can be connected in front of a certain word name wi as shown in FIG. sij,
Reverse lookup syntax rule storage means configured to be able to search pij and qij backward in the direction opposite to the time axis, and (8) is a word candidate (11) that is unlikely to belong to a word string candidate (22).
Is a word candidate storage means for storing.

上記実施例の連続音声認識用構文処理装置は,まず連続
音声認識装置で入力音声を音響的に処理する音響処理装
置が生成し単語名と始端・終端位置と音響距離値の各情
報を含む入力単語候補(11)を受ける。
In the continuous speech recognition syntax processing apparatus of the above-described embodiment, first, an input that includes a word name, a start / end position, and acoustic distance value generated by an acoustic processing apparatus that acoustically processes an input speech by the continuous speech recognition apparatus. Receive word candidates (11).

次に閾値判定手段(1)で入力単語候補(11)の音響距
離値が閾値以下のときはその入力単語候補(11)が音声
認識結果(77)に含まれる可能性が高いとして構文判定
手段(2)に入力単語候補(11)を送信する。一方閾値
を越えるときは同様の可能性が低いとして単語候補記憶
手段(8)に入力単語候補(11)のいったん記憶する。
また構文判定手段(2)で入力単語候補(11)の始端位
置が単語列候補記憶手段(5)から読出した単語列候補
(22)の終端位置と隣接し,更にその単語列候補(22)
の最終状態piで入力単語候補(11)の単語名wijが受理
されることを示す構文規則(33)が構文規則記憶手段
(6)から検索できるとき,その単語列候補(22)と入
力単語候補(11)を接続して成る新たな単語列候補の総
合音響距離値が閾値以下のときは入力単語候補(11)が
読出した単語列候補(22)に接続可能と判定し,単語列
候補記憶手段(5)に新たな単語列候補(22)として追
加記憶をする。一方閾値を越えて入力単語候補(11)を
接続できる単語列候補が存在しないときは後向き構文判
定手段(3)に入力単語候補(11)を送信する。
Next, when the acoustic distance value of the input word candidate (11) is less than or equal to the threshold value by the threshold value judging means (1), it is considered that the input word candidate (11) is likely to be included in the speech recognition result (77), and the syntax judging means. The input word candidate (11) is transmitted to (2). On the other hand, if the threshold value is exceeded, it is considered that the similar possibility is low, and the input word candidate (11) is temporarily stored in the word candidate storage means (8).
Further, the start position of the input word candidate (11) in the syntax judging means (2) is adjacent to the end position of the word string candidate (22) read from the word string candidate storing means (5), and further the word string candidate (22).
When the syntax rule (33) indicating that the word name wij of the input word candidate (11) is accepted in the final state pi of the can be retrieved from the syntax rule storage means (6), the word string candidate (22) and the input word When the total acoustic distance value of a new word string candidate formed by connecting the candidates (11) is less than or equal to the threshold value, it is determined that the input word candidate (11) can be connected to the read word string candidate (22), and the word string candidate It is additionally stored in the storage means (5) as a new word string candidate (22). On the other hand, when there is no word string candidate that can connect the input word candidate (11) beyond the threshold value, the input word candidate (11) is transmitted to the backward syntax determining means (3).

更に後向き構文判定手段(3)で入力単語候補(11)の
始端位置が単語候補記憶手段(8)から読出した単語候
補(55)の終端位置と隣接し,かつその単語候補(55)
の始端位置が単語列候補記憶手段(5)から読出した単
語列候補(22)の終端位置と隣接し,更にその単語列候
補(22)の最終状態sijで単語候補(55)の単語名w′i
jとその前の入力単語候補(11)の単語名wiが接続され
ることを示す逆引き構文規則(44)が逆引き構文規則記
憶手段(7)から検索できるとき,その単語列候補(2
2)と単語候補(55)と入力単語候補(11)を接続して
成る新たな単語列候補の総合音響距離値が閾値以下のと
きは入力単語候補(11)が読出した単語候補(55)と単
語列候補(22)に接続可能と判定し,単語列候補記憶手
段(5)に新たな単語列候補(22)として追加記憶をす
る。
Further, the starting position of the input word candidate (11) in the backward syntax determining means (3) is adjacent to the ending position of the word candidate (55) read from the word candidate storing means (8), and the word candidate (55).
Is adjacent to the end position of the word string candidate (22) read from the word string candidate storage means (5), and the word name w of the word candidate (55) is in the final state sij of the word string candidate (22). ′ I
When a reverse lookup syntax rule (44) indicating that j and the word name wi of the preceding input word candidate (11) are connected can be retrieved from the reverse lookup syntax rule storage means (7), the word string candidate (2
2) The word candidate (55) and the input word candidate (11) are connected, and when the total acoustic distance value of the new word string candidate is less than or equal to the threshold value, the word candidate (55) read by the input word candidate (11). Then, it is determined that the word string candidate (22) can be connected, and additionally stored in the word string candidate storage means (5) as a new word string candidate (22).

全ての入力単語候補(11)に対して上記の閾値判定,構
文判定および後向き構文判定処理をし全ての入力を終え
ると音響処理装置から音響処理終了信号(66)を受け
る。
When all the input word candidates (11) have been subjected to the above threshold value judgment, syntax judgment and backward syntax judgment processing and all input has been completed, a sound processing end signal (66) is received from the sound processing device.

最後に認識結果判定手段(4)で単語列候補記憶手段
(5)から上記処理で生成された単語列候補(22)の終
端位置が入力音声の終端位置付近にあり総合音響距離値
が最小のものを読出し音声認識結果(77)として出力す
る。
Finally, the end position of the word string candidate (22) generated by the above processing from the word string candidate storage means (5) in the recognition result judging means (4) is near the end position of the input voice, and the total acoustic distance value is the smallest. The object is read and output as the speech recognition result (77).

〔発明の効果〕〔The invention's effect〕

この発明は以上説明したように構成されており,従来の
ように全ての入力単語候補に対して構文判定をしないで
音響距離値が小さく音声認識結果に含まれる可能性の高
い入力単語候補だけ構文判定をし可能性の低いものはい
ったん記憶しておいて,音響距離値が小さくても単語列
候補に接続できる可能性の低いときは記憶しておいた単
語候補に戻って構文判定をするようにしたから,従来の
前向き法を用いる方式の問題点であった膨大な処理量を
正しい仮説の棄却をしないで低減できる。また一部後向
きの処理はあっても基本的には時間軸方向に処理してい
るようにしたから,従来の島駆動法を用いる方式の問題
点であった発生終了後でないと処理を開始できないこと
も解消できる。
The present invention is configured as described above, and does not perform the syntax judgment on all the input word candidates as in the prior art, but only the input word candidates that have a small acoustic distance value and are likely to be included in the speech recognition result. If it is not possible to make a decision, remember the ones that are unlikely to be stored once, and if the acoustic distance value is small and it is unlikely that you can connect to a word string candidate, go back to the stored word candidates and make a syntax decision. Therefore, the enormous amount of processing, which was a problem in the conventional method using the forward method, can be reduced without rejecting the correct hypothesis. In addition, even though there is some backward processing, processing is basically performed in the time axis direction, so processing can only be started after the occurrence of the problem, which was a problem with the conventional island driving method. It can also be resolved.

従って,この発明の構文処理装置を用いる連続音声認識
装置は従来装置に比べ文認識率を低下させないで処理量
を低減し,かつ実時間処理を可能にする効果がある。
Therefore, the continuous speech recognition apparatus using the syntax processing apparatus of the present invention has an effect of reducing the processing amount without lowering the sentence recognition rate and enabling real-time processing as compared with the conventional apparatus.

【図面の簡単な説明】[Brief description of drawings]

第1図はこの発明の一実施例を示す連続音声認識用構文
処理装置の機能ブロック図,第2図は第1図の単語列候
補記憶手段の記憶構成例を示す図,第3図と第4図は第
1図の構文規則記憶手段の構文規則を記述する構文状態
遷移網とその記憶構成例を示す図,第5図と第6図は第
1図の逆引き構文規則記憶手段の逆引き構文規則を記述
する構文状態遷移網とその記憶構成例を示す図,第7図
は従来例を示す連続音声認識用構文処理装置の機能ブロ
ック図,第8図と第9図と第10図は第7図の従来例で前
向き法とビーム探索法と島駆動法を用いる方式を示す図
である。 図において(1)は閾値判定手段,(2)は構文判定手
段,(3)は後向き構文判定手段,(4)は認識結果判
定手段,(5)は単語列候補記憶手段,(6)は構文規
則記憶手段,(7)は逆引き構文規則記憶手段,(8)
は単語候補記憶手段,(11)は入力単語候補,(22)は
単語列候補,(33)は構文規則,(44)は逆引き構文規
則,(55)は単語候補,(66)は音響処理終了信号,
(77)は音声認識結果。 なお,図中,同一符号は同一又は相当部分を示す。
FIG. 1 is a functional block diagram of a syntax processing device for continuous speech recognition showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of a memory configuration of a word string candidate storage means of FIG. 1, FIG. 3 and FIG. FIG. 4 is a diagram showing a syntax state transition network describing the syntax rules of the syntax rule storage means of FIG. 1 and an example of its storage configuration, and FIGS. 5 and 6 are the reverse of the reverse lookup syntax rule storage means of FIG. FIG. 7 is a diagram showing a syntax state transition network for describing a pull syntax rule and its storage configuration example. FIG. 7 is a functional block diagram of a syntax processing device for continuous speech recognition showing a conventional example, FIG. 8, FIG. 9, FIG. FIG. 8 is a diagram showing a method using a forward method, a beam search method, and an island driving method in the conventional example of FIG. 7. In the figure, (1) is a threshold judgment means, (2) is a syntax judgment means, (3) is a backward syntax judgment means, (4) is a recognition result judgment means, (5) is a word string candidate storage means, and (6) is Syntax rule storage means, (7) is reverse lookup syntax rule storage means, (8)
Is a word candidate storage means, (11) is an input word candidate, (22) is a word string candidate, (33) is a syntax rule, (44) is a reverse syntax rule, (55) is a word candidate, and (66) is a sound. Processing end signal,
(77) is the voice recognition result. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】入力音声の音響的特徴量と標準パターンの
音響的特徴量との差である入力単語候補の音響距離値を
閾値と比較する閾値判定手段と、前記閾値判定手段が閾
値以下とした単語候補を構文状態遷移網形式の構文規則
と後述の単語列候補記憶手段から読み出した単語列候補
への単語候補の接続を判定する構文判定手段と、ある単
語名とその前に接続できる単語名と接続時の遷移状態で
表した状態遷移網形式で記憶する逆引き構文規則記憶手
段と、前記閾値判定手段が閾値以上で単語列候補に接続
される可能性の低い入力単語候補を記憶する単語候補記
憶手段と、前記構文判定手段が単語列候補への接続がな
いとした単語候補と前記単語候補記憶手段からの単語と
に前記逆引き構文規則記憶手段中の規則を適用して単語
列候補への接続の判定を行う後向き構文判定手段と、前
記構文判定手段又は前記後向き構文判定手段が生成する
単語列候補を記憶する単語列候補記憶手段とを備え、入
力音声の音響処理終了時、総合音響距離値が最小の単語
列候補を出力する連続音声認識用構文処理装置。
1. A threshold determination means for comparing an acoustic distance value of an input word candidate, which is a difference between an acoustic feature amount of an input voice and an acoustic feature amount of a standard pattern, with a threshold value; A syntax determination means for determining the connection of the word candidate to the word string candidate read out from the word string candidate storage means, which will be described later, and a word that can be connected before a certain word name A reverse lookup syntax rule storage unit that stores a name and a transition state at the time of connection in a state transition network format, and an input word candidate that is less than the threshold value determination unit and is unlikely to be connected to a word string candidate. A word string by applying the rule in the reverse-lookup syntax rule storage means to the word candidate storage means, the word candidate determined by the syntax determination means to have no connection to the word string candidate, and the word from the word candidate storage means. Connection to the candidate And a word sequence candidate storage unit that stores a word sequence candidate generated by the syntax determination unit or the backward syntax determination unit, and when the acoustic processing of the input voice is completed, the total acoustic distance value is A syntax processing device for continuous speech recognition, which outputs the smallest word string candidate.
JP1280442A 1989-10-27 1989-10-27 Syntax processor for continuous speech recognition Expired - Fee Related JPH077273B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1280442A JPH077273B2 (en) 1989-10-27 1989-10-27 Syntax processor for continuous speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1280442A JPH077273B2 (en) 1989-10-27 1989-10-27 Syntax processor for continuous speech recognition

Publications (2)

Publication Number Publication Date
JPH03141398A JPH03141398A (en) 1991-06-17
JPH077273B2 true JPH077273B2 (en) 1995-01-30

Family

ID=17625115

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1280442A Expired - Fee Related JPH077273B2 (en) 1989-10-27 1989-10-27 Syntax processor for continuous speech recognition

Country Status (1)

Country Link
JP (1) JPH077273B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000010160A1 (en) * 1998-08-17 2000-02-24 Sony Corporation Speech recognizing device and method, navigation device, portable telephone, and information processor

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3460723B2 (en) * 1992-05-19 2003-10-27 富士通株式会社 Voice recognition method
JP4104313B2 (en) 2001-10-03 2008-06-18 株式会社デンソー Voice recognition device, program, and navigation system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000010160A1 (en) * 1998-08-17 2000-02-24 Sony Corporation Speech recognizing device and method, navigation device, portable telephone, and information processor

Also Published As

Publication number Publication date
JPH03141398A (en) 1991-06-17

Similar Documents

Publication Publication Date Title
US6535850B1 (en) Smart training and smart scoring in SD speech recognition system with user defined vocabulary
KR101056511B1 (en) Speech Segment Detection and Continuous Speech Recognition System in Noisy Environment Using Real-Time Call Command Recognition
CN1321401C (en) Speech recognition apparatus, speech recognition method, conversation control apparatus, conversation control method
JP4237713B2 (en) Audio processing device
US6134527A (en) Method of testing a vocabulary word being enrolled in a speech recognition system
CN104978963A (en) Speech recognition apparatus, method and electronic equipment
JP2000122691A (en) Automatic recognizing method for spelling reading type speech speaking
US6662159B2 (en) Recognizing speech data using a state transition model
JPS5991500A (en) Voice analyzer
JPH07334184A (en) Calculating device for acoustic category mean value and adapting device therefor
US20050071161A1 (en) Speech recognition method having relatively higher availability and correctiveness
EP0614169B1 (en) Voice signal processing device
CN114155839A (en) Voice endpoint detection method, device, equipment and storage medium
JPH077273B2 (en) Syntax processor for continuous speech recognition
CN112863496B (en) Voice endpoint detection method and device
JP4661239B2 (en) Voice dialogue apparatus and voice dialogue method
JPH10173769A (en) Voice message retrieval device
KR20210052563A (en) Method and apparatus for providing context-based voice recognition service
JP2000214879A (en) Adaptation method for voice recognition device
JP3357752B2 (en) Pattern matching device
JP2000259177A (en) Voice outputting device
JP7035476B2 (en) Speech processing program, speech processor, and speech processing method
JP2757356B2 (en) Word speech recognition method and apparatus
JPH0997095A (en) Speech recognition device
JP4060015B2 (en) Voice recognition apparatus, voice recognition method, and recording medium on which voice recognition program is recorded

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees