JPH0883091A - Voice recognition device - Google Patents

Voice recognition device

Info

Publication number
JPH0883091A
JPH0883091A JP6215958A JP21595894A JPH0883091A JP H0883091 A JPH0883091 A JP H0883091A JP 6215958 A JP6215958 A JP 6215958A JP 21595894 A JP21595894 A JP 21595894A JP H0883091 A JPH0883091 A JP H0883091A
Authority
JP
Japan
Prior art keywords
pattern
recognition
voice
input
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP6215958A
Other languages
Japanese (ja)
Inventor
Kenji Mizutani
研治 水谷
Makoto Hirai
誠 平井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP6215958A priority Critical patent/JPH0883091A/en
Publication of JPH0883091A publication Critical patent/JPH0883091A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE: To provide a voice recognition device having a high recognition rate. CONSTITUTION: The device utilizes a voice inputting device 101 which converts a voice into electrical signals as inputs and a voice signal recording device 102 which records the electrical signals outputted by the device 101 as input patterns. A collating pattern generating device 103 is used to generate voice standard patterns to recognize the input patterns. In order to surely determine the end of a voice of the input pattern held by the device 102, a recognition segment determining device 104 applies a word spotting system. The input patterns between the previous recognition end point and the end point determined by the device 104 and the standard patterns are collated and the degree of coincidence is computed. A voice signal collating device 105 arranges the patterns in the order of higher degree of coincidence and outputs them. To avoid misrecongnition, the pattern recognition results are verified using the matching to the context and the order of recognition candidates are charged. Then, a context control device 106 is used to predict the next input voice and to control the standard patterns generated by the device 103.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は人間の音声を認識する装
置に関するものであり、特に電子機器の入力装置に関す
るものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for recognizing human voice, and more particularly to an input device for electronic equipment.

【0002】[0002]

【従来の技術】近年、電子機器の操作を迅速かつ容易に
するために音声認識に関する研究開発がなされている。
入力される音声は電気信号に変換され、時間軸に対する
電圧の変化として扱われる。音声の認識は、一般にその
入力パターンとあらかじめ統計的に学習されている音声
の標準パターンとの差異を、時間的伸縮を考慮しながら
比較し、最も差異が小さい標準パターンを選択すること
によって行われる(例えば、特開平4−36269
8)。この方式では認識精度を向上させるために、標準
パターンに無音パターンを接続してから入力パターンと
の比較を行っている。
2. Description of the Related Art In recent years, research and development relating to voice recognition have been carried out in order to quickly and easily operate electronic devices.
The input voice is converted into an electric signal and treated as a change in voltage with respect to the time axis. Speech recognition is generally performed by comparing the difference between the input pattern and the standard pattern of speech that has been statistically learned beforehand in consideration of temporal expansion and contraction, and selecting the standard pattern with the smallest difference. (For example, Japanese Patent Laid-Open No. 4-36269
8). In this method, in order to improve recognition accuracy, a silent pattern is connected to a standard pattern and then compared with an input pattern.

【0003】[0003]

【発明が解決しようとする課題】前述の認識精度向上技
術には2つの問題がある。第1の問題は、入力パターン
と標準パターンの比較をワードスポッティング方式で行
っているために、標準パターンの長さと数に比例して計
算量が爆発的に多くなることである。第2の問題は促音
を含む入力パターンに対しては、促音と無音とを誤認識
して認識率が逆に低下する可能性があることである。
The above-mentioned recognition accuracy improving technique has two problems. The first problem is that since the comparison between the input pattern and the standard pattern is performed by the word spotting method, the calculation amount explosively increases in proportion to the length and the number of standard patterns. The second problem is that, for an input pattern including a consonant, the consonant and the silence may be erroneously recognized, and the recognition rate may be decreased.

【0004】本発明では、音声が無音を伴って終了する
ときの、文法によって支配される音韻的特徴に着目し、
ワードスポッティング方式と、認識の開始点と終了点を
あらかじめ確定してから認識を行う従来の音声認識方式
を組み合わせることにより、上記の問題を解決する。
The present invention focuses on the phonological features governed by grammar when speech ends with silence.
The above problem is solved by combining the word spotting method and the conventional voice recognition method in which the start point and the end point of the recognition are previously determined and then the recognition is performed.

【0005】[0005]

【課題を解決するための手段】音声を入力として電気信
号に変換する音声入力装置と、音声入力装置が出力する
電気信号を入力パターンとして記録する音声信号記録装
置を用意する。入力パターンを認識するための音声の標
準パターンを生成するために、照合パターン生成装置を
用いる。音声信号記録装置が保持する入力パターン上の
音声の終端を確定するために認識区間確定装置を用い
る。そして、前回の認識終了点から認識区間確定装置が
確定した終了点までの入力パターンと標準パターンとを
照合して一致度を計算し、一致度の高い順に標準パター
ンを並べて出力する音声信号照合装置を用いる。誤認識
を防ぐために、パターン認識結果の文脈との整合性を検
証して認識候補の順位を入れ換え、かつ、次に入力され
る音声を予測して照合パターン生成装置が生成する標準
パターンを制御するために文脈管理装置を用いる。
A voice input device for converting a voice into an electric signal as an input and a voice signal recording device for recording an electric signal output by the voice input device as an input pattern are prepared. A matching pattern generation device is used to generate a standard voice pattern for recognizing an input pattern. The recognition section determining device is used to determine the end of the voice on the input pattern held by the voice signal recording device. Then, a voice signal collating device that collates the input pattern from the last recognition end point to the end point confirmed by the recognition section confirming device with the standard pattern to calculate the degree of coincidence, and arranges and outputs the standard patterns in descending order of the degree of coincidence. To use. In order to prevent erroneous recognition, the consistency of the pattern recognition result with the context is verified, the order of recognition candidates is changed, and the next input voice is predicted to control the standard pattern generated by the matching pattern generation device. A context management device is used for this.

【0006】音声信号記録装置は、音声入力装置が出力
する音声の電気信号の情報量を圧縮する情報圧縮装置
と、その出力を記録する入力パターン記録装置で構成す
る。
The audio signal recording device is composed of an information compression device for compressing the information amount of the electric signal of the audio output from the audio input device and an input pattern recording device for recording the output.

【0007】照合パターン生成装置は、認識する音声を
構成する単語とその発音を音素表記で記述した語彙情報
格納装置と、単語と発音についてその接続規則を記述し
た文法情報格納装置と、音素のモデルを保持する音素モ
デル格納装置と、無音のモデルを格納する無音モデル格
納装置と、それらが出力する情報を参照して文脈管理装
置が指示する標準パターンを生成する標準パターン生成
装置で構成する。
The collation pattern generation device is a vocabulary information storage device that describes the words that make up the recognized speech and their pronunciations in phoneme notation, a grammatical information storage device that describes the connection rules for words and pronunciations, and a phoneme model. , A phoneme model storage device that holds, a silence model storage device that stores a silence model, and a standard pattern generation device that generates a standard pattern instructed by the context management device by referring to the information output by them.

【0008】認識区間確定装置は、標準パターンの終端
近傍のパターンを生成する終端パターン生成装置と、入
力パターンの中に終端近傍のパターンの存在を認識する
ワードスポッティング装置で構成する。
The recognition section determining device is composed of a terminal pattern generating device for generating a pattern near the terminal end of the standard pattern and a word spotting device for recognizing the existence of the pattern near the terminal end in the input pattern.

【0009】音声信号照合装置は、ワードスポッティン
グ装置が認識した終端近傍のパターンを含む標準パター
ンだけを選択する照合パターン絞り込み装置と、前回の
認識終了点から今回の認識終了点までの入力パターンを
標準パターンと照合して一致度の高いものから順に並べ
て出力するパターン照合装置で構成する。
The voice signal collating device selects a standard pattern including only the pattern near the end recognized by the word spotting device, and a standard input pattern from the last recognition end point to the current recognition end point. It is configured by a pattern matching device that matches patterns and outputs them in order from the one having the highest degree of matching.

【0010】文脈管理装置は、入力された音声の認識を
開始する時点までの認識結果を記録する発話履歴管理装
置と、音声が発せられる世界に関する知識を格納する対
象世界知識格納装置と、それらに整合する認識候補ほど
高い妥当性を与えて、認識結果の並びを変え、発話履歴
管理装置の内容を更新する認識結果修正装置と、認識履
歴格納装置の内容と対象世界情報格納装置の内容を参照
して次発話を予測するための認識候補生成装置で構成す
る。
The context management device includes an utterance history management device that records the recognition result up to the time when the recognition of the input voice is started, a target world knowledge storage device that stores knowledge about the world in which the voice is emitted, and Refer to the recognition result correction device that changes the arrangement of the recognition results and updates the contents of the utterance history management device, and the contents of the recognition history storage device and the contents of the target world information storage device, giving higher validity to matching recognition candidates. Then, the recognition candidate generating apparatus for predicting the next utterance is used.

【0011】[0011]

【作用】ワードスポッティング方式による音声認識は、
短い音韻列について行っているので、その数が増加して
も計算量の増加は小さい。また、促音を含む音声につい
ては、短い音韻と無音パターンを接続し、認識範囲を確
定してから単語全体の音声認識を行うので、促音を認識
の終了点として誤認識する率が低下する。
[Operation] Speech recognition by word spotting
Since it is performed for a short phoneme sequence, the increase in the amount of calculation is small even if the number is increased. In addition, for a voice including a consonant, short phonemes and silent patterns are connected, and the recognition range is determined, and then the speech recognition of the entire word is performed. Therefore, the rate of erroneous recognition of the consonant as the end point of the recognition decreases.

【0012】[0012]

【実施例】本発明の一実施例の音声認識装置の全体の構
成を表すブロック図を図1に示す。音声入力装置101
は、音声を収音し電気信号に変換する。音声信号記録装
置102は、電気信号に変換された音声を記録する。照
合パターン生成装置103は入力パターンと照合すべき
音声の標準パターンを出力する。認識区間確定装置10
4は標準パターンと照合すべき入力パターンの範囲を確
定する。音声信号照合装置105は前回の認識終了点か
ら認識区間確定装置104が検出した終了点までの入力
パターンを標準パターンと照合し、認識結果を出力す
る。文脈管理装置106は得られた認識結果の、これま
での認識結果が作ってきた文脈との整合性を検証し、認
識結果の順位を入れ換え、さらに次発話を予測して照合
パターン生成装置105が生成すべき標準パターンに関
する情報を出力する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the overall configuration of a voice recognition device according to an embodiment of the present invention. Voice input device 101
Collects voice and converts it into an electrical signal. The audio signal recording device 102 records the audio converted into an electric signal. The matching pattern generation device 103 outputs a standard pattern of voice to be matched with the input pattern. Recognition section determination device 10
4 determines the range of the input pattern to be matched with the standard pattern. The voice signal matching device 105 matches the input pattern from the last recognition end point to the end point detected by the recognition section determination device 104 with the standard pattern, and outputs the recognition result. The context management device 106 verifies the consistency of the obtained recognition result with the context created by the previous recognition result, exchanges the order of the recognition result, further predicts the next utterance, and the matching pattern generation device 105 Prints information about standard patterns to generate.

【0013】[0013]

【表1】 [Table 1]

【0014】次に本発明の動作例を、(表1)に示す会
社の社内案内の音声対話システムに応用した例をあげて
説明する。ユーザが「営業部の、八田さんは、どちらで
しょうか。」と発声すると、音声入力装置101はそれ
を電気信号に変換し、音声信号記録装置102はその波
形を有限時間分記録する。図2に音声信号記録装置10
2の内部構成を表すブロック図を示す。音声入力装置1
01によって電気信号に変換された音声は情報圧縮装置
201に入力され、記録容量を削減するために情報量が
圧縮される。情報量の圧縮には入力された音声を人間が
聞いて判別できる程度に復元可能な近似方法を用いる。
圧縮された音声信号は入力パターンとして情報記録装置
202に記録される。
Next, an operation example of the present invention will be described with reference to an example in which it is applied to a company in-company guidance voice dialogue system shown in (Table 1). When the user utters "Which is Mr. Hatta of the sales department?", The voice input device 101 converts it into an electric signal, and the voice signal recording device 102 records the waveform for a finite time. The audio signal recording device 10 is shown in FIG.
2 is a block diagram showing the internal configuration of FIG. Voice input device 1
The sound converted into an electric signal by 01 is input to the information compression device 201, and the information amount is compressed in order to reduce the recording capacity. To compress the amount of information, an approximation method is used that can restore the input voice to the extent that it can be discriminated by human hearing.
The compressed audio signal is recorded in the information recording device 202 as an input pattern.

【0015】照合パターン生成装置103は、文脈管理
装置106の指示により、あらかじめ認識すべき候補の
標準パターンを生成している。対話の開始時点ではユー
ザは部署と名前を発声することが予測されるので、部署
については「社長室の」「営業部の」「経理部の」、人
名については「川田さんを」などを生成している。図3
に照合パターン生成装置103の内部構成を表すブロッ
ク図を示す。語彙情報格納装置301は、ユーザが発話
すると想定される語彙とその発音の音素表記を格納す
る。社内案内の場合では、名詞として部署名、姓、名、
役職、性別を、助詞として「が」「の」「を」「は」
「には」を、動詞句として「お願いします」「いらっし
ゃいますか」「どちらでしょうか」などを用意し、それ
ぞれの音素表記を記述する。文法情報格納装置302
は、各単語の接続規則と音素の接続規則を格納する。音
素モデル格納装置303は音素の音韻モデルを格納す
る。音韻のモデルとしては隠れマルコフモデルを用い
る。無音モデル格納装置304は無音の音韻モデルを格
納する。標準パターン生成装置305は文脈管理装置1
06によって指定される認識候補を語彙情報格納装置3
01、文法情報格納装置302、無音モデル格納装置3
03、音素モデル格納装置304を参照して、無音モデ
ルを音韻モデル列の前後に接続した標準パターンを生成
する。
The collation pattern generation device 103 generates a standard pattern of candidates to be recognized in advance according to an instruction from the context management device 106. At the beginning of the dialogue, it is expected that the user will say the name of the department. Therefore, for the department, "President's office", "Sales department", "Accounting department", etc. are doing. FIG.
A block diagram showing the internal configuration of the matching pattern generation device 103 is shown in FIG. The vocabulary information storage device 301 stores the vocabulary assumed to be spoken by the user and the phoneme notation of its pronunciation. In the case of company information, nouns such as department name, surname, first name,
Title, gender, particle as “ga” “no” “o” “ha”
Prepare "ni" and "please,""do you come,""why?" Grammar information storage device 302
Stores the connection rules for each word and the phoneme connection rules. The phoneme model storage device 303 stores a phoneme model of a phoneme. Hidden Markov models are used as phonological models. The silent model storage device 304 stores silent phoneme models. The standard pattern generation device 305 is the context management device 1
The recognition candidate designated by 06 is used as the vocabulary information storage device 3.
01, grammar information storage device 302, silence model storage device 3
03, referring to the phoneme model storage device 304, a standard pattern in which silent models are connected before and after the phoneme model sequence is generated.

【0016】認識区間確定装置104は、音声信号記録
装置102が保持する入力パターン「(無音)えいぎょ
うぶの(無音)はっ(無音)たさんわ(無音)どちらで
しょうか(無音)」の中から、音声信号照合装置105
が認識すべき区間を確定する。図4に認識区間確定装置
104の内部構成を表すブロック図を示す。照合パター
ン生成装置103によって生成された標準パターンはパ
ターン生成装置401に入力され、その終端近傍のパタ
ーンが生成される。終端近傍のパターンとしては、助詞
の音韻モデルと無音のモデルを接続した部分を選択す
る。ワードスポッティング装置402は、生成された終
端近傍のパターンが入力パターンの各時点に存在する確
率を計算し、音声信号照合装置105が認識すべき音声
の終了点を確定する。この例では、ワードスポッティン
グ装置402は「の」を終端に持つ標準パターンが「え
いぎょうぶの(無音)」の位置で終了することを示す認
識終了点情報を出力する。
The recognition section determining device 104 has the input pattern "(silence) which is held by the audio signal recording device 102. From inside, the voice signal matching device 105
Determines the section that should be recognized. FIG. 4 is a block diagram showing the internal configuration of the recognition section determining device 104. The standard pattern generated by the matching pattern generation device 103 is input to the pattern generation device 401, and a pattern near the end thereof is generated. As the pattern near the terminal end, a part in which the phonological model of the particle and the silent model are connected is selected. The word spotting device 402 calculates the probability that the generated pattern near the end exists at each time point of the input pattern, and determines the end point of the voice to be recognized by the voice signal matching device 105. In this example, the word spotting device 402 outputs the recognition end point information indicating that the standard pattern ending with “no” ends at the position of “no sound”.

【0017】音声信号照合装置105は前回の認識終了
点から認識区間確定装置104が確定した認識終了点ま
でを認識対象として標準パターンとの一致度を計算す
る。図5に音声信号照合装置105の内部構成を表すブ
ロック図を示す。照合パターン絞り込み装置501は、
認識終了点情報を入力として複数の標準パターンの中か
ら実際に一致度を計算すべきものを選択する。この例で
は、「の」を終端に持つ標準パターン、すなわち、「社
長室の」「営業部の」「経理部の」の音韻モデルの、そ
れぞれの前後に無音の音韻モデルを接続した標準パター
ンが選択される。パターン照合装置502は、それらの
標準パターンと認識終了点情報で示される入力パターン
の一部「(無音)えいぎょうぶの(無音)」との一致度
を計算し、一致度の高いものから順に並べて出力する。
The voice signal collating device 105 calculates the degree of coincidence with the standard pattern from the last recognition end point to the recognition end point confirmed by the recognition section confirming device 104 as a recognition target. FIG. 5 is a block diagram showing the internal configuration of the voice signal matching device 105. The matching pattern narrowing device 501 is
Using the recognition end point information as an input, the one for which the degree of coincidence is to be actually calculated is selected from among a plurality of standard patterns. In this example, a standard pattern ending with "no", that is, a standard pattern in which silent phonological models are connected before and after the phonological models of "President's office", "Sales department", and "Accounting department", respectively. To be selected. The pattern matching device 502 calculates the degree of coincidence between these standard patterns and a part of the input pattern indicated by the recognition end point information “(silence) Eigubu no (silence)”, and the matching degree is calculated in descending order. Output side by side.

【0018】文脈管理装置106は認識結果を文脈を参
照して再順序づけを行い、照合パターン生成装置105
に次の認識で使用するために生成すべき語句を指示す
る。図6に文脈管理装置106の一実施例の構成を表す
ブロック図を示す。音声信号照合装置105によって順
位付けされたパターン認識結果は、認識結果修正装置6
03が認識履歴格納装置601と対象世界情報格納装置
602が保持する内容と比較し、一貫性のある認識候補
ほど順位が高く修正される。この例では、対象世界情報
格納装置602には、(表1)に示す人事に関する知識
と、受付における標準的な対話手順に関する知識が記述
される。認識結果修正装置603は、認識履歴格納装置
601を参照し、例えば、すでに部署が認識されている
ときは、部署の認識候補の順位を下げる。修正された認
識結果は認識履歴格納装置601に記録され、認識候補
生成装置604は、認識履歴格納装置601と対象世界
情報格納装置602を参照して、次に発話される内容を
予測し、認識候補生成情報を出力する。例えば、部署名
として「営業部の」が認識結果として得られた場合は、
名前の候補として「小川さんを」「淵さんを」「八田さ
んを」「田上さんを」「川田さんを」「三沢さんを」
「鶴田さんを」を標準パターンとして生成するように照
合パターン生成装置105に指示を与える。
The context management device 106 reorders the recognition results with reference to the context, and the matching pattern generation device 105
Instruct a word to be generated for use in the next recognition. FIG. 6 is a block diagram showing the configuration of an embodiment of the context management device 106. The pattern recognition results ranked by the voice signal matching device 105 are the recognition result correction device 6
03 is compared with the contents held in the recognition history storage device 601 and the target world information storage device 602, and the more consistent the recognition candidate is, the higher the rank is corrected. In this example, in the target world information storage device 602, knowledge about personnel affairs shown in (Table 1) and knowledge about standard dialogue procedures at reception are described. The recognition result correction device 603 refers to the recognition history storage device 601 and, for example, when the department is already recognized, lowers the rank of the recognition candidates of the department. The corrected recognition result is recorded in the recognition history storage device 601, and the recognition candidate generation device 604 refers to the recognition history storage device 601 and the target world information storage device 602 to predict the content to be uttered next and recognize the content. Output the candidate generation information. For example, if "Sales Department" is obtained as the recognition result as the department name,
"Mr. Ogawa""Mr.Fuchi""Mr.Hatta""Mr.Tagami""Mr.Kawata""Mr.Misawa"
The collation pattern generation device 105 is instructed to generate "Mr. Tsuruta" as a standard pattern.

【0019】なお、本発明は、ワードスポッティングを
行うべき短い音韻列を、音声認識を行う分野に応じて適
当に設定することで、あらゆる分野で高精度の音声認識
を行うことができる。
According to the present invention, the short phoneme sequence to be word-spotted is appropriately set according to the field of speech recognition, so that highly accurate speech recognition can be performed in all fields.

【0020】[0020]

【発明の効果】本装置により、多くの単語を認識しなけ
れならない場合でも、計算量の増加が緩やかで、かつ高
い精度で音声認識が可能になる。また、文法的に意味の
ある認識範囲を確定するので語彙情報と文法情報の正則
性が高く、それらの記述量の増加も緩やかになる。
According to the present invention, even when many words have to be recognized, the amount of calculation is moderately increased and the voice recognition can be performed with high accuracy. In addition, since the recognition range that has a grammatical meaning is determined, the regularity of the vocabulary information and the grammatical information is high, and the increase in the description amount of them is moderate.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例の音声対話型情報検索装置の
全体の構成を表すブロック図
FIG. 1 is a block diagram showing the overall configuration of a voice interactive information search device according to an embodiment of the present invention.

【図2】同じくその図1の音声信号記録装置102の内
部構成を表すブロック図
FIG. 2 is a block diagram showing an internal configuration of the audio signal recording device 102 of FIG.

【図3】同じくその図1の照合パターン生成装置103
の内部構成を表すブロック図
FIG. 3 is a collation pattern generation device 103 of FIG.
Block diagram showing the internal configuration of

【図4】同じくその図1の認識区間確定装置104の内
部構成を表すブロック図
FIG. 4 is a block diagram showing an internal configuration of the recognition section determination device 104 of FIG.

【図5】同じくその図1の音声信号照合装置105の内
部構成を表すブロック図
5 is a block diagram showing the internal configuration of the audio signal matching device 105 of FIG.

【図6】同じくその図1の文脈管理装置106の内部構
成を表すブロック図
6 is a block diagram showing an internal configuration of the context management device 106 of FIG.

【符号の説明】[Explanation of symbols]

101 音声入力装置 102 音声信号記録装置 103 照合パターン生成装置 104 認識区間確定装置 105 音声信号照合装置 106 文脈管理装置 201 情報圧縮装置 202 入力パターン記録装置 301 語彙情報格納装置 302 文法情報格納装置 303 音素モデル格納装置 304 無音モデル格納装置 305 標準パターン生成装置 401 終端パターン生成装置 402 ワードスポッティング装置 501 照合パターン絞り込み装置 502 パターン照合装置 601 認識履歴格納装置 602 対象世界情報格納装置 603 認識結果修正装置 604 認識候補生成装置 Reference Signs List 101 voice input device 102 voice signal recording device 103 collation pattern generation device 104 recognition section determination device 105 voice signal collation device 106 context management device 201 information compression device 202 input pattern recording device 301 vocabulary information storage device 302 grammar information storage device 303 phoneme model Storage device 304 Silence model storage device 305 Standard pattern generation device 401 End pattern generation device 402 Word spotting device 501 Matching pattern narrowing device 502 Pattern matching device 601 Recognition history storage device 602 Target world information storage device 603 Recognition result correction device 604 Recognition candidate generation apparatus

Claims (8)

【特許請求の範囲】[Claims] 【請求項1】音声を入力として電気信号を出力する音声
入力装置と、前記電気信号を入力パターンとして記録す
る音声信号記録装置と、前記入力パターンと照合するた
めの前記音声の標準パターンを出力する照合パターン生
成装置と、前記標準パターンと前記入力パターンとを入
力として、前記標準パターンと照合すべき前記入力パタ
ーンの区間を確定する認識区間確定装置と、前記認識区
間確定装置が指示する範囲の前記入力パターンと前記標
準パターンとの一致度を計算し、前記一致度の高い順に
並べた前記標準パターンを出力する音声信号照合装置
と、前記一致度の高い順に並べられた前記標準パターン
の順序を対話の文脈との整合性に応じて入れ換えて出力
し、かつ、次に入力される音声を予測して前記照合パタ
ーン生成装置が生成する前記標準パターンを制御する文
脈管理装置を有することを特徴とする音声認識装置。
1. A voice input device for inputting a voice and outputting an electric signal, a voice signal recording device for recording the electric signal as an input pattern, and a standard pattern of the voice for collating with the input pattern. A collation pattern generation device, a recognition segment determination device that determines the segment of the input pattern to be collated with the standard pattern by inputting the standard pattern and the input pattern, and a range of the range designated by the recognition segment determination device. A dialogue between the voice signal matching device that calculates the degree of coincidence between the input pattern and the standard pattern and outputs the standard patterns arranged in the order of the highest degree of coincidence with the order of the standard patterns arranged in the order of the highest degree of coincidence According to the consistency with the context of the output, and outputs the speech, and predicts the next input voice to generate the matching pattern generation device. That the voice recognition apparatus characterized by comprising a context manager for controlling the standard pattern.
【請求項2】音声信号記録装置は、音声入力装置が出力
する音声の電気信号の情報量を圧縮する情報圧縮装置
と、前記情報圧縮装置の出力を記録する入力パターン記
録装置を有することを特徴とする請求項1記載の音声認
識装置。
2. An audio signal recording device comprising an information compression device for compressing the information amount of an electric signal of an audio output from an audio input device, and an input pattern recording device for recording the output of the information compression device. The voice recognition device according to claim 1.
【請求項3】照合パターン生成装置は、認識すべき音声
を構成する単語と前記単語の音素表記を保持する語彙情
報格納装置と、前記単語の接続規則と前記音素の接続規
則とを保持する文法情報格納装置と、前記音素の音韻モ
デルを保持する音素モデル格納装置と、無音の音韻モデ
ルを保持する無音モデル格納装置と、前記語彙情報格納
装置と前記文法情報格納装置と前記音素モデル格納装置
と前記無音モデル格納装置とが出力する情報を参照して
文脈管理装置が指示する標準パターンを出力する標準パ
ターン生成装置を有することを特徴とする請求項1記載
の音声認識装置。
3. A matching pattern generation device, a vocabulary information storage device for holding a word constituting a speech to be recognized and a phoneme notation of the word, a grammar for holding a connection rule of the word and a connection rule of the phoneme. An information storage device, a phoneme model storage device that holds a phoneme model of the phoneme, a silence model storage device that holds a silent phoneme model, the vocabulary information storage device, the grammar information storage device, and the phoneme model storage device. 2. The voice recognition device according to claim 1, further comprising a standard pattern generation device for outputting a standard pattern instructed by the context management device with reference to information output by the silent model storage device.
【請求項4】標準パターン生成装置は、文脈管理装置が
生成を指示する語句の音韻モデルの前後に、無音のモデ
ルを接続して標準パターンとして出力することを特徴と
する請求項1記載の音声認識装置。
4. The standard pattern generation device according to claim 1, wherein a silent model is connected before and after the phoneme model of the phrase that the context management device instructs to generate, and the standard pattern generation device outputs the standard pattern. Recognition device.
【請求項5】認識区間確定装置は、標準パターンを入力
として前記標準パターンの終端近傍のパターンを出力す
る終端パターン生成装置と、入力パターンの中に前記標
準パターンの終端近傍のパターンの存在を認識して、前
記終端近傍のパターンの種類と前記入力パターンにおけ
る位置とを認識終了点情報として出力するワードスポッ
ティング装置を有することを特徴とする請求項1記載の
音声認識装置。
5. A recognition section determining device recognizes the existence of a terminal pattern generating device which inputs a standard pattern and outputs a pattern near the terminal end of the standard pattern, and the presence of a pattern near the terminal end of the standard pattern in the input pattern. 2. The voice recognition device according to claim 1, further comprising a word spotting device that outputs the type of pattern near the end and the position in the input pattern as recognition end point information.
【請求項6】終端パターン生成装置は、標準パターンの
終端近傍として、助詞の音韻モデルと無音の音韻モデル
を接続したパターンを生成することを特徴とする請求項
1記載の音声認識装置。
6. The speech recognition apparatus according to claim 1, wherein the terminal pattern generation device generates a pattern in which a particle phoneme model and a silent phoneme model are connected in the vicinity of the end of the standard pattern.
【請求項7】音声信号照合装置は、認識区間確定装置が
出力する認識終了点情報と、照合パターン生成装置が出
力する標準パターンとを入力として、前記認識終了点情
報が示す終端近傍のパターンを持つ前記標準パターンを
選択して出力する照合パターン絞り込み装置と、前記照
合パターン絞り込み装置が出力する前記標準パターンと
音声信号記録装置が出力する入力パターンを入力とし
て、最も最近の認識終了位置から前記認識終了点情報が
示す前記終端近傍のパターンの位置までの前記入力パタ
ーンと、個々の前記標準パターンとの一致度を計算し
て、前記一致度の高い順に前記標準パターンを並べてパ
ターン認識結果として出力するパターン照合装置を有す
ることを特徴とする請求項1記載の音声認識装置。
7. A voice signal matching device receives recognition end point information output by a recognition section determining device and a standard pattern output by a matching pattern generating device and inputs a pattern near the end indicated by the recognition end point information. The matching pattern narrowing device that selects and outputs the standard pattern that it has, and the standard pattern that the matching pattern narrowing device outputs and the input pattern that the audio signal recording device outputs, are input, and the recognition is performed from the most recent recognition end position. The degree of coincidence between the input pattern up to the position of the pattern near the end indicated by the end point information and each of the standard patterns is calculated, and the standard patterns are arranged in the descending order of the degree of coincidence and output as a pattern recognition result. The voice recognition device according to claim 1, further comprising a pattern matching device.
【請求項8】文脈管理装置は、入力された音声の認識を
開始する時点までの認識結果を記録する認識履歴管理装
置と、前記音声が発せられる世界に関する知識を格納す
る対象世界知識格納装置と、前記認識履歴管理装置の出
力と前記対象世界知識格納装置の出力と音声信号照合装
置が出力するパターン認識結果とを入力として、対話の
文脈と前記対話が対象とする世界に整合する前記パター
ン認識結果の認識候補の順位を上げて出力し、かつ、前
記認識履歴格納装置が保持する内容に、修正した前記パ
ターン認識結果を追記するパターン認識結果修正装置
と、前記認識履歴格納装置の出力と前記対象世界情報格
納装置の出力とを入力として次に入力される音声を予測
して認識候補生成情報を出力する認識候補生成装置を有
することを特徴とする請求項1記載の音声認識装置。
8. A context management device, a recognition history management device for recording a recognition result up to a time point when recognition of an input voice is started, and a target world knowledge storage device for storing knowledge about a world in which the voice is emitted. , The pattern recognition that matches the context of the dialogue and the world targeted by the dialogue, with the output of the recognition history management device, the output of the target world knowledge storage device, and the pattern recognition result output by the voice signal matching device as inputs A pattern recognition result correction device that outputs the result recognition candidates in a higher rank, and adds the corrected pattern recognition result to the content held by the recognition history storage device, the output of the recognition history storage device, and the It is characterized by having a recognition candidate generation device for predicting a voice input next by inputting the output of the target world information storage device and outputting recognition candidate generation information. Speech recognition apparatus according to claim 1.
JP6215958A 1994-09-09 1994-09-09 Voice recognition device Pending JPH0883091A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP6215958A JPH0883091A (en) 1994-09-09 1994-09-09 Voice recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP6215958A JPH0883091A (en) 1994-09-09 1994-09-09 Voice recognition device

Publications (1)

Publication Number Publication Date
JPH0883091A true JPH0883091A (en) 1996-03-26

Family

ID=16681067

Family Applications (1)

Application Number Title Priority Date Filing Date
JP6215958A Pending JPH0883091A (en) 1994-09-09 1994-09-09 Voice recognition device

Country Status (1)

Country Link
JP (1) JPH0883091A (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2008001486A1 (en) * 2006-06-29 2009-11-26 日本電気株式会社 Audio processing apparatus and program, and audio processing method
CN110073326A (en) * 2016-10-19 2019-07-30 搜诺思公司 Speech recognition based on arbitration
CN110895602A (en) * 2018-09-13 2020-03-20 中移(杭州)信息技术有限公司 Identity authentication method and device, electronic equipment and storage medium
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11405430B2 (en) 2016-02-22 2022-08-02 Sonos, Inc. Networked microphone device control
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US11482978B2 (en) 2018-08-28 2022-10-25 Sonos, Inc. Audio notifications
US11500611B2 (en) 2017-09-08 2022-11-15 Sonos, Inc. Dynamic computation of system response volume
US11501773B2 (en) 2019-06-12 2022-11-15 Sonos, Inc. Network microphone device with command keyword conditioning
US11514898B2 (en) 2016-02-22 2022-11-29 Sonos, Inc. Voice control of a media playback system
US11531520B2 (en) 2016-08-05 2022-12-20 Sonos, Inc. Playback device supporting concurrent voice assistants
US11538460B2 (en) 2018-12-13 2022-12-27 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US11540047B2 (en) 2018-12-20 2022-12-27 Sonos, Inc. Optimization of network microphone devices using noise classification
US11538451B2 (en) 2017-09-28 2022-12-27 Sonos, Inc. Multi-channel acoustic echo cancellation
US11545169B2 (en) 2016-06-09 2023-01-03 Sonos, Inc. Dynamic player selection for audio signal processing
US11556306B2 (en) 2016-02-22 2023-01-17 Sonos, Inc. Voice controlled media playback system
US11557294B2 (en) 2018-12-07 2023-01-17 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11563842B2 (en) 2018-08-28 2023-01-24 Sonos, Inc. Do not disturb feature for audio notifications
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11641559B2 (en) 2016-09-27 2023-05-02 Sonos, Inc. Audio playback settings for voice interaction
US11646023B2 (en) 2019-02-08 2023-05-09 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11646045B2 (en) 2017-09-27 2023-05-09 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
US11696074B2 (en) 2018-06-28 2023-07-04 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US11694689B2 (en) 2020-05-20 2023-07-04 Sonos, Inc. Input detection windowing
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices
US11710487B2 (en) 2019-07-31 2023-07-25 Sonos, Inc. Locally distributed keyword detection
US11714600B2 (en) 2019-07-31 2023-08-01 Sonos, Inc. Noise classification for event detection
US11736860B2 (en) 2016-02-22 2023-08-22 Sonos, Inc. Voice control of a media playback system
US11741948B2 (en) 2018-11-15 2023-08-29 Sonos Vox France Sas Dilated convolutions and gating for efficient keyword spotting
US11769505B2 (en) 2017-09-28 2023-09-26 Sonos, Inc. Echo of tone interferance cancellation using two acoustic echo cancellers
US11778259B2 (en) 2018-09-14 2023-10-03 Sonos, Inc. Networked devices, systems and methods for associating playback devices based on sound codes
US11790937B2 (en) 2018-09-21 2023-10-17 Sonos, Inc. Voice detection optimization using sound metadata
US11792590B2 (en) 2018-05-25 2023-10-17 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US11790911B2 (en) 2018-09-28 2023-10-17 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US11797263B2 (en) 2018-05-10 2023-10-24 Sonos, Inc. Systems and methods for voice-assisted media content selection
US11798553B2 (en) 2019-05-03 2023-10-24 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US11854547B2 (en) 2019-06-12 2023-12-26 Sonos, Inc. Network microphone device with command keyword eventing
US11862161B2 (en) 2019-10-22 2024-01-02 Sonos, Inc. VAS toggle based on device orientation
US11869503B2 (en) 2019-12-20 2024-01-09 Sonos, Inc. Offline voice control
US11893308B2 (en) 2017-09-29 2024-02-06 Sonos, Inc. Media playback system with concurrent voice assistance
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
US11900937B2 (en) 2017-08-07 2024-02-13 Sonos, Inc. Wake-word detection suppression
US11947870B2 (en) 2016-02-22 2024-04-02 Sonos, Inc. Audio response playback
US11961519B2 (en) 2020-02-07 2024-04-16 Sonos, Inc. Localized wakeword verification

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2008001486A1 (en) * 2006-06-29 2009-11-26 日本電気株式会社 Audio processing apparatus and program, and audio processing method
JP5223673B2 (en) * 2006-06-29 2013-06-26 日本電気株式会社 Audio processing apparatus and program, and audio processing method
US8751226B2 (en) 2006-06-29 2014-06-10 Nec Corporation Learning a verification model for speech recognition based on extracted recognition and language feature information
US11750969B2 (en) 2016-02-22 2023-09-05 Sonos, Inc. Default playback device designation
US11736860B2 (en) 2016-02-22 2023-08-22 Sonos, Inc. Voice control of a media playback system
US11556306B2 (en) 2016-02-22 2023-01-17 Sonos, Inc. Voice controlled media playback system
US11947870B2 (en) 2016-02-22 2024-04-02 Sonos, Inc. Audio response playback
US11514898B2 (en) 2016-02-22 2022-11-29 Sonos, Inc. Voice control of a media playback system
US11405430B2 (en) 2016-02-22 2022-08-02 Sonos, Inc. Networked microphone device control
US11863593B2 (en) 2016-02-22 2024-01-02 Sonos, Inc. Networked microphone device control
US11832068B2 (en) 2016-02-22 2023-11-28 Sonos, Inc. Music service selection
US11545169B2 (en) 2016-06-09 2023-01-03 Sonos, Inc. Dynamic player selection for audio signal processing
US11531520B2 (en) 2016-08-05 2022-12-20 Sonos, Inc. Playback device supporting concurrent voice assistants
US11641559B2 (en) 2016-09-27 2023-05-02 Sonos, Inc. Audio playback settings for voice interaction
US11308961B2 (en) 2016-10-19 2022-04-19 Sonos, Inc. Arbitration-based voice recognition
US11727933B2 (en) 2016-10-19 2023-08-15 Sonos, Inc. Arbitration-based voice recognition
CN110073326A (en) * 2016-10-19 2019-07-30 搜诺思公司 Speech recognition based on arbitration
US11900937B2 (en) 2017-08-07 2024-02-13 Sonos, Inc. Wake-word detection suppression
US11500611B2 (en) 2017-09-08 2022-11-15 Sonos, Inc. Dynamic computation of system response volume
US11646045B2 (en) 2017-09-27 2023-05-09 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
US11769505B2 (en) 2017-09-28 2023-09-26 Sonos, Inc. Echo of tone interferance cancellation using two acoustic echo cancellers
US11538451B2 (en) 2017-09-28 2022-12-27 Sonos, Inc. Multi-channel acoustic echo cancellation
US11893308B2 (en) 2017-09-29 2024-02-06 Sonos, Inc. Media playback system with concurrent voice assistance
US11689858B2 (en) 2018-01-31 2023-06-27 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11797263B2 (en) 2018-05-10 2023-10-24 Sonos, Inc. Systems and methods for voice-assisted media content selection
US11792590B2 (en) 2018-05-25 2023-10-17 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US11696074B2 (en) 2018-06-28 2023-07-04 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US11563842B2 (en) 2018-08-28 2023-01-24 Sonos, Inc. Do not disturb feature for audio notifications
US11482978B2 (en) 2018-08-28 2022-10-25 Sonos, Inc. Audio notifications
CN110895602A (en) * 2018-09-13 2020-03-20 中移(杭州)信息技术有限公司 Identity authentication method and device, electronic equipment and storage medium
CN110895602B (en) * 2018-09-13 2021-12-14 中移(杭州)信息技术有限公司 Identity authentication method and device, electronic equipment and storage medium
US11778259B2 (en) 2018-09-14 2023-10-03 Sonos, Inc. Networked devices, systems and methods for associating playback devices based on sound codes
US11790937B2 (en) 2018-09-21 2023-10-17 Sonos, Inc. Voice detection optimization using sound metadata
US11790911B2 (en) 2018-09-28 2023-10-17 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
US11741948B2 (en) 2018-11-15 2023-08-29 Sonos Vox France Sas Dilated convolutions and gating for efficient keyword spotting
US11557294B2 (en) 2018-12-07 2023-01-17 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11538460B2 (en) 2018-12-13 2022-12-27 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US11540047B2 (en) 2018-12-20 2022-12-27 Sonos, Inc. Optimization of network microphone devices using noise classification
US11646023B2 (en) 2019-02-08 2023-05-09 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11798553B2 (en) 2019-05-03 2023-10-24 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US11501773B2 (en) 2019-06-12 2022-11-15 Sonos, Inc. Network microphone device with command keyword conditioning
US11854547B2 (en) 2019-06-12 2023-12-26 Sonos, Inc. Network microphone device with command keyword eventing
US11714600B2 (en) 2019-07-31 2023-08-01 Sonos, Inc. Noise classification for event detection
US11710487B2 (en) 2019-07-31 2023-07-25 Sonos, Inc. Locally distributed keyword detection
US11862161B2 (en) 2019-10-22 2024-01-02 Sonos, Inc. VAS toggle based on device orientation
US11869503B2 (en) 2019-12-20 2024-01-09 Sonos, Inc. Offline voice control
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11961519B2 (en) 2020-02-07 2024-04-16 Sonos, Inc. Localized wakeword verification
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US11694689B2 (en) 2020-05-20 2023-07-04 Sonos, Inc. Input detection windowing
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices

Similar Documents

Publication Publication Date Title
JPH0883091A (en) Voice recognition device
US7013276B2 (en) Method of assessing degree of acoustic confusability, and system therefor
JP4221379B2 (en) Automatic caller identification based on voice characteristics
US6856956B2 (en) Method and apparatus for generating and displaying N-best alternatives in a speech recognition system
Juang et al. Automatic recognition and understanding of spoken language-a first step toward natural human-machine communication
EP1936606B1 (en) Multi-stage speech recognition
EP2048655B1 (en) Context sensitive multi-stage speech recognition
JP3180655B2 (en) Word speech recognition method by pattern matching and apparatus for implementing the method
JP2965537B2 (en) Speaker clustering processing device and speech recognition device
EP0533491B1 (en) Wordspotting using two hidden Markov models (HMM)
US6192337B1 (en) Apparatus and methods for rejecting confusible words during training associated with a speech recognition system
US20180137109A1 (en) Methodology for automatic multilingual speech recognition
EP1355295B1 (en) Speech recognition apparatus, speech recognition method, and computer-readable recording medium in which speech recognition program is recorded
JPH09500223A (en) Multilingual speech recognition system
JPH0394299A (en) Voice recognition method and method of training of voice recognition apparatus
US20070136060A1 (en) Recognizing entries in lexical lists
Boite et al. A new approach towards keyword spotting.
US20040006469A1 (en) Apparatus and method for updating lexicon
Hirschberg et al. Generalizing prosodic prediction of speech recognition errors
US10854196B1 (en) Functional prerequisites and acknowledgments
JP3285704B2 (en) Speech recognition method and apparatus for spoken dialogue
JP2921059B2 (en) Continuous speech recognition device
JP3465334B2 (en) Voice interaction device and voice interaction method
JP3075250B2 (en) Speaker recognition method and apparatus
JPH08190470A (en) Information providing terminal