JPH04253099A

JPH04253099A - Continuous voice recognition system

Info

Publication number: JPH04253099A
Application number: JP3010234A
Authority: JP
Inventors: Atsushi Noguchi; 淳野口; Akitoshi Okumura; 明俊奥村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1991-01-30
Filing date: 1991-01-30
Publication date: 1992-09-08
Anticipated expiration: 2013-10-30
Also published as: JP2817406B2

Abstract

PURPOSE:To provide the continuous voice recognition system which can recog nize diverse natural sentences at the time of voice recognition and output mean ing expressions. CONSTITUTION:A voice input part 101 analyzes an input voice and outputs the result to a continuous voice recognition part 104. A network storage part 102 is stored with sentences or word strings to be recognized. The continuous voice recognition part 104 selects a word string by matching the pattern of the analytic result of the input voice received from the voice input part 101. An intermediate expression generation part 106 generates an intermediate expression from the ID train of an arc inputted as the recognition result from the continuous voice recognition part 104 and the storage contents of a meaning relation storage part 105 and outputs the generated intermediate expression.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は自動通訳システム，音声
質問応答（ＱＡ）システム等において連続的に発声した
音声を認識する連続音声認識方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition system for recognizing continuously uttered speech in automatic interpretation systems, voice question answering (QA) systems, and the like.

【０００２】0002

【従来の技術】音声を用いたマンマシンインターフェー
スは、キーボードのような訓練が必要ないため使いやす
くまた自然であるので実用化が強く望まれている。また
音声も単語単位で発話されたものだけでなく、会話文の
ような連続音声を取り扱える方がより使いやすく自然な
ので望まれている。2. Description of the Related Art A man-machine interface using voice is easy to use and natural since it does not require training unlike a keyboard, so its practical use is strongly desired. In addition, it is desirable to be able to handle continuous speech, such as conversational sentences, rather than just single-word speech, as this would be easier to use and more natural.

【０００３】このような、音声を用いたインターフェー
スを自動通訳システム，音声ＱＡシステム等に使用する
場合は入力音声の一言一句を明らかにするのが主な目的
ではなく、入力音声の意味を取り出すこと、すなわち入
力音声の意味の理解が必要である。[0003] When such an interface using audio is used for an automatic interpretation system, audio QA system, etc., the main purpose is not to clarify each word of the input audio, but to extract the meaning of the input audio. In other words, it is necessary to understand the meaning of the input speech.

【０００４】なぜならば、質問応答システムではユーザ
ーの質問に対して適切に返答するために質問の意味を知
る必要があるし、自動通訳システムでは音声入力に用い
られた言語を英語等の他の言語に適切に変換するために
入力文の意味を明確にする必要があるからである。[0004] This is because a question answering system needs to know the meaning of a user's question in order to respond appropriately, and an automatic interpretation system requires the user to know the meaning of the question in order to respond appropriately. This is because it is necessary to clarify the meaning of the input sentence in order to convert it appropriately.

【０００５】入力音声の認識結果に対して改めて構文解
析，意味解析を行なうことなく、入力音声の自然言語と
しての意味を抽出する音声認識方式としては特願平２−
７２８８９号記載の「連続音声認識方式」（以下文献１
と称す）がある。文献１では音声認識用のオートマトン
で受理される単語列中の単語と単語の意味的な関係を、
その単語列がオートマトンに受理されたときにそれぞれ
の単語が対応する状態遷移列どうしの意味関係としてあ
らかじめ記憶しておくことにより入力音声の音声認識結
果に対して構文解析，意味解析を改めて行なうことなく
入力音声の自然言語としての意味を抽出する音声認識方
式を提案している。[0005] As a speech recognition method that extracts the meaning of input speech as a natural language without performing syntactic analysis or semantic analysis on the recognition result of input speech, Japanese Patent Application No.
"Continuous speech recognition method" described in No. 72889 (hereinafter referred to as Document 1)
). In Reference 1, the semantic relationships between words in a word string accepted by an automaton for speech recognition are
When the word string is accepted by the automaton, syntactic analysis and semantic analysis are performed on the speech recognition results of the input speech by storing in advance the semantic relationships between the state transition strings to which each word corresponds. We have proposed a speech recognition method that extracts the meaning of input speech as a natural language.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら文献１の
音声認識方式では、例えば、その認識された文の時制や
丁寧文，疑問文等の文の種類を中間表現中に表すことに
ついては考察されていなかった。[Problem to be Solved by the Invention] However, in the speech recognition method of Document 1, there is no consideration given to representing the tense of the recognized sentence, the type of sentence such as polite sentence, interrogative sentence, etc. in the intermediate representation. There wasn't.

【０００７】また、有限オートマトンを用いた方法と同
等の処理量で文脈自由文法を扱うことができる音声認識
方式として、１９８９年電子情報通信学会春季全国大会
「拡張遷移網を用いた連続音声認識の一方式」（以下文
献２と称す）（吉田和永、渡辺隆夫）がある。拡張遷移
網（以下ＡＴＮと記す）を用いた音声認識は、（１）サ
ブネットワークの呼び出しにより文脈自由文法が扱える
。（２）レジスタとそのレジスタをテストする機構より履
歴を考慮した処理を行ない、語順自由，共起関係，係受
けを扱うことができる。等の自然言語を記述するための高い能力を有している。[0007] Furthermore, as a speech recognition method that can handle context-free grammars with the same amount of processing as a method using finite automata, the 1989 Institute of Electronics, Information and Communication Engineers Spring National Conference ``Continuous Speech Recognition Using Extended Transition Networks'' There is a "one-way method" (hereinafter referred to as Document 2) (Kazunaga Yoshida, Takao Watanabe). Speech recognition using an Augmented Transition Network (hereinafter referred to as ATN) can handle (1) context-free grammars by calling subnetworks; (2) Processing that takes history into account is performed using a register and a mechanism that tests the register, and word order freedom, co-occurrence relationships, and dependency can be handled. He has a high ability to describe natural languages such as

【０００８】しかしながら、文献１の音声認識方法では
、文献２のように認識用ネットワークがサブネットを持
つ場合についての考察がなされていなかった。[0008] However, the speech recognition method of Document 1 does not consider the case where the recognition network has a subnet as in Document 2.

【０００９】また、例えば日本語による音声対話におい
て“（あなたは）学校にいくの？”“（わたしは）バー
シアのコンサートのチケットが欲しい。”のように良く
主語が省略されるので中間表現を作成する際に省略され
た要素を補わなくてはならないが、文献１の音声認識方
法ではその考察がなされていない。[0009] Also, in spoken dialogue in Japanese, for example, the subject is often omitted, as in "Are you going to school?" and "I want tickets to Basia's concert." It is necessary to supplement the omitted elements when creating the speech recognition method, but this consideration is not made in the speech recognition method of Document 1.

【００１０】本発明の目的は、このような欠点を克服し
て、音声認識の際により自然で多様な文を認識し、かつ
その意味表現を出力することができる連続音声認識方式
を提供することにある。An object of the present invention is to overcome these drawbacks and provide a continuous speech recognition method that can recognize more natural and diverse sentences during speech recognition and output their meaning expressions. It is in.

【００１１】[0011]

【課題を解決するための手段】本発明による連続音声認
識方式は、連続音声認識対象の文法を表現する単語に関
するネットワークを記憶する第１の記憶手段と、単語の
標準パターンを前記ネットワークにしたがって結合して
連続音声を認識する連続音声認識手段と、前記ネットワ
ークにおけるある単語と他の単語の間の意味的な関係及
び各単語に対する素性情報を記憶する第２の記憶手段と
、前記連続音声認識手段から生じる認識結果及び前記第
２の記憶手段からその中間表現を出力する出力手段とを
備えることを特徴とする。[Means for Solving the Problems] The continuous speech recognition method according to the present invention includes a first storage means for storing a network related to words expressing the grammar of a continuous speech recognition target, and a standard pattern of words that is combined according to the network. continuous speech recognition means for recognizing continuous speech; second storage means for storing semantic relationships between one word and another word in the network and feature information for each word; and the continuous speech recognition means. and an output means for outputting a recognition result generated from the recognition result and an intermediate representation thereof from the second storage means.

【００１２】そして、前記連続音声認識手段はあらかじ
め定められた認識単位の標準パターンを前記ネットワー
ク及び各ネットワークから呼ばれるサブネットワークに
したがって結合して連続音声を認識し、前記第１の記憶
手段は連続音声認識対象の文法を表現するネットワーク
及び各ネットワークから呼ばれるサブネットワークを記
憶し、前記出力手段は前記連続音声認識手段から生じる
認識結果及び前記第２の記憶手段のある単語と他の単語
の間の意味的な関係及び前記ネットワーク及びサブネッ
トワーク内の各単語に対する素性情報からその中間表現
を出力することを特徴とする。[0012] The continuous speech recognition means recognizes continuous speech by combining standard patterns of predetermined recognition units according to the network and sub-networks called from each network, and the first storage means recognizes continuous speech. A network representing a grammar to be recognized and a sub-network called from each network are stored, and the output means stores the recognition result generated from the continuous speech recognition means and the meaning between a certain word and another word of the second storage means. The present invention is characterized in that an intermediate representation thereof is output from the relationship and feature information for each word in the network and sub-network.

【００１３】また、ある単語が認識された時に、他の単
語に付与する素性情報を記憶する前記第２の記憶手段と
、前記情報を前記中間表現に付与する付与手段とを備え
ることを特徴とする。[0013] Also, the present invention is characterized by comprising the second storage means for storing feature information to be added to another word when a certain word is recognized, and the addition means for adding the information to the intermediate expression. do.

【００１４】さらに、ある単語が認識された時に、その
単語に付与する素性情報を単語辞書記憶部から取り出し
、前記付与手段は前記情報を前記中間表現に付与するこ
とを特徴とする。Furthermore, when a certain word is recognized, feature information to be added to the word is retrieved from the word dictionary storage section, and the adding means adds the information to the intermediate expression.

【００１５】そして、前記単語辞書記憶部は連続音声認
識対象の文法を表現するネットワーク内に存在しないが
中間表現を作成するために必ず必要な単語を記憶し、前
記出力手段は前記情報をもとに中間表現を作成して出力
することを特徴とする。[0015] The word dictionary storage unit stores words that do not exist in the network expressing the grammar of the continuous speech recognition target but are absolutely necessary to create an intermediate representation, and the output means stores words that are necessary for creating an intermediate representation. It is characterized by creating and outputting an intermediate representation.

【００１６】また、前記単語辞書記憶部は連続音声認識
対象の文法を表現するネットワーク内のある特定の単語
が認識された際に補う単語を記憶し、前記出力手段は前
記情報をもとに中間表現を作成して出力することを特徴
とする。Further, the word dictionary storage unit stores words to be supplemented when a certain specific word in the network expressing the grammar of the continuous speech recognition target is recognized, and the output means stores intermediate words based on the information. It is characterized by creating and outputting expressions.

【００１７】さらに、前記単語辞書記憶部は連続音声認
識対象の文法を表現するネットワーク内のある特定の単
語が認識された際に削除すべき連続音声認識対象の文法
を表現するネットワーク中の特定の単語を記憶し、前記
出力手段は前記情報をもとに中間表現を作成して出力す
ることを特徴とする。Furthermore, the word dictionary storage section stores a specific word in the network expressing the grammar of the continuous speech recognition target that should be deleted when a specific word in the network expressing the grammar of the continuous speech recognition target is recognized. It is characterized in that words are stored, and the output means creates and outputs an intermediate expression based on the information.

【００１８】[0018]

【作用】次に、本発明の作用について説明する。[Operation] Next, the operation of the present invention will be explained.

【００１９】図３は本発明の一実施例における意味関係
記憶部の一例を示す図、図４は音声認識用ネットワーク
の第１の例を示す図、図５は図４におけるメインネット
ワーク用の意味関係記憶部の内容の一例を示す図、図６
は図４におけるサブネットワーク用の意味関係記憶部の
内容の一例を示す図、図７は音声認識用ネットワークの
第２の例を示す図、図８は図７におけるネットワークの
中間表現の例を示す図、図９は図７におけるネットワー
ク用の意味関係記憶部の内容の一例を示す図、図１０は
音声認識用ネットワークの第３の例を示す図、図１１は
図１０におけるネットワーク用の意味関係記憶部の内容
の一例を示す図、図１２は音声認識用ネットワークの第
４の例を示す図、図１３は図１２におけるネットワーク
の中間表現の例を示す図、図１５は音声認識用ネットワ
ークの第５の例を示す図、図１６は音声認識用ネットワ
ークの第６の例を示す図、図１７は図１６におけるネッ
トワークの中間表現の例を示す図である。FIG. 3 is a diagram showing an example of a semantic relationship storage unit in an embodiment of the present invention, FIG. 4 is a diagram showing a first example of a speech recognition network, and FIG. FIG. 6 is a diagram showing an example of the contents of the relational storage section.
is a diagram showing an example of the contents of the semantic relation storage unit for the sub-network in FIG. 4, FIG. 7 is a diagram showing a second example of the speech recognition network, and FIG. 8 is a diagram showing an example of an intermediate representation of the network in FIG. 7. 9 is a diagram showing an example of the contents of the semantic relation storage unit for the network in FIG. 7, FIG. 10 is a diagram showing a third example of the speech recognition network, and FIG. 11 is a diagram showing the semantic relation for the network in FIG. 10. A diagram showing an example of the contents of the storage unit, FIG. 12 is a diagram showing a fourth example of the speech recognition network, FIG. 13 is a diagram showing an example of an intermediate representation of the network in FIG. 12, and FIG. 15 is a diagram showing the fourth example of the speech recognition network. FIG. 16 is a diagram showing a sixth example of a speech recognition network, and FIG. 17 is a diagram showing an example of an intermediate representation of the network in FIG. 16.

【００２０】例えば今、図１６のような音声認識用ネッ
トワークにおいて“私はあさってのバーシアのコンサー
トのチケットが欲しい”という音声が認識されたものと
する。ここで図１６において“私は”，“切符”のよう
な各ノードに付してある番号をＩＤ番号と呼ぶ。この“
私はあさってのバーシアのコンサートのチケットが欲し
い”という文の中間表現は図１７のようになる。図１７
の“欲しい”，“あさって”のような各音声認識用ネッ
トワーク中の単語に対応する中間表現内の表記をＣＰと
呼ぶ。また“受益”、“対象”のように各単語間の関係
を表している表記をＣＡＳＥと呼ぶ。図１７で＃文体，
＃名詞意味素性のように各ＣＰの横に付与されている情
報を素性情報と呼ぶ。＃文体は文型を、＃時制は時間を
、＃動詞意味分類は動詞の意味分類を、＃名詞意味素性
は名詞の意味素性を、＃数量は単数・複数を、＃ていね
いさは文が丁寧文であることを表す。またそれぞれの｛
｝内の文字・数字は各素性情報の値を示す。本発明では
、例えば図３において、ネットワークにおけるある単語
と他の単語の間の意味的な関係と、ネットワークにおけ
る各単語に対する素性情報を記憶することにより音声認
識の結果から図１７のような中間表現を作成するように
している。この結果、認識された文の時制や丁寧文，疑
問文等の文の種類を中間表現中に表すこができるように
なる。For example, assume that the voice recognition network shown in FIG. 16 recognizes the phrase "I want tickets to Basia's concert tomorrow." Here, in FIG. 16, the numbers attached to each node such as "I am" and "Ticket" are called ID numbers. this"
The intermediate expression of the sentence "I want tickets for Basia's concert tomorrow" is shown in Figure 17.Figure 17
The notation in the intermediate representation corresponding to the words in each speech recognition network, such as "I want" and "Asatte", is called CP. Also, the notation that expresses the relationship between each word, such as "beneficiary" and "object", is called CASE. In Figure 17, #style,
#The information given next to each CP, such as the noun semantic feature, is called feature information. #Style is the sentence type, #Tense is the time, #Verb semantic classification is the semantic classification of the verb, #Noun semantic feature is the semantic feature of the noun, #Quantity is the singular and plural, #Positiveness is the politeness of the sentence. represents that Also, each {
The letters and numbers in } indicate the value of each feature information. In the present invention, for example, in FIG. 3, by storing the semantic relationship between a certain word and another word in the network and the feature information for each word in the network, an intermediate expression as shown in FIG. I'm trying to create one. As a result, the tense of the recognized sentence and the type of sentence, such as polite sentences and interrogative sentences, can be expressed in the intermediate representation.

【００２１】また、文献２のようにＡＴＮのような音声
認識用ネットワークを用いる場合について考える。今、
図４のようなメインネットワーク及びサブネットワーク
を用いて認識するとする。図４（ａ）のメインネットに
おいて例えば“ｓｕｂ日時”というところで図４（ｂ）
で示したサブネットワークに跳んでいる。このとき請求
項２の発明では、例えば図４（ａ）のネットワークに対
して図５に示すテーブルを、図４（ｂ），（ｃ），（ｄ
）のネットワークに対してそれぞれ図６（ａ），（ｂ）
，（ｃ）に示すテーブルを用意する。図５のテーブル中
のＣＰの欄でサブネットを指定することによりサブネッ
ト用のテーブルを参照する。この結果、例えばＡＴＮの
ようにメインネットワーク及びそこから呼ばれるサブネ
ットワークがあるようなネットワークを用いた音声認識
結果からも図１７のような中間表現を作成することがで
きる。[0021] Also, consider the case where a voice recognition network such as ATN is used as in Document 2. now,
It is assumed that recognition is performed using a main network and subnetworks as shown in FIG. For example, in the main network of Figure 4 (a), at the "sub date and time", Figure 4 (b)
It jumps to the subnetwork shown in . In this case, in the invention of claim 2, for example, the table shown in FIG. 5 for the network of FIG. 4(a) is
) for the networks shown in Figures 6(a) and (b), respectively.
, (c) are prepared. The subnet table is referenced by specifying the subnet in the CP column in the table of FIG. As a result, it is possible to create an intermediate representation as shown in FIG. 17 even from the speech recognition results using a network such as ATN, which has a main network and sub-networks called from the main network.

【００２２】また、図７のような音声認識ネットワーク
において例えば入力音声が“私はチケットを１枚欲しい
”であるときこの中間表現は図８（ａ）のようになり、
チケットは１枚なのでＣＰ“チケット”の素性情報のう
ち＃数量の値は単数をあらわすＳＩＮとなる。しかし、
もし入力音声が“私はチケットを２枚欲しい”であると
きこの中間表現は図８（ｂ）のようになり、チケットは
２枚なのでＣＰ“チケット”の素性情報のうち＃数量の
値は複数をあらわすＰＬとなる。Furthermore, in the speech recognition network shown in FIG. 7, for example, when the input voice is "I want one ticket," the intermediate expression is as shown in FIG. 8(a),
Since there is only one ticket, the value of the # quantity in the background information of the CP "ticket" is SIN representing the singular number. but,
If the input voice is "I want two tickets", this intermediate representation will be as shown in Figure 8(b), and since there are two tickets, the value of # quantity in the background information of CP "ticket" is multiple. PL represents.

【００２３】このとき請求項３の発明では、“１枚”や
“２枚”のようなＣＰの情報がその親のＣＰ“チケット
”の素性情報を決定するので、図９の“親付与情報”の
欄のようにその親のＣＰの素性情報に付与する情報を記
憶する。この結果、ある単語が認識されたときに他の単
語に素性情報を付与するような場合でも図７のようなネ
ットワークの中間表現を作成することができるようにな
る。In this case, in the invention of claim 3, since the information of the CP such as "1 ticket" or "2 tickets" determines the background information of the parent CP "ticket", the "parent grant information" shown in FIG. ” column, information to be added to the background information of the parent CP is stored. As a result, even in the case where feature information is added to another word when a certain word is recognized, it becomes possible to create an intermediate representation of the network as shown in FIG. 7.

【００２４】ところで、例えば図３の素性情報の欄の記
述のうち＃名詞意味素性や＃動詞意味分類の値は“私”
や“チケット”のような単語に固有な情報である。した
がって、請求項４の発明では、辞書中にそれぞれの素性
情報を記述しておきこの辞書を参照して＃名詞意味素性
や＃動詞意味分類のような情報を補う。この結果テーブ
ルの記述が簡素化し、テーブルの記憶容量も削減できる
ようになる。By the way, for example, among the descriptions in the feature information column of FIG. 3, the values of #noun semantic feature and #verb semantic classification are "I"
This information is unique to words such as "ticket" and "ticket." Therefore, in the invention of claim 4, each feature information is described in a dictionary and this dictionary is referenced to supplement information such as #noun semantic feature and #verb semantic classification. As a result, the table description can be simplified and the storage capacity of the table can be reduced.

【００２５】また、例えば日本語の場合、文の主語が省
かれることが多い。図１０のネットワークの認識結果の
一つである“あさってのバーシアのコンサートのチケッ
トが欲しい”という音声が入力されたものとする。この
中間表現は図１７と同じにならなければならないがＣＰ
“私”に相当する単語が認識結果に存在しない。このと
き請求項５の発明では、音声中では省略されているが中
間表現中に必要なＣＰを補わなければならないので、例
えば図１１のようにＩＤ番号の欄がＤであるものをディ
フォルトで補うものとする。この結果、音声認識結果に
は存在しないが中間表現作成に必ず必要な単語を補い、
図１７のような中間表現が作成できるようになる。Furthermore, in the case of Japanese, for example, the subject of a sentence is often omitted. Assume that a voice saying "I want tickets to Basia's concert tomorrow", which is one of the recognition results of the network in FIG. 10, is input. This intermediate representation must be the same as in Figure 17, but with CP
The word equivalent to “I” does not exist in the recognition results. In this case, in the invention of claim 5, it is necessary to supplement the CP that is omitted in the voice but is necessary in the intermediate expression, so for example, as shown in FIG. 11, the CP in the ID number column is D by default. shall be taken as a thing. As a result, words that are not present in the speech recognition results but are necessary for creating intermediate expressions are supplemented,
An intermediate representation as shown in FIG. 17 can now be created.

【００２６】また、例えば図１２のような音声認識用ネ
ットワークがあるとする。今、入力文が例えば“私はバ
ーシアのチケットが欲しい”の場合について考える。こ
の場合意味的には“私はバーシアのコサートのチケット
が欲しい”といっており。“コンサート”という単語が
省略されている。このとき中間表現は図１３（ａ）のよ
うになる。Further, suppose that there is a voice recognition network as shown in FIG. 12, for example. Now, let's consider the case where the input sentence is, for example, "I want tickets for Basia." In this case, the meaning is "I want tickets to Basia's Cosart." The word “concert” is omitted. At this time, the intermediate representation becomes as shown in FIG. 13(a).

【００２７】また、入力文が例えば“私は夢の遊眠社の
チケットが欲しい”であるとする。この場合意味的には
“私は夢の遊眠社の演劇のチケットが欲しい”といって
おり、“演劇”という単語が省略されている。このとき
中間表現は図１３（ｂ）のようになる。このとき請求項
６の発明では、ある特定の単語が認識されたときにその
単語ごとに異なった意味を補うものとする。この結果、
ある特定の単語が認識された際に適切な単語を補い正し
い中間表現を作成することができるようになる。Further, assume that the input sentence is, for example, "I want Yume no Yuminsha tickets." In this case, the meaning is ``I want tickets to Yume no Yuminsha's play,'' and the word ``geki'' is omitted. At this time, the intermediate representation becomes as shown in FIG. 13(b). In this case, in the invention of claim 6, when a certain specific word is recognized, a different meaning is added for each word. As a result,
When a specific word is recognized, it becomes possible to create a correct intermediate expression by adding appropriate words.

【００２８】また、図１５のような音声認識用ネットワ
ークがある場合について考える。このネットワークは例
えば“私はバーシアのコンサートのチケットが欲しい”
という入力文も、“バーシアのコンサートのチケットが
欲しい”という“私は”という単語が省略された入力文
も受けつける。この中間表現はいずれも図１３（ａ）の
ようになる。Also, consider the case where there is a voice recognition network as shown in FIG. For example, this network might say, ``I want tickets to Basia's concert.''
It also accepts input sentences such as ``I want tickets to Basia's concert'' where the word ``I am'' is omitted. All of these intermediate representations are as shown in FIG. 13(a).

【００２９】このとき請求項５の発明を用いて“私”と
いうＣＰをディフォルトで補うと音声認識結果中に“私
は”が省略された場合は問題ないが、音声認識結果中に
“私は”が存在する場合に、“欲しい”と’私”の関係
が受益であることを示す木を２本持った誤った中間表現
を作成してしまう。In this case, if the invention of claim 5 is used to supplement the CP "I" by default, there will be no problem if "I am" is omitted in the speech recognition result, but if "I am" is omitted in the speech recognition result, there will be no problem. ” exists, an incorrect intermediate representation with two trees indicating that the relationship between “I want” and “I” is a beneficiary is created.

【００３０】したがって請求項７の発明では、ある特定
の単語が認識された場合に中間表現の一部を削除して意
味表現作成するものとする。この結果、音声認識結果中
である単語が現われる場合と省略される場合の両方があ
る場合も正しい中間表現を作成することができるように
なる。Therefore, in the seventh aspect of the invention, when a certain specific word is recognized, a part of the intermediate expression is deleted to create a meaning expression. As a result, a correct intermediate expression can be created even when a word appears and is omitted in the speech recognition result.

【００３１】[0031]

【実施例】次に、本発明による連続音声認識方式の実施
例について図面を参照して説明する。[Embodiment] Next, an embodiment of the continuous speech recognition system according to the present invention will be described with reference to the drawings.

【００３２】まず、請求項１の発明の一実施例について
説明する。図１は本発明の一実施例を示す構成図である
。図１において、音声入力部１０１は入力音声をデジタ
ル信号にして分析を行ない、特徴ベクトルの時系列を求
めて連続音声認識部１０４にその結果を出力する。ネッ
トワーク記憶部１０２は音声認識の際に認識対象となる
文または単語列を記憶する。ネットワーク中に付された
番号はネットワーク中の各単語を表すアークのＩＤ番号
である。図１６は図１におけるネットワーク記憶部１０
２に記憶される認識用ネットワークの一例を示すもので
ある。このようなネットワークを始端から終端までたど
ることができた時、入力音声はこのネットワークにて受
理されたとする。例えば図１６の場合は“私はあすのバ
ーシアのライブの切符が欲しい”、“私はしあさっての
バンヘーレンのコンサートの切符がいい”などの入力音
声を受理することができる。単語標準パターン記憶部は
認識対象の単語の音声の標準パターンをあらかじめ記憶
しており、単語標準パターン記憶部１０３には、ネット
ワーク中の単語の標準パターンを記憶している。この標
準パターンは例えば音節のようなより小さな単位を連結
したものでも良い。First, an embodiment of the invention according to claim 1 will be described. FIG. 1 is a block diagram showing one embodiment of the present invention. In FIG. 1, a voice input unit 101 analyzes input voice as a digital signal, obtains a time series of feature vectors, and outputs the results to a continuous voice recognition unit 104. The network storage unit 102 stores sentences or word strings to be recognized during speech recognition. The numbers assigned to the network are the ID numbers of the arcs representing each word in the network. FIG. 16 shows the network storage unit 10 in FIG.
2 shows an example of a recognition network stored in FIG. When such a network can be traced from the beginning to the end, it is assumed that the input voice is accepted by this network. For example, in the case of FIG. 16, input voices such as "I want tickets to Basia's concert tomorrow" and "I want tickets to Van Heeren's concert tomorrow" can be accepted. The word standard pattern storage section stores in advance the standard patterns of sounds of words to be recognized, and the word standard pattern storage section 103 stores standard patterns of words in the network. This standard pattern may be a concatenation of smaller units, such as syllables.

【００３３】連続音声認識部１０４は上記のネットワー
クをたどることにより生成される単語列に従い単語標準
パターン記憶部１０３に記憶されている標準パターンを
連結し、音声入力部１０１から受けとった入力音声の特
徴ベクトルの時系列とのパターンマッチングを行なうこ
とにより入力音声と最も音響的類似性の高い単語列を選
択する。連続音声認識部１０４はこの単語列を表す各単
語のアークのＩＤの列を認識結果として中間表現生成部
１０６に出力する。中間表現生成部１０６に入力された
アークのＩＤ列と意味関係記憶部１０５の記憶内容より
中間表現を作成して出力する。The continuous speech recognition section 104 connects the standard patterns stored in the word standard pattern storage section 103 according to the word string generated by tracing the above network, and calculates the characteristics of the input speech received from the speech input section 101. By performing pattern matching with the time series of vectors, a word string with the highest acoustic similarity to the input speech is selected. The continuous speech recognition unit 104 outputs a string of IDs of arcs of each word representing this word string to the intermediate expression generation unit 106 as a recognition result. An intermediate representation is created and output from the ID string of the arc input to the intermediate representation generation unit 106 and the contents stored in the semantic relationship storage unit 105.

【００３４】次に、中間表現生成部１０６の処理につい
て説明する。今、入力音声が“私はあさってのバーシア
のコンサートのチケットが欲しい”であるとする。この
とき図１６に示したアークのＩＤ番号より音声認識結果
として“１，３，５，６，１０，１１，１３，１５，１
６，１７”というアークＩＤの列が中間表現生成部１０
６に渡される。中間表現生成部１０６では図３の情報と
アークのＩＤ列より中間表現を作成する。まずアークＩ
Ｄ番号の列の先頭の１に対して図３のテーブルを検索す
ると、ＩＤ番号１の欄の親ＩＤ番号が１７，１８，１９
である。この中でアークのＩＤ列中にあるのは１７番な
ので、中間表現の親は“欲しい”で、またＣＡＳＥの欄
から“受益”の関係であることが分かる。また、素性情
報の欄から“＃名詞意味素性，１１１”という素性がこ
の文の中間表現中の“私”のところに付与される。以上
のような処理を繰り返すことにより図１７のような中間
表現が得られる。Next, the processing of the intermediate representation generation section 106 will be explained. Suppose that the input voice is "I want tickets to Basia's concert tomorrow." At this time, the voice recognition result is "1, 3, 5, 6, 10, 11, 13, 15, 1 from the ID number of the arc shown in FIG.
The sequence of arc IDs “6, 17” is the intermediate representation generation unit 10.
Passed to 6. The intermediate representation generation unit 106 creates an intermediate representation from the information shown in FIG. 3 and the ID string of the arc. First, Arc I
When searching the table in Figure 3 for the first 1 in the D number column, the parent ID numbers in the ID number 1 column are 17, 18, 19.
It is. Among these, number 17 is in the ID column of the arc, so the parent of the intermediate expression is "want", and the CASE column shows that the relationship is "benefit". Also, from the feature information column, a feature "#noun semantic feature, 111" is added to "I" in the intermediate expression of this sentence. By repeating the above processing, an intermediate representation as shown in FIG. 17 is obtained.

【００３５】次に請求項２の発明の一実施例について説
明する。請求項２の発明の一実施例は図１における意味
関係記憶部１０５中に音声認識用のネットワーク及び各
ネットワークから呼ばれるサブネットワーク中の単語の
意味情報，素性情報を記憶して中間表現を作成する。図
４（ａ）は音声認識に用いるメインネットワークで、図
４（ｂ），図４（ｃ），図４（ｄ）がメインネットワー
クから呼ばれるサブネットワークである。例えば図４（
ｂ）はサブネット“日時”でメインネット中のアークＩ
Ｄ番号２のところで呼ばれる。Next, an embodiment of the invention according to claim 2 will be described. An embodiment of the invention of claim 2 creates an intermediate representation by storing semantic information and feature information of words in a speech recognition network and sub-networks called from each network in the semantic relation storage unit 105 in FIG. . FIG. 4(a) is a main network used for speech recognition, and FIG. 4(b), FIG. 4(c), and FIG. 4(d) are subnetworks called from the main network. For example, Figure 4 (
b) is the subnet “date and time” and arc I in the mainnet.
Called at D number 2.

【００３６】今、入力文が“私はあさってのバーシアの
コンサートのチケットが欲しい”であるとする。すると
音声認識結果のアーク番号の列は図４（ａ）〜（ｄ）に
よって“１，２（２），３，４（１），５，６，８，１
０，１１，１２（１）”となる。ここで例えば“４（１
）”とあるのは、メインネットの４番のアークからサブ
ネットの１番のアークが呼ばれるという意味である。こ
の場合図５のＩＤ番号４の行のサブの欄からアーチスト
名のサブネットが呼ばれていることが分かり、図６（ｂ
）の内容からＣＰが“バーシア”で素性情報が＃名詞意
味素性，１１１であることが分かる。また、図５のＩＤ
番号４の行で親ＩＤ番号に１１，１２とあり、認識結果
のＩＤ列中に“１２（１）”があるので、図６（ｃ）か
らＣＰ“欲しい”を指し、ＣＡＳＥの欄から“バーシア
”と“欲しい”は所有の関係であることが分かる。以上
のような処理を繰り返すことにより図１７のような中間
表現が得られる。この他の処理は請求項１の発明の一実
施例と全く同じである。この結果、ネットワーク及びネ
ットワークから呼ばれるサブネットワークを用いた音声
認識の結果から中間表現を作成することができる。Now, suppose that the input sentence is "I want tickets for Basia's concert tomorrow." Then, the sequence of arc numbers in the voice recognition result is “1, 2 (2), 3, 4 (1), 5, 6, 8, 1” according to FIGS. 4(a) to (d).
0, 11, 12 (1)". Here, for example, "4 (1
)" means that the arc No. 1 of the subnet is called from the arc No. 4 of the main net. In this case, the subnet with the artist name is called from the sub column of the row with ID number 4 in Figure 5. Figure 6(b)
), it can be seen that the CP is "Basia" and the feature information is #Noun Semantic Feature, 111. Also, the ID in Figure 5
In the row with number 4, the parent ID numbers are 11 and 12, and the ID column of the recognition result is "12 (1)," so from Figure 6(c), point to CP "I want", and from the CASE column, select "12 (1)". It can be seen that ``versia'' and ``want'' have a relationship of ownership. By repeating the above processing, an intermediate representation as shown in FIG. 17 is obtained. The other processing is exactly the same as the embodiment of the invention according to claim 1. As a result, an intermediate representation can be created from the result of speech recognition using the network and the subnetworks called from the network.

【００３７】次に請求項３の発明の一実施例について説
明する。請求項３の発明の一実施例は図１における意味
関係記憶部１０５中に他の単語に付与する素性情報も記
憶しこの情報も用いて中間表現を作成する。Next, an embodiment of the invention according to claim 3 will be described. An embodiment of the invention according to claim 3 also stores feature information to be added to other words in the semantic relationship storage unit 105 in FIG. 1, and creates an intermediate expression using this information as well.

【００３８】例えば、入力音声が“私はチケットを１枚
欲しい”であったとする。すると音声認識結果のアーク
のＩＤ列は図７において“１，２，４，５，８”となる
。このとき、アーク番号５のものは図９のＩＤ番号５の
行からＣＰが“１枚”であり、その親はアークＩＤ列中
に“２”があるため“チケット”であることが分かる。また、親付与情報の欄から親ＣＰとなる“チケット”に
素性情報“＃ＮＵＭＢＥＲ，｛ＳＩＮ｝”を付与するこ
とが分かる。このような処理の結果、図８（ａ）のよう
な中間表現を作成することができる。この他の処理は請
求項１の発明の一実施例と全く同じである。For example, assume that the input voice is "I want one ticket." Then, the ID string of the arc as a result of voice recognition becomes "1, 2, 4, 5, 8" in FIG. At this time, it can be seen that the arc number 5 has a CP of "1" from the row of ID number 5 in FIG. 9, and its parent is a "ticket" because there is "2" in the arc ID column. Further, it can be seen from the parent assignment information column that the identity information "#NUMBER, {SIN}" is assigned to the "ticket" serving as the parent CP. As a result of such processing, an intermediate representation as shown in FIG. 8(a) can be created. The other processing is exactly the same as the embodiment of the invention according to claim 1.

【００３９】次に請求項４の発明の一実施例について説
明する。図２は本発明の他の実施例を示す構成図である
。本実施例では、素性情報を意味関係記憶部２０５から
だけでなく、単語辞書記憶部２０７の内容も用いて各単
語に付与することにより中間表現を作成する。例えば、
＃名詞意味素性，＃動詞意味分類等の素性情報は単語ご
とに常に同じであるので、意味関係記憶部２０５中に記
述しなくても単語辞書記憶部２０７の情報を用いること
で単語の素性情報として付与する。この他の処理は請求
項１の発明の一実施例と全く同じである。Next, an embodiment of the invention according to claim 4 will be described. FIG. 2 is a block diagram showing another embodiment of the present invention. In this embodiment, an intermediate expression is created by adding feature information to each word not only from the semantic relationship storage section 205 but also using the contents of the word dictionary storage section 207. for example,
Since feature information such as #noun semantic features and #verb semantic classification is always the same for each word, word feature information can be stored by using information in the word dictionary storage unit 207 without having to write it in the semantic relationship storage unit 205. Granted as. The other processing is exactly the same as the embodiment of the invention according to claim 1.

【００４０】次に請求項５の発明の一実施例について説
明する。Next, an embodiment of the invention according to claim 5 will be described.

【００４１】例えば図１０のようなネットワークにおい
て入力文が“あさってのバーシアのコンサートのチケッ
トが欲しい”であるとする。するとアークＩＤ番号の列
は“２，４，５，９，１０，１２，１４，１５，１６”
となる。また、意味関係記憶部の内容が図１１に示した
ものであるとする。ここでＩＤ番号の値がＤとなってい
る行は音声認識の結果に関わらず無条件で中間表現作成
に用いるとする。今“私”というＣＰがある行のＩＤ番
号の欄が“Ｄ”なので、この行の内容も中間表現作成に
使用する。その結果図１７に示したような正しい中間表
現を作成することができる。この他の処理は請求項１の
発明の一実施例と全く同じである。For example, suppose that in the network shown in FIG. 10, the input sentence is "I want tickets for Basia's concert tomorrow." Then the sequence of arc ID numbers is “2, 4, 5, 9, 10, 12, 14, 15, 16”
becomes. It is also assumed that the contents of the semantic relationship storage section are as shown in FIG. Here, it is assumed that the row whose ID number is D is unconditionally used for intermediate expression creation regardless of the result of speech recognition. Since the ID number column of the line containing the CP "I" is "D", the contents of this line are also used to create the intermediate representation. As a result, a correct intermediate representation as shown in FIG. 17 can be created. The other processing is exactly the same as the embodiment of the invention according to claim 1.

【００４２】次に請求項６の発明の一実施例について説
明する。Next, an embodiment of the invention according to claim 6 will be described.

【００４３】例えば図１２のようなネットワークにおい
て入力文が“私はバーシアのチケットが欲しい”である
とする。するとアークＩＤ番号の列は“１，２，４，５
，７，８”となる。また意味関係記憶部の内容が図１４
に示したものであるとする。ここでオンノードという欄
は、この欄の中に記述した番号のアークが音声認識結果
にあるとき、この行の内容を中間表現作成に用いるとす
る。このときアークＩＤ番号列中に“２”があるので、
ＩＤ番号が“ｏｎｌ”である行の内容も中間表現作成に
使用し、その結果、図１３（ａ）に示したような正しい
中間表現を作成することができる。この他の処理は請求
項１の発明の一実施例と全く同じである。For example, suppose that in the network shown in FIG. 12, the input sentence is "I want a Basia ticket." Then the sequence of arc ID numbers is “1, 2, 4, 5”
, 7, 8''. Also, the contents of the semantic relationship storage section are shown in Figure 14.
Assume that it is as shown in . Here, in the on-node column, when the arc with the number described in this column exists in the speech recognition result, the contents of this line are used to create an intermediate expression. At this time, there is "2" in the arc ID number string, so
The contents of the line with the ID number "onl" are also used to create the intermediate representation, and as a result, a correct intermediate representation as shown in FIG. 13(a) can be created. The other processing is exactly the same as the embodiment of the invention according to claim 1.

【００４４】次に請求項７の発明の一実施例について説
明する。図１８は図１５におけるネットワーク用の意味
関係記憶部の内容の一例を示す図、図１９は図１５にお
けるネットワークの中間表現の例を示す図である。Next, an embodiment of the invention according to claim 7 will be described. FIG. 18 is a diagram showing an example of the contents of the network semantic relationship storage section in FIG. 15, and FIG. 19 is a diagram showing an example of the intermediate representation of the network in FIG. 15.

【００４５】例えば図１５のようなネットワークにおい
て入力文が“私はバーシアのコンサートのチケットが欲
しい”であるとする。するとアークＩＤ番号の列は“１
，２，４，５，７，８，１０，１１”となる。また意味
関係記憶部が図１８に示したものであるとする。ここで
オフノードという欄は、この欄の中に記述した番号のア
ークが音声認識結果にあるとき、たとえ同じ行内のＩＤ
番号が“Ｄ”となっていてもこの行の内容を中間表現作
成に用いないものとする。今、図１８の１番上の行はＩ
Ｄ番号が“Ｄ”となっているがオフノードの欄に“１”
とあり、音声認識結果のアークＩＤ番号列中に“１”が
あるのでこの行の情報は使用しない。その結果、図１９
のような中間表現が得られる。For example, suppose that in the network shown in FIG. 15, the input sentence is "I want tickets to Basia's concert." Then the arc ID number column is “1”
, 2, 4, 5, 7, 8, 10, 11''. Also assume that the semantic relationship storage unit is as shown in Figure 18.Here, the off node column is the number written in this column. When the arc of is in the speech recognition result, even if the ID in the same line
Even if the number is "D", the contents of this line are not used to create an intermediate representation. Now, the top row of Figure 18 is I
The D number is “D” but there is “1” in the off node column.
Since there is "1" in the arc ID number column of the voice recognition result, the information in this row is not used. As a result, Figure 19
An intermediate representation such as is obtained.

【００４６】また入力文が“バーシアのコンサートのチ
ケットが欲しい”であるとする。するとアークＩＤ番号
の列は図１５において“２，４，５，７，８，１０，１
１”となる。今、図１８の１番上の行はＩＤ番号が“Ｄ
”となっているためこの行の内容も使用し、その結果、
図１９のような中間表現が得られる。このように本実施
例では、ネットワーク内にて省略したりしなかったりす
る要素がある場合も正しく中間表現を作成することがで
きる。この他の処理は請求項１の発明の一実施例と全く
同じである。Also assume that the input sentence is "I want tickets to Basia's concert." Then, the sequence of arc ID numbers becomes “2, 4, 5, 7, 8, 10, 1” in FIG.
1". Now, in the top row of FIG. 18, the ID number is "D".
”, so the contents of this line are also used, and as a result,
An intermediate representation as shown in FIG. 19 is obtained. In this way, in this embodiment, even if there are elements that are omitted or omitted in the network, it is possible to correctly create an intermediate representation. The other processing is exactly the same as the embodiment of the invention according to claim 1.

【００４７】[0047]

【発明の効果】以上述べたように本発明の連続音声認識
方式によれば、音声認識の際により自然で多様な文を認
識し、かつその意味表現を出力することができるという
効果が得られる。[Effects of the Invention] As described above, according to the continuous speech recognition method of the present invention, it is possible to recognize more natural and diverse sentences during speech recognition, and to output their meaning expressions. .

[Brief explanation of the drawing]

【図１】本発明の一実施例を示す構成図である。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図２】本発明の他の実施例を示す構成図である。FIG. 2 is a configuration diagram showing another embodiment of the present invention.

【図３】図１における意味関係記憶部の内容の一例を示
す図である。FIG. 3 is a diagram showing an example of the contents of a semantic relationship storage section in FIG. 1;

【図４】音声認識用ネットワークの第１の例を示す図で
ある。FIG. 4 is a diagram showing a first example of a speech recognition network.

【図５】図４におけるメインネットワーク用の意味関係
記憶部の内容の一例を示す図である。FIG. 5 is a diagram illustrating an example of the contents of a semantic relationship storage unit for the main network in FIG. 4;

【図６】図４におけるサブネットワーク用の意味関係記
憶部の内容の一例を示す図である。FIG. 6 is a diagram illustrating an example of the contents of a semantic relationship storage unit for subnetworks in FIG. 4;

【図７】音声認識用ネットワークの第２の例を示す図で
ある。FIG. 7 is a diagram showing a second example of a speech recognition network.

【図８】図７におけるネットワークの中間表現の例を示
す図である。FIG. 8 is a diagram showing an example of an intermediate representation of the network in FIG. 7;

【図９】図７におけるネットワーク用の意味関係記憶部
の内容の一例を示す図である。FIG. 9 is a diagram illustrating an example of the contents of a network semantic relationship storage unit in FIG. 7;

【図１０】音声認識用ネットワークの第３の例を示す図
である。FIG. 10 is a diagram showing a third example of a voice recognition network.

【図１１】図１０におけるネットワーク用の意味関係記
憶部の内容の一例を示す図である。FIG. 11 is a diagram illustrating an example of the contents of a network semantic relationship storage unit in FIG. 10;

【図１２】音声認識用ネットワークの第４の例示す図で
ある。FIG. 12 is a diagram illustrating a fourth example of a network for speech recognition.

【図１３】図１２におけるネットワークの中間表現の例
を示す図である。FIG. 13 is a diagram illustrating an example of an intermediate representation of the network in FIG. 12;

【図１４】図１２におけるネットワーク用の意味関係記
憶部の内容の一例を示す図である。FIG. 14 is a diagram illustrating an example of the contents of a network semantic relationship storage unit in FIG. 12;

【図１５】音声認識用ネットワークの第５の例を示す図
である。FIG. 15 is a diagram showing a fifth example of a voice recognition network.

【図１６】音声認識用ネットワークの第６の例を示す図
である。FIG. 16 is a diagram showing a sixth example of a speech recognition network.

【図１７】図１６におけるネットワークの中間表現の例
を示す図である。FIG. 17 is a diagram illustrating an example of an intermediate representation of the network in FIG. 16;

【図１８】図１５におけるネットワーク用の意味関係記
憶部の内容の一例を示す図である。FIG. 18 is a diagram illustrating an example of the contents of a network semantic relationship storage unit in FIG. 15;

【図１９】図１５におけるネットワークの中間表現の例
を示す図である。FIG. 19 is a diagram illustrating an example of an intermediate representation of the network in FIG. 15;

[Explanation of symbols]

１０１，２０１　　　　音声入力部１０２，２０２　　　　ネットワーク記憶部１０３，２
０３　　　　単語標準パターン記憶部１０４，２０４　
　　　連続音声認識部１０５，２０５　　　　意味関係
記憶部１０６，２０６　　　　中間表現生成部２０７　
　　　単語辞書記憶部101, 201 Audio input section 102, 202 Network storage section 103, 2
03 Word standard pattern storage unit 104, 204
Continuous speech recognition unit 105, 205 Semantic relationship storage unit 106, 206 Intermediate expression generation unit 207
Word dictionary storage section

Claims

[Claims]

1. A first storage means for storing a network related to words expressing a grammar of a continuous speech recognition target;
continuous speech recognition means for recognizing continuous speech by combining standard patterns of words according to the network; and a second device for storing semantic relationships between one word and another word in the network and feature information for each word. 1. A continuous speech recognition method, comprising: a storage means; and an output means for outputting a recognition result generated from the continuous speech recognition means and an intermediate representation thereof from the second storage means.

2. The continuous speech recognition means combines standard patterns of predetermined recognition units according to the network and sub-networks called from each network to recognize continuous speech, and the first storage means recognizes continuous speech. A network representing a grammar to be recognized and a sub-network called from each network are stored, and the output means stores the recognition result generated from the continuous speech recognition means and the meaning between a certain word and another word of the second storage means. 2. The continuous speech recognition method according to claim 1, wherein an intermediate representation thereof is output based on a relationship between the two words and feature information for each word in the network and sub-networks.

3. The second storage means for storing feature information to be added to another word when a certain word is recognized;
3. The continuous speech recognition method according to claim 1, further comprising: adding means for adding the information to the intermediate representation.

4. When a certain word is recognized, feature information to be added to the word is retrieved from a word dictionary storage section, and the adding means adds the information to the intermediate expression. Continuous speech recognition method described in 2 or 3.

5. The word dictionary storage unit stores words that do not exist in the network expressing the grammar of the continuous speech recognition target but are absolutely necessary for creating an intermediate expression, and the output unit stores words that are not present in the network expressing the grammar of the continuous speech recognition target, and the output unit stores words that are necessary to create an intermediate expression. 5. The continuous speech recognition method according to claim 1, wherein an intermediate representation is created and outputted.

6. The word dictionary storage unit stores words to be supplemented when a certain specific word in the network expressing the grammar of the continuous speech recognition target is recognized, and the output unit stores intermediate words based on the information. 6. The continuous speech recognition method according to claim 1, wherein an expression is created and output.

7. The word dictionary storage unit stores a specific word in the network expressing the grammar of the continuous speech recognition target that is to be deleted when a specific word in the network expressing the grammar of the continuous speech recognition target is recognized. 7. The continuous speech recognition system according to claim 1, wherein words are stored, and the output means creates and outputs an intermediate expression based on the information.