JPH0772891A

JPH0772891A - Device for recognizing voice

Info

Publication number: JPH0772891A
Application number: JP5218327A
Authority: JP
Inventors: Tatsuro Ito; 達朗伊藤; Masakatsu Hoshimi; 昌克星見; Maki Yamada; 麻紀山田; Mitsuru Endo; 充遠藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-09-02
Filing date: 1993-09-02
Publication date: 1995-03-17
Anticipated expiration: 2019-10-13
Also published as: JP3577725B2

Abstract

PURPOSE:To provide a voice recognition device capable of correctly recognizing a sentence sound of an inverted expression where a predicative is omitted, and no predicative exists in the end of the sentence. CONSTITUTION:A sound signal is inputted to a sound analysis part 1, and a paragraph lattice is formed through a frame similarity calculation part 3 and a spotting part 5. In a key word estimation part 8, first of all, several paragraph hypotheses (hereafter, called as key word) with a high similarity score are found from the paragraph lattices, and by referring a relative/receptive rule of a construction information storage part 6, the predicative receiving the key word is estimated. In an input sentence estimation part 7, by using the estimated predicate, the key word and the relative/receptive rule of the construction information storage part 6, a paragraph column is retrieved from the paragraph lattices, and is made a recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は人間の発声した音声、特
に文音声等の連続音声をを自動認識する音声認識装置に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for automatically recognizing a human voice, particularly a continuous voice such as a sentence voice.

【０００２】[0002]

【従来の技術】従来、文音声等の連続音声を認識する装
置を実現する方法として、入力音声区間中に考えられう
る単語または文節の存在仮説の集合データ（以後ラティ
スと呼ぶ）から実際に入力された単語または文節を構文
情報を用いて推定する方法がある。2. Description of the Related Art Conventionally, as a method for realizing an apparatus for recognizing continuous speech such as sentence speech, data is actually input from a set data of hypotheses of existence of words or phrases that can be considered in an input speech section (hereinafter referred to as lattice). There is a method of estimating a specified word or phrase using syntactic information.

【０００３】以下、従来技術として、文節についてラテ
ィスを求めた後、文節間の結合関係からなる構文情報を
用いて入力文を推定する音声認識装置について説明す
る。図６は従来の音声認識装置の構成を示したものであ
る。図６において、１は入力音声を分析の単位であるフ
レーム毎に分析し、特徴パラメータを得る音声分析部、
２は「認識の基本単位」（例えば音素。以後音素を用い
て説明する。）の特徴を表わす標準パターンを格納する
標準パターン格納部、３は特徴パラメータと標準パター
ンとの間の類似度をフレーム毎に算出するフレーム類似
度算出部、４は「言語単位」である文節の発音に関する
情報を上記認識の基本単位を表す発音記号によって表記
した発音情報格納部、５は音声区間中における文節の存
在仮説データの集合である文節ラティスを作成するスポ
ッティング部、６は認識対象となる文における文節間の
結合情報を格納する構文情報格納部、７はスポッティン
グ部５で作成された文節ラティスから入力された文を推
定する入力文推定部である。As a conventional technique, a speech recognition apparatus for estimating an input sentence by using a syntactic information consisting of a connection relation between phrases after obtaining a lattice for the phrase will be described. FIG. 6 shows the configuration of a conventional voice recognition device. In FIG. 6, reference numeral 1 denotes a voice analysis unit that analyzes input voice for each frame that is a unit of analysis and obtains a characteristic parameter.
Reference numeral 2 denotes a standard pattern storage unit that stores a standard pattern that represents a feature of a "basic unit of recognition" (for example, a phoneme, which will be described below using a phoneme). 3 is a frame indicating the similarity between the feature parameter and the standard pattern. A frame similarity calculation unit for each calculation, 4 is a pronunciation information storage unit in which information about pronunciation of a phrase, which is a “linguistic unit”, is represented by a phonetic symbol that represents the basic unit of recognition, and 5 is the presence of a phrase in a voice section. A spotting unit that creates a phrase lattice, which is a set of hypothesis data, 6 is a syntax information storage unit that stores the connection information between the phrases in the sentence to be recognized, and 7 is input from the phrase lattice created by the spotting unit 5. It is an input sentence estimation unit that estimates a sentence.

【０００４】以上のように構成された音声認識装置につ
いて、以下その動作について説明する。まず、音声信号
は音声分析部１に入力される。音声分析部１は入力音声
を分析し、音声の特徴を表す特徴パラメータの時系列を
分析の基本単位であるフレーム毎に出力する。フレーム
類似度算出部３は音声分析部１で得られた特徴パラメー
タ時系列と、標準パターン格納部２に音素毎に用意され
た標準パターンとの間の類似度を計算することによって
得られるフレーム類似度を全音声区間について出力す
る。スポッティング部５では、考えられうる文節の存在
を全音声区間に仮定し、発音情報格納部４に格納された
各文節の音素表記とフレーム類似度算出部３で求めてお
いたフレーム類似度を用いて文節の始終端自由なパター
ン照合を行う。The operation of the speech recognition apparatus configured as described above will be described below. First, the voice signal is input to the voice analysis unit 1. The voice analysis unit 1 analyzes an input voice and outputs a time series of feature parameters representing the feature of voice for each frame which is a basic unit of analysis. The frame similarity calculation unit 3 obtains the frame similarity obtained by calculating the similarity between the feature parameter time series obtained by the speech analysis unit 1 and the standard pattern prepared for each phoneme in the standard pattern storage unit 2. The degree is output for all voice sections. The spotting unit 5 assumes the existence of conceivable phrases in the entire speech section, and uses the phoneme notation of each phrase stored in the pronunciation information storage unit 4 and the frame similarity calculated by the frame similarity calculation unit 3. The pattern matching can be done freely at the beginning and end of the phrase.

【０００５】図７はこうして得られた文節ラティスの一
例である。文節ラティス中の各文節仮説にはその存在区
間と類似度スコアが求められている。類似度スコアは文
節仮説の存在可能性を表す尺度でありで、文節仮説区間
中のフレーム類似度を最適に足し合わせる事等により計
算される。構文情報格納部６には、文脈自由文法、オー
トマトン、係り受け構造規則等の枠組みに基づいた文節
間の接続情報である構文情報が格納されている。係り受
け構造規則に基づいた構文情報の一例を図８に示す。図
８の構文情報は、受け文節となる動詞「飲みたい」と
「食べたい」に関するものである。「飲みたい」に係る
文節は２つあり、一つは「飲み物」という範疇の自立語
と「を」という付属語から成る文節、もう一つは「もう
一杯」「一杯」等の自立語のみから成る文節であること
を表している。また、「食べたい」に係る文節は２つあ
り、一つは「食べ物」という範疇の自立語と「を」とい
う付属語から成る文節、もう一つは「もう一枚」「一
つ」等の自立語のみから成る文節であることを表してい
る。FIG. 7 shows an example of the phrase lattice thus obtained. For each bunsetsu hypothesis in the bunsetsu lattice, its existence interval and similarity score are required. The similarity score is a measure of the possibility of existence of the bunsetsu hypothesis, and is calculated by optimally adding the frame similarities in the bunsetsu hypothesis section. The syntax information storage unit 6 stores syntax information that is connection information between clauses based on a framework such as a context-free grammar, an automaton, and a dependency structure rule. FIG. 8 shows an example of syntax information based on the dependency structure rule. The syntax information in FIG. 8 relates to the verbs "want to drink" and "want to eat" that are the receiving phrases. There are two clauses related to "I want to drink", one is a clause consisting of an independent word in the category of "drink" and the adjunct word "wo", and the other is only an independent word such as "one more cup" or "one cup". It means that the clause consists of. Also, there are two clauses related to "I want to eat", one consisting of an independent word in the category of "food" and an adjunct word of "wo", the other is "another piece", "one", etc. It means that it is a bunsetsu consisting only of independent words.

【０００６】入力文推定部７では、上記文節ラティス中
から文を成す文節仮説を探索しこれを認識結果とする。
この時入力文推定部７では、文節仮説間の時間的な連続
性や構文情報格納部６に格納された文節間のつながり情
報を制約条件として、文を成す文節列を探索する。以下
に、係り受け構文規則を用いて文末に位置する受け文節
である述語を手がかりとした文節列探索処理を図９のＰ
ＡＤ図を参照しながら説明する。まず、文節ラティス中
から音声区間の終端付近に存在する述語となり得る文節
仮説を探す（ステップＳ１）。見つかった述語となり得
る文節仮説を部分文の集合とする（ステップＳ２）。部
分文の集合が空になるまで以下の処理（ステップＳ４〜
Ｓ１２）を繰り返す（ステップＳ３）。各部分文に対し
て時間的に前に隣接する文節仮説（以降、前隣接仮説と
呼ぶ）を文節ラティス中から探索し（ステップＳ５）、
それらの前隣接仮説を係り受けの関係で受けることがで
きる文節が部分文に存在するか否か調べる（ステップＳ
７）。具体的には、部分文を構成している各文節に関す
る係り受け構造規則を構文情報格納部６に問い合せる。
前隣接仮説を係り受けの関係で受けることができる文節
が部分文に存在する場合には、前隣接仮説をその部分文
につなげたものを新部分文の集合に加える（ステップＳ
８）。こうしてできた新部分文の集合の各部分文につい
て、部分文が音声区間を前から後まで覆っているか調べ
る（ステップＳ１０）。覆っている部分文は新部分文の
集合から除き、文候補の集合に加える（ステップＳ１
１）。最後に新部分文の集合を部分文の集合とする。こ
こで、部分文の集合が空でなければステップＳ４からス
テップＳ１２の処理を繰り返す。以上の処理の結果、文
候補の集合が得られる。大抵の場合、文候補は複数見つ
かる。認識結果として文候補から１つを選択するため
に、文候補を成す文節仮説の類似度スコアから計算でき
る評価関数を用意しておきその値が一番高いものを認識
結果とする。また文候補の探索処理量を軽減するため
に、上記評価関数を用いてステップＳ１２において、新
部分文のうち評価関数の値が高いものだけを次の処理
（ステップＳ４）の対象となる部分文として残す。これ
は、一般にビームサーチ（枝がり法）と呼ばれる探索手
法であり、残す数のことをビーム幅という。The input sentence estimation unit 7 searches for a phrase hypothesis forming a sentence from the phrase lattice and uses it as a recognition result.
At this time, the input sentence estimation unit 7 searches for a phrase sequence forming a sentence by using the temporal continuity between the phrase hypotheses and the connection information between the phrases stored in the syntax information storage unit 6 as a constraint condition. Below, the P clause of FIG. 9 will be used for the phrase sequence search process using the dependency syntax rule as a clue to the predicate that is the dependent phrase located at the end of the sentence.
This will be described with reference to the AD diagram. First, a bunsetsu hypothesis that can be a predicate that exists near the end of the speech section is searched from the bunsetsu lattice (step S1). A clause hypothesis that can serve as a found predicate is set as a set of partial sentences (step S2). The following processing is performed until the set of partial sentences becomes empty (step S4 to
S12) is repeated (step S3). A bunsetsu hypothesis (hereinafter referred to as a pre-adjacency hypothesis) temporally adjacent to each sub-sentence is searched from the bunsetsu lattice (step S5),
It is checked whether or not there is a bunsetsu that can receive those pre-adjacent hypotheses in a dependency relationship (step S).
7). Specifically, the dependency information structure rule regarding each clause forming the partial sentence is inquired to the syntax information storage unit 6.
If a subsentence has a clause that can receive the preceding adjacency hypothesis in a dependency relationship, the preceding adjacency hypothesis connected to that subsentence is added to the set of new subsentences (step S
8). For each sub-sentence of the set of new sub-sentences created in this way, it is checked whether the sub-sentence covers the voice section from the front to the back (step S10). The covering partial sentence is removed from the set of new partial sentences and added to the set of sentence candidates (step S1).
1). Finally, the set of new sub-sentences is the set of sub-sentences. Here, if the set of sub-sentences is not empty, the processes of steps S4 to S12 are repeated. As a result of the above processing, a set of sentence candidates is obtained. In most cases, multiple sentence candidates are found. In order to select one of the sentence candidates as the recognition result, an evaluation function that can be calculated from the similarity score of the bunsetsu hypothesis forming the sentence candidate is prepared, and the one having the highest value is set as the recognition result. Further, in order to reduce the amount of sentence candidate search processing, in step S12 using the above evaluation function, only new sub-sentences having a high evaluation function value are targeted for the next processing (step S4). Leave as. This is a search method generally called a beam search (branching method), and the remaining number is called a beam width.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら上記の従
来の音声認識装置は、はっきり発声されない等の原因に
より文末の述語が類似度スコアの高い文節仮説として文
節ラティス中に求められない場合には、正しく認識する
ことができないという課題を有していた。また、文末の
述語を文節ラティス探索の直接の手がかりとしているた
めに、文末の述語が省略された文や文末に述語が無い倒
置表現の文を認識することができないという課題を有し
ていた。さらに、係り受け関係規則を満たす前隣接の文
節仮説を文末から文頭までつないでいく従来の音声認識
装置では、係り受け関係規則を満たす前隣接の文節仮説
が見つからないと探索処理が途中で続かなくなり結果が
得られないという課題を有していた。However, the above-mentioned conventional speech recognition apparatus does not operate correctly when the predicate at the end of a sentence is not found in the phrase lattice as a phrase hypothesis with a high similarity score due to the fact that it is not uttered clearly. It had a problem that it could not be recognized. Moreover, since the predicate at the end of the sentence is used as a direct clue for the phrase lattice search, there is a problem that it is not possible to recognize a sentence in which the predicate at the end of the sentence is omitted or an inverted expression sentence in which there is no predicate at the end of the sentence. Furthermore, in the conventional speech recognition system that connects the pre-adjacent bunsetsu hypotheses that satisfy the dependency relation rule from the end of the sentence to the beginning of the sentence, the search process will not continue unless the pre-adjacent bunsetsu hypotheses that satisfy the dependency relation rule are found. There was a problem that results could not be obtained.

【０００８】本発明は、上記従来の課題を解決するもの
で、文末の述語が類似度スコアの高い文節仮説として文
節ラティス中に求められない文音声や文末の述語が省略
されたり文末に述語が無い倒置表現の文音声を正しく認
識することができ、係り受け関係規則を満たす前隣接の
文節仮説が見つからない場合にも探索処理が続けられる
音声認識装置を提供することを目的とする。The present invention is to solve the above-mentioned conventional problems. As a predicate at the end of a sentence, a sentence voice that is not required in the phrase lattice as a phrase hypothesis having a high similarity score or a predicate at the end of a sentence is omitted. An object of the present invention is to provide a speech recognition apparatus that can correctly recognize a sentence speech that does not have an inverted expression and that can continue search processing even when a pre-adjacent bunsetsu hypothesis that satisfies the dependency rule is not found.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
に本発明の音声認識装置は、入力音声を分析の単位であ
るフレーム毎に分析し、特徴パラメータを得る音声分析
部と、音素や音節等の「認識の基本単位」の特徴を表わ
す標準パターンを格納する標準パターン格納部と、上記
特徴パラメータと標準パターンとの間の類似度をフレー
ム毎に算出するフレーム類似度算出部と、単語や文節等
の「言語単位」の発音に関する情報を上記認識の基本単
位を表す発音記号によって表記した発音情報格納部と、
上記発音辞書に格納されている全ての言語単位に対す
る、入力音声の任意の区間における類似度を算出してラ
ティスを作成するスポッティング部と、認識対象となる
文における単語や文節等の言語単位間の結合情報を格納
する構文情報格納部と、上記スポッティング部で作成さ
れたラティスから音声により入力された文を推定し、こ
れを認識結果とする入力文推定部と、上記入力推定部で
入力文を推定する際に手がかりとするキーワードを上記
スポッティング部で作成されたラティスから推定するキ
ーワード推定部とから成る。In order to achieve the above object, a speech recognition apparatus of the present invention comprises a speech analysis section for analyzing input speech for each frame which is a unit of analysis and obtaining a characteristic parameter, a phoneme and a syllable. And the like, a standard pattern storage unit that stores a standard pattern that represents a feature of a “basic unit of recognition”, a frame similarity calculation unit that calculates the similarity between the feature parameter and the standard pattern for each frame, a word or A pronunciation information storage unit in which information about pronunciation of "linguistic units" such as phrases is represented by phonetic symbols that represent the basic units of recognition,
Between all the language units stored in the pronunciation dictionary, between the spotting unit that calculates the similarity in an arbitrary section of the input speech to create a lattice and the language units such as words and clauses in the sentence to be recognized. The sentence information input unit that estimates the sentence input by voice from the lattice information storage unit that stores the combined information and the lattice created by the spotting unit, and uses this as the recognition result, and the input sentence by the input estimation unit. The keyword estimation unit estimates a keyword to be used as a clue from the lattice created by the spotting unit.

【００１０】[0010]

【作用】この構成によって、文節ラティス中の類似度ス
コアの高い文節仮説をキーワードとし、このキーワード
とこのキーワードから推定される述語を文節ラティス探
索の手がかりとし、文末の述語が類似度スコアの高い文
節仮説として文節ラティス中に求められない文音声や文
末の述語が省略され文音声および文末に述語が無い倒置
表現の文音声を認識することができる。また、入力文中
の一部の文節が文節ラティスの文節仮説として捕まらな
くても探索処理が進められ、入力文の一部を成す文節列
を得ることができる。With this configuration, a bunsetsu hypothesis with a high similarity score in a bunsetsu lattice is used as a keyword, and this keyword and a predicate estimated from this keyword are used as clues for a bunsetsu lattice search, and a predicate at the end of a sentence has a high similarity score. As a hypothesis, it is possible to recognize sentence speech which is not required in the phrase lattice and sentence predicates at the end of the sentence are omitted, and sentence speech of an inverted expression where there is no predicate at the end of the sentence. Further, even if a part of the bunsetsu in the input sentence is not caught as the bunsetsu lattice's bunsetsu hypothesis, the search process is advanced and the bunsetsu string forming a part of the input sentence can be obtained.

【００１１】[0011]

【Example】

（実施例１）以下本発明の第１の実施例について、図面
を参照しながら説明する。図１は本実施例における音声
認識装置のブロック図である。本実施例においては、図
６に示した従来の音声認識装置と同一構成部分には同一
番号を付してその詳細な説明を省略する。(First Embodiment) A first embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a voice recognition device in this embodiment. In the present embodiment, the same components as those of the conventional speech recognition apparatus shown in FIG. 6 are designated by the same reference numerals and detailed description thereof will be omitted.

【００１２】図１において、１は入力音声を分析の単位
であるフレーム毎に分析し、特徴パラメータを得る音声
分析部、２は「認識の基本単位」（例えば音素。以後音
素を用いて説明する）の特徴を表わす標準パターンを格
納する標準パターン格納部、３は特徴パラメータと標準
パターンとの間の類似度をフレーム毎に算出するフレー
ム類似度算出部、４は「言語単位」である文節の発音に
関する情報を上記認識の基本単位を表す発音記号によっ
て表記した発音情報格納部、５は音声区間中における文
節の存在仮説データの集合である文節ラティスを作成す
るスポッティング部、６は認識対象となる文における文
節間の結合情報を格納する構文情報格納部、７はスポッ
ティング部５で作成された文節ラティスから入力された
文を推定する入力文推定部、８は入力推定部７で入力文
を推定する際に手がかりとするキーワードを上記スポッ
ティング部で作成されたラティスから推定するキーワー
ド推定部である。In FIG. 1, reference numeral 1 denotes a speech analysis unit that analyzes input speech for each frame that is a unit of analysis and obtains a characteristic parameter. Reference numeral 2 denotes a "basic unit of recognition" (for example, a phoneme. Hereinafter, description will be made using phonemes. ) A standard pattern storage unit for storing a standard pattern representing a feature, 3 is a frame similarity calculation unit for calculating the similarity between the feature parameter and the standard pattern for each frame, and 4 is a clause of a "language unit". A pronunciation information storage unit in which information about pronunciation is represented by phonetic symbols that represent the basic unit of recognition, 5 is a spotting unit that creates a phrase lattice that is a set of hypothesis existence data of a phrase in a voice section, and 6 is a recognition target. A syntax information storage unit that stores connection information between clauses in a sentence, and 7 is an input that estimates a sentence that is input from the phrase lattice created by the spotting unit 5. Estimator, 8 keywords that clue in estimating the input sentence in the input estimating section 7 is a keyword estimation unit that estimates the lattice created by the spotting unit.

【００１３】以上のように構成された音声認識装置につ
いて、その動作を説明する。まず、音声信号は音声分析
部１に入力され、フレーム類似度算出部、スポッティン
グ部を経て文節ラティスが作成されるところまでは従来
例と同じである。キーワード推定部８では、文節ラティ
ス中からまず類似度スコアの高い文節仮説（以降、キー
ワードと呼ぶ）を幾つか見つける。次に、構文情報格納
部６の係り受け関係規則を参照することによって、キー
ワードを受ける述語を推定し、その述語が文節ラティス
の音声区間の終端付近に存在するか調べる。推定された
述語が文節ラティス中に存在する場合には、入力文推定
部７において、その述語と先のキーワードを手がかりと
して両者をつなぐように文節列を探索する。この時、構
文情報格納部６の係り受け関係規則を用いる。The operation of the speech recognition apparatus configured as above will be described. First, the speech signal is input to the speech analysis unit 1 and the same as in the conventional example up to the point where the phrase lattice is created through the frame similarity calculation unit and the spotting unit. The keyword estimation unit 8 first finds some bunsetsu hypotheses (hereinafter referred to as keywords) having a high similarity score from the bunsetsu lattice. Next, by referring to the dependency relation rule of the syntax information storage unit 6, a predicate that receives the keyword is estimated, and it is checked whether or not the predicate exists near the end of the speech section of the phrase lattice. When the estimated predicate exists in the phrase lattice, the input sentence estimation unit 7 searches the phrase string so as to connect the predicate and the preceding keyword as a clue. At this time, the dependency relation rule of the syntax information storage unit 6 is used.

【００１４】以上の動作の一例を示したのが図２であ
る。図２の例は「ビールをもう一杯飲みたい」の発声に
対して得られた文節ラティスの概略図であり、正解にな
るべき文節仮説「飲みたい」の類似度スコアは低くなっ
ている。ビーム幅１のビームサーチにより探索を行うと
する。まず、文節ラティス中において類似度スコアの一
番高い文節仮説「ビールを」をキーワードとして見つけ
る。次に、構文情報格納部６の係り受け関係規則におい
て「ビールを」を係り文節としてもつことができる述語
を探す。図８の係り受け関係規則を用いると述語として
「飲みたい」が推定される。この述語の仮説が文節ラテ
ィスの音声区間の終端付近に存在するか調べると存在す
るので、その述語と先のキーワードを手がかりとして図
８の係り受け関係規則を用いて前隣接する文節仮説をつ
なぎながら文節列を探索すれば「ビールを・もう一杯・
飲みたい」という文節列が正しく探索でき、認識結果と
して出力する。FIG. 2 shows an example of the above operation. The example of FIG. 2 is a schematic diagram of the phrase lattice obtained for the utterance "I want to drink another beer", and the similarity score of the phrase hypothesis "I want to drink" that should be correct is low. It is assumed that the search is performed by a beam search with a beam width of 1. First, the phrase hypothesis "beer" having the highest similarity score in the phrase lattice is found as a keyword. Next, in the dependency relation rule of the syntax information storage unit 6, a predicate that can have “beer” as a dependency clause is searched for. Using the dependency relation rule of FIG. 8, "I want to drink" is estimated as a predicate. It is present if the hypothesis of this predicate exists near the end of the speech section of the bunsetsu lattice. It exists, so using the predicate and the preceding keyword as a clue, while connecting the preceding bunsetsu hypotheses using the dependency relation rules in FIG. If you search for the phrase series, you'll see
The phrase string "I want to drink" can be correctly searched and is output as a recognition result.

【００１５】これに対して、従来の音声認識装置でビー
ム幅１のビームサーチにより探索を行うと、文末の述語
の探索（図９のステップＳ１）において類似度スコアの
高い「食べたい」が探索されてしまい、以降図８の係り
受け関係規則を用いると「ピザを・もう一枚・食べた
い」という文節列が探索されてしまう。On the other hand, when the search is performed by the beam search with the beam width of 1 in the conventional speech recognition apparatus, "I want to eat" having a high similarity score is searched in the search of the predicate at the end of the sentence (step S1 in FIG. 9). Then, if the dependency relation rule of FIG. 8 is used thereafter, the phrase sequence “I want to eat pizza, another one, I want to eat” will be searched.

【００１６】以上のように本実施例によれば、文節ラテ
ィスにおいて述語以外のキーワードを推定し上記キーワ
ードから述語を推定するキーワード推定部を設けること
により、はっきり発声されない等の原因により文末の述
語が類似度スコアの高い文節仮説として文節ラティス中
に求められない場合にも正しく認識することができる。As described above, according to this embodiment, by providing a keyword estimation unit that estimates a keyword other than a predicate in the bunsetsu lattice and estimates the predicate from the keyword, the predicate at the end of the sentence is not clearly pronounced. It can be correctly recognized even when the phrase hypothesis with a high similarity score cannot be obtained during the phrase lattice.

【００１７】（実施例２）以下本発明の第２の実施例に
ついて、図面を参照しながら説明する。本実施例の音声
認識装置の構成は第１の実施例と同一であるので詳細な
説明を省略し、その動作を説明する。(Second Embodiment) A second embodiment of the present invention will be described below with reference to the drawings. Since the configuration of the voice recognition apparatus of this embodiment is the same as that of the first embodiment, its detailed description is omitted and its operation will be described.

【００１８】まず、音声信号は音声分析部１に入力さ
れ、フレーム類似度算出部、スポッティング部を経て文
節ラティスが作成され、キーワド推定部８においてキー
ワードとそのキーワードを受ける述語を推定するところ
までは第１の実施例と同じである。キーワド推定部８で
は推定された述語が文節ラティスの音声区間の終端付近
に存在するか調べられる。推定された述語が文節ラティ
ス中に存在する場合には、入力文推定部７において、そ
の述語と先のキーワードを手がかりとして両者をつなぐ
ように文節列を探索する。この時、構文情報格納部６の
係り受け関係規則を用いる。ここでキーワードと述語の
間の区間に係り受け関係規則を満たす文節仮説が見つか
らない場合がある。この場合には、キーワードの前の音
声区間において係り受け関係規則を満たす文節仮説を探
す処理を行う。最悪の場合でも、キーワードと述語から
成る文節列を認識結果とする。First, the voice signal is input to the voice analysis unit 1, the phrase lattice is created through the frame similarity calculation unit and the spotting unit, and the keyword estimation unit 8 estimates the keyword and the predicate that receives the keyword. This is the same as the first embodiment. The keyword estimation unit 8 checks whether the estimated predicate exists near the end of the speech section of the phrase lattice. When the estimated predicate exists in the phrase lattice, the input sentence estimation unit 7 searches the phrase string so as to connect the predicate and the preceding keyword as a clue. At this time, the dependency relation rule of the syntax information storage unit 6 is used. Here, there is a case where a clause hypothesis that satisfies the dependency relation rule cannot be found in the section between the keyword and the predicate. In this case, a process of searching for a clause hypothesis that satisfies the dependency relation rule in the voice section before the keyword is performed. In the worst case, the phrase sequence consisting of keywords and predicates is used as the recognition result.

【００１９】以上の動作の一例を示したのが図３であ
る。図３の例は「ビールをもう一杯飲みたい」の発声に
対して得られた文節ラティスの概略図であり、正解にな
るべき文節仮説「もう一杯」が文節ラティスに存在して
いない。ビーム幅１のビームサーチにより探索を行うと
する。まず、文節ラティス中において類似度スコアの一
番高い文節仮説「ビールを」をキーワードとして見つけ
る。次に、構文情報格納部６の係り受け関係規則におい
て「ビールを」を係り文節としてもつことができる述語
を探す。図８の係り受け関係規則を用いると述語として
「飲みたい」が推定される。この述語の仮説が文節ラテ
ィスの音声区間の終端付近に存在するか調べると存在す
るので、その述語と先のキーワードを手がかりとして図
８の係り受け関係規則を用いて前隣接の文節仮説をつな
ぎながら文節列を探索する。ところが述語「飲みたい」
の文節仮説には係り受け関係規則を満足する前隣接の文
節仮説が存在していないので、キーワードの前の音声区
間において係り受け関係規則を満たす文節仮説を探す。
図３の例ではキーワードが音声区間の先頭にあるので探
索処理はここで終了となり、結果として「ビールを・飲
みたい」という文節列が得られる。これは、正解文節列
ではないが入力発声の一部は捕らえることができてお
り、入力が文の場合においては意味的に近い結果が得ら
れることが多い。FIG. 3 shows an example of the above operation. The example of FIG. 3 is a schematic diagram of the bunsetsu lattice obtained for the utterance "I want to drink another beer", and the bunsetsu hypothesis "another cup" that should be the correct answer does not exist in the bunsetsu lattice. It is assumed that the search is performed by a beam search with a beam width of 1. First, the phrase hypothesis "beer" having the highest similarity score in the phrase lattice is found as a keyword. Next, in the dependency relation rule of the syntax information storage unit 6, a predicate that can have “beer” as a dependency clause is searched for. Using the dependency relation rule of FIG. 8, "I want to drink" is estimated as a predicate. It is present if the hypothesis of this predicate exists near the end of the speech section of the bunsetsu lattice, so it exists, and while connecting the pre-adjacent bunsetsu hypothesis using the dependency relation rule of FIG. Search for a phrase sequence. However, the predicate "I want to drink"
Since there is no pre-adjacent bunsetsu hypothesis satisfying the dependency relation rule, the bunsetsu hypothesis is searched for a bunsetsu hypothesis that satisfies the dependency relation rule in the speech section before the keyword.
In the example of FIG. 3, since the keyword is at the beginning of the voice section, the search process ends here, and as a result, the phrase string “beer / want to drink” is obtained. Although this is not the correct phrase sequence, it can catch a part of the input utterance, and when the input is a sentence, a result close in meaning is often obtained.

【００２０】これに対して、従来の音声認識装置でビー
ム幅１のビームサーチにより探索を行うと、文末の述語
の探索（図９のステップＳ１）において類似度スコアの
高い「食べたい」が探索されてしまい、以降図８の係り
受け関係規則を用いると「ピザを・もう一枚・食べた
い」という文節列が探索されてしまう。また文節仮説
「飲みたい」の類似度スコアが仮に「食べたい」よりも
高い場合でも、係り受け関係規則を満たす前隣接の文節
仮説を文末から文頭までつないでいく従来の音声認識装
置では、係り受け関係規則を満たす前隣接の文節仮説が
見つからないと探索処理が途中で続かなくなり結果が得
られない。On the other hand, when the search is performed by the beam search with the beam width of 1 in the conventional speech recognition apparatus, "I want to eat" having a high similarity score is searched in the search of the predicate at the end of the sentence (step S1 in FIG. 9). Then, if the dependency relation rule of FIG. 8 is used thereafter, the phrase sequence “I want to eat pizza, another one, I want to eat” will be searched. Even if the similarity score of the bunsetsu hypothesis “I want to drink” is higher than that of “I want to eat”, in the conventional speech recognition device that connects the preceding bunsetsu hypotheses that satisfy the dependency relation rule from the end of the sentence to the beginning of the sentence, If no pre-adjacent bunsetsu hypothesis satisfying the receiving relation rule is found, the search process does not continue halfway and no result is obtained.

【００２１】以上のように、文節ラティスにおいて述語
以外のキーワードを推定し上記キーワードから述語を推
定するキーワード推定部とラティス中に適当な前隣接の
仮説が存在しない場合にも入力文の推定処理を続けるこ
とができる入力文推定部を設けることにより、入力文中
の一部の文節が文節ラティスの文節仮説として捕まらな
くても探索処理が進められ、入力文の一部を成す文節列
を得ることができる。As described above, the keyword estimation unit for estimating a keyword other than the predicate in the phrase lattice and estimating the predicate from the keyword and the estimation process of the input sentence are performed even when there is no proper pre-adjacent hypothesis in the lattice. By providing an input sentence estimation unit that can continue, the search process can proceed even if some bunsetsus in the input sentence are not caught as the bunsetsu lattice's bunsetsu hypothesis, and a bunsetsu string forming a part of the input sentence can be obtained. it can.

【００２２】（実施例３）以下本発明の第３の実施例に
ついて、図面を参照しながら説明する。本実施例の音声
認識装置の構成は第１の実施例と同一であるのでその詳
細な説明を省略し、その動作を説明する。(Embodiment 3) A third embodiment of the present invention will be described below with reference to the drawings. Since the configuration of the voice recognition apparatus of this embodiment is the same as that of the first embodiment, its detailed description will be omitted and its operation will be described.

【００２３】まず、音声信号は音声分析部１に入力さ
れ、フレーム類似度算出部、スポッティング部を経て文
節ラティスが作成され、キーワド推定部８においてキー
ワードとそのキーワードを受ける述語を推定するところ
までは第１の実施例と同じである。キーワド推定部８で
は推定された述語が文節ラティスの音声区間の終端付近
に存在するか調べられる。推定された述語が文節ラティ
スにおいて音声区間の終端付近に存在しない場合には、
述語が省略されたと見なし、入力文推定部７においてそ
の述語が音声区間のうしろに存在すると仮定して、その
述語と先のキーワードを手がかりとして両者をつなぐよ
うに文節列を探索する。この時、構文情報格納部６の係
り受け関係規則を用いる。First, the voice signal is input to the voice analysis unit 1, the phrase lattice is created through the frame similarity calculation unit and the spotting unit, and the keyword estimation unit 8 estimates the keyword and the predicate that receives the keyword. This is the same as the first embodiment. The keyword estimation unit 8 checks whether the estimated predicate exists near the end of the speech section of the phrase lattice. If the estimated predicate does not exist near the end of the speech segment in the phrase lattice,
It is considered that the predicate is omitted, and the input sentence estimation unit 7 assumes that the predicate exists behind the voice section, and searches the phrase string so as to connect the predicate and the preceding keyword as a clue. At this time, the dependency relation rule of the syntax information storage unit 6 is used.

【００２４】以上の動作の一例を示したのが図４であ
る。図４の例は述語が省略された「ビールをもう一杯」
の発声に対して得られた文節ラティスの概略図である。
まず、文節ラティス中において類似度スコアの一番高い
文節仮説「ビールを」をキーワードとして見つける。次
に、構文情報格納部６の係り受け関係規則において「ビ
ールを」を係り文節としてもつことができる述語を探
す。図８の係り受け関係規則を用いると述語として「飲
みたい」が推定される。この述語の仮説が文節ラティス
の音声区間の終端付近に存在するか調べると存在しない
ので、述語「飲みたい」が省略されていると見なして、
その述語と先のキーワードを手がかりとして図８の係り
受け関係規則を用いて前隣接する文節仮説をつなぎなが
ら文節列を探索する。その結果「ビールを・もう一杯」
という文節列が正しく探索でき、認識結果として出力す
る。FIG. 4 shows an example of the above operation. In the example of Fig. 4, "one more beer" with the predicate omitted
FIG. 3 is a schematic diagram of a bunsetsu lattice obtained for the utterance of FIG.
First, the phrase hypothesis "beer" having the highest similarity score in the phrase lattice is found as a keyword. Next, in the dependency relation rule of the syntax information storage unit 6, a predicate that can have “beer” as a dependency clause is searched for. Using the dependency relation rule of FIG. 8, "I want to drink" is estimated as a predicate. Since it does not exist when checking whether the hypothesis of this predicate exists near the end of the speech section of the phrase lattice, it is assumed that the predicate "I want to drink" is omitted,
Using the predicate and the preceding keyword as a clue, the bunsetsu string is searched while connecting the pre-adjacent bunsetsu hypotheses using the dependency relation rules of FIG. As a result, "one more beer"
Can be correctly searched and is output as a recognition result.

【００２５】以上のように、音声区間において述語以外
のキーワードを推定し述語が省略されているか否かを判
断するキーワード推定部と述語が省略された入力文を推
定することができる入力文推定部を設けることにより、
正しく認識することができる。As described above, the keyword estimation unit that estimates the keywords other than the predicate in the voice section and determines whether the predicate is omitted, and the input sentence estimation unit that can estimate the input sentence in which the predicate is omitted. By providing
Can be recognized correctly.

【００２６】（実施例４）以下本発明の第４の実施例に
ついて、図面を参照しながら説明する。本実施例の音声
認識装置の構成は第１の実施例と同一であるのでその詳
細な説明を省略し、その動作を説明する。(Embodiment 4) A fourth embodiment of the present invention will be described below with reference to the drawings. Since the configuration of the voice recognition apparatus of this embodiment is the same as that of the first embodiment, its detailed description will be omitted and its operation will be described.

【００２７】まず、音声信号は音声分析部１に入力さ
れ、フレーム類似度算出部、スポッティング部を経て文
節ラティスが作成され、キーワド推定部８においてキー
ワードとそのキーワードを受ける述語を推定するところ
までは第１の実施例と同じである。キーワド推定部８で
は推定された述語が文節ラティスの音声区間の終端付近
に存在するか調べられる。推定された述語が文節ラティ
スにおいて音声区間の終端付近に存在しない場合には、
述語が省略されたと見なし、入力文推定部７においてそ
の述語が音声区間のうしろに存在すると仮定して、その
述語と先のキーワードを手がかりとして両者をつなぐよ
うに文節列を探索する。この時、構文情報格納部６の係
り受け関係規則を用いる。探索の途中において、前隣接
の文節仮説として推定述語の仮説が見つかった場合に
は、入力が倒置表現であったと解釈してその述語仮説を
つなぎ探索を続ける。First, the voice signal is input to the voice analysis unit 1, the phrase lattice is created through the frame similarity calculation unit and the spotting unit, and the keyword estimation unit 8 estimates a keyword and a predicate to receive the keyword. This is the same as the first embodiment. The keyword estimation unit 8 checks whether the estimated predicate exists near the end of the speech section of the phrase lattice. If the estimated predicate does not exist near the end of the speech segment in the phrase lattice,
It is considered that the predicate is omitted, and the input sentence estimation unit 7 assumes that the predicate exists behind the voice section, and searches the phrase string so as to connect the predicate and the preceding keyword as a clue. At this time, the dependency relation rule of the syntax information storage unit 6 is used. If a hypothesis of an estimated predicate is found as a pre-adjacent clause hypothesis during the search, it is interpreted that the input is an inverted expression, and the predicate hypothesis is connected to continue the search.

【００２８】以上の動作の一例を示したのが図５であ
る。図５の例は倒置表現である「ビールを飲みたいもう
一杯」の発声に対して得られた文節ラティスの概略図で
ある。まず、文節ラティス中において類似度スコアの一
番高い文節仮説「ビールを」をキーワードとして見つけ
る。次に、構文情報格納部６の係り受け関係規則におい
て「ビールを」を係り文節としてもつことができる述語
を探す。図８の係り受け関係規則を用いると述語として
「飲みたい」が推定される。この述語の仮説が文節ラテ
ィスの音声区間の終端付近に存在するか調べると存在し
ないので、述語「飲みたい」が省略されていると見なし
て、その述語と先のキーワードを手がかりとして図８の
係り受け関係規則を用いて前隣接する文節仮説をつなぎ
ながら文節列を探索する。まず、文節仮説「もう一杯」
が見つかり、その前隣接の文節仮説として「飲みたい」
が見つかる。これは推定述語であるので入力が倒置表現
であったと解釈してその述語仮説をつなぎ探索を続け
る。図５では、文節仮説「飲みたい」が文節仮説「もう
一杯」と「ビールを」をつなぎ探索処理は終了する。そ
の結果「ビールを・飲みたい・もう一杯」という文節列
が正しく探索でき、認識結果として出力する。FIG. 5 shows an example of the above operation. The example of FIG. 5 is a schematic diagram of the phrase lattice obtained for the utterance of "another drink of beer" which is an inverted expression. First, the phrase hypothesis "beer" having the highest similarity score in the phrase lattice is found as a keyword. Next, in the dependency relation rule of the syntax information storage unit 6, a predicate that can have “beer” as a dependency clause is searched for. Using the dependency relation rule of FIG. 8, "I want to drink" is estimated as a predicate. If the hypothesis of this predicate does not exist when checking whether it exists near the end of the speech section of the bunsetsu lattice, it does not exist, so it is considered that the predicate "I want to drink" is omitted, and the predicate and the preceding keyword are used as clues to the relationship in FIG. The bunsetsu sequence is searched while connecting the preceding bunsetsu hypotheses using the receiving relation rule. First, the bunsetsu hypothesis "One more cup"
Was found, and "I want to drink" as the bunsetsu hypothesis in front of it
Can be found. Since this is an estimated predicate, the input is interpreted as an inverted expression, and the predicate hypothesis is connected to continue the search. In FIG. 5, the bunsetsu hypothesis "I want to drink" connects the bunsetsu hypothesis "another cup" and "beer", and the search process ends. As a result, the phrase sequence "beer, want to drink, another cup" can be correctly searched and output as a recognition result.

【００２９】以上のように、音声区間において述語以外
のキーワードを推定し述語が省略されているか否かを判
断するキーワード推定部と倒置表現の入力文を推定する
ことができる入力文推定部を設けることにより、正しく
認識することができる。As described above, the keyword estimation unit for estimating the keywords other than the predicate in the voice section and determining whether the predicate is omitted, and the input sentence estimation unit for estimating the input sentence of the inverted expression are provided. Therefore, it can be correctly recognized.

【００３０】なお、上記実施例では文節ラティスを用い
た例を説明したが、単語ラティスを用いる場合について
も本発明はもちろん適用される。In addition, although the example using the phrase lattice is described in the above embodiment, the present invention is naturally applied to the case using the word lattice.

【００３１】[0031]

【発明の効果】以上のように本発明は、入力音声を分析
の単位であるフレーム毎に分析し、特徴パラメータを得
る音声分析部と、音素や音節等の「認識の基本単位」の
特徴を表わす標準パターンを格納する標準パターン格納
部と、上記特徴パラメータと標準パターンとの間の類似
度をフレーム毎に算出するフレーム類似度算出部と、単
語や文節等の「言語単位」の発音に関する情報を上記認
識の基本単位を表す発音記号によって表記した発音情報
格納部と、上記発音辞書に格納されている全ての言語単
位に対する、入力音声の任意の区間における類似度を算
出してラティスを作成するスポッティング部と、認識対
象となる文における単語や文節等の言語単位間の結合情
報を格納する構文情報格納部と、上記スポッティング部
で作成されたラティスから入力音声である文を推定する
入力文推定部と、上記入力推定部で入力文を推定する際
に手がかりとするキーワードを上記スポッティング部で
作成されたラティスから推定するキーワード推定部を設
けることにより、文末の述語が類似度スコアの高い文節
仮説として文節ラティス中に求められない文音声や述語
が省略された文音声および倒置表現の文音声をも認識す
ることができ、発話制約の少ない優れた音声認識装置を
実現できるものである。As described above, according to the present invention, the characteristics of the "basic unit of recognition" such as a phoneme and a syllable are analyzed by analyzing the input speech for each frame which is a unit of analysis and obtaining a characteristic parameter. A standard pattern storage unit that stores a standard pattern to represent, a frame similarity calculation unit that calculates the similarity between the characteristic parameter and the standard pattern for each frame, and information about pronunciation in "language units" such as words and phrases. To create a lattice by calculating the degree of similarity in an arbitrary section of the input voice with respect to all pronunciation units stored in the pronunciation dictionary and the pronunciation information storage unit in which the above is represented by a phonetic symbol representing the basic unit of recognition. The spotting section, the syntax information storage section that stores the connection information between linguistic units such as words and clauses in the sentence to be recognized, and the latte created by the spotting section. An input sentence estimation unit that estimates a sentence that is an input voice from a keyword and a keyword estimation unit that estimates a keyword to be used as a clue when estimating the input sentence by the input estimation unit from the lattice created by the spotting unit are provided. As a result, it is possible to recognize sentence speech that is not required in the clause lattice as a clause hypothesis with a high similarity score for the predicate at the end of the sentence, sentence speech in which the predicate has been omitted, and sentence speech with an inverted expression. The voice recognition device can be realized.

[Brief description of drawings]

【図１】本発明による音声認識装置の実施例の構成図FIG. 1 is a configuration diagram of an embodiment of a voice recognition device according to the present invention.

【図２】第１の実施例における音声認識装置のラティス
探索動作の概念図FIG. 2 is a conceptual diagram of a lattice search operation of the voice recognition device in the first embodiment.

【図３】第２の実施例における音声認識装置のラティス
探索動作の概念図FIG. 3 is a conceptual diagram of a lattice search operation of the voice recognition device in the second embodiment.

【図４】第３の実施例における音声認識装置のラティス
探索動作の概念図FIG. 4 is a conceptual diagram of a lattice search operation of the voice recognition device in the third embodiment.

【図５】第４の実施例における音声認識装置のラティス
探索動作の概念図FIG. 5 is a conceptual diagram of a lattice search operation of the voice recognition device in the fourth embodiment.

【図６】従来技術による音声認識装置の構成図FIG. 6 is a configuration diagram of a voice recognition device according to a conventional technique.

【図７】スポッティングの結果得られた文節ラティスの
一部を示す図FIG. 7 is a diagram showing a part of a phrase lattice obtained as a result of spotting.

【図８】構文情報の記述内容を示す図FIG. 8 is a diagram showing description contents of syntax information.

【図９】従来例の文節列探索処理を説明するＰＡＤ図FIG. 9 is a PAD diagram explaining a conventional phrase sequence search process.

[Explanation of symbols]

１音声分析部２標準パターン格納部３フレーム類似度算出部４発音情報格納部５スポッティング部６構文情報格納部７入力文推定部８キーワード推定部 1 voice analysis unit 2 standard pattern storage unit 3 frame similarity calculation unit 4 pronunciation information storage unit 5 spotting unit 6 syntax information storage unit 7 input sentence estimation unit 8 keyword estimation unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者遠藤充神奈川県川崎市多摩区東三田３丁目10番１号松下技研株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Mitsuru Endo 3-10-1 Higashisanda, Tama-ku, Kawasaki-shi, Kanagawa Matsushita Giken Co., Ltd.

Claims

[Claims]

1. A voice analysis unit for analyzing input speech for each frame as a unit of analysis to obtain a characteristic parameter, and a standard pattern for storing a standard pattern representing a feature of a phoneme or a syllable as a "basic unit of recognition". A storage unit, a frame similarity calculation unit that calculates the similarity between the characteristic parameter and the standard pattern for each frame, and information regarding the pronunciation of a word or phrase that is a "linguistic unit" represents the basic unit of recognition. A pronunciation information storage unit represented by phonetic symbols, a spotting unit that creates a lattice by calculating the similarity in an arbitrary section of the input speech with respect to all the language units stored in the pronunciation information storage unit, and a recognition target From the lattice information created by the spotting section and the syntax information storage section that stores the connection information between the language units in the sentence An input sentence estimation unit that estimates a sentence represented by a voice as a recognition result, and a keyword estimation unit that estimates a keyword to be a clue when estimating the input sentence in the input estimation unit from the lattice created in the spotting unit. A voice recognition device provided.

2. The voice recognition device according to claim 1, wherein the keyword estimation unit estimates a keyword other than the predicate in the voice section and estimates the predicate from the keyword.

3. A keyword estimation unit for estimating a keyword other than a predicate in a voice section to determine whether or not the predicate is omitted, and an input sentence estimation unit for estimating an input sentence in which the predicate is omitted. The voice recognition device according to claim 2.

4. A keyword estimating unit for estimating a keyword other than a predicate in a voice section to determine whether or not the predicate is omitted, and an input sentence estimating unit for estimating an input sentence of an inverted expression. The voice recognition device according to claim 2.

5. The speech recognition according to claim 2, further comprising an input sentence estimation unit that continues the input sentence estimation process even when there is no suitable preceding neighbor hypothesis in the lattice. apparatus.

6. A keyword estimation unit estimates a keyword other than a predicate in a voice section and determines whether or not the predicate is omitted, and an input sentence estimation unit determines an input sentence in which the predicate is omitted and an input sentence of an inverted expression. The estimation is performed according to claim 2.
The voice recognition device described.

7. A keyword estimation unit for estimating a keyword other than a predicate in a speech section to determine whether or not the predicate is omitted, and an input sentence in which the predicate is omitted is estimated, and an appropriate pre-adjacent word is added in the lattice. The speech recognition apparatus according to claim 2, further comprising an input sentence estimation unit that continues the input sentence estimation process even when no hypothesis exists.

8. A keyword estimation unit for estimating a keyword other than a predicate in a voice section to determine whether or not the predicate is omitted, and an input sentence of an inverted expression are estimated, so that an appropriate pre-adjacent hypothesis is generated in the lattice. The speech recognition apparatus according to claim 2, further comprising an input sentence estimation unit that continues the estimation processing of the input sentence even when it does not exist.

9. A keyword estimation unit for estimating a keyword other than a predicate in a voice section and determining whether or not the predicate is omitted, and an input sentence in which the predicate is omitted and an input sentence of an inverted expression are suitable in the estimation lattice. The speech recognition apparatus according to claim 2, further comprising an input sentence estimation unit that continues the estimation process of the input sentence even when there is no such pre-adjacent hypothesis.