JP2009116107A

JP2009116107A - Information processing device and method

Info

Publication number: JP2009116107A
Application number: JP2007289964A
Authority: JP
Inventors: Hideo Kuboyama; 英生久保山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2007-11-07
Filing date: 2007-11-07
Publication date: 2009-05-28

Abstract

PROBLEM TO BE SOLVED: To present confirmation information for confirmation results in a manner easy for a user to understand while allowing recognition of an optional utterance by voice recognition using subword grammar. SOLUTION: A voice recognition part 102 performs, to an input voice, voice recognition using keyword grammar based on a keyword for a state related to registration, and voice recognition using subword grammar based on a subword, and outputs recognition results of the recognition having a more satisfactory recognition score of the both. When the voice recognition part 102 outputs the recognition results of the voice recognition using the keyword grammar, a confirmation information generation part 106 generates confirmation information composed of a keyword string of the recognition results. At the same time, when the recognition part 102 outputs the recognition results of the voice recognition using the subword grammar, the generation part 106 generates confirmation information composed of attribute information associated with the state related to registration. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声認識技術に関する。 The present invention relates to speech recognition technology.

カーナビゲーションシステム等の機器において、音声認識技術の実用化が進んでいる。音声認識は、複数の選択項目から１つの項目を音声で選択できる、装置のボタン数や操作のステップ数を削減できる、といった利点を有する。 Voice recognition technology is being put to practical use in devices such as car navigation systems. Voice recognition has the advantage that one item can be selected by voice from a plurality of selection items, and the number of buttons and operation steps of the apparatus can be reduced.

音声認識のアプリケーションの一つとして、他のアプリケーションの状態や設定を音声で登録し、音声で登録済みの状態を呼び出すものが考えられる。図１５は、テレビの電子番組表の表示を行うアプリケーションの一例を示す図である。同図において、１５０１は番組表である。１５０２は番組表中の番組（放送枠）、１５０３はフォーカスのあたった番組である。番組表の中では、チャンネル毎に放送時間順に番組が並んでいる。そして番組一つ一つにフォーカスをあてて選択したり、近傍の番組にフォーカスを移動したりできる。ユーザは、この電子番組表においてある番組にフォーカスがあたった状態を音声認識を用いて登録することができる。例えば“山田太郎のダビデ王”とユーザが発声すると、上記状態が音声認識結果と関連付けられて登録される。その後、別の状態から同じ発声をして呼び出すことで、登録した状態に遷移することができる。 As one of the voice recognition applications, it is possible to register the state and setting of another application by voice and call the registered state by voice. FIG. 15 is a diagram illustrating an example of an application that displays an electronic program guide on a television. In the figure, reference numeral 1501 denotes a program guide. 1502 is a program (broadcast frame) in the program guide, and 1503 is a focused program. In the program guide, programs are arranged in order of broadcast time for each channel. Then, each program can be focused and selected, or the focus can be moved to a nearby program. The user can register a state in which a certain program is focused in the electronic program guide using voice recognition. For example, when the user utters “Taro Yamada's King David”, the state is registered in association with the speech recognition result. Then, by calling the same utterance from another state, it is possible to transition to the registered state.

音声認識において、音声を発声して登録し、登録した情報を音声で呼び出す方法としては、登録時と呼び出し時の音声特徴量のＤＰマッチングで音声同士の近さで判定する手法がある。しかし現在は、認識精度、発話者によらず認識可能、ユーザ登録が必須でないという点から、ＨＭＭ（隠れマルコフモデル）による音声認識が主流となっている。 In speech recognition, as a method of uttering and registering speech and calling the registered information by speech, there is a method of determining by proximity of speech by DP matching of speech feature values at the time of registration and at the time of calling. However, at present, speech recognition by HMM (Hidden Markov Model) has become mainstream because recognition accuracy, recognition is possible regardless of the speaker, and user registration is not essential.

ＨＭＭを用いた音声認識において、音声の登録・呼び出しは、次のような手順で行う。登録時には、登録する音声の部品である単語又はサブワード（音素）を文法とする音声認識を実行し、認識した単語列又はサブワード列を登録用の登録文法に単語として登録する。呼び出し時には、登録文法を用いて音声認識を実行し、認識結果の単語に対応する状態や設定を呼び出す。 In voice recognition using the HMM, voice registration / calling is performed in the following procedure. At the time of registration, speech recognition using a word or subword (phoneme) that is a speech component to be registered as a grammar is executed, and the recognized word string or subword string is registered as a word in the registration grammar for registration. At the time of calling, voice recognition is executed using the registered grammar, and the state and setting corresponding to the word of the recognition result are called.

このとき、登録時に用いる文法の種類として二通りが考えられる。一つは、キーワード文法であり、あらかじめ用意した単語又は、状態や設定の情報から抽出した単語による文法である。もう一つは、サブワード文法であり、例えば音韻の最小単位である音素による文法である。 At this time, there are two possible grammar types used during registration. One is a keyword grammar, which is a grammar based on words prepared in advance or extracted from state and setting information. The other is a subword grammar, for example, a grammar based on phonemes, which is the minimum unit of phonemes.

図４にキーワード文法の一例を、図５にサブワード文法の一例を示す。図４のキーワード文法は、図１５の番組表でフォーカスのあたった番組１５０３に含まれる番組情報からキーワードを抽出して生成した文法である。図５は、番組情報に依存せず、言語を発声する際のあらゆる音韻を表す所定の音素セットによる文法である。これらの文法を使って得られる音声認識結果を図１６に示す。ユーザが“山田太郎のダビデ王”と発声したとき、キーワード文法を用いた音声認識の場合、認識結果はキーワード文法の単語を組み合わせたキーワード列である“山田太郎のダビデ王”となる。一方、サブワード文法の場合、認識結果はサブワードを組み合わせたサブワード列である“y a m a d a t a r o o n o d a b i d e o o”となる。それぞれの場合において、このような認識結果を一単語として登録文法に登録する。呼び出し時は同じ発声をすると、登録文法に含まれる単語（“山田太郎のダビデ王”もしくは“y a m a d a t a r o o n o d a b i d e o o”）にマッチして登録した単語を認識結果として得ることができる。 FIG. 4 shows an example of the keyword grammar, and FIG. 5 shows an example of the subword grammar. The keyword grammar in FIG. 4 is a grammar generated by extracting a keyword from program information included in the program 1503 focused on the program table in FIG. FIG. 5 is a grammar based on a predetermined phoneme set representing all phonemes when speaking a language without depending on program information. FIG. 16 shows a speech recognition result obtained using these grammars. When the user utters “Taro Yamada's King David”, in the case of speech recognition using the keyword grammar, the recognition result is “Taro Yamada's King King”, which is a keyword string that combines the words of the keyword grammar. On the other hand, in the case of sub-word grammar, the recognition result is “y a m a d a t a r o o n o d a b i d e o o”, which is a sub-word string combining sub-words. In each case, such a recognition result is registered in the registration grammar as one word. When the same utterance is made at the time of calling, a registered word that matches a word (“Taro Yamada's King of David” or “y amad a t o r o o n o d a b d e o o”) included in the registration grammar can be obtained as a recognition result.

ここで、キーワード文法による認識は、キーワード文法に含まれる単語列しか登録することができない。例えば“山ちゃんのダビデ王”と発声した場合、サブワード文法では“y a m a ch a X n o d a b i d e o o”と変換できる。これに対して、キーワード文法では“山ちゃん”に対応する単語がないために、誤った認識結果を登録したり、認識結果を得られなかったりしてしまう。つまり、ユーザが任意の発声で登録するためには、サブワード文法を用いる必要がある。 Here, the recognition by the keyword grammar can register only a word string included in the keyword grammar. For example, if you say “King David of Yama-chan”, you can convert it to “y a m a ch a X n o d a b i d e o o” in the subword grammar. On the other hand, since there is no word corresponding to “Yama-chan” in the keyword grammar, an incorrect recognition result is registered or a recognition result cannot be obtained. That is, in order for a user to register with an arbitrary utterance, it is necessary to use a subword grammar.

特開平１０−０９７２８４号公報JP-A-10-097284 日本音響学会２００６年春季研究発表会１−１−６Acoustical Society of Japan 2006 Spring Research Presentation 1-1-6

しかしながら、サブワード文法の認識結果はユーザへの提示が困難である。アプリケーションでは登録時や呼び出し時の確認のために、ユーザに認識結果を表示や音声で提示することが好ましい。キーワード文法による音声認識であれば、認識結果から“山田太郎のダビデ王”という単語列を得られるので、ユーザの提示は表示することで可能であるし、音声合成によって音声で提示することもできる。その一方、サブワード列“y a m a d a t a r o o n o d a b i d e o o”を表示してもユーザが理解することは困難である。また、音声合成しても言語的な情報がないため正しいイントネーションで合成することができない。 However, the recognition result of the subword grammar is difficult to present to the user. The application preferably presents the recognition result to the user by display or voice for confirmation at the time of registration or calling. If the speech recognition is based on keyword grammar, the word string “Taro Yamada's King of David” can be obtained from the recognition result, so the user can present it by displaying it, or it can also present it by speech synthesis. . On the other hand, it is difficult for the user to understand even if the sub-word string “y m a dat a r o o n o d a b i d e o o” is displayed. In addition, even if speech synthesis is performed, there is no linguistic information, so synthesis cannot be performed with correct intonation.

従って、サブワード文法を用いてユーザの任意の発声を認識可能としながらも、キーワード文法の認識結果が好ましい場合はこれを利用して、できる限りユーザの発声と一致した情報を自然な形で提示することが望ましい。 Therefore, if the recognition result of the keyword grammar is preferable while using the subword grammar to recognize any user's utterance, information that matches the user's utterance as much as possible is presented as natural as possible. It is desirable.

サブワード文法とキーワード文法を併用した音声認識としては、特許文献１、非特許文献１がある。特許文献１では、キーワード文法の認識結果とサブワード文法の認識結果とでスコアを比較し、キーワード文法のスコアが高ければ認識結果を出力し、サブワード文法のスコアが高ければ認識結果を棄却している。また、非特許文献１では、姓名入力のインターフェースとして、姓名辞書認識と音節列認識を併用してそれぞれの認識結果を出力し、ユーザが認識結果の中から選択している。 As speech recognition using both subword grammar and keyword grammar, there are Patent Document 1 and Non-Patent Document 1. In Patent Document 1, the recognition result of the keyword grammar and the recognition result of the subword grammar are compared, and if the score of the keyword grammar is high, the recognition result is output, and if the score of the subword grammar is high, the recognition result is rejected. . Further, in Non-Patent Document 1, as a first and last name input interface, first name surname dictionary recognition and syllable string recognition are used together to output each recognition result, and the user selects from the recognition results.

しかしながら、特許文献１はサブワード文法の認識結果を提示に用いることについては開示していない。また非特許文献１は、音節の入力としてであれば良いが、所定の情報をユーザへ提示する場合に音節列を提示しても、ユーザが理解することは難しい。 However, Patent Document 1 does not disclose using the recognition result of the subword grammar for presentation. Non-Patent Document 1 may be used as an input of syllables, but even if a syllable string is presented when predetermined information is presented to the user, it is difficult for the user to understand.

本発明の目的は、サブワード文法を用いた音声認識によって任意の発声を認識可能としながらも、認識結果についての確認情報をユーザに分かりやすい態様で提示できるようにすることである。 An object of the present invention is to make it possible to present confirmation information about a recognition result in an easy-to-understand manner to a user while making it possible to recognize an arbitrary utterance by speech recognition using a subword grammar.

本発明の一側面によれば、ユーザが選択したアプリケーションの状態を、ユーザの登録操作に従って登録するとともに、ユーザの呼び出し操作に従って前記登録された状態を呼び出して当該状態に遷移することが可能な情報処理装置であって、ユーザの音声を入力する音声入力手段と、前記音声入力手段より入力された音声に対して、前記状態に関するキーワードに基づくキーワード文法を用いた音声認識及び、サブワードに基づくサブワード文法を用いた音声認識を行い、良好な認識スコアをとる方の認識結果を出力する音声認識手段と、前記音声認識手段が前記キーワード文法を用いた音声認識の認識結果を出力したときは、当該認識結果のキーワード列で構成される確認情報をユーザに対して提示する一方、前記音声認識手段が前記サブワード文法を用いた音声認識の認識結果を出力したときは、前記状態に関連付けられた属性情報で構成される確認情報をユーザに対して提示する提示手段と、前記音声認識手段が出力した認識結果と前記状態とを対応付けて記憶手段に登録する登録手段とを有することを特徴とする情報処理装置が提供される。 According to one aspect of the present invention, information on an application selected by a user can be registered according to a user's registration operation, and the registered state can be called according to a user's call operation to transition to the state. A speech processing unit for inputting a user's speech, speech recognition using a keyword grammar based on a keyword related to the state, and a subword grammar based on a subword for speech input from the speech input unit A speech recognition unit that performs speech recognition using a word and outputs a recognition result of a person with a good recognition score; and when the speech recognition unit outputs a recognition result of speech recognition using the keyword grammar, the recognition While the confirmation information composed of the resulting keyword string is presented to the user, the voice recognition means When a recognition result of speech recognition using word grammar is output, a presentation means for presenting confirmation information made up of attribute information associated with the state to the user, and a recognition result output by the speech recognition means There is provided an information processing apparatus comprising registration means for associating the state with the state and registering them in a storage means.

本発明によれば、サブワード文法を用いた音声認識によって任意の発声を認識可能としながらも、認識結果についての確認情報をユーザに分かりやすい態様で提示することができる。 According to the present invention, it is possible to present confirmation information about a recognition result in an easy-to-understand manner to a user while making it possible to recognize an arbitrary utterance by speech recognition using subword grammar.

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。なお、本発明は以下の実施形態に限定されるものではなく、本発明の実施に有利な具体例を示すにすぎない。また、以下の実施形態の中で説明されている特徴の組み合わせの全てが本発明の課題解決手段として必須のものであるとは限らない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited to the following embodiment, It shows only the specific example advantageous for implementation of this invention. In addition, not all combinations of features described in the following embodiments are indispensable as means for solving the problems of the present invention.

（実施形態１）
以下では、一例として、テレビの電子番組表を扱うアプリケーションの状態を音声によって登録し、音声で登録済みの状態を呼び出すことができるようにした情報処理装置について説明する。この情報処理装置では、例えば、表示された電子番組表からユーザが所望の番組枠を選択した状態を、ユーザの登録操作に従って登録できるように構成されている。登録後は、ユーザの呼び出し操作に従ってシステムを素早くその状態に遷移させることができる。この登録操作及び呼び出し操作に音声認識を使用することができる。具体的には、登録するときは、ユーザはその状態を特定するためのタイトルとするワード又はフレーズを発声する。すると、この発声に対して音声認識が行われ、その認識結果と登録に係る状態とが関連付けられて登録される。こうして登録された状態を呼び出すには、ユーザはその状態のタイトルとしたワード又はフレーズを発声すればよい。すると、その発声に対して音声認識が行われ、その認識結果に関連付けられた状態が読み出され、システムはその状態に遷移する。 (Embodiment 1)
Hereinafter, as an example, an information processing apparatus will be described in which the state of an application that handles an electronic program guide on a television is registered by voice and the registered state can be called by voice. In this information processing apparatus, for example, a state in which the user has selected a desired program frame from the displayed electronic program guide can be registered in accordance with the user's registration operation. After registration, the system can be quickly transitioned to that state according to the user's call operation. Voice recognition can be used for the registration operation and the call operation. Specifically, when registering, the user utters a word or phrase as a title for specifying the state. Then, voice recognition is performed on this utterance, and the recognition result and the state related to registration are associated and registered. In order to call the state registered in this way, the user may utter a word or phrase as the title of the state. Then, voice recognition is performed on the utterance, the state associated with the recognition result is read, and the system transitions to that state.

ただし、本発明は電子番組表を扱うアプリケーションに限定されるものではないことは言うまでもない。 However, it goes without saying that the present invention is not limited to applications that handle electronic program guides.

図１Ａは、本発明の実施形態１に係る情報処理装置における音声認識機能の構成を示すブロック図である。同図において、１０１は、音声を入力する音声入力部である。１０２は、音声を認識する音声認識部である。１０３は、登録する状態の所定の情報からキーワード文法を生成するキーワード文法生成部である。１０４は、キーワード文法生成部１０３が生成した音声認識用の文法であるキーワード文法である。１０５は、音素などのサブワードに基づく音声認識用の文法であるサブワード文法である。１０６は、音声認識の認識結果についての確認情報を生成する確認情報生成部である。１０７は、確認情報生成部１０６が作成した確認情報を表示もしくは音声でユーザに提示する提示部である。１０８は、音声認識結果と状態を対応付けて登録する登録部である。１０９は、登録部１０８が、音声認識部１０２の認識結果を認識文法の単語として登録する登録文法である。１１０は、登録文法１０９に登録する単語と状態とを対応付けた登録状態情報である。 FIG. 1A is a block diagram showing a configuration of a voice recognition function in the information processing apparatus according to the first embodiment of the present invention. In the figure, reference numeral 101 denotes a voice input unit for inputting voice. Reference numeral 102 denotes a voice recognition unit that recognizes voice. Reference numeral 103 denotes a keyword grammar generation unit that generates keyword grammar from predetermined information in a registered state. Reference numeral 104 denotes a keyword grammar which is a grammar for speech recognition generated by the keyword grammar generation unit 103. Reference numeral 105 denotes a subword grammar that is a grammar for speech recognition based on a subword such as a phoneme. A confirmation information generation unit 106 generates confirmation information about the recognition result of the voice recognition. 107 is a presentation unit that displays the confirmation information created by the confirmation information generation unit 106 to the user by display or voice. Reference numeral 108 denotes a registration unit that registers a speech recognition result and a state in association with each other. Reference numeral 109 denotes a registration grammar in which the registration unit 108 registers the recognition result of the speech recognition unit 102 as a recognition grammar word. Reference numeral 110 denotes registration state information in which words to be registered in the registration grammar 109 are associated with states.

図１Ｂは、本実施形態に係る情報処理装置のハードウェア構成を示すブロック図である。この情報処理装置はコンピュータ装置によって実現できるものである。具体的には、図示するように、情報処理装置全体の制御を司るＣＰＵ１、主記憶装置として機能しＣＰＵ１にワークエリアを提供するＲＡＭ２、ブートプログラム等を記憶しているＲＯＭ３をはじめ、以下の構成を備える。 FIG. 1B is a block diagram illustrating a hardware configuration of the information processing apparatus according to the present embodiment. This information processing apparatus can be realized by a computer apparatus. Specifically, as shown in the figure, a CPU 1 that controls the entire information processing apparatus, a RAM 2 that functions as a main storage device and provides a work area to the CPU 1, a ROM 3 that stores a boot program and the like, and the following configuration Is provided.

ＨＤＤ４はハードディスク装置であって、ここに、ＯＳ４１のほか、音声認識プログラム４２がインストールされている。音声認識プログラム４２は、音声認識モジュール４３、キーワード文法生成モジュール４４、確認情報生成モジュール４５、登録モジュール４６を含む。これらのモジュールはそれぞれ、音声認識部１０２、キーワード文法生成部１０３、確認情報生成部１０６、登録部１０８の機能を実現するためのプログラムコードを含んでいる。上記したキーワード文法１０４、サブワード文法１０５、登録文法１０９、登録状態情報１１０も、このＨＤＤ４に記憶される。この他、ＨＤＤ４にはテレビジョン番組の受信、表示、電子番組表の提供等を行うためのＴＶアプリケーション４７もインストールされている。 The HDD 4 is a hard disk device, in which a voice recognition program 42 is installed in addition to the OS 41. The voice recognition program 42 includes a voice recognition module 43, a keyword grammar generation module 44, a confirmation information generation module 45, and a registration module 46. Each of these modules includes program codes for realizing the functions of the speech recognition unit 102, the keyword grammar generation unit 103, the confirmation information generation unit 106, and the registration unit 108. The above-described keyword grammar 104, subword grammar 105, registration grammar 109, and registration status information 110 are also stored in the HDD 4. In addition, a TV application 47 is installed in the HDD 4 for receiving and displaying television programs, providing an electronic program guide, and the like.

５はマイクロホンであり、音声入力部１０１を構成する。６は音声を出力するスピーカ、７は文字、画像を表示するディスプレイであり、これらが提示部１０７を構成する。８はテレビジョン番組コンテンツデータや電子番組表データ等を受信するための通信インタフェース（Ｉ／Ｆ）である。９は操作ユニットであり、操作ボタン、キーボード、マウス等で実現される。あるいは、この操作ユニット９はリモートコントローラで実現されてもよい。ユーザは、この操作ユニット９を介して、アプリケーションの起動／終了の操作、ディスプレイ７に表示された電子番組表に対する操作等を行うことができる。 Reference numeral 5 denotes a microphone, which constitutes the voice input unit 101. Reference numeral 6 denotes a speaker that outputs sound, 7 denotes a display that displays characters and images, and these constitute a presentation unit 107. Reference numeral 8 denotes a communication interface (I / F) for receiving television program content data, electronic program guide data, and the like. An operation unit 9 is realized by an operation button, a keyboard, a mouse, and the like. Alternatively, the operation unit 9 may be realized by a remote controller. The user can perform an operation for starting / ending an application, an operation on the electronic program guide displayed on the display 7, and the like via the operation unit 9.

本実施形態における情報処理装置の構成は概ね上記のとおりである。 The configuration of the information processing apparatus in the present embodiment is generally as described above.

キーワード文法生成部１０３は、登録に係る状態に関連付けられた属性情報からキーワード文法を生成する。図２は、テレビの電子番組表においてフォーカスをあてた状態からキーワード文法を生成する様子を表す図である。同図において、２０１は、テレビの電子番組表である。２０２は、電子番組表２０１の中でフォーカスをあてた番組である。２０３は、番組２０２に関連付けられた属性情報としての番組情報である。２０４は、番組情報２０３から取得したキーワードである。キーワード文法生成部１０３は、番組２０２の番組情報２０３からキーワード２０４を抽出する。そして、抽出したキーワード２０４に対して言語解析を行って読み情報を求め、キーワード文法１０４を生成する。 The keyword grammar generation unit 103 generates a keyword grammar from attribute information associated with a state related to registration. FIG. 2 is a diagram illustrating how keyword grammar is generated from a focused state in an electronic program guide on a television. In the figure, 201 is an electronic program guide for television. A program 202 is a focused program in the electronic program guide 201. Reference numeral 203 denotes program information as attribute information associated with the program 202. Reference numeral 204 denotes a keyword acquired from the program information 203. The keyword grammar generation unit 103 extracts the keyword 204 from the program information 203 of the program 202. Then, language analysis is performed on the extracted keyword 204 to obtain reading information, and a keyword grammar 104 is generated.

図３は、本実施形態における、番組２０２にフォーカスのあたった状態を登録するときの処理を示すフローチャートである。 FIG. 3 is a flowchart showing processing when registering a focused state in the program 202 in the present embodiment.

まず、ステップＳ３０１において、音声入力部１０１を介してユーザの発声を入力する。次に、ステップＳ３０２において、音声認識部１０２は、入力された音声に対し、キーワード文法１０４を用いた音声認識とサブワード文法１０５を用いた音声認識を行い、良好な認識スコアをとる方の認識結果を出力する。ステップＳ３０３において、音声認識部１０２の認識結果がキーワード文法１０４から得られるキーワード列の場合、ステップＳ３０４において、確認情報生成部１０６は、認識結果として得られたキーワードに基づいて確認情報を生成する。一方、認識結果がサブワード文法１０５から得られるサブワード列の場合、ステップＳ３０５において、確認情報生成部１０６は、登録する状態に関連付けられた属性情報（図２では番組情報２０３）に基づいて確認情報を生成する。次にステップＳ３０６において、提示部１０７が確認情報を提示する。そしてステップＳ３０７において、登録部１０８が認識結果と状態とを対応付けて、登録文法１０９及び登録状態情報１１０に登録する。 First, in step S 301, a user's utterance is input via the voice input unit 101. Next, in step S302, the speech recognition unit 102 performs speech recognition using the keyword grammar 104 and speech recognition using the subword grammar 105 on the input speech, and the recognition result of obtaining a good recognition score. Is output. If the recognition result of the speech recognition unit 102 is a keyword string obtained from the keyword grammar 104 in step S303, the confirmation information generation unit 106 generates confirmation information based on the keyword obtained as the recognition result in step S304. On the other hand, when the recognition result is a subword string obtained from the subword grammar 105, in step S305, the confirmation information generation unit 106 obtains confirmation information based on the attribute information (program information 203 in FIG. 2) associated with the state to be registered. Generate. In step S306, the presentation unit 107 presents confirmation information. In step S 307, the registration unit 108 associates the recognition result with the state and registers them in the registration grammar 109 and the registration state information 110.

なお、ステップＳ３０６において確認情報を提示した後に、状態を登録しても良いか否かをユーザに確認させても良い。この確認で登録しても良いとユーザが指示すればステップＳ３０７で登録する。また逆に登録しないとユーザが指示すれば、登録せずに終了するか、あるいは再発声を要求してステップＳ３０１に戻る。 In addition, after presenting confirmation information in step S306, the user may confirm whether or not the state may be registered. If the user indicates that registration may be performed by this confirmation, registration is performed in step S307. On the other hand, if the user instructs not to register, the process ends without registering, or a re-voice is requested and the process returns to step S301.

図４に、キーワード文法１０４の一例を示す。同図において、４０１は単語ごとに振られた語ＩＤ、４０２は単語の表記、４０３は単語の読みである。４０４は文法の開始ノード、４０５は文法の終了ノードである。文法は、単語とネットワークで構成され、開始ノード４０４から終了ノード４０５までのパスに含まれる単語列を認識文として受け付ける。 FIG. 4 shows an example of the keyword grammar 104. In the figure, 401 is a word ID assigned to each word, 402 is a word notation, and 403 is a word reading. Reference numeral 404 denotes a grammar start node, and reference numeral 405 denotes a grammar end node. The grammar is composed of a word and a network, and accepts a word string included in the path from the start node 404 to the end node 405 as a recognition sentence.

ここで、キーワード文法は、キーワード２０４及びあらかじめ用意した所定の一般語（助詞「の」など、状態によらず使われる単語を選ぶ）を含み、これらの単語リストがループ可能な文法とする。図４の文法によれば、“金曜映画”、“山田太郎のダビデ王”、“佐藤花子”といった、キーワードで構成される文を受け付けることができる。 Here, the keyword grammar includes a keyword 204 and a predetermined general word (a word to be used regardless of the state, such as a particle “NO”) prepared in advance, and a grammar in which these word lists can be looped. According to the grammar of FIG. 4, it is possible to accept sentences including keywords such as “Friday movie”, “Taro David Yamada” and “Hanako Sato”.

図５に、サブワード文法１０５の一例を示す。同図は音素をサブワードとしており、表記と読みは同じなので表記の記述が省略されている。サブワード文法は、音素やトライフォンなどのサブワードを認識語彙とする認識文法である。音素のように言語を構成する単位のサブワードの文法を用いることで、任意の発声をサブワード列に変換することができる。例えば、“山ちゃん”と発声すると、“y a m a ch a X”というサブワード列が得られる。サブワード文法は、それぞれの状態に出現するキーワードに依存せずに利用できる。 FIG. 5 shows an example of the subword grammar 105. In the figure, phonemes are used as subwords, and the notation and the reading are the same, so the notation is omitted. The subword grammar is a recognition grammar using a subword such as a phoneme or triphone as a recognition vocabulary. An arbitrary utterance can be converted into a subword string by using a grammar of a subword of a unit constituting a language like a phoneme. For example, when “Yama-chan” is uttered, a subword string “y a m a ch a X” is obtained. The subword grammar can be used independently of the keywords that appear in each state.

図６、図７に、ステップＳ３０１〜ステップＳ３０６の動作の様子を示す。 FIG. 6 and FIG. 7 show the behavior of steps S301 to S306.

図６において、ユーザが音声入力部１０１に“山田太郎のダビデ王”と発声する。音声認識部１０２は、キーワード文法１０４を用いた音声認識の認識結果及びスコアと、サブワード文法１０５を用いた音声認識の認識結果及びスコアとをそれぞれ出力する。サブワード文法１０５を用いた認識は、キーワードに比べて自由度が高くスコアが優位となりやすいため、サブワードごとに挿入スコアを加えることが望ましい。このようにすると、発声がキーワード文法に全ての語が含まれるものであった場合には、キーワード文法を用いた認識結果のスコアが高くなる。こうして、図示の例では、キーワード列“山田太郎のダビデ王”を認識結果として得る。すると、確認情報生成部１０６がこのキーワード列を使って“「山田太郎のダビデ王」ですね？”と確認情報を生成し、提示部１０７がこの確認情報を表示や音声合成などの手段でユーザに提示する。 In FIG. 6, the user utters “Taro Yamada's King David” on the voice input unit 101. The speech recognition unit 102 outputs the recognition result and score of speech recognition using the keyword grammar 104 and the recognition result and score of speech recognition using the subword grammar 105, respectively. The recognition using the subword grammar 105 has a higher degree of freedom than the keyword and the score tends to be superior, so it is desirable to add an insertion score for each subword. In this way, when the utterance includes all the words in the keyword grammar, the score of the recognition result using the keyword grammar increases. Thus, in the example shown in the figure, the keyword string “Taro Yamada's King David” is obtained as a recognition result. Then, the confirmation information generation unit 106 uses this keyword string to indicate “Taro Yamada's King of David”? The confirmation unit 107 generates confirmation information, and presents the confirmation information to the user by means of display or speech synthesis.

一方、図７に示すように、ユーザが音声入力部１０１に対して“山ちゃんのダビデ王”と発声したとする。キーワード文法１０４には、“山ちゃん”という単語は含まれていない。そのためキーワード文法の認識結果では、図７のように、誤った認識結果を低いスコアで出力するか、あるいは認識結果なしと出力する。一方、サブワード文法による認識では、いかなる発声もサブワード列に当てはめることができる。このため、サブワード文法による認識結果の方が高いスコアとなり、サブワード列“y a m a ch a X n o d a b i d e o o”を認識結果として得る。この場合、確認情報生成部１０６は、認識結果のサブワード列からではなく、登録する状態の情報に基づいて確認情報を生成する。図７では、番組情報２０３から放送局、タイトル、サブタイトルの項目に記述されている文字列を結合する、というあらかじめ決められたルールに基づき、“「丸子テレビ金曜ワイド劇場ダビデ王」ですね？”を確認情報として生成して提示している。しかし本発明はこれに限るものではなく、例えば番組情報２０３の他の項目を使っても良いし、情報量に応じて使う項目を変えても良い。あるいは、番組情報２０３や、番組情報２０３から抽出したキーワード２０４を検索キーとして、データベースから情報を取得して確認情報としても良い。 On the other hand, as shown in FIG. 7, it is assumed that the user utters “King Yamada” to the voice input unit 101. The keyword grammar 104 does not include the word “Yama-chan”. Therefore, in the recognition result of the keyword grammar, an erroneous recognition result is output with a low score or no recognition result is output as shown in FIG. On the other hand, in the recognition by the subword grammar, any utterance can be applied to the subword string. For this reason, the recognition result based on the subword grammar has a higher score, and the subword string “y a mach a Xnodabideoo” is obtained as the recognition result. In this case, the confirmation information generation unit 106 generates confirmation information based on information on the state to be registered, not from the subword string of the recognition result. In FIG. 7, based on the predetermined rule of combining the character strings described in the broadcast station, title, and subtitle items from the program information 203, ““ Maruko TV Friday Wide Theater David King ”? However, the present invention is not limited to this. For example, other items of the program information 203 may be used, or items to be used may be changed according to the amount of information. Alternatively, the program information 203 or the keyword 204 extracted from the program information 203 may be used as a search key to obtain information from the database as confirmation information.

図８、図９に、ステップＳ３０７における状態の登録処理の様子を示す。登録部１０８は、音声による登録語を表す登録文法１０９、登録する状態を表す登録状態情報１１０、の２つを登録する。登録文法１０９は、登録語ごとに、（１）登録語を識別する語ＩＤ、（２）登録語の表記、（３）登録語の読み、の３つを含む。登録状態情報１１０は、登録に係る状態ごとに、（１）登録に係る状態を識別する登録ＩＤ、（２）その状態の内容を表す状態属性、（３）その状態に対応する登録文法１０９中の登録語を表す語ＩＤ、の３つを含む。 8 and 9 show the state registration process in step S307. The registration unit 108 registers two, a registration grammar 109 that represents a registered word by voice and registration state information 110 that represents a registration state. The registered grammar 109 includes, for each registered word, three items: (1) a word ID for identifying the registered word, (2) notation of the registered word, and (3) reading of the registered word. The registration status information 110 includes, for each status related to registration, (1) a registration ID for identifying the status related to registration, (2) a status attribute representing the content of the status, and (3) a registration grammar 109 corresponding to the status. 3 of the word ID representing the registered word.

図８は、キーワード列を認識結果として得たときの登録処理を表す。キーワード文法の認識結果は全てキーワードから構成されているため、登録語の表記は認識結果をそのまま使用する。また、読みはキーワード文法から取得して結合する。同図において、認識結果のキーワード列“山田太郎のダビデ王”に対して、キーワード列をそのまま登録語の表記として登録する。また、キーワード文法１０４より、“山田太郎”、“の”、“ダビデ王”のそれぞれの読みをサブワード列として取得し、これを結合して登録語の読み“y a m a d a t a r o o n o d a b i d e o o”を登録する。 FIG. 8 shows a registration process when a keyword string is obtained as a recognition result. Since the recognition results of the keyword grammar are all composed of keywords, the recognition results are used as they are for the registered words. Readings are taken from the keyword grammar and combined. In the figure, the keyword string is registered as it is as a registered word notation for the keyword string “Taro Yamada of King David” as the recognition result. Also, from the keyword grammar 104, the readings of “Taro Yamada”, “No”, and “David King” are acquired as subword strings, and these are combined to register the reading of the registered word “y a m a d o t o o n o d a b i d e o o”.

図９は、サブワード列を認識結果として得たときの登録処理を表す。サブワード文法の認識結果では、登録語の読みとして認識結果のサブワード列をそのまま使用する。登録語の表記は同図のように、確認情報生成部１０６が登録状態に基づいて生成する情報を登録する。 FIG. 9 shows registration processing when a subword string is obtained as a recognition result. In the recognition result of the subword grammar, the subword string of the recognition result is used as it is for reading the registered word. As for the notation of the registered word, information generated by the confirmation information generation unit 106 based on the registration state is registered as shown in FIG.

以上によって、登録した状態を音声認識で呼び出すときも、登録時と同じ確認情報を提示できる。図１０に、本実施形態における呼び出し時の処理フローを示す。呼び出し時にステップＳ１００１において音声を入力すると、ステップＳ１００２で、音声認識部１０２が音声認識を実行する。この呼び出し時においては、登録時とは異なる登録文法１０９を用いる。そして、ステップＳ１００３で、確認情報生成部１０６が認識結果から確認情報を取得する。確認情報は図８、図９に示す登録文法１０９の表記に登録しているので、これを取得する。この際に、確認情報生成部１０６は、取得した確認情報に加えて呼び出し時の状況に応じて確認情報をさらに付加しても良い。こうして得た確認情報を、ステップＳ１００４で、提示部１０７が提示する。以上のステップを経て、制御手段としてのＣＰＵ１は、ステップＳ１００２での認識結果に対応する語ＩＤを含む登録上タイ情報から状態属性を読み出して、システム状態を、ユーザが所望した呼び出しに係る状態に遷移させる。 As described above, when the registered state is called up by voice recognition, the same confirmation information as that at the time of registration can be presented. FIG. 10 shows a processing flow at the time of calling in the present embodiment. When voice is input in step S1001 at the time of calling, the voice recognition unit 102 performs voice recognition in step S1002. At the time of calling, a registration grammar 109 different from that at the time of registration is used. In step S1003, the confirmation information generation unit 106 acquires confirmation information from the recognition result. Since the confirmation information is registered in the notation of the registration grammar 109 shown in FIGS. 8 and 9, this is acquired. At this time, the confirmation information generation unit 106 may further add confirmation information according to the situation at the time of calling in addition to the acquired confirmation information. The presenting unit 107 presents the confirmation information thus obtained in step S1004. Through the above steps, the CPU 1 as the control means reads the state attribute from the registered tie information including the word ID corresponding to the recognition result in step S1002, and changes the system state to the state relating to the call desired by the user. Transition.

なお、本実施形態では、登録時に生成した確認情報を登録文法１０９の表記に登録する形態で説明しているが、本発明はこれに限るものではなく、例えば登録状態情報１１０において、状態ごとに確認情報という項目を追加して確認情報を登録しても良い。 In the present embodiment, the confirmation information generated at the time of registration is described as being registered in the notation of the registration grammar 109. However, the present invention is not limited to this. For example, in the registration status information 110, for each status, Confirmation information may be registered by adding an item of confirmation information.

以上の実施形態１によれば、キーワード列による認識結果が選択される場合とサブワード列による認識結果が選択される場合とで、確認情報の生成方法が変更される。これにより、任意語の登録を可能としながら、ユーザに登録した状態に関する情報を自然なかたちで提示することができる。 According to the first embodiment described above, the method for generating confirmation information is changed between the case where the recognition result based on the keyword string is selected and the case where the recognition result based on the subword string is selected. As a result, it is possible to present information relating to the state registered to the user in a natural manner while allowing arbitrary words to be registered.

（実施形態２）
上述の実施形態１では図３の処理フローに示す通り、状態を登録する時に、キーワード列とサブワード列の認識結果に応じて確認情報の生成方法を切り替えて提示している。しかし本発明はこれに限るものではなく、状態の呼び出し時に確認情報の生成方法を切り替えて提示しても良い。 (Embodiment 2)
In the first embodiment, as shown in the processing flow of FIG. 3, when the state is registered, the confirmation information generation method is switched according to the recognition result of the keyword string and the subword string. However, the present invention is not limited to this, and the confirmation information generation method may be switched and presented when the state is called.

この場合、登録文法１０９又は登録状態情報１１０に登録する単語がキーワード文法とサブワード文法とのいずれから得られた単語であるかを表す登録時文法フラグを備える（例えば、キーワード文法は０、サブワード文法は１とフラグで表す）。登録時には認識結果に応じてフラグを設定する。 In this case, a registration grammar flag indicating whether a word to be registered in the registration grammar 109 or the registration status information 110 is a word obtained from either the keyword grammar or the subword grammar is provided (for example, the keyword grammar is 0, the subword grammar). Is represented by 1 and a flag). At the time of registration, a flag is set according to the recognition result.

呼び出し時の処理フローを図１１に示す。同図において、まずステップＳ１１０１で音声入力部１０１が音声を入力する。次に、ステップＳ１１０２で、音声認識部１０２が登録文法１０９を用いて音声認識する。次にステップＳ１１０３で、登録時文法フラグを取得する。そしてステップＳ１１０４において、登録時文法フラグがキーワード文法を表す場合、ステップＳ１１０５において確認情報生成部１０６が認識結果の表記に基づいて確認情報を生成する。一方、登録時文法フラグがサブワード文法を表す場合には、ステップＳ１１０６において確認情報生成部１０６が認識結果に対応する状態から得られる情報に基づいて確認情報を生成する。 A processing flow at the time of calling is shown in FIG. In the figure, first in step S1101, the voice input unit 101 inputs voice. Next, in step S1102, the speech recognition unit 102 recognizes speech using the registered grammar 109. In step S1103, a registration grammar flag is acquired. If the registration grammar flag represents keyword grammar in step S1104, the confirmation information generation unit 106 generates confirmation information based on the recognition result notation in step S1105. On the other hand, when the registration grammar flag represents the subword grammar, the confirmation information generation unit 106 generates confirmation information based on information obtained from the state corresponding to the recognition result in step S1106.

（実施形態３）
上述の実施形態は、認識結果としてサブワード列が得られた場合には番組情報２０３に基づいて確認情報を生成するものであった。これに対して本実施形態では、サブワード列が得られた場合にはユーザの発声した音声を確認情報として再生する。本実施形態においては、提示部１０７の提示方法は、スピーカ６を介した音声出力による提示となる。 (Embodiment 3)
In the above embodiment, confirmation information is generated based on the program information 203 when a subword string is obtained as a recognition result. On the other hand, in this embodiment, when a subword string is obtained, the voice uttered by the user is reproduced as confirmation information. In the present embodiment, the presentation method of the presentation unit 107 is presentation by audio output via the speaker 6.

図１２に、本実施形態において音声認識で状態を登録する処理フローを示す。ステップＳ１２０１からステップＳ１２０３までは、図３のステップＳ３０１からステップＳ３０３までと同様である。ステップＳ１２０３において、認識結果がキーワード列であった場合は、ステップＳ１２０４において、認識結果のキーワード列に基づいて音声合成で確認情報の音声を生成する。一方、認識結果がサブワード列であった場合は、ステップＳ１２０５において、音声入力部１０１で入力し音声認識部１０２に用いた入力音声を確認情報として利用する。例えばユーザが“山ちゃんのダビデ王”と発声した場合、入力音声である“山ちゃんのダビデ王”をそのまま確認情報とする。またさらに、その前後に合成音声を接続しても良い。そしてステップＳ１２０７において、登録部１０８が認識結果と状態とを対応付けて、登録文法１０９及び登録状態情報１１０に登録する。 FIG. 12 shows a processing flow for registering a state by voice recognition in this embodiment. Steps S1201 to S1203 are the same as steps S301 to S303 in FIG. In step S1203, if the recognition result is a keyword string, in step S1204, a voice of confirmation information is generated by voice synthesis based on the keyword string of the recognition result. On the other hand, if the recognition result is a subword string, in step S1205, the input voice input by the voice input unit 101 and used by the voice recognition unit 102 is used as confirmation information. For example, when the user utters “Yama-chan King David”, the input voice “Yama-chan King David” is used as confirmation information. Furthermore, synthesized speech may be connected before and after that. In step S 1207, the registration unit 108 associates the recognition result with the state and registers them in the registration grammar 109 and the registration state information 110.

図１３に入力音声に基づいて確認情報を生成する様子を示す。同図において、“山ちゃんのダビデ王”はユーザの任意発声であるため、入力音声をそのまま使う。前後の“番組”、“でよろしいですか？”は定型文なので装置があらかじめ有する録音音声、あるいはテキストから合成した合成音声である。これを入力音声に接続してできた音声を確認情報として、ステップＳ１２０６で提示部１０７が確認情報の音声を出力する。 FIG. 13 shows how confirmation information is generated based on input voice. In the figure, “Yama-chan's David King” is an arbitrary utterance of the user, so the input voice is used as it is. Since the preceding and following “programs” and “Are you sure?” Are standard sentences, they are recorded voices that the device has in advance or synthesized voices synthesized from texts. In step S1206, the presentation unit 107 outputs the confirmation information sound, using the sound formed by connecting this to the input sound as confirmation information.

（実施形態４）
上述の実施形態では、キーワード文法とサブワード文法とによる音声認識を実行し、どちらの文法から最もスコアの高い認識結果が得られたかで確認情報の生成方法を切り替えている。 (Embodiment 4)
In the embodiment described above, speech recognition based on the keyword grammar and subword grammar is executed, and the generation method of the confirmation information is switched depending on which grammar yields the recognition result with the highest score.

しかしサブワード文法はあらゆる発声を認識可能なので、キーワード文法にあてはまる発声でも、サブワード列の方が高いスコアを得る可能性もある。例えば図６における発声“山田太郎のダビデ王”に対して、サブワード列のほうが高いスコアを得ることがある。 However, since the subword grammar can recognize any utterance, even if the utterance is applicable to the keyword grammar, the subword string may obtain a higher score. For example, for the utterance “Taro Yamada of King David” in FIG. 6, the subword string may obtain a higher score.

そこで本実施形態では、過去の履歴によりキーワード文法による認識結果のスコア、サブワード文法による認識結果のスコアに重みをつける。例えば、過去の認識結果において、音声認識部１０２がサブワード列を出力した回数に対してキーワード列を出力した回数が所定の割合を超える場合、所定の定数の重みを認識結果に加える。ただし過去の履歴は回数の割合に限らず、例えば頻度情報でも良い。また重みの掛け方は定数に限らずいかなる方法でも構わない。 Therefore, in the present embodiment, weights are assigned to the recognition result score based on the keyword grammar and the recognition result score based on the subword grammar based on the past history. For example, in the past recognition results, when the number of times that the speech recognition unit 102 has output a keyword string exceeds the number of times that the speech recognition unit 102 has output a subword string, a predetermined constant weight is added to the recognition result. However, the past history is not limited to the ratio of the number of times, and may be frequency information, for example. Further, the method of applying the weight is not limited to a constant, and any method may be used.

このように重みをかけることで、ユーザがキーワード列、サブワード列のどちらで登録しやすいかを過去の履歴から判断して選択することができる。 By applying weights in this way, it is possible to select from the past history whether the user can easily register a keyword string or a subword string.

（実施形態５）
ユーザが音声入力する際に、キーワード文法に登録されているキーワードをユーザに表示しても良い。キーワードを表示することで、サブワードで任意の発声を受け付けながらも、ユーザになるべくキーワードで発声するよう促すことができる。 (Embodiment 5)
When the user inputs a voice, the keyword registered in the keyword grammar may be displayed to the user. By displaying the keyword, it is possible to prompt the user to speak with the keyword as much as possible while accepting any utterance with the subword.

（実施形態６）
上述の実施形態では、電子番組表２０１上で選択している番組２０２から得られる番組情報２０３の全てを利用してキーワード２０４を取得してキーワード文法を生成している。このかわりに、同じ状態でもユーザへ表示されているキーワードのみを用いてキーワード文法を生成する、すなわち、ユーザへの表示に応じて取得するキーワードを変えるようにすることも可能である。 (Embodiment 6)
In the above-described embodiment, the keyword grammar is generated by acquiring the keyword 204 using all of the program information 203 obtained from the program 202 selected on the electronic program guide 201. Instead, it is also possible to generate a keyword grammar using only the keywords displayed to the user even in the same state, that is, to change the keywords to be acquired according to the display to the user.

図１４に、ユーザへの表示に応じて取得するキーワードを変える様子を示す。同図において、１４０１は番組表画面上である番組を選択した状態である。１４０２は、同じ番組を選択した状態で番組の詳細情報を画面表示している。１４０３は、１４０１において取得するキーワードである。１４０４は、１４０２において取得するキーワードである。同図に示すように、１４０１においてはユーザが表示から得られる番組情報はチャンネル番号、放送局名、日時、番組名であるので、その番組情報のみをキーワード１４０３として取得する。一方、１４０２においては番組情報としてジャンルや出演者も表示されているため、これらも加えてキーワード１４０４として取得する。 FIG. 14 shows how the keyword to be acquired is changed according to the display to the user. In the figure, reference numeral 1401 denotes a state in which a program on the program guide screen is selected. 1402 displays detailed information of the program on the screen while the same program is selected. 1403 is a keyword acquired in 1401. 1404 is a keyword acquired in 1402. As shown in the figure, in 1401, since the program information obtained from the display by the user is a channel number, a broadcasting station name, a date and time, and a program name, only the program information is acquired as a keyword 1403. On the other hand, since a genre and performers are also displayed as program information in 1402, these are also added and acquired as a keyword 1404.

このように、ユーザへ表示されている情報に応じてキーワードを取得してキーワード文法に登録することで、ユーザがキーワードとして発声し得る情報を絞り込んで精度良く登録できる。 Thus, by acquiring a keyword according to information displayed to the user and registering it in the keyword grammar, information that can be uttered by the user as a keyword can be narrowed down and registered with high accuracy.

（実施形態７）
上述の実施形態ではキーワード文法生成部１０３が登録する状態からキーワードを取得してキーワード文法１０４を生成している。しかし、キーワード文法生成部１０３は必須ではなく、キーワード文法１０４を状態ごとにあらかじめ別の装置などで生成しておいても良い。音声認識部１０２の認識結果がキーワード列の場合にキーワードに基づいて確認情報を生成し、その一方で認識結果がサブワード列の場合には状態の情報に基づいて確認情報を生成する点が、本発明の重要な特徴である。 (Embodiment 7)
In the above-described embodiment, the keyword grammar 104 is generated by acquiring the keyword from the state registered by the keyword grammar generation unit 103. However, the keyword grammar generation unit 103 is not essential, and the keyword grammar 104 may be generated in advance by another device for each state. When the recognition result of the speech recognition unit 102 is a keyword string, confirmation information is generated based on a keyword. On the other hand, when the recognition result is a subword string, confirmation information is generated based on state information. It is an important feature of the invention.

（他の実施形態）
以上、本発明の実施形態を詳述したが、本発明は、複数の機器から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 (Other embodiments)
As mentioned above, although embodiment of this invention was explained in full detail, this invention may be applied to the system comprised from several apparatuses, and may be applied to the apparatus which consists of one apparatus.

なお、本発明は、前述した実施形態の各機能を実現するプログラムを、システム又は装置に直接又は遠隔から供給し、そのシステム又は装置に含まれるコンピュータがその供給されたプログラムコードを読み出して実行することによっても達成される。 In the present invention, a program for realizing each function of the above-described embodiments is supplied directly or remotely to a system or apparatus, and a computer included in the system or apparatus reads and executes the supplied program code. Can also be achieved.

したがって、本発明の機能・処理をコンピュータで実現するために、そのコンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、上記機能・処理を実現するためのコンピュータプログラム自体も本発明の一つである。 Accordingly, since the functions and processes of the present invention are implemented by a computer, the program code itself installed in the computer also implements the present invention. That is, the computer program itself for realizing the functions and processes is also one aspect of the present invention.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 In this case, the program may be in any form as long as it has a program function, such as an object code, a program executed by an interpreter, or script data supplied to the OS.

プログラムを供給するためのコンピュータ読み取り可能な記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷなどがある。また、記録媒体としては、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などもある。 Examples of the computer-readable recording medium for supplying the program include a flexible disk, a hard disk, an optical disk, a magneto-optical disk, an MO, a CD-ROM, a CD-R, and a CD-RW. Examples of the recording medium include a magnetic tape, a non-volatile memory card, a ROM, a DVD (DVD-ROM, DVD-R), and the like.

また、プログラムは、クライアントコンピュータのブラウザを用いてインターネットのホームページからダウンロードしてもよい。すなわち、ホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスク等の記録媒体にダウンロードしてもよい。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードする形態も考えられる。つまり、本発明の機能・処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明の構成要件となる場合がある。 The program may be downloaded from a homepage on the Internet using a browser on a client computer. That is, the computer program itself of the present invention or a compressed file including an automatic installation function may be downloaded from a home page to a recording medium such as a hard disk. Further, it is also possible to divide the program code constituting the program of the present invention into a plurality of files and download each file from a different home page. That is, a WWW server that allows a plurality of users to download a program file for realizing the functions and processing of the present invention on a computer may be a constituent requirement of the present invention.

また、本発明のプログラムを暗号化してコンピュータ読み取り可能なＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記憶媒体に格納してユーザに配布してもよい。この場合、所定条件をクリアしたユーザにのみ、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報で暗号化されたプログラムを復号して実行し、プログラムをコンピュータにインストールしてもよい。 The program of the present invention may be encrypted and stored in a computer-readable storage medium such as a computer-readable CD-ROM and distributed to users. In this case, only the user who cleared the predetermined condition is allowed to download the key information to be decrypted from the homepage via the Internet, decrypt the program encrypted with the key information, execute it, and install the program on the computer May be.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現されてもよい。なお、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部又は全部を行ってもよい。もちろん、この場合も、前述した実施形態の機能が実現され得る。 Further, the functions of the above-described embodiments may be realized by the computer executing the read program. Note that an OS or the like running on the computer may perform part or all of the actual processing based on the instructions of the program. Of course, also in this case, the functions of the above-described embodiments can be realized.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれてもよい。そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行ってもよい。このようにして、前述した実施形態の機能が実現されることもある。 Furthermore, the program read from the recording medium may be written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Based on the instructions of the program, a CPU or the like provided in the function expansion board or function expansion unit may perform part or all of the actual processing. In this way, the functions of the above-described embodiments may be realized.

本発明の実施形態１に係る情報処理装置における音声認識機能の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition function in the information processing apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る情報処理装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the information processing apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１におけるキーワード文法を生成する処理を表す図である。It is a figure showing the process which produces | generates the keyword grammar in Embodiment 1 of this invention. 本発明の実施形態１における状態の登録処理の流れを表すフローチャートである。It is a flowchart showing the flow of the state registration process in Embodiment 1 of this invention. キーワード文法を表す図である。It is a figure showing keyword grammar. サブワード文法を表す図である。It is a figure showing subword grammar. 本発明の実施形態１における、キーワード文法の認識結果に基づいて確認情報を生成して提示する処理を説明する図である。It is a figure explaining the process which produces | generates and shows confirmation information based on the recognition result of keyword grammar in Embodiment 1 of this invention. 本発明の実施形態１における、認識結果がサブワード列となる場合に番組情報に基づいて確認情報を生成して提示する処理を説明する図である。It is a figure explaining the process which produces | generates and shows confirmation information based on program information, when the recognition result turns into a subword sequence in Embodiment 1 of this invention. 本発明の実施形態１における、キーワード文法の認識結果と状態とを登録文法及び登録状態情報に登録する処理を説明する図である。It is a figure explaining the process which registers the recognition result and state of keyword grammar in registration grammar and registration status information in Embodiment 1 of this invention. 本発明の実施形態１における、サブワード文法の認識結果と状態とを登録文法及び登録状態情報に登録する処理を説明する図である。It is a figure explaining the process which registers the recognition result and state of subword grammar in registration grammar and registration status information in Embodiment 1 of this invention. 本発明の実施形態１における、状態の呼び出し処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the call process of a state in Embodiment 1 of this invention. 本発明の実施形態２における、状態の呼び出し処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the calling process of the state in Embodiment 2 of this invention. 本発明の実施形態３における状態の登録処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the registration process of the state in Embodiment 3 of this invention. 本発明の実施形態３における、入力音声に基づく確認情報生成の処理を説明する図である。It is a figure explaining the process of the confirmation information generation based on the input sound in Embodiment 3 of this invention. 本発明の実施形態６における、ユーザへの表示に応じて取得するキーワードを変える処理を説明する図である。It is a figure explaining the process which changes the keyword acquired according to the display to a user in Embodiment 6 of this invention. テレビの電子番組表上の番組を音声で登録するアプリケーションを表す図である。It is a figure showing the application which registers the program on the electronic program guide of a television with an audio | voice. キーワード文法、サブワード文法それぞれを使った音声認識結果を表す図である。It is a figure showing the speech recognition result using each of keyword grammar and subword grammar.

Explanation of symbols

１０１音声入力部
１０２音声認識部
１０３キーワード文法生成部
１０４キーワード文法
１０５サブワード文法
１０６確認情報生成部
１０７提示部
１０８登録部
１０９登録文法
１１０登録状態情報 101 voice input unit 102 voice recognition unit 103 keyword grammar generation unit 104 keyword grammar 105 subword grammar 106 confirmation information generation unit 107 presentation unit 108 registration unit 109 registration grammar 110 registration status information

Claims

An information processing apparatus capable of registering a state of an application selected by a user according to a user's registration operation, calling the registered state according to a user's call operation, and transitioning to the state.
Voice input means for inputting user's voice;
Recognition of a person who obtains a good recognition score by performing speech recognition using a keyword grammar based on a keyword related to the state and speech recognition using a subword grammar based on a subword for the speech input from the speech input means Voice recognition means for outputting the results;
When the speech recognition means outputs a recognition result of speech recognition using the keyword grammar, confirmation information composed of a keyword string of the recognition result is presented to the user, while the speech recognition means When outputting a recognition result of speech recognition using grammar, a presentation means for presenting confirmation information made up of attribute information associated with the state to the user;
Registration means for registering the recognition result output by the voice recognition means and the state in association with each other;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, further comprising a keyword grammar generation unit configured to generate the keyword grammar based on a keyword extracted from the attribute information.

When the speech recognition means outputs a recognition result of speech recognition using the keyword grammar, the presenting means outputs a synthesized speech of the keyword sequence of the recognition result, while the speech recognition means uses the subword grammar. The information processing apparatus according to claim 1 or 2, wherein when the recognition result of the used voice recognition is output, the voice input to the voice input unit is output.

The speech recognition means weights the recognition score based on a frequency at which a recognition result of speech recognition using the keyword grammar is output and a frequency at which a recognition result of speech recognition using the subword grammar is output. The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus is characterized in that:

5. The display device according to claim 1, further comprising a display unit configured to display a keyword registered in the keyword grammar to a user when a voice is input by the voice input unit. The information processing apparatus described.

The information processing apparatus according to claim 2, wherein the keyword grammar generation unit extracts a keyword corresponding to attribute information displayed by the application in the state.

The registration means includes
A registered grammar including a word ID for identifying a registered word, a notation, and a reading for a registered word corresponding to the recognition result output by the voice recognition unit;
Registration state information including a registration ID for identifying the state related to registration, a state attribute representing the content of the state, and a word ID of a registered word in the registration grammar corresponding to the state;
The information processing apparatus according to any one of claims 1 to 6, characterized in that

The registration means includes
When the speech recognition means outputs a recognition result of speech recognition using the keyword grammar, the keyword string that is the recognition result is represented as a registered word in the registered grammar,
When the speech recognition means outputs a recognition result of speech recognition using the subword grammar, the subword string that is the recognition result is read as a registered word in the registered grammar. The information processing apparatus described.

When calling up the registered state,
The voice recognition means performs voice recognition using the registered grammar for the voice input by the voice input means,
The presenting means presents confirmation information at the time of calling based on the recognition result output by the voice recognition means,
The information processing apparatus transitions to the state based on the state attribute read from the registered state information including a word ID corresponding to a recognition result output by the voice recognition unit. Information processing device.

At the time of calling up the state, the presenting means confirms when the voice recognition means outputs a recognition result of voice recognition using the keyword grammar at the time of registration of the state, and is constituted by a keyword string of the recognition result When the speech recognition means outputs the recognition result of speech recognition using the subword grammar at the time of registration of the state while presenting information to the user, the confirmation made up of attribute information associated with the state The information processing apparatus according to claim 9, wherein information is presented to a user.

An information processing method in an information processing apparatus capable of registering a state of an application selected by a user according to a user's registration operation and calling the registered state according to a user's calling operation and transitioning to the state. ,
A voice input step for inputting the user's voice;
Recognition of a person who obtains a good recognition score by performing speech recognition using a keyword grammar based on a keyword related to the state and speech recognition using a subword grammar based on a subword for the speech input in the speech input step A speech recognition step for outputting the results;
When a recognition result of speech recognition using the keyword grammar is output in the speech recognition step, confirmation information including a keyword string of the recognition result is presented to the user, while the speech recognition step When a recognition result of speech recognition using subword grammar is output, a presentation step of presenting confirmation information made up of attribute information associated with the state to the user;
A registration step in which the recognition result output in the voice recognition step and the state are associated with each other and registered in the storage unit;
An information processing method characterized by comprising:

A program for causing a computer to execute the information processing method according to claim 11.

A computer-readable storage medium storing the program according to claim 12.