JP2001282285A5

JP2001282285A5 -

Info

Publication number: JP2001282285A5
Application number: JP2000097336A
Authority: JP
Filing date: 2000-03-31
Publication date: 2004-08-19
Anticipated expiration: 2020-03-31

Description

【０００３】
図５に従来の音声認識装置の構成図を示し、以下に説明する。
音声認識を開始する前に、音声認識の対象となる複数の単語を予め定義された複数のジャンルに分けて単語セット５０として記憶し、辞書生成部５１により複数の単語セット５０の音声特徴を抽出して辞書ファイル５２に記憶させ、音声入力の前に辞書ロード部５３は辞書ロード指示部５４の指示された辞書ファイル５２を辞書メモリ５５にロードする。0003
FIG. 5 shows a configuration diagram of a conventional voice recognition device, which will be described below.
Before starting voice recognition, a plurality of words to be voice recognition are divided into a plurality of predefined genres and stored as a word set 50, and the dictionary generation unit 51 extracts the voice features of the plurality of word sets 50. Then, it is stored in the dictionary file 52, and the dictionary loading unit 53 loads the instructed dictionary file 52 of the dictionary loading instruction unit 54 into the dictionary memory 55 before voice input.

【０００５】
【発明が解決しようとする課題】
しかしながら、これらの従来技術を用いても認識対象語彙は、予め定めたジャンルに従って作成されるためにユーザーの意図としたものとは限らず、さらに定められたジャンルが変わるたびに辞書ファイルのロードを繰り返すことが必要となり、ユーザーの意図に合った音声認識結果を得るためにはユーザー操作が増えるなどの課題があった。0005
[Problems to be Solved by the Invention]
However, even if these conventional techniques are used, the vocabulary to be recognized is not always intended by the user because it is created according to a predetermined genre, and the dictionary file is loaded every time the predetermined genre changes. It is necessary to repeat the above, and there are problems such as an increase in user operations in order to obtain a voice recognition result that matches the user's intention.

【０００６】
本発明は、ユーザーの嗜好に合わせて認識対象語彙を少ないユーザー操作で絞り込むことにより、音声認識性能の向上を図ることを目的とする。0006
An object of the present invention is to improve the speech recognition performance by narrowing down the recognition target vocabulary according to the user's preference with a small number of user operations.

【００１０】
【発明の実施の形態】
本発明の請求項１に記載の発明は、音声認識の対象とすべき全単語を格納した認識対象候補語彙から予め取得したユーザーの嗜好情報に基づいた認識対象語彙を予め作成し、入力される音声と認識対象語彙とから単語を認識することを特徴とするもので、ユーザーの嗜好に合わせて認識対象語彙を絞り込むことにより音声認識性能を向上させるという作用を有する。また、認識対象候補となりうる全語彙を対象に音声認識を行うよりも処理が簡便で早くなる方法である。
請求項２に記載の発明は、入力された音声と音声認識の対象の単語である認識対象語彙とから前記単語を認識する音声認識によって得られる認識得点と、予め取得したユーザーの嗜好情報に基づいた前記認識対象語彙の嗜好得点とを使用して、認識結果を決定することを特徴とする音声認識方法とするものであり、認識得点に嗜好得点も加味することにより、ユーザーがより発声しそうな内容に重みをつけて結果を出力して、ユーザーの全く意図しない結果を出現させるケースを大幅に低減すると同時に、意図する結果を高い確率で出現させるという作用を有する。0010
BEST MODE FOR CARRYING OUT THE INVENTION [Embodiments of the Invention]
In the invention according to claim 1 of the present invention, a recognition target vocabulary based on user preference information acquired in advance from a recognition target candidate vocabulary storing all words to be voice recognition is created and input in advance. It is characterized by recognizing a word from a voice and a recognition target vocabulary, and has an effect of improving the voice recognition performance by narrowing down the recognition target vocabulary according to the user's preference. In addition, it is a method that is simpler and faster than performing speech recognition for all vocabularies that can be recognition target candidates.
The invention according to claim 2 is based on a recognition score obtained by voice recognition for recognizing the word from a input voice and a recognition target vocabulary which is a word to be voice-recognized, and a user's preference information acquired in advance. The voice recognition method is characterized in that the recognition result is determined by using the preference score of the recognition target vocabulary, and by adding the preference score to the recognition score, the user is more likely to speak. It has the effect of weighting the content and outputting the result, greatly reducing the cases where the user's completely unintended result appears, and at the same time, causing the intended result to appear with a high probability.

【００１１】
請求項３に記載の発明は、請求項１又は２記載の音声認識方法において、ユーザーの嗜好情報は、ユーザーが入力するユーザー識別記号、ユーザーの発声、ユーザーの画像又は前記ユーザーの嗜好情報を選択する時間から選択された少なくとも一つの情報である音声認識方法とするもので、本発明の嗜好情報の学習に関わるものであり、好きまたは嫌いの単純択一を使用することにより、ユーザーに負担をかけずに嗜好を学習することができるという作用を有する。さらに、好きまたは嫌いを利用した学習を用いることにより、音声操作の即動性、取り扱いの簡便性を損なうことなく嗜好情報を学習することができる。[0011]
The invention according to claim 3 is the voice recognition method according to claim 1 or 2 , wherein the user's preference information selects a user identification symbol, a user's voice, a user's image, or the user's preference information input by the user. It is a voice recognition method that is at least one piece of information selected from the time to be performed, and is related to learning the preference information of the present invention. It has the effect of being able to learn tastes without spending time. Furthermore, by using learning using likes or dislikes, it is possible to learn preference information without impairing the promptness of voice operation and the ease of handling.

【００１２】
請求項４に記載の発明は、請求項１ないし３のいずれか記載の音声認識方法を用いて、番組指定をする番組指定方法としたものであり、デジタルテレビ放送や、オン・デ・マンド映画／カラオケ配信サービスを利用に適用したもので、ユーザーの嗜好にあった番組選択ができるという作用を有する。
請求項５に記載の発明は、ユーザーの嗜好情報は、番組の視聴側による中断時、番組の配信側による中断時又は番組終了時に、学習される情報である請求項４記載の番組指定方法としたものであり、番組内容を視聴した後にユーザーの嗜好情報が学習されることにより、よりユーザーの嗜好にあった番組選択ができるという作用を有する。
請求項６に記載の発明は、ユーザーの嗜好情報は、番組の視聴履歴を用いて学習した情報である請求項４又は５記載の番組指定方法としたものであり、番組の視聴履歴を用いてユーザーの嗜好情報を学習するため、ユーザーの手間を省略し、容易にユーザーの嗜好に
あった学習ができるという作用を有する。
請求項７に記載の発明は、番組の視聴履歴は、電子番組ガイドで提供される時間帯、ジャンル、出演者、出演グループ名、番組名、番組内容、テーマ、音楽、内容キーワード、ユーザー名の少なくとも１つを構成要素として持つ情報である請求項６記載の番組指定方法としたものであり、ユーザーの嗜好情報の内容を指定することにより、よりユーザーの嗜好にあった番組選択ができるという作用を有する。
請求項８に記載の発明は、ユーザーの発声する音声を入力する音声入力部と、ユーザーの嗜好情報に基づいて音声認識の対象とすべき全単語を格納した認識対象候補語彙から認識対象語彙を作成する認識対象語彙作成部と、前記音声入力部から入力される音声と前記認識対象語彙とから単語を認識する認識部とを含む音声認識装置としたものであり、ユーザーの嗜好に合わせて認識対象語彙を絞り込むことにより音声認識性能を向上させるという作用を有する。 [0012]
The invention according to claim 4 is a program designation method for designating a program by using the voice recognition method according to any one of claims 1 to 3, and is used for digital television broadcasting and on-de-mand movies. / It is applied to the use of the karaoke distribution service, and has the effect of being able to select a program that suits the user's taste.
The invention according to claim 5 is the program designation method according to claim 4, wherein the user's preference information is information to be learned when the program is interrupted by the viewing side, when the program is interrupted by the distribution side, or when the program ends. By learning the user's preference information after viewing the program content, it has the effect of being able to select a program that better suits the user's preference.
The invention according to claim 6 is the program designation method according to claim 4 or 5, wherein the user's preference information is information learned by using the viewing history of the program, and the viewing history of the program is used. Since the user's preference information is learned, the user's trouble is saved and the user's preference can be easily adjusted.
It has the effect of being able to learn properly.
According to the invention of claim 7, the viewing history of the program is the time zone, genre, performer, appearance group name, program name, program content, theme, music, content keyword, and user name provided by the electronic program guide. The program designation method according to claim 6, which is information having at least one as a component, is used, and by designating the content of the user's preference information, the program can be selected more according to the user's preference. Has.
The invention according to claim 8 is to obtain a recognition target vocabulary from a voice input unit for inputting a voice uttered by a user and a recognition target candidate vocabulary storing all words to be voice recognition based on the user's preference information. It is a voice recognition device including a recognition target vocabulary creation unit to be created, a recognition unit that recognizes a word from the voice input from the voice input unit and the recognition target vocabulary, and recognizes according to the user's preference. It has the effect of improving speech recognition performance by narrowing down the target vocabulary.

【００１３】
請求項９記載の発明は、ユーザーの発声する音声を入力する音声入力部と、ユーザーの嗜好情報に基づいて音声認識の対象とすべき全単語を格納した認識対象候補語彙から認識対象語彙を作成する認識対象語彙作成部と、前記ユーザーの嗜好情報に基づいた前記認識対象語彙の嗜好得点を算出する嗜好得点計算部と、前記音声入力部から入力される音声と前記認識対象語彙とから単語の認識得点を計算する認識部と、前記嗜好得点と前記認識得点とを使用して、認識結果を決定し出力する認識結果決定部とを含む音声認識装置としたものであり、ユーザーの嗜好に合わせて認識対象語彙を絞り込むことにより音声認識性能を向上させるという作用を有する。0013
The invention according to claim 9 creates a recognition target vocabulary from a voice input unit for inputting a voice uttered by the user and a recognition target candidate vocabulary storing all words to be voice recognition based on the user's preference information. A recognition target vocabulary creation unit, a preference score calculation unit that calculates a preference score of the recognition target vocabulary based on the user's preference information, a voice input from the voice input unit, and a word from the recognition target vocabulary. It is a voice recognition device including a recognition unit for calculating a recognition score and a recognition result determination unit for determining and outputting a recognition result using the preference score and the recognition score, according to the user's preference. It has the effect of improving speech recognition performance by narrowing down the vocabulary to be recognized.

【００１４】
請求項１０記載の発明は、一つまたは複数のユーザーの嗜好情報を保持する嗜好情報記憶部を、更に、含み、前記ユーザーの嗜好情報は、ユーザーが入力するユーザー識別記号、ユーザーの発声、ユーザーの画像又は前記ユーザーの嗜好情報を選択する時間の少なくとも１つによって選択される情報である請求項８又は９記載の音声認識装置とするものであり、１つまたは複数の嗜好情報から認識対象語彙選択に使用する嗜好情報を選択することにより、ユーザーの嗜好に合わせて認識対象語彙を絞り込むことにより音声認識性能を向上させるという作用を有する。0014.
The invention according to claim 10 further includes a preference information storage unit that holds preference information of one or more users, and the user preference information includes a user identification symbol input by the user, a user's voice, and a user. The voice recognition device according to claim 8 or 9, which is information selected by at least one of the time to select the image or the user's preference information, and is a recognition target vocabulary from one or more preference information. By selecting the preference information used for selection, it has the effect of improving the speech recognition performance by narrowing down the recognition target vocabulary according to the user's preference.

【００１９】
請求項１１に記載の発明は、請求項８ないし１０のいずれか記載の音声認識装置を用いて、番組指定をすることを特徴とする番組指定装置としたものであり、デジタルテレビ放送や、オン・デ・マンド映画／カラオケ配信サービスを利用する時の番組指定装置に適用したもので、ユーザーの嗜好にあった番組選択ができるという作用を有する。[0019]
The invention according to claim 11 is a program designation device characterized in that a program is designated by using the voice recognition device according to any one of claims 8 to 10, and is used for digital television broadcasting and on. -It is applied to the program designation device when using the de Mand movie / karaoke distribution service, and has the effect of being able to select a program that suits the user 's taste.

【００２０】
請求項１２に記載の発明は、番組の視聴側による中断時、番組の配信側による中断時又は番組終了時に、ユーザーの嗜好情報の入力を促す嗜好情報作成部を更に含む請求項１１記載の番組指定装置としたものであり、番組指定装置がユーザーから嗜好情報を学習する場合に、番組の切れ目などを狙って、積極的に嗜好を尋ねることにより、よりユーザーの嗜好にあった番組選択ができるという作用を有する。
請求項１３に記載の発明は、ユーザーの嗜好情報は、番組の視聴履歴を用いて学習した情報である請求項１１又は１２記載の番組指定装置とするもので、番組の視聴履歴を用いて嗜好情報の学習するため、ユーザーの手間を省略し、容易にユーザーの嗜好にあった学習ができるという作用を有する。0020
The program according to claim 12 , further comprising a preference information creating unit that prompts the user to input preference information at the time of interruption by the viewing side of the program, interruption by the distribution side of the program, or the end of the program. It is a designated device, and when the program designation device learns preference information from the user, it is possible to select a program that better suits the user's preference by actively asking for the preference, aiming at the break of the program. It has the effect of.
The invention according to claim 13 is the program designation device according to claim 11 or 12, wherein the user's preference information is information learned by using the viewing history of the program, and the preference is obtained by using the viewing history of the program. Since information is learned, it saves the user 's time and effort and has the effect of easily learning according to the user's taste.

【００２１】
請求項１４に記載の発明は、番組の視聴履歴は、電子番組ガイドで提供される時間帯、ジャンル、出演者、出演グループ名、番組名、番組内容、テーマ、音楽、内容キーワード、ユーザー名の少なくとも１つを構成要素として持つ情報である請求項１３記載の番組指定装置とするものであり、嗜好情報の内容を指定することにより、よりユーザーの嗜好にあった番組選択ができるという作用を有する。0021.
According to the invention of claim 14 , the viewing history of the program is the time zone, genre, performer , appearance group name, program name, program content, theme, music, content keyword, user name provided by the electronic program guide. The program designation device according to claim 13, which is information having at least one of the above as a component, and by designating the content of the preference information, it is possible to select a program more suitable for the user 's preference. It has the effect of.

【００２２】
請求項１５に記載の発明は、音声認識の対象とすべき全単語を格納した認識対象候補語彙から予め取得したユーザーの嗜好情報に基づいた認識対象語彙を予め作成し、入力される音声と認識対象語彙とから単語を認識することを特徴とする音声認識方法をコンピュータに実行させるためのプログラムユーザーの嗜好に合わせて認識対象語彙を絞り込むことにより音声認識性能を向上させるという作用を有する。また、認識対象候補となりうる全語彙を対象に音声認識を行うよりも処理が簡便で早くなる方法である。
請求項１６に記載の発明は、入力された音声と音声認識の対象の単語である認識対象語彙とから前記単語を認識する音声認識によって得られる認識得点と、予め取得したユーザーの嗜好情報に基づいた前記認識対象語彙の嗜好得点とを使用して、認識結果を決定することを特徴とする音声認識方法をコンピュータに実行させるためのプログラムとしたものであり、認識得点に嗜好得点も加味することにより、ユーザーがより発声しそうな内容に重みをつけて結果を出力して、ユーザーの全く意図しない結果を出現させるケースを大幅に低減すると同時に、意図する結果を高い確率で出現させるという作用を有する。 [0022]
The invention according to claim 15 creates a recognition target vocabulary based on user preference information acquired in advance from a recognition target candidate vocabulary storing all words to be voice recognition, and recognizes the input voice. A program for causing a computer to execute a voice recognition method characterized by recognizing a word from a target vocabulary. It has an effect of improving voice recognition performance by narrowing down the recognition target vocabulary according to the user's preference. In addition, it is a method that is simpler and faster than performing speech recognition for all vocabularies that can be recognition target candidates.
The invention according to claim 16 is based on a recognition score obtained by voice recognition for recognizing the word from a input voice and a recognition target vocabulary which is a word to be voice-recognized, and a user's preference information acquired in advance. It is a program for causing a computer to execute a voice recognition method characterized by determining a recognition result by using the preference score of the recognition target vocabulary, and the preference score is also added to the recognition score. As a result, the content that is more likely to be spoken by the user is weighted and the result is output, which greatly reduces the cases where the user's completely unintended result appears, and at the same time, has the effect of causing the intended result to appear with a high probability. ..

【符号の説明】
１音声入力部
２嗜好情報作成部
３嗜好情報記憶部
４認識対象候補語彙記憶部
５認識対象語彙作成部
６認識部
７認識結果決定部
８セットトップボックス
９デジタルテレビ放送
１０オン・デ・マンド映画／カラオケ配信サービス
１１ビデオ
１２ディスプレイモニタ
１３音響テンプレート記憶部
１４音響テンプレート選択部
１５嗜好選択情報入力部
１６番号決定部
１７嗜好得点計算部[Explanation of code]
1 Voice input unit 2 Preference information creation unit 3 Preference information storage unit 4 Recognition target candidate vocabulary storage unit 5 Recognition target vocabulary creation unit 6 Recognition unit 7 Recognition result determination unit 8 Set top box 9 Digital TV broadcast 10 On-de-mand movie / Karaoke distribution service 11 Video 12 Display monitor 13 Sound template storage unit 14 Sound template selection unit 15 Preference selection information input unit 16 Number determination unit 17 Preference score calculation unit

Claims

Create in advance a recognition target vocabulary based on user preference information acquired in advance from recognition target candidate vocabularies in which all words to be subjected to speech recognition are stored, and recognize words from the input speech and the recognition target vocabulary A speech recognition method characterized by

A recognition score obtained by speech recognition for recognizing the word from the input speech and a recognition target vocabulary which is a word for speech recognition, and a preference score of the recognition target vocabulary based on preference information of the user acquired in advance A speech recognition method, characterized in that the recognition result is determined using.

The voice according to claim 1 or 2, wherein the preference information of the user is at least one information selected from a user identification symbol input by the user, an utterance of the user, an image of the user or a time of selecting the preference information of the user. Recognition method.

A program specification method for specifying a program using the speech recognition method according to any one of claims 1 to 3.

5. The program specification method according to claim 4, wherein the preference information of the user is information to be learned at the time of interruption by the program viewing side, at the time of interruption by the program distribution side or at the end of the program.

6. The program specification method according to claim 4, wherein the preference information of the user is information learned using a program viewing history.

The viewing history of the program is information having at least one of the time zone, genre, cast, cast group name, program name, program content, theme, music, content keyword, and user name provided by the electronic program guide as a component The program designation method according to claim 6, which is

A voice input unit for inputting a voice uttered by a user; a recognition target vocabulary creation unit for creating a recognition target vocabulary from recognition target candidate vocabulary storing all words to be voice recognition targets based on user preference information; A speech recognition apparatus comprising: a recognition unit that recognizes a word from the speech input from the speech input unit and the recognition target vocabulary.

A voice input unit for inputting a voice uttered by a user; a recognition target vocabulary creation unit for creating a recognition target vocabulary from recognition target candidate vocabulary storing all words to be voice recognition targets based on user preference information; recognition of calculating a preference score calculation unit for calculating a preference score of the recognition target words based on the preference information of the user, the recognition score of a word and a speech before Ki認 identification target words that are input from the voice input unit parts and uses and the preference scores and before Ki認 identification scores, the speech recognition system comprising a recognition result determining unit that determines a recognition result output.

The information processing apparatus further includes a preference information storage unit holding preference information of one or more users, wherein the user preference information is a user identification symbol input by the user, the user
The speech recognition apparatus according to claim 8 or 9, wherein the information is selected by at least one of an utterance of-, an image of a user, and a time of selecting preference information of the user .

Using the speech recognition apparatus of any serial mounting of claims 8 to 10, the program designation apparatus characterized by the program designation.

Time of the interruption by the program viewing聴側, when interruption or when the program ends by the distribution side of the program, further comprising <br/> claim 11 program designation device according to the preference information creating unit that prompts the input of the user preference information.

13. The program specification device according to claim 11, wherein the preference information of the user is information learned using a program viewing history.

Program viewing history has between band when it is provided by the electronic program guide, genre, cast, starring the group name, program name, program content, theme, music, content keywords, as at least one of the components of a user name The program specification device according to claim 13, which is information .

Create in advance a recognition target vocabulary based on user preference information acquired in advance from recognition target candidate vocabularies in which all words to be subjected to speech recognition are stored, and recognize words from the input speech and the recognition target vocabulary A program for causing a computer to execute a speech recognition method characterized by

A recognition score obtained by speech recognition for recognizing the word from the input speech and a recognition target vocabulary which is a word for speech recognition, and a preference score of the recognition target vocabulary based on preference information of the user acquired in advance A program for causing a computer to execute a speech recognition method, characterized by determining a recognition result using.