JP2006106451A

JP2006106451A - Speech input method of television broadcast receiver

Info

Publication number: JP2006106451A
Application number: JP2004294429A
Authority: JP
Inventors: Konagi Uchibe; こなぎ内部; Yasutsugu Morimoto; 康嗣森本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-10-07
Filing date: 2004-10-07
Publication date: 2006-04-20

Abstract

<P>PROBLEM TO BE SOLVED: To easily and securely input a keyword for a program search, a video-recording reservation, etc. <P>SOLUTION: A dictionary for speech recognition is generated by extracting keywords from program information and registering the keywords together with speech recognition information, keywords are displayed on a screen at a user request based upon a partial dictionary of the dictionary for speech recognition, and a user makes a choice among the keywords displayed on the screen and vocally inputs a keyword. Input utterance at the time of the speech input of the keyword is secured since a keyword input means of the present invention makes it possible to select the keyword out of the presented keywords, and a high recognition rate is secured for the user's speech input since an input decision is made by using the partial dictionary with a small object vocabulary on which the presented keywords are based. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は，テレビ放送受信機における音声入力方法に関するものである。 The present invention relates to an audio input method in a television broadcast receiver.

テレビ放送のデジタル化に伴い，ＥＰＧ（電子番組表）配信サービスが普及し，ＥＰＧを用いた番組検索や録画予約機能を備えたテレビ放送受信機が一般的になりつつある。 With the digitalization of television broadcasts, EPG (electronic program guide) distribution services have become widespread, and television broadcast receivers equipped with EPG-based program search and recording reservation functions are becoming common.

ＥＰＧを用いた番組検索や録画予約は，ユーザが入力したジャンル，出演者，キーワード等を基に行われる。ユーザが検索や録画に必要な情報を入力する際，サブジャンルを含むジャンル名や出演者名については，予め機器側が用意した単語群から選択することで入力が可能であるが，ジャンルや出演者以外のキーワードを入力する場合には，リモコンのボタンを用いて一文字ずつ入力して変換する方法が一般的である。しかし，そのような入力方法は操作が複雑であり，不慣れなユーザにとっては非常に使いにくい。 Program search and recording reservation using EPG are performed based on a genre, a performer, a keyword, and the like input by the user. When a user inputs information necessary for searching or recording, a genre name including a sub-genre and a performer name can be input by selecting from a group of words prepared in advance by the device side. When a keyword other than is input, a method of inputting and converting character by character using a button on the remote control is generally used. However, such an input method is complicated in operation and is very difficult to use for an unfamiliar user.

特許文献１はＥＰＧからキーワードを抽出し音声認識用辞書を生成し，ユーザが音声によりキーワードを入力する方法を提案している。 Patent Document 1 proposes a method in which a keyword is extracted from an EPG to generate a speech recognition dictionary, and a user inputs the keyword by voice.

特開2001-309257号公報JP 2001-309257 A

しかし，登録される語彙数が大量であるため，十分な音声認識率が確保できず，番組検索や録画予約時のキーワード入力が容易かつ確実に行えないという課題があった。 However, since there are a large number of registered vocabularies, there is a problem that a sufficient speech recognition rate cannot be secured, and keywords cannot be easily and reliably input when searching for programs or recording recordings.

本発明は，番組検索や録画予約時のキーワード入力を容易に行えるようにするため，ＥＰＧからキーワードを抽出し，音声認識情報を含む辞書をキーワードの属するジャンルごと，五十音順などで適当な数ずつ含む部分辞書の集合として生成し，部分辞書ごとにその部分辞書に含まれるキーワードを画面表示してユーザに提示し，ユーザは提示された中からキーワードを選択して音声入力ができるようにすることを最も主要な特徴とする。 In the present invention, keywords are extracted from the EPG so that keywords can be easily input when searching for programs or recording reservations. Generated as a set of partial dictionaries that contain a number, and displays the keywords contained in the partial dictionaries on the screen for each partial dictionary and presents them to the user so that the user can select a keyword from the presented words and input speech Doing is the main feature.

本発明のキーワード入力手段は，提示された中からキーワードを選択できるため，キーワードを音声入力するときの入力発話が確実であり，なおかつ提示するキーワードの基となっている，対象語彙数の少ない部分辞書を用いて入力判定を行うため，ユーザの音声入力に対して高い認識率が確保される。従って，ユーザが面倒な操作を行うことなく，容易かつ確実にキーワードの入力ができるという利点がある。 Since the keyword input means of the present invention can select a keyword from the presented words, the input utterance when inputting the keyword by voice is reliable, and the part of the target vocabulary with a small number of target words is the basis of the presented keyword Since input determination is performed using a dictionary, a high recognition rate is ensured for a user's voice input. Therefore, there is an advantage that keywords can be input easily and reliably without troublesome operations by the user.

以下，実施するための最良の形態について実施例を以って説明する。 Hereinafter, the best mode for carrying out the invention will be described by way of examples.

以下，本発明の実施例を図１〜図７を参照して説明する。 Embodiments of the present invention will be described below with reference to FIGS.

図１は本実施例の構成を説明する図である。全体構成は音声認識用辞書生成部１０１とキーワード音声入力部１０７から成る。全体構成は音声認識用辞書生成部１０１とキーワード音声入力部１０７とは、別個の計算機に具備されても、同一の計算機に具備されてもよい。 FIG. 1 is a diagram for explaining the configuration of this embodiment. The overall configuration includes a voice recognition dictionary generation unit 101 and a keyword voice input unit 107. As for the overall configuration, the speech recognition dictionary generation unit 101 and the keyword speech input unit 107 may be provided in separate computers or in the same computer.

音声認識用辞書生成部１０１は，番組情報１０２を入力とし，番組情報格納部１０３，キーワード抽出部１０４，辞書登録部１０５から構成される。番組情報格納部１０３，キーワード抽出部１０４，辞書登録部１０５は、音声認識用辞書生成部１０１を構成する計算機の処理装置において、予め格納された専用プログラムを実行することにより実現される。番組情報格納部１０３は入力された番組情報１０２に含まれる情報を格納する。番組情報１０２は、例えば、音声認識用辞書生成部１０１がネットワークを介して接続されるサーバから文字列を含む電子データとして入力される。あるいは、ＣＤ−ＲＯＭなどの記憶媒体から読み込まれてもよい。キーワード抽出部１０４は格納された番組情報から形態素解析等の技術を用いてキーワードの抽出を行う。テキスト情報からのキーワード抽出技術に関しては，「Excelで学ぶテキストマイニング入門」（日経ＢＰ企画）他，一般的に広く知られている。辞書登録部１０５は抽出されたキーワードを読み仮名等の音声認識に必要な情報と共に音声認識用辞書１０６（記憶装置）に登録する。 The speech recognition dictionary generation unit 101 is input from the program information 102 and is composed of a program information storage unit 103, a keyword extraction unit 104, and a dictionary registration unit 105. The program information storage unit 103, the keyword extraction unit 104, and the dictionary registration unit 105 are realized by executing a dedicated program stored in advance in a computer processing device constituting the speech recognition dictionary generation unit 101. The program information storage unit 103 stores information included in the input program information 102. The program information 102 is input, for example, as electronic data including a character string from a server to which the speech recognition dictionary generation unit 101 is connected via a network. Alternatively, it may be read from a storage medium such as a CD-ROM. The keyword extraction unit 104 extracts keywords from the stored program information using a technique such as morphological analysis. Keywords extraction technology from text information is widely known, including "Introduction to text mining learned with Excel" (Nikkei BP Planning). The dictionary registration unit 105 reads the extracted keyword and registers it in the speech recognition dictionary 106 (storage device) together with information necessary for speech recognition such as kana.

一方，キーワード音声入力部１０７は，音声入力手段を持つ入力部１０８と，キーワード表示・入力判定部１０９（計算機の処理装置で実行される）により構成され，ユーザの音声によるキーワードの入力に対し，音声認識辞書生成部１０１が生成した音声認識辞書１０６の参照により，ユーザのキーワード入力と合致する音声認識辞書内のキーワードを検出し，検出したキーワードを表すテキストやコードナンバーなどの電子データであるキーワード情報１１０を番組検索や録画予約を行うための制御部へ出力する。 On the other hand, the keyword voice input unit 107 includes an input unit 108 having a voice input unit and a keyword display / input determination unit 109 (executed by a computer processing device). By referring to the speech recognition dictionary 106 generated by the speech recognition dictionary generation unit 101, a keyword in the speech recognition dictionary that matches the user's keyword input is detected, and a keyword that is electronic data such as text or code number representing the detected keyword. The information 110 is output to a control unit for program search and recording reservation.

図２は，音声認識用辞書（図１の１０６）の構造について説明する図である。ジャンル名をインデックスとし，各ジャンルに対し，ある個数（図２においてはＮ個）ごとの部分辞書にキーワードが登録されている。図２の２０１，２０２，２０３，２０４は部分辞書の例である。 FIG. 2 is a diagram for explaining the structure of the speech recognition dictionary (106 in FIG. 1). A genre name is used as an index, and keywords are registered in a partial dictionary for each certain genre (N in FIG. 2). Reference numerals 201, 202, 203, and 204 in FIG. 2 are examples of partial dictionaries.

以下，図３〜７を用いて処理手順を説明する。
図３は，音声認識用辞書生成部の処理方式について説明する図である。配信された番組情報（図１の１０２）に対し，ステップ３０１において番組名，ジャンル名，出演者，放送日時，放送局，番組内容等の番組に関する情報を格納する。配信される番組情報には、番組名，ジャンル名，出演者，放送日時，放送局，番組内容等の番組に関する情報が、共通のフォーマットで含まれているものとする。ステップ３０１の処理は図１の番組情報格納部１０３における処理を表す。ステップ３０１の後，キーワード抽出処理３０２及び辞書登録処理３０３を行う。 Hereinafter, a processing procedure will be described with reference to FIGS.
FIG. 3 is a diagram for explaining the processing method of the speech recognition dictionary generation unit. For the distributed program information (102 in FIG. 1), information relating to the program such as program name, genre name, performer, broadcast date and time, broadcast station, program content is stored in step 301. It is assumed that the program information to be distributed includes program information such as program name, genre name, performer, broadcast date and time, broadcast station, and program content in a common format. The process of step 301 represents the process in the program information storage unit 103 of FIG. After step 301, keyword extraction processing 302 and dictionary registration processing 303 are performed.

図４は図３におけるキーワード抽出処理３０２の処理方法について説明する図である。図３のステップ３０１において格納された各番組情報に対し，ステップ４０１において，形態素解析によるテキスト分析を行い、テキストに含まれる全ての単語のリストを出力する。形態素解析によるテキスト分析については，前述の特許文献１にある方法等が知られている。ステップ４０２ではステップ４０１の分析結果である単語リストから，出演者名，スポーツ等のチーム名，番組名，その他一般的な名詞等，キーワードとなる単語を選択する。ステップ４０３においてはステップ４０２で選択した全てのキーワードと，キーワード選択元である番組の番組情報を参照して得られるジャンル名とからなる，キーワードのリストとジャンル名の組を出力する。 FIG. 4 is a diagram for explaining the processing method of the keyword extraction processing 302 in FIG. In step 401, each program information stored in step 301 of FIG. 3 is subjected to text analysis by morphological analysis, and a list of all words included in the text is output. For text analysis by morphological analysis, the method described in Patent Document 1 is known. In step 402, words that are keywords such as performer names, team names such as sports, program names, and other general nouns are selected from the word list that is the analysis result of step 401. In step 403, a combination of a keyword list and a genre name including all the keywords selected in step 402 and a genre name obtained by referring to the program information of the program that is the keyword selection source is output.

図５は図３における辞書登録処理３０３の処理方法について説明する図である。ステップ５０１において，図３のキーワード抽出処理３０２によって得られたキーワードのリストに含まれる全てのキーワードに対し，キーワード抽出処理３０２で得られたジャンル（ジャンル名Ａとする）の部分辞書の集合を探索する。音声認識用辞書は，ジャンルごとに五十音順などで適当な個数ずつ分割したキーワードの部分集合から成る部分辞書の集合として構成されている（図２参照）。即ち，各ジャンルに対して部分辞書の集合が対応付けられる。各ジャンルの部分辞書には順序付けがされている。各キーワードはそのキーワードの読み仮名情報と共に部分辞書に登録されている。ステップ５０２における探索の結果，対象キーワードがジャンルＡの全ての部分辞書に存在しない場合，ステップ５０２において，ジャンルＡ以外のジャンルの部分辞書を探索する。探索の結果，ジャンルＡ以外のジャンル（Ｂとする）の部分辞書にキーワードが含まれている場合，ステップ５０３において，ジャンルＢの部分辞書にそのキーワードと共に登録されている読み仮名情報とキーワードをジャンルＡの部分辞書に登録する。対象キーワードがジャンルＡ以外のあらゆるジャンルの全ての部分辞書にも含まれていない場合，ステップ５０４において，キーワードに対する読み仮名情報を生成する。テキストからの読み仮名生成は，特開平８−３０２８７号公報、特開平８−９５５９７号公報等に記載の、テキストを読み上げるテキストツースピーチ（ＴＴＳ；ＴｅｘｔＴｏＳｐｅｅｃｈ）の技術を用いて、認識する対象を表すテキストから自動的に、当該対象を発音した発音データを生成する方法等を用いることができる。ステップ５０５において，キーワードとステップ５０４で生成した読み仮名情報をジャンルＡの部分辞書に登録する。 FIG. 5 is a diagram for explaining the processing method of the dictionary registration processing 303 in FIG. In step 501, a set of partial dictionaries of the genre (genre name A) obtained by the keyword extraction process 302 is searched for all keywords included in the keyword list obtained by the keyword extraction process 302 of FIG. To do. The dictionary for speech recognition is configured as a set of partial dictionaries composed of a subset of keywords divided by an appropriate number for each genre in the order of Japanese syllabary (see FIG. 2). That is, a set of partial dictionaries is associated with each genre. The partial dictionaries for each genre are ordered. Each keyword is registered in the partial dictionary together with the reading information of the keyword. As a result of the search in step 502, if the target keyword does not exist in all the partial dictionaries of genre A, in step 502, a partial dictionary of a genre other than genre A is searched. As a result of the search, if a keyword is included in a partial dictionary of a genre (B) other than genre A, in step 503, the kana information and keywords registered in the genre B partial dictionary together with the keyword are used as the genre. Register in A's partial dictionary. If the target keyword is not included in all the partial dictionaries of any genre other than the genre A, in step 504, kana information for the keyword is generated. Reading kana generation from text is an object to be recognized by using text-to-speech (TTS) technology that reads text, as described in Japanese Patent Laid-Open Nos. 8-30287 and 8-95597. For example, a method of automatically generating pronunciation data in which the target is pronounced can be used from the text representing the symbol. In step 505, the keyword and the reading information generated in step 504 are registered in the genre A partial dictionary.

図６はキーワード音声入力部の処理方式について説明する図である。処理は入力部（図１の１０８）からユーザの入力があった場合にその入力に応じて行う。ユーザの入力はキーワード入力をしたいことを伝える入力要求指示の入力，入力したいキーワードが属するジャンルを決定するための入力，画面の切り替えを要求する指示のための入力（ここまでの入力は、音声入力に限らず、ボタンなどによる入力でもよい），キーワードの音声による入力に大別される（図６の６０６）。ユーザから検索キーとしてや録画予約のためのキーワード入力を要求する入力があった場合，ステップ６０１においてジャンル選択指示画面を表示する。ここでは、部分辞書の分類の名前であるジャンル名を表示して、そのなかからユーザに選択させるようにする。ユーザからジャンルを指定する入力があった場合，ステップ６０２において指定されたジャンルに対する第一の部分辞書に含まれるキーワードを画面に表示する。ジャンルを指定する入力は音声入力でなくてもよいが、音声入力とする場合には、ステップ６０１で表示するジャンル名とその読み仮名情報を格納したジャンル辞書を記録媒体に保持しておき，画面表示されたジャンル名を読み上げることでユーザが音声入力したジャンル名とジャンル辞書を照合し，入力されたジャンルを判定する。各ジャンルに対応する部分辞書の集合に対しては，集合内の部分辞書に対し順序付けがなされているため，ステップ６０２においては，対象となるジャンルの部分辞書のうち，第一番目の部分辞書に含まれるキーワードを表示する。図２の２０１はジャンルＸの第一の辞書を示している。ユーザから画面のスクロールやジャンル選択画面等への画面切り替え指示入力があった場合，処理６０３において適切な画面への切り替えを行い、キーワードの音声入力の待ち受けに入る。ユーザがキーワードの発声を終えて、キーワードを決定する入力があった場合，ステップ６０４においては，決定するキーワードを含む画面表示の基である部分辞書を用いて入力されたキーワードの判定を行う。キーワードの判定では，ユーザが入力したキーワードと入力の基である部分辞書内のキーワードの読み仮名情報とを照合し，それらが合致するキーワードの抽出を行う。ステップ６０５において，ステップ６０４で判定された結果得られたキーワードの情報を出力する。 FIG. 6 is a diagram for explaining the processing method of the keyword voice input unit. The processing is performed in response to a user input from the input unit (108 in FIG. 1). The user's input is an input request instruction indicating that the user wants to input a keyword, an input for determining a genre to which the keyword to be input belongs, an input for an instruction for requesting a screen change (the input up to this point is a voice input) The input is not limited to the above but may be input by a button or the like), and input by keyword voice (606 in FIG. 6). If there is an input from the user as a search key or a keyword input request for recording reservation, in step 601, a genre selection instruction screen is displayed. Here, the genre name, which is the classification name of the partial dictionary, is displayed, and the user is allowed to select from the genre name. If there is an input for designating the genre from the user, the keyword included in the first partial dictionary for the genre designated in step 602 is displayed on the screen. The input for designating the genre may not be a voice input. However, in the case of a voice input, a genre dictionary storing the genre name displayed in step 601 and its reading information is held in a recording medium, By reading out the displayed genre name, the genre name input by the user is collated with the genre dictionary to determine the input genre. Since a set of partial dictionaries corresponding to each genre is ordered with respect to the partial dictionaries in the set, in step 602, the first partial dictionary among the partial dictionaries of the target genre is selected. Display included keywords. 2 indicates a first dictionary of genre X. If the user inputs a screen switching instruction to the screen scroll, genre selection screen, or the like, the process switches to an appropriate screen in processing 603 and enters a voice input of a keyword. When the user finishes speaking the keyword and there is an input for determining the keyword, in step 604, the input keyword is determined using the partial dictionary that is the basis of the screen display including the determined keyword. In the keyword determination, the keyword input by the user is compared with the reading information of the keyword in the partial dictionary that is the basis of the input, and the keyword that matches them is extracted. In step 605, keyword information obtained as a result of the determination in step 604 is output.

図７は図６における画面表示処理６０３の処理を説明する図である。処理は図６の６０６の画面切替指示における切替後の画面の表示指示内容に従って行われる。表示指示された画面が前のキーワード表示画面である場合には，ステップ７０１において対象ジャンルの一つ前の部分辞書に含まれるキーワードを表示する。例えば，現在表示しているキーワード表示画面の基になっている部分辞書が図２における２０３である場合，部分辞書２０２のキーワードを表示する。表示指示された画面が次のキーワード表示画面である場合には，ステップ７０２において対象ジャンルの一つ後の部分辞書に含まれるキーワードを表示する。例えば，現在表示しているキーワード表示画面が図２０３の部分辞書を基としている場合には，部分辞書２０４のキーワードを表示する。表示指示された画面が上記のどちらでもない場合は，ステップ７０３において指示に従った画面を表示する。 FIG. 7 is a diagram for explaining the screen display processing 603 in FIG. The processing is performed according to the display instruction content of the screen after switching in the screen switching instruction 606 in FIG. If the display instructed screen is the previous keyword display screen, in step 701, keywords included in the partial dictionary immediately before the target genre are displayed. For example, when the partial dictionary that is the basis of the currently displayed keyword display screen is 203 in FIG. 2, the keywords of the partial dictionary 202 are displayed. If the display instructed screen is the next keyword display screen, in step 702, the keywords included in the partial dictionary immediately after the target genre are displayed. For example, if the currently displayed keyword display screen is based on the partial dictionary shown in FIG. 203, the keywords in the partial dictionary 204 are displayed. If the screen instructed to display is neither of the above, a screen in accordance with the instruction is displayed in step 703.

図８はキーワード表示画面例を説明する図である。指定されたジャンルに属するキーワードを部分辞書ごとに表示する。部分辞書をユーザプロファイルからユーザの嗜好する順番でキーワードを登録するよう生成することにより，ユーザ嗜好順にキーワードを表示することもできる。ユーザは表示されたキーワードの中から入力したいキーワードを選択し，音声により入力を行う。 FIG. 8 is a diagram for explaining an example of a keyword display screen. The keywords belonging to the specified genre are displayed for each partial dictionary. By generating the partial dictionary so that keywords are registered in the order of user preference from the user profile, the keywords can be displayed in the order of user preference. The user selects a keyword to be input from the displayed keywords, and inputs by voice.

本発明の構成を説明する図である。It is a figure explaining the structure of this invention. 音声認識用辞書の構成を説明する図である。It is a figure explaining the structure of the dictionary for speech recognition. 音声認識用辞書生成処理方式を説明する図である。It is a figure explaining the dictionary production | generation method for speech recognition. キーワード抽出処理方式を説明する図である。It is a figure explaining a keyword extraction processing system. 辞書登録処理方式を説明する図である。It is a figure explaining a dictionary registration processing system. キーワード音声入力処理方式を説明する図である。It is a figure explaining a keyword voice input processing system. 画面切替処理方式を説明する図である。It is a figure explaining a screen switching processing system. キーワード表示画面の一例を示す図である。It is a figure which shows an example of a keyword display screen.

Explanation of symbols

１０１音声認識用辞書生成部、１０２番組情報、１０３番組情報格納部、１０４キーワード抽出部、１０５辞書登録部、１０６音声認識用辞書、１０７キーワード音声入力部、１０８入力部、１０９キーワード表示・入力判定部、１１０キーワード情報、
２０１，２０２，２０３，２０４音声認識用辞書の部分辞書
３０２キーワード抽出処理、３０３辞書登録処理、６０３画面切替処理。
101 speech recognition dictionary generation unit, 102 program information, 103 program information storage unit, 104 keyword extraction unit, 105 dictionary registration unit, 106 speech recognition dictionary, 107 keyword speech input unit, 108 input unit, 109 keyword display / input determination Part, 110 keyword information,
201, 202, 203, 204 Voice recognition dictionary partial dictionary 302 Keyword extraction processing, 303 dictionary registration processing, 603 screen switching processing.

Claims

The method includes a step of displaying a keyword included in the partial dictionary of the speech recognition dictionary and determining the user input using the partial dictionary, and the user inputs the keyword by voice by selecting the keyword displayed on the screen. A method for generating a speech recognition partial dictionary used in a television program selection method for selecting a program,
In the processing device, a step of extracting a keyword from the input program information, classifying the extracted keyword based on a genre associated with the keyword, and a partial dictionary provided for each genre together with voice recognition information A method for generating a partial dictionary for speech recognition, comprising the steps of: registering in a dictionary for speech recognition generated and stored in a storage device.

A partial dictionary provided for each genre and including keywords related to program information is read from the storage device, the keywords included in the partial dictionary are displayed on the screen, the processing device determines the user's input using the partial dictionary, and the user A program search or recording reservation based on the input of
The step of determining the user's input includes determining the user's input by comparing the step of accepting the voice input by the user from the voice input unit and the keyword displayed on the screen and the voice input by the user. An audio input method for a television broadcast receiver.

The method for displaying the keywords according to claim 1 or 2 by referring to the user profile and displaying the keywords in an order that the user likes.