JP2016029495A

JP2016029495A - Image display device and image display method

Info

Publication number: JP2016029495A
Application number: JP2015200509A
Authority: JP
Inventors: 智弘小金井; Toshihiro Koganei
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-10-08
Filing date: 2015-10-08
Publication date: 2016-03-03

Abstract

PROBLEM TO BE SOLVED: To provide an image display device capable of allowing a user to easily select a piece of desired information from plural pieces of selectable information by using a voice recognition.SOLUTION: A voice recognition apparatus 100 includes: a voice acquisition section that acquires a voice; a recognition result acquisition section 103 that acquires a recognition result of the voice; an extraction section 105 that, when a recognition result includes a key word and a selection command for selecting one of plural pieces of selectable information, extracts selection candidates including the key word; a selection mode change section 106 that, when plural selection candidates are included, changes all pieces of the selectable information from a selectable first selection mode to a second selection mode in which plural selection candidates are selectable; and a display control section 107 that changes the display mode of the display information according to the second selection mode. When the selection mode is the second selection mode, the display control section displays an identifier for identifying the selection candidate to each of the plural selection candidates.SELECTED DRAWING: Figure 2

Description

本開示は、ユーザが発した音声を認識することにより、ユーザが複数の情報のうちのいずれか一つを選択する映像表示装置および映像表示方法に関する。 The present disclosure relates to a video display apparatus and a video display method in which a user selects any one of a plurality of pieces of information by recognizing a voice uttered by the user.

従来、ユーザが発した音声入力を受信し、受信した音声入力を解析することにより命令を認識し、認識した命令に応じて機器を制御する音声入力装置がある（例えば特許文献１参照）。つまり、特許文献１の音声入力装置では、ユーザが発した音声を音声認識させることにより、認識した結果である命令に応じて機器を制御している。 2. Description of the Related Art Conventionally, there is a voice input device that receives voice input issued by a user, recognizes a command by analyzing the received voice input, and controls a device in accordance with the recognized command (for example, see Patent Document 1). That is, in the voice input device disclosed in Patent Document 1, the voice is uttered by the user, and the device is controlled in accordance with the command that is the result of recognition.

ところで、このような音声入力装置を利用して、例えばテレビ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などで、ユーザがブラウザを操作しているときに、ブラウザ上の画面に表示されているハイパーテキストの選択を、音声認識で行わせるニーズがある。つまり、ハイパーテキストなどの、選択されたときにハイパーテキストに埋め込まれているハイパーリンク（参照情報）により参照されている関連情報にアクセスするような情報（以下、「選択可能情報」という）を、音声認識を用いて選択させる。 By the way, using such a voice input device, for example, when a user is operating the browser on a television, a PC (Personal Computer), etc., the hypertext displayed on the screen on the browser is selected. There is a need for voice recognition. In other words, information that accesses related information referenced by hyperlinks (reference information) embedded in hypertext when it is selected (hereinafter referred to as “selectable information”), such as hypertext, Select using voice recognition.

特許第４８１２９４１号公報Japanese Patent No. 4812941

しかしながら、音声認識を用いて選択可能情報の選択を行わせる場合、ユーザが選択することを意図していない選択可能情報を誤って選択してしまう場合がある。 However, when selecting selectable information using speech recognition, selectable information that the user does not intend to select may be selected by mistake.

そこで、本開示は、複数の選択可能情報のうちで、ユーザが選択することを意図した選択可能情報を、音声認識を用いて容易に選択することができる映像表示装置などを提供することにある。 Therefore, the present disclosure is to provide a video display device and the like that can easily select selectable information intended for a user to select from among a plurality of selectable information using voice recognition. .

本開示における映像表示装置は、映像表示装置であって、映像を表示する表示部と、関連情報を参照するための参照情報が埋め込まれている複数の選択可能情報が表示部に表示されている場合に、前記複数の選択可能情報のうちのいずれか一つをユーザが選択することを支援するプロセッサと、を備え、前記プロセッサは、ユーザが発した音声を取得する音声取得部と、前記音声取得部により取得された前記音声の認識結果を取得する認識結果取得部と、キーワードと、前記複数の選択可能情報のうちの一つを選択するための選択コマンドとが前記認識結果に含まれている場合に、前記複数の選択可能情報のうちで当該キーワードが含まれる選択可能情報である選択候補を抽出する抽出部と、前記抽出部により抽出された前記選択候補が複数ある場合に、前記複数の選択可能情報を選択するための選択モードを、全ての前記選択可能情報を選択可能な第一選択モードから、前記複数の選択候補を選択可能な第二選択モードに変更する選択モード変更部と、前記選択モード変更部により変更された前記第二選択モードに従って、前記表示情報の表示態様を変更する表示制御部と、を有し、前記表示制御部は、前記選択モードが前記第二選択モードである場合、前記複数の選択候補のそれぞれについて、当該選択候補を識別するための識別子を表示させる。 The video display device according to the present disclosure is a video display device, and a display unit that displays video and a plurality of selectable information in which reference information for referencing related information is embedded is displayed on the display unit. A processor that assists a user in selecting any one of the plurality of selectable information, the processor acquiring a voice uttered by the user, and the voice The recognition result includes a recognition result acquisition unit that acquires the recognition result of the voice acquired by the acquisition unit, a keyword, and a selection command for selecting one of the plurality of selectable information. An extraction unit that extracts selection candidates that are selectable information including the keyword from among the plurality of selectable information, and a plurality of selection candidates extracted by the extraction unit The selection mode for selecting the plurality of selectable information is changed from the first selection mode capable of selecting all the selectable information to the second selection mode capable of selecting the plurality of selection candidates. And a display control unit that changes a display mode of the display information according to the second selection mode changed by the selection mode change unit, and the display control unit includes the selection mode. Is the second selection mode, an identifier for identifying the selection candidate is displayed for each of the plurality of selection candidates.

本開示における映像表示装置は、ユーザが選択することを意図した選択可能情報を、音声認識を用いて容易に選択することができる。 The video display device according to the present disclosure can easily select selectable information that the user intends to select using voice recognition.

図１は、実施の形態１に係る音声認識システムを示す図である。FIG. 1 is a diagram illustrating a speech recognition system according to Embodiment 1. 図２は、音声認識システムの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of the voice recognition system. 図３は、ディクテーションを説明するための図である。FIG. 3 is a diagram for explaining dictation. 図４は、本実施の形態の音声認識装置の選択処理の流れを示すフローチャートである。FIG. 4 is a flowchart showing a flow of selection processing of the speech recognition apparatus according to the present embodiment. 図５Ａは、インターネット検索の検索結果画像を示す図である。FIG. 5A is a diagram showing a search result image of the Internet search. 図５Ｂは、選択処理における選択モードが第二選択モードである場合の一例を示す図である。FIG. 5B is a diagram illustrating an example when the selection mode in the selection process is the second selection mode. 図５Ｃは、第二選択モードについて説明するための図である。FIG. 5C is a diagram for describing the second selection mode. 図６は、番組表による検索結果を示す図である。FIG. 6 is a diagram showing search results based on the program guide. 図７は、番組表による検索結果をリストアップ表示した例を示す図である。FIG. 7 is a diagram showing an example of a list-up display of search results based on the program guide. 図８は、検索コマンドの種類が指定されていない場合について説明するための図である。FIG. 8 is a diagram for explaining a case where the type of search command is not specified.

以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.

なお、発明者は、当業者が本開示を十分に理解するために添付図面および以下の説明を提供するものであって、これらによって特許請求の範囲に記載の主題を限定することを意図するものではない。 The inventor provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and is intended to limit the subject matter described in the claims. is not.

本開示の音声認識装置は、図１に示すようなテレビ１０に内蔵される音声認識装置であって、ユーザが発した音声を認識し、音声認識の結果に応じてテレビ１０の制御を行う装置である。図１は、実施の形態１に係る音声認識システムを示す図である。図２は、音声認識システムの構成を示すブロック図である。 The speech recognition device according to the present disclosure is a speech recognition device built in the television 10 as illustrated in FIG. 1, and recognizes speech uttered by a user and controls the television 10 according to the result of speech recognition. It is. FIG. 1 is a diagram illustrating a speech recognition system according to Embodiment 1. FIG. 2 is a block diagram showing the configuration of the voice recognition system.

＜音声認識システム＞
図１および図２に示すように、音声認識システム１は、実施の形態１では、映像表示装置としてのテレビ１０と、リモートコントローラ（図２ではリモコンと表記）２０と、携帯端末３０と、ネットワーク４０と、キーワード認識部５０とにより構成される。 <Voice recognition system>
As shown in FIG. 1 and FIG. 2, the speech recognition system 1 includes a television 10 as a video display device, a remote controller (indicated as a remote controller in FIG. 2) 20, a portable terminal 30, and a network in the first embodiment. 40 and a keyword recognition unit 50.

テレビ１０は、音声認識装置１００と、内蔵カメラ１２０と、内蔵マイク１３０と、表示部１４０と、送受信部１５０と、チューナ１６０と、記憶部１７０とを有する。 The television 10 includes a voice recognition device 100, a built-in camera 120, a built-in microphone 130, a display unit 140, a transmission / reception unit 150, a tuner 160, and a storage unit 170.

音声認識装置１００は、ユーザからの音声を取得し、取得した音声を解析することによりキーワードおよびコマンドを認識し、認識した結果に応じてテレビ１０の制御を行う。具体的な構成については後述する。 The voice recognition apparatus 100 acquires a voice from a user, recognizes a keyword and a command by analyzing the acquired voice, and controls the television 10 according to the recognized result. A specific configuration will be described later.

内蔵カメラ１２０は、テレビ１０の外部に設置されるカメラであり、表示部１４０の表示する方向を撮影するカメラである。つまり、内蔵カメラ１２０は、テレビ１０の表示部１４０に対面しているユーザの方向を向いており、当該ユーザを撮影可能なカメラである。 The built-in camera 120 is a camera installed outside the television 10 and is a camera that captures the direction displayed by the display unit 140. That is, the built-in camera 120 is a camera that faces the user facing the display unit 140 of the television 10 and can photograph the user.

内蔵マイク１３０は、テレビ１０の外部に設置されるマイクであり、内蔵カメラ１２０と同様に、主に表示部１４０の表示する方向からの音声を集音するマイクである。つまり、内蔵マイク１３０は、テレビ１０の表示部１４０に対面しているユーザの方向を向いており、当該ユーザが発した音声を集音可能なマイクである。 The built-in microphone 130 is a microphone installed outside the television 10 and is a microphone that mainly collects sound from the direction displayed on the display unit 140, as with the built-in camera 120. That is, the built-in microphone 130 faces the user facing the display unit 140 of the television 10 and is a microphone that can collect the sound emitted by the user.

リモートコントローラ２０は、テレビ１０をユーザがテレビ１０から離れた位置で操作するためのコントローラであり、マイク２１および入力部２２を有する。マイク２１は、ユーザが発した音声を集音することが可能である。入力部２２は、ユーザにより入力が行われるタッチパッド、キーボード、ボタンなどの入力デバイスである。マイク２１により集音された音声を示す音声信号、または、入力部２２により入力された入力信号は、無線通信によりテレビ１０に送信される。 The remote controller 20 is a controller for the user to operate the television 10 at a position away from the television 10, and includes a microphone 21 and an input unit 22. The microphone 21 can collect the sound uttered by the user. The input unit 22 is an input device such as a touch pad, a keyboard, or a button that is input by a user. An audio signal indicating the sound collected by the microphone 21 or an input signal input by the input unit 22 is transmitted to the television 10 by wireless communication.

表示部１４０は、液晶ディスプレイ、プラズマディスプレイ、有機ＥＬディスプレイなどで構成される表示装置であり、表示制御部１０７により生成された画像を表示する。表示部１４０は、また、チューナ１６０が受信した放送に関する放送画像を表示する。 The display unit 140 is a display device that includes a liquid crystal display, a plasma display, an organic EL display, and the like, and displays an image generated by the display control unit 107. The display unit 140 also displays a broadcast image related to the broadcast received by the tuner 160.

送受信部１５０は、ネットワーク４０と接続されており、ネットワーク４０を通じた情報の送受信を行う。 The transmission / reception unit 150 is connected to the network 40 and transmits / receives information through the network 40.

チューナ１６０は、放送を受信する。 The tuner 160 receives a broadcast.

記憶部１７０は、不揮発性または揮発性のメモリまたはハードディスクであり、テレビ１０の各部の制御のための情報などを記憶している。記憶部１７０は、例えば、後述するコマンド認識部１０２により参照される音声コマンド情報などを記憶している。 The storage unit 170 is a nonvolatile or volatile memory or hard disk, and stores information for controlling each unit of the television 10. The storage unit 170 stores, for example, voice command information referred to by the command recognition unit 102 described later.

携帯端末３０は、例えばスマートフォンであり、テレビ１０を操作するためのアプリケーションが起動されている携帯端末３０を利用でき、マイク３１および入力部３２を有する。マイク３１は、携帯端末３０に内蔵されているマイクであり、リモートコントローラ２０と同様にユーザが発した音声を集音することが可能である。入力部３２は、ユーザにより入力が行われるタッチパネル、キーボード、ボタンなどの入力デバイスである。携帯端末３０においても、リモートコントローラ２０と同様に、マイク３１により集音された音声を示す音声信号、または、入力部３２により入力された入力信号は、無線通信によりテレビ１０に送信される。 The mobile terminal 30 is, for example, a smartphone, can use the mobile terminal 30 on which an application for operating the television 10 is activated, and includes a microphone 31 and an input unit 32. The microphone 31 is a microphone built in the mobile terminal 30, and can collect the sound uttered by the user as with the remote controller 20. The input unit 32 is an input device such as a touch panel, a keyboard, and buttons that are input by the user. Also in the portable terminal 30, as with the remote controller 20, the audio signal indicating the sound collected by the microphone 31 or the input signal input by the input unit 32 is transmitted to the television 10 by wireless communication.

なお、テレビ１０と、リモートコントローラ２０または携帯端末３０とは、無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）などの無線通信により接続されており、リモートコントローラ２０または携帯端末３０から取得された音声などのデータは、当該無線通信によりテレビ１０に送信される。 Note that the TV 10 and the remote controller 20 or the portable terminal 30 are connected by wireless communication such as a wireless LAN or Bluetooth (registered trademark), and data such as voice acquired from the remote controller 20 or the portable terminal 30 is And transmitted to the television 10 by the wireless communication.

ネットワーク４０は、いわゆるインターネットによるネットワークである。 The network 40 is a so-called Internet network.

キーワード認識部５０は、ネットワーク４０を介してテレビ１０と接続されるクラウド上の辞書サーバである。キーワード認識部５０は、具体的には、テレビ１０から送信されてきた音声情報を受信して、受信した音声情報が示す音声を文字列（一文字を含む）に変換する。そして、キーワード認識部５０は、文字列に変換した後の音声である文字情報を音声の認識結果として、ネットワーク４０を介してテレビ１０に送信する。 The keyword recognition unit 50 is a dictionary server on the cloud that is connected to the television 10 via the network 40. Specifically, the keyword recognition unit 50 receives voice information transmitted from the television 10 and converts the voice indicated by the received voice information into a character string (including one character). Then, the keyword recognizing unit 50 transmits the character information, which is the sound after being converted into the character string, to the television 10 via the network 40 as a sound recognition result.

＜音声認識装置＞
音声認識装置１００は、音声取得部１０１と、コマンド認識部１０２と、認識結果取得部１０３と、コマンド処理部１０４と、抽出部１０５と、選択モード変更部１０６と、表示制御部１０７と、選択部１０８と、検索部１０９と、操作受付部１１０と、ジェスチャ認識部１１１とを有する。 <Voice recognition device>
The voice recognition apparatus 100 includes a voice acquisition unit 101, a command recognition unit 102, a recognition result acquisition unit 103, a command processing unit 104, an extraction unit 105, a selection mode change unit 106, a display control unit 107, a selection Unit 108, search unit 109, operation reception unit 110, and gesture recognition unit 111.

音声取得部１０１は、ユーザが発した音声を取得する。音声取得部１０１は、ユーザが発した音声を、テレビ１０に内蔵される内蔵マイク１３０を直接利用して取得してもよいし、リモートコントローラ２０に内蔵されるマイク２１または携帯端末３０に内蔵されるマイク３１が取得したユーザが発した音声を取得するようにしてもよい。 The voice acquisition unit 101 acquires voice uttered by the user. The voice acquisition unit 101 may acquire voice uttered by the user by directly using the built-in microphone 130 built in the television 10, or built in the microphone 21 built in the remote controller 20 or the portable terminal 30. The voice uttered by the user acquired by the microphone 31 may be acquired.

コマンド認識部１０２は、音声取得部１０１により取得された音声を解析して、予め設定されたコマンドを特定する。具体的には、コマンド認識部１０２は、音声取得部１０１により取得された音声のうちで、予め記憶部１７０に記憶されている音声コマンド情報を参照する。音声コマンド情報は、音声と、テレビ１０に対する指示情報であるコマンドとが関連付けられた情報である。コマンドは、複数種類あり、それぞれのコマンドに異なる音声が関連付けられている。コマンド認識部１０２は、音声コマンド情報を参照した結果、複数のコマンドのうちで当該音声に対応するコマンドが特定でれば、当該音声が特定したコマンドであると認識する。また、コマンド認識部１０２は、音声取得部１０１により取得された音声のうちで、コマンド以外の音声を送受信部１５０からネットワーク４０を介してキーワード認識部５０に送信する。 The command recognition unit 102 analyzes the voice acquired by the voice acquisition unit 101 and identifies a preset command. Specifically, the command recognition unit 102 refers to the voice command information stored in advance in the storage unit 170 among the voices acquired by the voice acquisition unit 101. The voice command information is information in which voice and a command that is instruction information for the television 10 are associated with each other. There are multiple types of commands, and different voices are associated with each command. As a result of referring to the voice command information, if the command corresponding to the voice is specified among the plurality of commands, the command recognition unit 102 recognizes that the voice is the specified command. Further, the command recognition unit 102 transmits voice other than the command among the voices acquired by the voice acquisition unit 101 from the transmission / reception unit 150 to the keyword recognition unit 50 via the network 40.

認識結果取得部１０３は、音声取得部１０１により取得された音声がコマンド認識部１０２またはキーワード認識部５０により認識された結果である認識結果を取得する。なお、認識結果取得部１０３は、キーワード認識部５０による認識結果を、ネットワーク４０を介して受信した送受信部１５０から取得する。 The recognition result acquisition unit 103 acquires a recognition result that is a result of recognition of the voice acquired by the voice acquisition unit 101 by the command recognition unit 102 or the keyword recognition unit 50. The recognition result acquisition unit 103 acquires the recognition result obtained by the keyword recognition unit 50 from the transmission / reception unit 150 received via the network 40.

ここで、キーワード認識部５０は、音声取得部１０１により取得された音声のうちでコマンド以外の音声を取得する。キーワード認識部５０は、コマンド以外の音声をキーワードとして認識し、当該音声を対応する文字列への変換（以下、「ディクテーション」という）を行う。 Here, the keyword recognizing unit 50 acquires a voice other than the command among the voices acquired by the voice acquisition unit 101. The keyword recognizing unit 50 recognizes a voice other than the command as a keyword, and converts the voice into a corresponding character string (hereinafter referred to as “dictation”).

コマンド処理部１０４は、認識結果取得部１０３により取得された認識結果にコマンドが含まれている場合に、当該コマンドに応じた処理を各処理部に行わせる。また、コマンド処理部１０４は、操作受付部１１０により受け付けられたユーザが行った操作、または、ジェスチャ認識部１１１により認識されたユーザが行ったジェスチャに対応するコマンドに応じた処理を各処理部に行わせる。具体的には、コマンド処理部１０４は、当該コマンドが、キーワードおよび選択コマンドを含んでいる場合には、抽出部１０５による後述する抽出処理を行わせる。また、コマンド処理部１０４は、当該コマンドが、キーワードおよび検索コマンドを含んでいる場合には、検索部１０９による後述する検索処理を行わせる。また、コマンド処理部１０４は、当該コマンドが操作コマンドを含んでいる場合には、選択部１０８による後述する選択処理を行わせる。一方、認識結果取得部１０３により取得された認識結果がキーワードのみの場合は、表示制御部１０７に対して、キーワードを表示部１４０に出力させる。 When a command is included in the recognition result acquired by the recognition result acquisition unit 103, the command processing unit 104 causes each processing unit to perform processing corresponding to the command. In addition, the command processing unit 104 performs processing corresponding to a command corresponding to an operation performed by the user received by the operation receiving unit 110 or a gesture performed by the user recognized by the gesture recognition unit 111 to each processing unit. Let it be done. Specifically, when the command includes a keyword and a selection command, the command processing unit 104 causes the extraction unit 105 to perform an extraction process described later. In addition, when the command includes a keyword and a search command, the command processing unit 104 causes the search unit 109 to perform a search process to be described later. In addition, when the command includes an operation command, the command processing unit 104 causes the selection unit 108 to perform selection processing described later. On the other hand, when the recognition result acquired by the recognition result acquisition unit 103 is only the keyword, the display control unit 107 causes the display unit 140 to output the keyword.

なお、本実施の形態では、キーワード認識部５０は、コマンド認識部１０２によって認識されたコマンド以外の音声を受信し、キーワードを認識してディクテーション結果を認識結果取得部１０３へ送信する構成となっているが、音声取得部１０１により取得された音声全てを受信し、音声全てのディクテーション結果を認識結果取得部１０３へ送信する構成であってもよい。この場合、認識結果取得部１０３は、予め記憶部１７０に記憶されている音声コマンド情報を参照して、キーワード認識部５０から受信したディクテーション結果をキーワードとコマンドに分離し、コマンド処理部１０４へ出力する。 In the present embodiment, the keyword recognition unit 50 receives a voice other than the command recognized by the command recognition unit 102, recognizes the keyword, and transmits the dictation result to the recognition result acquisition unit 103. However, the configuration may be such that all the voices acquired by the voice acquisition unit 101 are received and the dictation results of all the voices are transmitted to the recognition result acquisition unit 103. In this case, the recognition result acquisition unit 103 refers to the voice command information stored in advance in the storage unit 170, separates the dictation result received from the keyword recognition unit 50 into keywords and commands, and outputs them to the command processing unit 104. To do.

抽出部１０５は、キーワードと、複数の選択可能情報のうちの一つを選択するための選択コマンドとが、認識結果取得部１０３により取得された認識結果に含まれている場合に、複数の選択可能情報のうちで当該キーワードが含まれる選択可能情報である選択候補を抽出する抽出処理を行う。 The extraction unit 105 selects a plurality of selections when a keyword and a selection command for selecting one of a plurality of selectable information are included in the recognition result acquired by the recognition result acquisition unit 103. An extraction process is performed to extract selection candidates that are selectable information including the keyword from the possible information.

選択モード変更部１０６は、抽出部１０５により抽出された選択候補が複数ある場合に、表示制御部１０７が表示部１４０に表示させる画像に含まれる複数の選択可能情報を選択するための選択モードを、全ての選択可能情報を選択可能な第一選択モードから、複数の選択候補のみを選択可能な第二選択モードに変更する。 When there are a plurality of selection candidates extracted by the extraction unit 105, the selection mode changing unit 106 selects a selection mode for selecting a plurality of selectable information included in the image displayed on the display unit 140 by the display control unit 107. The first selection mode in which all selectable information can be selected is changed to the second selection mode in which only a plurality of selection candidates can be selected.

表示制御部１０７は、選択モード変更部１０６、選択部１０８および検索部１０９の各処理部により出力された画像を、予め設定されている表示のための解像度に応じて表示部１４０に表示させる。具体的には、例えば、表示制御部１０７は、次に挙げるような画像を表示部１４０に表示させる。表示制御部１０７は、複数の選択可能情報のいずれか一つが選択部１０８により選択された場合に、選択部１０８により選択された選択可能情報に埋め込まれている参照情報の参照先である関連情報を表示部１４０に表示させる。また、表示制御部１０７は、選択モードが第二選択モードである場合、複数の選択候補の表示形態を、選択候補であることを示す表示形態に変更して表示部１４０に表示させる。また、表示制御部１０７は、選択モードが第二選択モードである場合、さらに、複数の選択候補のそれぞれについて、当該選択候補を識別するための識別子を当該選択候補が表示される領域に表示させる。また、表示制御部１０７は、選択モードが第二モードである場合、操作受付部１１０により受け付けられた操作に従って、選択候補として抽出された選択可能情報のうちの一つを、他の選択可能情報の表示形態とは異なる表示形態で選択的に表示させる。また、表示制御部１０７は、検索部１０９による検索の結果を、複数の選択可能情報として表示部１４０に表示させる。また、表示制御部１０７は、インターネット検索アプリケーションでのキーワードの検索の結果、番組表アプリケーションでのキーワード検索の結果、または検索可能アプリケーションでのキーワードの検索の結果を、複数の選択可能情報として表示部に表示させる。また、表示制御部１０７は、キーワードによる検索結果だけでなく、ウェブページとして表示される複数のハイパーテキストを、複数の選択可能情報として表示部１４０に表示させてもよい。 The display control unit 107 causes the display unit 140 to display the images output by the processing units of the selection mode change unit 106, the selection unit 108, and the search unit 109 according to a preset resolution for display. Specifically, for example, the display control unit 107 displays the following images on the display unit 140. The display control unit 107, when any one of a plurality of selectable information is selected by the selection unit 108, related information that is a reference destination of the reference information embedded in the selectable information selected by the selection unit 108 Is displayed on the display unit 140. Further, when the selection mode is the second selection mode, the display control unit 107 changes the display form of the plurality of selection candidates to a display form indicating that it is a selection candidate and causes the display unit 140 to display the display form. Further, when the selection mode is the second selection mode, the display control unit 107 further displays, for each of the plurality of selection candidates, an identifier for identifying the selection candidate in an area where the selection candidate is displayed. . In addition, when the selection mode is the second mode, the display control unit 107 selects one of the selectable information extracted as a selection candidate according to the operation received by the operation receiving unit 110 as another selectable information. This is selectively displayed in a display form different from the display form. In addition, the display control unit 107 causes the display unit 140 to display the search results by the search unit 109 as a plurality of selectable information. Further, the display control unit 107 displays the keyword search result in the Internet search application, the keyword search result in the program guide application, or the keyword search result in the searchable application as a plurality of selectable information. To display. Further, the display control unit 107 may display not only the search results by keywords but also a plurality of hypertexts displayed as web pages on the display unit 140 as a plurality of selectable information.

選択部１０８は、操作受付部１１０により受け付けられたユーザ操作、または、ジェスチャ認識部１１１により認識されたユーザが行ったジェスチャ操作に従って、複数の選択可能情報のうちのいずれか一つを選択する。また、選択部１０８は、選択モードが第二モードである場合であって、認識結果取得部１０３により取得された認識結果に、複数の選択候補に付された識別子を示すキーワードまたは複数の選択候補のうちの一つを特定可能なキーワードと、前記選択コマンドとが含まれていた場合、当該キーワードにより特定される選択候補を選択することで、複数の選択候補のうちのいずれか一つを選択する。また、選択部１０８は、操作受付部１１０が決定を示す操作を受け付けた場合、表示制御部１０７が他の選択可能情報の表示形態とは異なる表示形態で表示部１４０に表示させている選択可能情報を選択することで、複数の選択候補のうちのいずれか一つを選択する。 The selection unit 108 selects any one of a plurality of selectable information according to the user operation received by the operation reception unit 110 or the gesture operation performed by the user recognized by the gesture recognition unit 111. Further, the selection unit 108 is a case where the selection mode is the second mode, and a keyword indicating a plurality of selection candidates or a plurality of selection candidates in the recognition result acquired by the recognition result acquisition unit 103. If a keyword that can specify one of the selection command and the selection command is included, by selecting a selection candidate specified by the keyword, one of a plurality of selection candidates is selected. To do. In addition, when the operation reception unit 110 receives an operation indicating determination, the selection unit 108 can select the display control unit 107 to display on the display unit 140 in a display form different from the display form of other selectable information. By selecting information, one of a plurality of selection candidates is selected.

検索部１０９は、認識結果取得部１０３により取得された認識結果に、キーワードと、予め設定されたアプリケーションに関連付けられた検索コマンドとが含まれている場合に、当該アプリケーションで当該キーワードの検索を行う。ここで、検索部１０９は、当該認識結果に含まれる検索コマンドが予め設定されたアプリケーションの一つであるインターネット検索アプリケーションに関連付けられている場合、当該インターネット検索アプリケーションで当該キーワードの検索を行う。また、検索部１０９は、当該認識結果に含まれる検索コマンドが予め設定されたアプリケーションの一つである番組表アプリケーションに関連付けられている場合、当該番組表アプリケーションで当該キーワードの検索を行う。また、検索部１０９は、当該認識結果に含まれる検索コマンドが予め設定されたアプリケーションに関連付けられていない場合、当該キーワードで検索を行うことができる全てのアプリケーションである検索可能アプリケーションで当該キーワードの検索を行う。 When the recognition result acquired by the recognition result acquisition unit 103 includes a keyword and a search command associated with a preset application, the search unit 109 searches the keyword with the application. . Here, when the search command included in the recognition result is associated with an Internet search application that is one of preset applications, the search unit 109 searches for the keyword using the Internet search application. In addition, when the search command included in the recognition result is associated with a program guide application that is one of preset applications, the search unit 109 searches for the keyword using the program guide application. In addition, when the search command included in the recognition result is not associated with a preset application, the search unit 109 searches for the keyword with a searchable application that is all applications that can perform a search with the keyword. I do.

操作受付部１１０は、ユーザが行った操作を受け付ける。具体的には、リモートコントローラ２０の入力部２２に対して行われたユーザの操作、または、携帯端末３０の入力部３２に対して行われたユーザの操作を示す入力信号を、テレビ１０とリモートコントローラ２０または携帯端末３０との間で行われている無線通信により受信することで、ユーザが行った操作を受け付ける。 The operation reception unit 110 receives an operation performed by the user. Specifically, an input signal indicating a user operation performed on the input unit 22 of the remote controller 20 or a user operation performed on the input unit 32 of the mobile terminal 30 is transmitted to the TV 10 and the remote controller 20. The operation performed by the user is received by receiving the wireless communication performed between the controller 20 and the portable terminal 30.

ジェスチャ認識部１１１は、内蔵カメラ１２０により撮影された動画像に対して、画像処理を行うことによりユーザが行ったジェスチャを認識する。具体的には、例えば、ユーザの手を認識して、ユーザの手の動きと予め設定されたコマンドとを比較することにより、ユーザの手の動きと合致するコマンドを特定する。 The gesture recognition unit 111 recognizes a gesture made by the user by performing image processing on the moving image captured by the built-in camera 120. Specifically, for example, the user's hand is recognized, and the user's hand movement is compared with a preset command to identify a command that matches the user's hand movement.

＜動作＞
次に、本実施の形態に係るテレビ１０の音声認識装置１００の動作について説明する。 <Operation>
Next, the operation of the speech recognition apparatus 100 of the television 10 according to the present embodiment will be described.

＜音声認識装置の起動＞
まず、テレビ１０の音声認識装置１００による音声認識処理の開始方法について説明する。音声認識装置１００による音声認識処理の開始方法には、次に示すように、主に、３つの方法が挙げられる。 <Activation of voice recognition device>
First, a method for starting voice recognition processing by the voice recognition device 100 of the television 10 will be described. As a method for starting speech recognition processing by the speech recognition apparatus 100, there are mainly three methods as described below.

３つの方法の一つ目は、リモートコントローラ２０の入力部２２の一つであるマイクボタン（図示せず）を押すことである。具体的には、ユーザは、リモートコントローラ２０のマイクボタンを押せば、テレビ１０は、操作受付部１１０がリモートコントローラ２０のマイクボタンが押されたことを受け付けて、テレビ１０のスピーカ（図示せず）から出力されている音の音量をマイク２１による音声認識が容易なほどに十分に小さい音量であって、予め設定されている音量に設定する。そして、テレビ１０のスピーカから出力されている音の音量が予め設定されている音量に設定されれば、音声認識装置１００による音声認識処理が開始される。このとき、テレビ１０は、スピーカから出力されている音量が、音声認識が容易なほどに十分に小さい場合には、上記の音量調整を行う必要はないので音量をそのままに設定する。なお、この方法は、リモートコントローラ２０に限らずに、携帯端末３０により同様に行われてもよい。携帯端末３０（例えばタッチパネルを備えるスマートフォン）の場合には、リモートコントローラ２０のマイクボタンを押すことの代わりに、携帯端末３０にインストールされているアプリケーションが起動されており、起動されているアプリケーションに応じてタッチパネルに表示されるマイクボタンを押すことで、音声認識装置１００による音声認識が開始される。 The first of the three methods is to press a microphone button (not shown) that is one of the input units 22 of the remote controller 20. Specifically, when the user presses the microphone button of the remote controller 20, the television 10 receives that the operation reception unit 110 has pressed the microphone button of the remote controller 20, and the television 10 speaker (not shown). ) Is set to a volume that is sufficiently low so that the microphone 21 can easily recognize the voice. Then, if the volume of the sound output from the speaker of the television 10 is set to a preset volume, the voice recognition process by the voice recognition device 100 is started. At this time, if the volume output from the speaker is sufficiently small so that voice recognition is easy, the television 10 sets the volume as it is because it is not necessary to perform the volume adjustment. Note that this method is not limited to the remote controller 20 and may be similarly performed by the mobile terminal 30. In the case of the portable terminal 30 (for example, a smartphone equipped with a touch panel), instead of pressing the microphone button of the remote controller 20, an application installed in the portable terminal 30 is activated and depends on the activated application. When the microphone button displayed on the touch panel is pressed, voice recognition by the voice recognition device 100 is started.

また、３つの方法の二つ目は、テレビ１０の内蔵マイク１３０に対して図１に示すように予め設定された音声認識処理の開始コマンドである「Ｈｉ，ＴＶ」と話すことである。なお、「Ｈｉ，ＴＶ」は開始コマンドの一例であり、音声認識処理の開始コマンドは別の文言であってもよい。内蔵マイク１３０により集音された音声が予め設定されている開始コマンドであると認識されれば、上述と同様にテレビ１０のスピーカから出力されている音の音量を予め設定されている音量に設定し、音声認識装置１００による音声認識処理が開始される。 The second of the three methods is to speak “Hi, TV”, which is a voice recognition processing start command set in advance as shown in FIG. “Hi, TV” is an example of a start command, and the start command for the speech recognition process may be another wording. If the voice collected by the built-in microphone 130 is recognized as a preset start command, the volume of the sound output from the speaker of the television 10 is set to a preset volume as described above. Then, the voice recognition process by the voice recognition device 100 is started.

また、３つの方法の三つ目は、テレビ１０の内蔵カメラ１２０に向けて予め設定されたジェスチャ（例えば、手を上から下へ振り下ろすジェスチャ）を行うことである。当該ジェスチャがジェスチャ認識部１１１により認識されれば、上述と同様にテレビ１０のスピーカから出力されている音の音量を予め設定されている音量に設定し、音声認識装置１００による音声認識処理が開始される。 The third of the three methods is to perform a preset gesture (for example, a gesture of shaking a hand from top to bottom) toward the built-in camera 120 of the television 10. If the gesture is recognized by the gesture recognition unit 111, the volume of the sound output from the speaker of the television 10 is set to a preset volume as described above, and the voice recognition process by the voice recognition device 100 is started. Is done.

なお、上記に限らずに、三つ目の方法に、一つ目または二つ目の方法を組み合わせることにより、音声認識装置１００による音声認識処理が開始されてもよい。 The speech recognition process by the speech recognition apparatus 100 may be started by combining the first method or the second method with the third method.

上述のように、音声認識装置１００による音声認識処理が開始されれば、表示制御部１０７は、図１に示すように、表示部１４０に表示される画像２００の下部に、音声認識が開始されたことを示す音声認識アイコン２０１と、集音されている音声の音量を示すインジケータ２０２とが表示される。なお、上述のように音声認識処理が開始されたことは、音声認識アイコン２０１を表示させることにより示されているが、これに限らずに、音声認識処理が開始されたことを示すメッセージを表示させることにより示してもよいし、当該メッセージを音声で出力することにより示してもよい。 As described above, when the voice recognition processing by the voice recognition device 100 is started, the display control unit 107 starts voice recognition at the lower part of the image 200 displayed on the display unit 140 as shown in FIG. A voice recognition icon 201 indicating that the sound has been collected and an indicator 202 indicating the volume of the collected voice are displayed. Note that the start of the voice recognition process as described above is indicated by displaying the voice recognition icon 201. However, the present invention is not limited to this, and a message indicating that the voice recognition process has started is displayed. It may be indicated by making it appear, or it may be indicated by outputting the message by voice.

＜音声認識＞
次に、本実施の形態に係るテレビ１０の音声認識装置１００による音声認識処理について説明する。本実施の形態に係る音声認識装置１００の音声認識処理では、２種類の音声認識が行われる。一つは、予め設定されているコマンドを認識するための音声認識処理（以下、「コマンド認識処理」という）であり、もう一つは、コマンド以外の音声をキーワードとして認識するための音声認識処理（以下、「キーワード認識処理」という）である。 <Voice recognition>
Next, the speech recognition process by the speech recognition apparatus 100 of the television 10 according to the present embodiment will be described. In the speech recognition process of the speech recognition apparatus 100 according to the present embodiment, two types of speech recognition are performed. One is a voice recognition process for recognizing a preset command (hereinafter referred to as “command recognition process”), and the other is a voice recognition process for recognizing a voice other than the command as a keyword. (Hereinafter referred to as “keyword recognition processing”).

コマンド認識処理は、上述したように、音声認識装置１００が有するコマンド認識部１０２により行われる。つまり、コマンド認識処理は、音声認識装置１００の内部により行われる。コマンド認識部１０２は、テレビ１０に対するユーザからの音声を、予め記憶部１７０に記憶されている音声コマンド情報と比較することにより、コマンドを特定する。なお、ここで言う「コマンド」は、テレビ１０を操作するためのコマンドである。 The command recognition process is performed by the command recognition unit 102 included in the speech recognition apparatus 100 as described above. That is, the command recognition process is performed inside the speech recognition apparatus 100. The command recognition unit 102 identifies the command by comparing the voice from the user to the television 10 with the voice command information stored in the storage unit 170 in advance. The “command” here is a command for operating the television 10.

キーワード認識処理は、上述したように、ネットワーク４０を介してテレビ１０に接続されている辞書サーバであるキーワード認識部５０により行われる（図３参照）。つまり、キーワード認識処理は、音声認識装置１００の外部により行われる。キーワード認識部５０は、音声取得部１０１により取得された音声のうちでコマンド以外の音声を取得する。そして、キーワード認識部５０は、取得したコマンド以外の音声をキーワードとして認識し、ディクテーションを行う。キーワード認識部５０は、ディクテーションを行うのに、音声と文字列とを対応付けたデータベースを用いて、音声とデータベースとを比較することにより、文字列に変換する。なお、本実施の形態では、取得したコマンド以外の音声をキーワードとして認識してディクテーションを行う構成となっているが、音声取得部１０１により取得された音声全てを受信し、音声全てのディクテーションを行う構成であってもよい。 As described above, the keyword recognition process is performed by the keyword recognition unit 50, which is a dictionary server connected to the television 10 via the network 40 (see FIG. 3). That is, the keyword recognition process is performed outside the voice recognition device 100. The keyword recognition unit 50 acquires voices other than commands from the voices acquired by the voice acquisition unit 101. Then, the keyword recognizing unit 50 recognizes speech other than the acquired command as a keyword and performs dictation. The keyword recognizing unit 50 performs conversion to a character string by comparing the voice and the database using a database in which the voice and the character string are associated with each other for dictation. In the present embodiment, the voice other than the acquired command is recognized as a keyword and dictated. However, all the voices acquired by the voice acquisition unit 101 are received, and all the voices are dictated. It may be a configuration.

具体的には、図３に示すように、ブラウザの検索キーワードを入力するための入力欄２０３にカーソルがある状態としたときに、ユーザは、音声認識装置１００による音声認識処理を開始させれば、画像２１０のように表示部１４０に表示される。そして、ユーザが「ＡＢＣ」と発話すれば、発話された音声を示す音声情報がネットワーク４０を介してテレビ１０に接続されているキーワード認識部５０に送信される。キーワード認識部５０は、受信した「ＡＢＣ」という音声情報をデータベースと比較することにより、「ＡＢＣ」という文字列に変換して、変換後の文字列を示す文字情報をテレビ１０にネットワーク４０を介して送信する。テレビ１０は、キーワード認識部５０から受信した文字情報を取得して、認識結果取得部１０３、コマンド処理部１０４、表示制御部１０７を介して、入力欄２０３に「ＡＢＣ」という文字列を入力する。 Specifically, as shown in FIG. 3, when the cursor is in the input field 203 for inputting the search keyword of the browser, the user can start the voice recognition process by the voice recognition device 100. And displayed on the display unit 140 as an image 210. When the user speaks “ABC”, voice information indicating the spoken voice is transmitted to the keyword recognition unit 50 connected to the television 10 via the network 40. The keyword recognizing unit 50 converts the received voice information “ABC” into a character string “ABC” by comparing it with the database, and converts the character information indicating the converted character string to the television 10 via the network 40. To send. The television 10 acquires the character information received from the keyword recognition unit 50 and inputs the character string “ABC” in the input field 203 via the recognition result acquisition unit 103, the command processing unit 104, and the display control unit 107. .

このようにして、音声認識装置１００は、音声認識処理を行うことにより、ユーザが発した音声を取得して、文字列としてテレビ１０に入力することができる。そして、例えば、「検索」というように、取得した音声にコマンドが含まれている場合には当該コマンドに従った処理をテレビ１０に行わせ、「“ＡＢＣ”を検索」というように、取得した音声にコマンドおよびキーワードが含まれている場合には当該キーワードを用いた当該コマンドによる処理をテレビ１０に行わせる。なお、音声にコマンドおよびキーワードが含まれる場合とは、例えば、コマンドが予め設定されたアプリケーションに関連付けられた検索コマンドである場合である。つまり、予め設定されたアプリケーションによるキーワード検索が行われることになる。ここで、予め設定されたアプリケーションとは、例えば、上述したようにウェブブラウザを起動させて行うインターネット検索アプリケーション、番組表の中からキーワード検索を行う番組検索アプリケーションなどである。このような検索コマンドによる検索処理は、上述した検索部１０９により行われる。 In this way, the speech recognition apparatus 100 can acquire speech uttered by the user by performing speech recognition processing and input the speech as a character string to the television 10. Then, for example, when a command is included in the acquired voice, such as “search”, the television 10 is caused to perform processing according to the command, and the acquired “search“ ABC ”” is acquired. When a command and a keyword are included in the voice, the television 10 is caused to perform processing based on the command using the keyword. The case where a command and a keyword are included in the voice is, for example, a case where the command is a search command associated with a preset application. That is, a keyword search is performed using a preset application. Here, the preset application is, for example, an Internet search application that is executed by starting a web browser as described above, or a program search application that performs keyword search from a program guide. The search process using such a search command is performed by the search unit 109 described above.

＜選択処理＞
次に、本実施の形態に係るテレビ１０の音声認識装置１００による選択処理について説明する。 <Selection process>
Next, the selection process by the speech recognition apparatus 100 of the television 10 according to the present embodiment will be described.

選択処理とは、例えば、図５Ａに示すようにインターネット検索による結果である複数の検索結果２２１ａ、２２１ｂ、２２１ｃ、２２１ｄ、・・・が表示制御部１０７により出力されているときに、複数の検索結果２２１のうちからユーザにより発された音声に応じて最適な検索結果を選択させるための処理である。なお、ここで、複数の検索結果２２１ａ、２２１ｂ、２２１ｃ、２２１ｄ、・・・は、表示部１４０に表示される画像２２０ａに含まれる検索結果２２１ａ〜２２１ｄだけでなく、表示部１４０に表示され切れていない検索結果も含まれている。つまり、複数の検索結果２２１ａ、２２１ｂ、２２１ｃ、２２１ｄ、・・・は、他のページに遷移することなくスクロールすることのみで表示可能な同一ページ内の画像に含まれる検索結果のことを指す。 For example, as shown in FIG. 5A, the selection process is performed when a plurality of search results 221a, 221b, 221c, 221d,... This is a process for selecting an optimum search result from the result 221 according to the voice uttered by the user. Here, the plurality of search results 221a, 221b, 221c, 221d,... Are not displayed on the display unit 140 as well as the search results 221a to 221d included in the image 220a displayed on the display unit 140. Some search results are not included. That is, the plurality of search results 221a, 221b, 221c, 221d,... Indicate search results included in an image on the same page that can be displayed only by scrolling without changing to another page.

以下、図４および図５Ａ〜図５Ｃを用いて選択処理について説明する。図４は、本実施の形態の音声認識装置の選択処理の流れを示すフローチャートである。図５Ａは、インターネット検索の検索結果画像を示す図である。図５Ｂは、選択処理における選択モードが第二選択モードである場合の一例を示す図である。図５Ｃは、第二選択モードについて説明するための図である。 Hereinafter, the selection process will be described with reference to FIGS. 4 and 5A to 5C. FIG. 4 is a flowchart showing a flow of selection processing of the speech recognition apparatus according to the present embodiment. FIG. 5A is a diagram showing a search result image of the Internet search. FIG. 5B is a diagram illustrating an example when the selection mode in the selection process is the second selection mode. FIG. 5C is a diagram for describing the second selection mode.

まず、選択処理は、図５Ａに示すように表示部１４０に選択可能情報の一種であるインターネットでのキーワード検索の結果である検索結果２２１ａ、２２１ｂ、２２１ｃ、２２１ｄ、・・・が複数表示されているときに開始され得る。このときユーザは、検索結果２２１ｃを音声認識処理により選択しようとし、検索結果２２１ｃに含まれる文字列「ＡＢＣ」について注目したとする。そこで、図５Ｂに示すように、音声認識処理を開始させた状態で、ユーザが、「ＡＢＣにジャンプ」という音声を発する。これにより、選択処理が開始されることになる。つまり、音声取得部１０１が、内蔵マイク１３０、リモートコントローラ２０のマイク２１、または携帯端末３０のマイク３１によりユーザから音声を取得する（Ｓ１０１）。 First, in the selection process, as shown in FIG. 5A, a plurality of search results 221a, 221b, 221c, 221d,. Can be started when. At this time, it is assumed that the user tries to select the search result 221c by voice recognition processing and pays attention to the character string “ABC” included in the search result 221c. Therefore, as shown in FIG. 5B, the user utters a voice “jump to ABC” in a state where the voice recognition process is started. Thereby, the selection process is started. That is, the voice acquisition unit 101 acquires voice from the user through the built-in microphone 130, the microphone 21 of the remote controller 20, or the microphone 31 of the mobile terminal 30 (S101).

そして、コマンド認識部１０２は、音声取得部１０１により取得された音声「“ＡＢＣ”にジャンプ」のうちのコマンドである「ジャンプ」を、予め記憶部１７０に記憶されている音声コマンド情報と比較することにより、コマンドを認識する（Ｓ１０２）。なお、本実施の形態では、「ジャンプ」というコマンドは、複数の選択可能情報からいずれか一つを選択するための選択コマンドである。 Then, the command recognition unit 102 compares “jump”, which is a command of the voice “jump to“ ABC ””, acquired by the voice acquisition unit 101 with the voice command information stored in the storage unit 170 in advance. Thus, the command is recognized (S102). In the present embodiment, the command “jump” is a selection command for selecting any one of a plurality of selectable information.

コマンド認識部１０２は、「ＡＢＣにジャンプ」という音声のうちで、コマンドとして認識された「ジャンプ」以外の「ＡＢＣ」という音声をキーワードとして特定し、送受信部１５０からネットワーク４０を介してキーワード認識部５０にキーワードとして特定した音声を転送する（Ｓ１０３）。 The command recognizing unit 102 identifies a speech “ABC” other than “jump” recognized as a command from the speech “Jump to ABC” as a keyword, and the keyword recognizing unit via the network 40 from the transmission / reception unit 150. The voice specified as the keyword is transferred to 50 (S103).

キーワード認識部５０は、「ＡＢＣ」という音声を示す音声情報に対してディクテーションを行うことにより「ＡＢＣ」という文字列に変換し、変換した文字列を示す文字情報を、認識結果として、「ＡＢＣ」という音声を示す音声情報の送信元であるテレビ１０に送信する。 The keyword recognizing unit 50 performs dictation on the voice information indicating the voice “ABC” to convert it into a character string “ABC”, and uses the character information indicating the converted character string as a recognition result as “ABC”. Is transmitted to the television 10 which is the transmission source of the sound information indicating the sound.

認識結果取得部１０３は、ステップＳ１０２により認識されたコマンドと、キーワード認識部５０により送信されてきた文字情報が示す文字列であるキーワードとを取得する（Ｓ１０４）。 The recognition result acquisition unit 103 acquires the command recognized in step S102 and the keyword that is the character string indicated by the character information transmitted by the keyword recognition unit 50 (S104).

抽出部１０５は、認識結果取得部１０３により取得されたコマンドおよびキーワードが含まれる選択可能情報である選択候補を抽出する（Ｓ１０５）。具体的には、図５Ａに示す複数の検索結果２２１ａ、２２１ｂ、２２１ｃ、２２１ｄ、・・・の中から、キーワードとして認識された「ＡＢＣ」２２５という文字列が含まれる選択可能情報である検索結果２２１ａ、２２１ｃ、２２１ｅを選択候補として抽出する。 The extraction unit 105 extracts selection candidates that are selectable information including the command and keyword acquired by the recognition result acquisition unit 103 (S105). Specifically, the search result is selectable information including the character string “ABC” 225 recognized as a keyword from among the plurality of search results 221a, 221b, 221c, 221d,... Shown in FIG. 221a, 221c, and 221e are extracted as selection candidates.

抽出部１０５は、選択候補として抽出した検索結果が複数であるか否かを判定する（Ｓ１０６）。 The extraction unit 105 determines whether there are a plurality of search results extracted as selection candidates (S106).

抽出部１０５により選択候補として抽出された検索結果が複数であると判定されれば（Ｓ１０６：Ｙｅｓ）、選択モード変更部１０６は、表示制御部１０７が表示部１４０に表示させる画像に含まれる複数の検索結果を選択するための選択モードを、全ての検索結果を選択可能な第一選択モードから、複数の選択候補のみを選択可能な第二選択モードに変更する（Ｓ１０７）。具体的には、図５Ｂに示すように、抽出部１０５により抽出された選択候補が、検索結果２２１ａ、２２１ｃ、２２１ｅの３つの検索結果であるため、選択モードを第一選択モードから第二選択モードに変更する。なお、ここで言う、第一選択モードとは、例えば、マウスなどでカーソルを自由に動かすことのできるフリーカーソルモードである。 If it is determined that there are a plurality of search results extracted as selection candidates by the extraction unit 105 (S106: Yes), the selection mode changing unit 106 includes a plurality of search results included in the image displayed on the display unit 140 by the display control unit 107. The selection mode for selecting the search results is changed from the first selection mode in which all search results can be selected to the second selection mode in which only a plurality of selection candidates can be selected (S107). Specifically, as shown in FIG. 5B, since the selection candidates extracted by the extraction unit 105 are the three search results 221a, 221c, and 221e, the selection mode is changed from the first selection mode to the second selection mode. Change to mode. The first selection mode referred to here is, for example, a free cursor mode in which a cursor can be freely moved with a mouse or the like.

また、選択モード変更部１０６により選択モードが第二選択モードに変更された場合、図５Ｂに示すような画像２２０ｂが表示部１４０に表示されることになる。具体的には、画像２２０ｂには、選択候補として抽出された、検索結果２２１ａ、２２１ｃ、２２１ｅと、検索結果２２１ａ、２２１ｃ、２２１ｅのそれぞれに選択候補であることを示す枠２２２、２２３と、検索結果２２１ａ、２２１ｃ、２２１ｅのそれぞれを識別するための識別子２２４ａ、２２４ｂ、２２４ｃとが含まれる。なお、枠２２２、２２３には、２種類あり、複数の選択候補のうちの一つを選択するためのフォーカスであることを示す第一の枠２２２と、フォーカスされていないことを示す第二の枠２２３である。 Further, when the selection mode is changed to the second selection mode by the selection mode changing unit 106, an image 220b as shown in FIG. 5B is displayed on the display unit 140. Specifically, the image 220b includes search results 221a, 221c, and 221e extracted as selection candidates, and frames 222 and 223 indicating that the search results 221a, 221c, and 221e are selection candidates, and a search Identifiers 224a, 224b, and 224c for identifying each of the results 221a, 221c, and 221e are included. There are two types of frames 222 and 223. The first frame 222 indicates that the focus is for selecting one of a plurality of selection candidates, and the second indicates that the focus is not focused. This is a frame 223.

選択モード変更部１０６により選択モードが第二選択モードに変更されれば、ユーザからの入力に従って、複数の選択候補である検索結果２２１ａ、２２１ｃ、２２１ｅのうちの一つが選択される（Ｓ１０８）。なお、第二選択モードにおいて、ユーザは、複数の選択候補のうちの一つを選択するのに複数の方法がある。 If the selection mode is changed to the second selection mode by the selection mode changing unit 106, one of the search results 221a, 221c, and 221e as a plurality of selection candidates is selected according to the input from the user (S108). In the second selection mode, the user has a plurality of methods for selecting one of a plurality of selection candidates.

一つ目の方法は、図５Ｃに示すように、リモートコントローラ２０の入力部２２または携帯端末３０の入力部３２を操作することにより、選択的に選択候補への第一の枠２２２を切り替えることにより選択する方法である。具体的には、図５Ｂに表示される画像２２０ｂの状態で、ユーザがリモートコントローラ２０の入力部２２に図５Ｃに示すような下向きにスワイプする操作を入力した場合に、入力前に検索結果２２１ａに表示されていたフォーカスを示す第一の枠２２２が、図５Ｃに示す画像２２０ｃのように検索結果２２１ｃに表示されることになる。このようにして、第一の枠２２２が表示される検索結果を切り替えた上で、リモートコントローラ２０または携帯端末３０の入力部２２、３２により決定を示す入力を行うことで、フォーカスを示す第一の枠２２２が表示されている検索結果２２１ｃを選択することになる。また、第一の枠２２２は、第二の枠２２３が表示されている検索結果のみに移動することになる。また、入力部２２、３２による入力だけでなく、音声認識処理によるコマンドで行ってもよい。つまり具体的には、ユーザは、音声認識処理を開始させた上で「下に移動」と発話する。これにより、「下に移動」というコマンドをコマンド認識部１０２に認識させ、フォーカス先を移動させるような形態としてもよい。 As shown in FIG. 5C, the first method is to selectively switch the first frame 222 to the selection candidate by operating the input unit 22 of the remote controller 20 or the input unit 32 of the mobile terminal 30. It is a method of selecting by. Specifically, in the state of the image 220b displayed in FIG. 5B, when the user inputs an operation of swiping downward as shown in FIG. 5C to the input unit 22 of the remote controller 20, the search result 221a before input. The first frame 222 indicating the focus displayed on the search result 221c is displayed on the search result 221c as in the image 220c shown in FIG. 5C. In this way, after switching the search result in which the first frame 222 is displayed, the input indicating the determination is performed by the input unit 22 or 32 of the remote controller 20 or the portable terminal 30, thereby the first indicating the focus. The search result 221c in which the frame 222 is displayed is selected. In addition, the first frame 222 moves only to the search result in which the second frame 223 is displayed. Further, not only input by the input units 22 and 32 but also a command by voice recognition processing may be used. Specifically, the user utters “move down” after starting the speech recognition process. Thus, the command recognition unit 102 may recognize the command “move down” and the focus destination may be moved.

二つ目の方法は、識別子２２４ａ〜２２４ｃとして表示されている番号のボタンを押すことである。例えば、テンキーを有するリモートコントローラや携帯端末３０にテンキーを表示させることにより、識別子を示す番号のボタンを押すことにより、操作コマンドとしてユーザ入力を受け付け、所望の検索結果を選択するようにしてもよい。 The second method is to press a button with a number displayed as the identifiers 224a to 224c. For example, by displaying a numeric keypad on a remote controller having a numeric keypad or the portable terminal 30, pressing a number button indicating an identifier may accept user input as an operation command and select a desired search result. .

なお、識別子を示す番号は、リモートコントローラのテンキーを１回押すだけで決定できるという利便性や、表示部１４０に一覧表示できる閲覧性を考慮すれば、一桁であることが好ましい。つまり、選択候補が１１以上ある場合には、何らかの優先順位をつけたうえで、優先順位の高い方から１０個の検索結果に絞ることが好ましい。なお、優先順位をつけて優先順位の高い方から検索結果を並べることは、１０個の検索結果に絞ることに限定されない。つまり、検索結果を絞らなくても優先順位の高い方から並べるようにしてもよい。優先順位は、選択コマンドと組み合わせて利用されたキーワード（上述の「ＡＢＣ」２２５）が検索結果の文字数に占める割合によって決定してもよい。 Note that the number indicating the identifier is preferably a single digit in consideration of the convenience that it can be determined by pressing the numeric keypad of the remote controller once and the viewability that can be displayed as a list on the display unit 140. That is, when there are 11 or more selection candidates, it is preferable to narrow down to ten search results from the highest priority after giving some priority. It should be noted that the ordering of the search results from the higher priority order with priorities is not limited to narrowing down to ten search results. In other words, the search results may be arranged in descending order of priority without narrowing down the search results. The priority order may be determined by the ratio of the keywords ("ABC" 225 described above) used in combination with the selection command to the number of characters in the search result.

また、識別子は、番号に限らずにアルファベット等の文字であっても構わない。また、この場合にも、音声認識処理を利用することにより、ユーザが所望する検索結果に付されている識別子が発話されたことを認識すれば、当該識別子に対応する検索結果が選択されるようにしてもよい。音声認識処理を利用する場合は、識別子は、操作コマンドとして認識されるように、予め記憶部１７０に格納された音声コマンド情報に含まれる識別子を用いる。 The identifier is not limited to a number and may be a character such as an alphabet. Also in this case, if it is recognized that the identifier attached to the search result desired by the user is uttered by using the voice recognition process, the search result corresponding to the identifier is selected. It may be. When using the voice recognition processing, the identifier is an identifier included in the voice command information stored in advance in the storage unit 170 so as to be recognized as an operation command.

抽出部１０５により選択候補として抽出された検索結果が複数でないと判定されれば（Ｓ１０６：Ｎｏ）、選択部１０８は、一つの選択候補である検索結果を選択する（Ｓ１０９）。 If it is determined that there are not a plurality of search results extracted as selection candidates by the extraction unit 105 (S106: No), the selection unit 108 selects a search result that is one selection candidate (S109).

ステップＳ１０８またはステップＳ１０９で、選択候補が選択されれば、選択候補である検索結果に埋め込まれている参照情報により参照されている関連情報にジャンプして選択処理を終了する。なお、ここで言う参照情報とは、例えば、ＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）であり、関連情報とはＵＲＬにより参照されているウェブページである。 If a selection candidate is selected in step S108 or step S109, the selection process is terminated by jumping to related information referred to by the reference information embedded in the search result that is the selection candidate. The reference information referred to here is, for example, a URL (Uniform Resource Locator), and the related information is a web page referenced by the URL.

上記実施の形態に係る音声認識装置１００では、インターネットの検索結果に対しての選択処理を例に挙げて説明したが、インターネットの検索処理に限らない。例えば、番組表アプリケーションの検索の結果であってもよい。図６に番組表（ＥＰＧ：ＥｌｅｃｔｒｏｎｉｃＰｒｏｇｒａｍＧｕｉｄｅ）の検索結果を示す。図６は、番組表による検索結果を示す図である。 In the speech recognition apparatus 100 according to the above embodiment, the selection process for the Internet search result has been described as an example, but the present invention is not limited to the Internet search process. For example, it may be a search result of a program guide application. FIG. 6 shows a search result of a program table (EPG: Electronic Program Guide). FIG. 6 is a diagram showing search results based on the program guide.

図６に示すように、番組表アプリケーションによるキーワード検索の結果である検索結果を示す画像３００は、番組が放送される時刻を示す時刻情報３０１、番組が放送されるチャンネルを示すチャンネル情報３０２、各時刻および各チャンネルで放送される番組を示す番組情報３０３、番組表アプリケーションによる検索の結果である検索結果３０４、３０５、および検索結果３０４、３０５であることを識別するための識別子３０６、３０７で構成される。 As shown in FIG. 6, an image 300 indicating a search result as a result of keyword search by the program guide application includes time information 301 indicating the time when the program is broadcast, channel information 302 indicating the channel where the program is broadcast, Program information 303 indicating a program broadcast on the time and each channel, search results 304 and 305 as search results by the program guide application, and identifiers 306 and 307 for identifying the search results 304 and 305 Is done.

このように、番組表を例えば俳優名などのキーワードで検索することにより抽出された複数の選択候補としての検索結果３０４、３０５は、番組情報３０３が表示されている背景と文字とが反転して表示されている。つまり、選択候補としての検索結果３０４、３０５は、選択候補ではない番組情報３０３とは異なる表示形態で表示される。また、図６では、検索結果３０４の番組がフォーカス表示されており、決定を示す操作が行われれば検索結果３０４が選択されることになる。また、識別子３０６、３０７については、インターネット検索結果と同様に、各識別子３０６、３０７を示す入力が行われれば、当該入力に対応する識別子が選択されることになる。なお、ここで、検索結果のうちの一つが選択されれば、当該検索結果に対応する番組情報の詳細が表示されることになる。 In this manner, the search results 304 and 305 as a plurality of selection candidates extracted by searching the program guide with keywords such as actor names are such that the background and characters on which the program information 303 is displayed are reversed. It is displayed. That is, the search results 304 and 305 as selection candidates are displayed in a display form different from the program information 303 that is not a selection candidate. In FIG. 6, the program of the search result 304 is displayed in focus, and the search result 304 is selected if an operation indicating determination is performed. As for the identifiers 306 and 307, as in the Internet search result, if an input indicating each identifier 306 and 307 is performed, the identifier corresponding to the input is selected. Here, if one of the search results is selected, the details of the program information corresponding to the search result are displayed.

また、図６に示す、番組表アプリケーションによる検索結果では、番組表のうちで、該当する番組の表示形態を変えることで、選択候補を抽出しているが、これに限らない。例えば、図７に示すように、番組の検索結果をリストアップして表示してもよい。リストアップして表示する場合の検索結果を示す画像４００は、チャンネル情報４０１、識別子４０２、時刻情報４０３、および番組情報４０４により構成される。この場合も、上記で説明したように、ユーザは複数の選択候補のうちから一つを選択的に選択することができる。 Further, in the search result by the program guide application shown in FIG. 6, selection candidates are extracted by changing the display form of the corresponding program in the program guide, but this is not restrictive. For example, as shown in FIG. 7, the search results of programs may be listed and displayed. An image 400 showing search results when listing and displaying is composed of channel information 401, identifier 402, time information 403, and program information 404. Also in this case, as described above, the user can selectively select one of a plurality of selection candidates.

なお、音声認識装置１００では、特に言及していないが、音声認識処理においてユーザが発した音声に検索コマンドとキーワードとが含まれている場合であって、検索コマンドの種類がインターネット検索アプリケーションによる検索である場合には、インターネット検索アプリケーションによる当該キーワードの検索が行われる。例えば、「ＡＢＣをインターネットで検索」とユーザが発話すれば、「インターネットで検索」という音声をインターネット検索アプリケーションによる検索コマンドであると認識する。このため、ユーザは当該音声を発話するのみで、当該キーワードによるインターネット検索を行わせることができる。 In the voice recognition apparatus 100, although not particularly mentioned, the search command and the keyword are included in the voice uttered by the user in the voice recognition process, and the search command type is a search by the Internet search application. If it is, the keyword search is performed by the Internet search application. For example, if the user speaks “Search ABC on the Internet”, the voice “Search on the Internet” is recognized as a search command by the Internet search application. For this reason, the user can perform an Internet search using the keyword only by speaking the voice.

また、音声認識処理においてユーザが発した音声に検索コマンドとキーワードとが含まれている場合であって、検索コマンドの種類が番組表アプリケーションによる検索である場合には、番組表アプリケーションによる当該キーワードの検索が行われる。例えば、「ＡＢＣを番組表で検索」とユーザが発話すれば、「番組表で検索」という音声を番組表アプリケーションによる検索コマンドであると認識する。このため、ユーザは、当該音声を発話するのみで、当該キーワードによる番組表検索を行わせることができる。 Further, when the search command and the keyword are included in the voice uttered by the user in the voice recognition process, and the type of the search command is a search by the program guide application, the keyword of the keyword by the program guide application is displayed. A search is performed. For example, if the user says “Search ABC in the program guide”, the voice “Search in program guide” is recognized as a search command by the program guide application. For this reason, the user can perform the program guide search by the keyword only by speaking the voice.

また、音声認識処理においてユーザが発した音声に検索コマンドとキーワードとが含まれている場合であって、検索コマンドの種類が指定されていない場合には、図８に示すように、ユーザに、当該検索を行わせるアプリケーションを選択させる画面を表示させてもよい。図８は、検索コマンドの種類が指定されていない場合について説明するための図である。検索コマンドの種類が指定されていない状態で検索コマンドが認識されれば、キーワードで検索を行うことができる全てのアプリケーションのアイコン５０１〜５０７が画像５００上に表示される。 Further, when the search command and the keyword are included in the voice uttered by the user in the voice recognition process and the type of the search command is not designated, as shown in FIG. A screen for selecting an application for performing the search may be displayed. FIG. 8 is a diagram for explaining a case where the type of search command is not specified. If the search command is recognized in a state where the type of the search command is not specified, icons 501 to 507 of all applications that can be searched using keywords are displayed on the image 500.

ユーザは、この状態で、所望のアプリケーションをリモートコントローラ２０または携帯端末３０の入力部２２、３２を操作することにより選択するか、あるいは、音声認識処理により選択すれば、選択されたアプリケーションによってキーワード検索が行われることになる。なお、画像５００に含まれる各アイコン５０１〜５０７は、それぞれ、インターネット検索アプリケーション、インターネットによる画像検索アプリケーション、インターネットによるニュース検索アプリケーション、動画投稿サイトアプリケーション、インターネットによる百科事典アプリケーション、番組表アプリケーション、および録画一覧アプリケーションである。 In this state, if the user selects a desired application by operating the input units 22 and 32 of the remote controller 20 or the portable terminal 30, or if selected by voice recognition processing, keyword search is performed by the selected application. Will be done. The icons 501 to 507 included in the image 500 are respectively an Internet search application, an Internet image search application, an Internet news search application, a video posting site application, an Internet encyclopedia application, a program guide application, and a recording list. Is an application.

また、音声認識処理においてユーザが発した音声に検索コマンドとキーワードとが含まれている場合であって、検索コマンドの種類が指定されていない場合には、当該キーワードが含まれる全てのアプリケーションで当該キーワードの検索を行い、検索を行った全てのアプリケーションでの検索結果を表示するようにしてもよい。 Further, when the search command and the keyword are included in the voice uttered by the user in the voice recognition process, and the type of the search command is not specified, the search command and the keyword are not used in all applications including the keyword. You may make it search a keyword and to display the search result in all the applications which searched.

なお、音声認識処理は、上述したような開始方法で開始できるため、音声認識処理を開始さえすれば、テレビ１０により番組を視聴している途中であっても、上記のような検索を行うことができる。 Since the voice recognition process can be started by the above-described start method, the search as described above can be performed even when the TV 10 is watching a program as long as the voice recognition process is started. Can do.

本実施の形態に係る音声認識装置１００によれば、音声認識の結果のキーワードおよび選択コマンドに従って、選択候補を抽出部１０５が抽出したときに、抽出された選択候補が複数ある場合に、全ての選択可能情報を選択可能な第一モードから、抽出された選択候補のみを選択可能な第二モードに変更する。つまり、音声認識の結果のキーワードを用いて複数の選択可能情報の中から一つの選択可能情報を選択しようとしても、選択候補が複数存在しており、選択候補を一つに絞り込めないような場合に、当該複数の選択候補のみを選択可能な第二モードに変更される。 According to the speech recognition apparatus 100 according to the present embodiment, when the extraction unit 105 extracts selection candidates according to the keyword and the selection command as a result of speech recognition, when there are a plurality of selection candidates extracted, The first mode in which selectable information can be selected is changed to the second mode in which only the extracted selection candidates can be selected. In other words, even when trying to select one piece of selectable information from among a plurality of selectable information using a keyword as a result of speech recognition, there are a plurality of selection candidates, and the selection candidates cannot be narrowed down to one. In this case, the mode is changed to the second mode in which only the plurality of selection candidates can be selected.

したがって、ユーザは、複数の選択可能情報からキーワードが含まれる複数の選択可能情報に絞り込むことができ、絞り込まれた複数の選択候補のみの中から選択することができる。このため、ユーザは、全ての複数の選択可能情報から一つを選択するよりも、容易に意図する選択可能情報を選択することができる。 Therefore, the user can narrow down a plurality of selectable information to a plurality of selectable information including a keyword, and can select from only a plurality of narrowed selection candidates. For this reason, the user can select the intended selectable information more easily than selecting one from all the plurality of selectable information.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択候補の表示形態が他の選択可能情報とは異なる表示形態に変更されて表示されるため、ユーザは複数の選択可能情報のうちの複数の選択候補を容易に判別することができる。 Moreover, according to the speech recognition apparatus 100 according to the present embodiment, since the display form of a plurality of selection candidates is changed to a display form different from other selectable information, the user can display a plurality of selectable information. It is possible to easily determine a plurality of selection candidates.

また、本実施の形態に係る音声認識装置１００によれば、抽出された複数の選択候補のそれぞれに識別子が表示されるため、ユーザは、複数の選択候補のうちから選択を意図する選択可能情報を選択するときに、意図する選択可能情報の識別子を指定すれば、容易に当該選択可能情報を選択させることができる。 Further, according to speech recognition apparatus 100 according to the present embodiment, an identifier is displayed for each of a plurality of extracted selection candidates, so that the user can select selectable information intended to be selected from a plurality of selection candidates. When selecting an identifier of an intended selectable information, the selectable information can be easily selected.

また、本実施の形態に係る音声認識装置１００によれば、ユーザは、複数の選択候補に付された識別子を示すキーワード、または、複数の選択候補のうちの一つを特定可能なキーワードと、当該キーワードによる選択を行わせる選択コマンドとを含む音声を発することのみで、ユーザが選択を意図する選択可能情報を選択することができる。 Further, according to speech recognition apparatus 100 according to the present embodiment, a user can specify a keyword indicating an identifier assigned to a plurality of selection candidates, or a keyword that can specify one of a plurality of selection candidates, The user can select selectable information that the user intends to select only by emitting a voice including a selection command for performing selection based on the keyword.

また、本実施の形態に係る音声認識装置１００によれば、操作受付部１１０が受け付けたユーザの操作に従って、複数の選択候補のうちの一つを、他の選択候補の表示形態とは異なる表示形態で選択的に表示させる。そして、操作受付部１１０が受け付けた操作が決定を示す操作である場合、当該操作を受け付けたときに異なる表示形態で表示されている選択候補を選択する。つまり、ユーザが行った操作に基づいて、複数の選択候補のうちの一つが選択的にフォーカスされ、決定の操作が受け付けられたときにフォーカスされている選択候補が選択できる。このため、ユーザは、複数の選択候補の中から選択を意図する選択可能情報を容易に選択することができる。 Further, according to voice recognition device 100 according to the present embodiment, one of a plurality of selection candidates is displayed differently from the display mode of other selection candidates in accordance with a user operation received by operation reception unit 110. Display selectively in the form. When the operation received by the operation receiving unit 110 is an operation indicating determination, selection candidates displayed in different display forms when the operation is received are selected. That is, based on the operation performed by the user, one of a plurality of selection candidates is selectively focused, and the selection candidate focused when the determination operation is accepted can be selected. Therefore, the user can easily select selectable information intended for selection from among a plurality of selection candidates.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択可能情報は、予め定められたアプリケーションによるキーワードの検索結果である。つまり、予め定められたアプリケーションによるキーワードの検索結果であっても、ユーザは、その検索結果のうちでユーザが選択を意図する選択可能情報を容易に選択することができる。 Moreover, according to the speech recognition apparatus 100 according to the present embodiment, the plurality of selectable information is a keyword search result by a predetermined application. That is, even if it is a keyword search result by a predetermined application, the user can easily select selectable information that the user intends to select from among the search results.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択可能情報は、インターネットによるキーワードの検索結果である。つまり、インターネットによるキーワードの検索結果であっても、ユーザは、その検索結果のうちでユーザが選択を意図する選択可能情報を容易に選択することができる。 In addition, according to the speech recognition apparatus 100 according to the present embodiment, the plurality of selectable information is a keyword search result via the Internet. That is, even if it is the search result of the keyword by the internet, the user can select easily the selectable information which a user intends to select among the search results.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択可能情報は、番組表アプリケーションによるキーワードの検索結果である。つまり、番組表におけるキーワードの検索結果であっても、ユーザは、その検索結果のうちでユーザが選択を意図する選択可能情報を容易に選択することができる。 Moreover, according to the speech recognition apparatus 100 according to the present embodiment, the plurality of selectable information is a keyword search result by the program guide application. That is, even if it is the search result of the keyword in a program schedule, the user can select easily the selectable information which a user intends to select among the search results.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択可能情報は、全てのアプリケーションのうちの全ての検索可能アプリケーションによるキーワードの検索結果である。つまり、全ての検索可能アプリケーションにおけるキーワードの検索結果であっても、ユーザは、その検索結果のうちでユーザが選択を意図する選択可能情報を容易に選択することができる。 In addition, according to the speech recognition apparatus 100 according to the present embodiment, the plurality of selectable information are keyword search results by all searchable applications among all applications. That is, even if it is a keyword search result in all searchable applications, the user can easily select selectable information that the user intends to select from among the search results.

また、本実施の形態に係る音声認識装置１００によれば、複数の選択可能情報は、複数のハイパーテキストである。つまり、複数のハイパーテキストであっても、ユーザは、複数のハイパーテキストのうちでユーザが選択を意図する選択可能情報を容易に選択することができる。 Moreover, according to the speech recognition apparatus 100 according to the present embodiment, the plurality of selectable information is a plurality of hypertexts. That is, even for a plurality of hypertexts, the user can easily select selectable information that the user intends to select from among the plurality of hypertexts.

なお、本発明を上記実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。 Although the present invention has been described based on the above embodiment, it is needless to say that the present invention is not limited to the above embodiment. The following cases are also included in the present invention.

（１）上記の各装置は、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭ、ハードディスクユニット、ディスプレイユニットなどから構成されるシステムで実現され得る。ＲＡＭまたはハードディスクユニットには、プログラムが記憶されている。マイクロプロセッサが、プログラムにしたがって動作することにより、各装置は、その機能を達成する。ここでプログラムは、所定の機能を達成するために、マイクロプロセッサに対する指令を示す命令コードが複数個組み合わされて構成されたものである。 (1) Specifically, each of the above devices can be realized by a system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, and the like. A program is stored in the RAM or the hard disk unit. Each device achieves its function by the microprocessor operating according to the program. Here, the program is configured by combining a plurality of instruction codes indicating instructions to the microprocessor in order to achieve a predetermined function.

（２）上記の各装置を構成する構成要素の一部または全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）から構成されているとしてもよい。システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるシステムである。ＲＯＭには、プログラムが記憶されている。マイクロプロセッサが、ＲＯＭからＲＡＭにプログラムをロードし、ロードしたプログラムにしたがって演算等の動作することにより、システムＬＳＩは、その機能を達成する。 (2) A part or all of the constituent elements constituting each of the above-described devices may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip. Specifically, the system LSI includes a microprocessor, a ROM, a RAM, and the like. A program is stored in the ROM. The system LSI achieves its functions by the microprocessor loading a program from the ROM to the RAM and performing operations such as operations in accordance with the loaded program.

（３）上記の各装置を構成する構成要素の一部または全部は、各装置に脱着可能なＩＣカードまたは単体のモジュールから構成されてもよい。ＩＣカードまたはモジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどから構成されるシステムである。ＩＣカードまたはモジュールには、上記の超多機能ＬＳＩが含まれてもよい。マイクロプロセッサが、プログラムにしたがって動作することにより、ＩＣカードまたはモジュールは、その機能を達成する。このＩＣカードまたはこのモジュールは、耐タンパ性を有してもよい。 (3) Part or all of the constituent elements constituting each of the above apparatuses may be configured from an IC card that can be attached to and detached from each apparatus or a single module. The IC card or module is a system that includes a microprocessor, ROM, RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the program. This IC card or this module may have tamper resistance.

（４）本発明は、上記に示す方法で実現されてもよい。また、これらの方法をコンピュータにより実現するプログラムで実現してもよいし、プログラムからなるデジタル信号で実現してもよい。 (4) The present invention may be realized by the method described above. Further, these methods may be realized by a program realized by a computer, or may be realized by a digital signal composed of a program.

また、本発明は、プログラムまたはデジタル信号をコンピュータ読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、半導体メモリなどに記録したもので実現してもよい。また、これらの記録媒体に記録されているデジタル信号で実現してもよい。 The present invention also relates to a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, or BD (Blu-ray (registered trademark)). Disc), and may be realized by recording in a semiconductor memory or the like. Moreover, you may implement | achieve with the digital signal currently recorded on these recording media.

また、本発明は、プログラムまたはデジタル信号を、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送してもよい。 In the present invention, a program or a digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

また、本発明は、マイクロプロセッサとメモリを備えたシステムであって、メモリは、プログラムを記憶しており、マイクロプロセッサは、プログラムにしたがって動作してもよい。 Further, the present invention is a system including a microprocessor and a memory. The memory stores a program, and the microprocessor may operate according to the program.

また、プログラムまたはデジタル信号を記録媒体に記録して移送することにより、またはプログラムまたはデジタル信号をネットワーク等を経由して移送することにより、独立した他のコンピュータシステムにより実施するとしてもよい。 Further, the program or digital signal may be recorded on a recording medium and transferred, or the program or digital signal may be transferred via a network or the like, and may be executed by another independent computer system.

（５）上記実施の形態及び上記変形例をそれぞれ組み合わせるとしてもよい。 (5) The above embodiment and the above modifications may be combined.

以上のように、本開示における技術の例示として、実施の形態を説明した。そのために、添付図面および詳細な説明を提供した。 As described above, the embodiments have been described as examples of the technology in the present disclosure. For this purpose, the accompanying drawings and detailed description are provided.

したがって、添付図面および詳細な説明に記載された構成要素の中には、課題解決のために必須な構成要素だけでなく、上記技術を例示するために、課題解決のためには必須でない構成要素も含まれ得る。そのため、それらの必須ではない構成要素が添付図面や詳細な説明に記載されていることをもって、直ちに、それらの必須ではない構成要素が必須であるとの認定をするべきではない。 Accordingly, among the components described in the accompanying drawings and the detailed description, not only the components essential for solving the problem, but also the components not essential for solving the problem in order to illustrate the above technique. May also be included. Therefore, it should not be immediately recognized that these non-essential components are essential as those non-essential components are described in the accompanying drawings and detailed description.

また、上述の実施の形態は、本開示における技術を例示するためのものであるから、特許請求の範囲またはその均等の範囲において種々の変更、置き換え、付加、省略などを行うことができる。 Moreover, since the above-mentioned embodiment is for demonstrating the technique in this indication, a various change, replacement, addition, abbreviation, etc. can be performed in a claim or its equivalent range.

本開示は、ユーザが選択することを意図した選択可能情報を、音声認識を用いて容易に選択することができる映像表示装置として適用可能である。具体的には、テレビなどに、本開示は適用可能である。 The present disclosure can be applied as a video display device capable of easily selecting selectable information intended to be selected by a user using voice recognition. Specifically, the present disclosure can be applied to a television or the like.

１音声認識システム
１０テレビ
２０リモートコントローラ
２１、３１マイク
２２、３２入力部
３０携帯端末
４０ネットワーク
５０キーワード認識部
１００音声認識装置
１０１音声取得部
１０２コマンド認識部
１０３認識結果取得部
１０４コマンド処理部
１０５抽出部
１０６選択モード変更部
１０７表示制御部
１０８選択部
１０９検索部
１１０操作受付部
１１１ジェスチャ認識部
１２０内蔵カメラ
１３０内蔵マイク
１４０表示部
１５０送受信部
１６０チューナ
１７０記憶部 DESCRIPTION OF SYMBOLS 1 Voice recognition system 10 Television 20 Remote controller 21, 31 Microphone 22, 32 Input part 30 Portable terminal 40 Network 50 Keyword recognition part 100 Voice recognition apparatus 101 Voice acquisition part 102 Command recognition part 103 Recognition result acquisition part 104 Command processing part 105 Extraction Unit 106 selection mode change unit 107 display control unit 108 selection unit 109 search unit 110 operation reception unit 111 built-in camera 130 built-in microphone 140 display unit 150 transmission / reception unit 160 tuner 170 storage unit

Claims

A video display device,
A display unit for displaying images;
When display information including a plurality of selectable information in which reference information for referencing related information is embedded is displayed on the display unit, any one of the plurality of selectable information is displayed as a user. And a processor that assists in selecting
The processor is
An audio acquisition unit for acquiring audio emitted by the user;
A recognition result acquisition unit that acquires a recognition result of the voice acquired by the voice acquisition unit;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction unit for extracting selection candidates that are information;
When there are a plurality of selection candidates extracted by the extraction unit, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all of the selectable information can be selected. A selection mode changing unit for changing the selection candidate to a selectable second selection mode;
A display control unit that changes a display mode of the display information according to the second selection mode changed by the selection mode change unit,
When the selection mode is the second selection mode, the display control unit displays, for each of the plurality of selection candidates, an identifier for identifying the selection candidate.

A video display device,
A display unit for displaying images;
When display information including a plurality of selectable information in which reference information for referencing related information is embedded is displayed on the display unit, any one of the plurality of selectable information is displayed as a user. And a processor that assists in selecting
The processor is
An audio acquisition unit for acquiring audio emitted by the user;
A recognition result acquisition unit that acquires a recognition result of the voice acquired by the voice acquisition unit;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction unit for extracting selection candidates that are information;
When there are a plurality of selection candidates extracted by the extraction unit, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all of the selectable information can be selected. A selection mode changing unit for changing the selection candidate to a selectable second selection mode;
A display control unit that changes a display mode of the display information according to the second selection mode changed by the selection mode change unit;
When the recognition result acquired by the recognition result acquisition unit includes a keyword and a search command associated with a preset application, a search unit that searches the keyword with the application; Have
The display control unit displays a result of the search by the search unit as the plurality of selectable information.

A video display device,
A display unit for displaying images;
When display information including a plurality of selectable information in which reference information for referencing related information is embedded is displayed on the display unit, any one of the plurality of selectable information is displayed as a user. And a processor that assists in selecting
The processor is
An audio acquisition unit for acquiring audio emitted by the user;
A recognition result acquisition unit that acquires a recognition result of the voice acquired by the voice acquisition unit;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction unit for extracting selection candidates that are information;
Only when there are a plurality of selection candidates extracted by the extraction unit, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all the selectable information can be selected. A selection mode changing unit for changing the selection candidate to a selectable second selection mode;
A display control unit configured to change a display mode of the display information according to the second selection mode changed by the selection mode changing unit.

When there is one selection candidate extracted by the extraction unit, the display control unit displays related information referred to by reference information embedded in selectable information extracted as the selection candidate. The video display device according to claim 3.

further,
An operation reception unit that receives operations from the user is provided.
The first selection mode is a mode in which the operation reception unit receives a free cursor operation, and the second selection mode is a mode in which the operation reception unit receives a predetermined command operation or a swipe operation in a predetermined direction. Item 4. The video display device according to Item 3.

When display information including a plurality of selectable information embedded with reference information for referencing related information and a display unit that displays video is displayed on the display unit, the plurality of selectable information A video display method of a video display device comprising: a processor that supports a user to select any one of them,
An audio acquisition step for acquiring audio uttered by the user;
A recognition result acquisition step for acquiring the recognition result of the voice acquired in the voice acquisition step;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction step for extracting selection candidates that are information;
When there are a plurality of selection candidates extracted in the extraction step, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all of the selectable information can be selected. A selection mode change step for changing the selection candidate to a selectable second selection mode;
A display control step of changing a display mode of the display information according to the second selection mode changed in the selection mode change step,
In the display control step, when the selection mode is the second selection mode, for each of the plurality of selection candidates, an identifier for identifying the selection candidate is displayed.

When display information including a plurality of selectable information embedded with reference information for referencing related information and a display unit that displays video is displayed on the display unit, the plurality of selectable information A video display method of a video display device comprising: a processor that supports a user to select any one of them,
An audio acquisition step for acquiring audio uttered by the user;
A recognition result acquisition step for acquiring the recognition result of the voice acquired in the voice acquisition step;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction step for extracting selection candidates that are information;
When there are a plurality of selection candidates extracted in the extraction step, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all of the selectable information can be selected. A selection mode change step for changing the selection candidate to a selectable second selection mode;
A display control step of changing a display mode of the display information according to the second selection mode changed in the selection mode change step;
When the recognition result acquired in the recognition result acquisition step includes a keyword and a search command associated with a preset application, a search step of searching for the keyword with the application; Including
In the display control step, a video display method of displaying the search result in the search step as the plurality of selectable information.

When display information including a plurality of selectable information embedded with reference information for referencing related information and a display unit that displays video is displayed on the display unit, the plurality of selectable information A video display method of a video display device comprising: a processor that supports a user to select any one of them,
An audio acquisition step for acquiring audio uttered by the user;
A recognition result acquisition step for acquiring the recognition result of the voice acquired in the voice acquisition step;
When a keyword and a selection command for selecting one of the plurality of selectable information are included in the recognition result, a selection is possible that includes the keyword among the plurality of selectable information. An extraction step for extracting selection candidates that are information;
Only when there are a plurality of selection candidates extracted in the extraction step, the selection mode for selecting the plurality of selectable information is changed from the first selection mode in which all the selectable information can be selected. A selection mode changing step for changing the selection candidate to a selectable second selection mode;
A display control step of changing a display mode of the display information in accordance with the second selection mode changed in the selection mode changing step.