JP2019215393A

JP2019215393A - Image display device and television receiver

Info

Publication number: JP2019215393A
Application number: JP2018111058A
Authority: JP
Inventors: 石川　善文; Yoshifumi Ishikawa; 善文石川; 修輿水; Osamu Koshimizu
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2019-12-19
Anticipated expiration: 2038-06-11
Also published as: JP7041589B2

Abstract

To provide technology for changing a voice recognition range to be suitable to a viewing stye of a user in a display device with a voice recognition function.SOLUTION: A display device with a voice recognition function, having a display section, comprises a voice input section which: has directivity, sets an orientation direction being a direction of directivity to a prescribed angle, performs voice acquisition in parallel for a plurality of times, in which angles of voice recognition differ, which selects candidate voice data in accordance with a feature amount of voice data from voice data acquired by the voice acquisition, sets the orientation direction of the candidate voice data to be a speaker direction, discriminates whether or not valid voice data is a keyword voice, performs the voice acquisition in which the speaker direction is set to be the orientation direction and acquires a command voice. The display device comprises: a narrow angle mode in which a voice acquisition range by the voice acquisition for the plurality of times is narrow; and a wide angle mode in which the voice acquisition range by the voice acquisition for the plurality of times is wide, and the display device is configured to allow the narrow angle mode and the wide angle mode to be switched.SELECTED DRAWING: Figure 1

Description

本発明は、音声認識機能付テレビに関する。より詳しくは、音声認識機能付テレビにおける音声認識の精度向上に関する。 The present invention relates to a television with a voice recognition function. More specifically, the present invention relates to improving the accuracy of speech recognition in a television with a speech recognition function.

近年、音声認識機能を搭載した製品が多数発売されている。音声認識機能によって、リモコン等を用いず、機器を操作出来たり、必要な情報を手軽に取得出来るので便利である。テレビに音声認識機能が搭載された製品も発売されつつある。 In recent years, many products equipped with a voice recognition function have been released. The voice recognition function is convenient because the device can be operated and necessary information can be easily obtained without using a remote controller or the like. Products equipped with a voice recognition function on a television are also on the market.

テレビで音声認識機能を利用する場合、テレビの音量を上げ下げしたり、チャンネルを切り替えたりするリモコンの代わりを行ったり、明日の天気等の情報を確認する場合もある。いずれにしても、通常、使用者は、テレビに正対し、テレビの視聴に適切な位置にすると考えられる。そのため、音声認識の方向は、テレビに正対した方向のみとするのが妥当と考えられる。他の方向の雑音を抑えることで、音声の認識率を上げることが出来る。 When using the voice recognition function on the TV, there are cases where the volume of the TV is raised or lowered, a channel is switched, a remote controller is replaced, or information such as tomorrow's weather is checked. In any case, it is generally considered that the user faces the television and is in a position suitable for watching the television. Therefore, it is considered appropriate that the direction of voice recognition be only the direction directly facing the television. By suppressing noise in other directions, the speech recognition rate can be increased.

しかしながら、使用者の視聴スタイルによっては、必ずしもそうとは言えない。例えば、使用者が、居間のテレビを台所から視聴している場合もある。また、他の作業をしながら、テレビをBGMとして視聴している場合もある。
そこで、使用者の視聴スタイルに合わせて、音声認識範囲を変えられる技術が求められていた。 However, this is not always the case depending on the viewing style of the user. For example, a user may be watching television in the living room from the kitchen. In some cases, the user watches television as background music while performing other tasks.
Therefore, there has been a demand for a technique capable of changing a voice recognition range in accordance with a user's viewing style.

このような問題に対して、従来からも様々な技術が提案されている。例えば、視聴者の位置をカメラでとらえ、音声認識の指向性の方向を変化させる技術が開示されている（特許文献１）。 To solve such a problem, various techniques have been conventionally proposed. For example, a technique has been disclosed in which the position of a viewer is captured by a camera, and the direction of the directivity of voice recognition is changed (Japanese Patent Application Laid-Open No. H10-163,086).

特許６２５０２９７号Patent No. 6250297

しかしながら、複数の視聴者の場合には、指向性の設定が難しいし、カメラ映像が不明瞭な場合は、設定が難しかった。そのため、使用者の視聴スタイルに合わせて、音声認識範囲を変えるという課題解決となっていない。 However, setting of directivity is difficult for a plurality of viewers, and setting is difficult when the camera image is unclear. Therefore, the problem of changing the voice recognition range in accordance with the user's viewing style has not been solved.

本発明に係る画像表示装置は、指向性を持ち、指向性の方向である指向方向を所定の角度に設定し音声取得を、並行して複数回行い、それぞれの該音声取得の該角度は異なり、該音声取得によって取得した音声データの中から、該音声データの特徴量に応じて候補音声データを選定し、該候補音声データの指向方向を発話者方向とし、該有効音声データがキーワード音声か否かを判別し、該発話者方向を指向方向とした該音声取得を行い、コマンド音声を取得する音声入力部を備え、複数の該音声取得による音声取得範囲が狭い狭角度モードと、複数の該音声取得による該音声取得範囲が広い広角度モードと、を備え、該狭角度モードと該広角度モードを切り替え可能であることを手段とする。 The image display device according to the present invention has directivity, sets a directivity direction, which is a direction of directivity, to a predetermined angle, performs sound acquisition a plurality of times in parallel, and the angle of each sound acquisition is different. Selecting candidate voice data from the voice data acquired by the voice acquisition in accordance with the feature amount of the voice data, setting the direction of the candidate voice data to the speaker direction, and determining whether the valid voice data is the keyword voice. A voice input unit that obtains the command voice by performing the voice acquisition with the speaker direction as the directivity direction, and a narrow angle mode in which the voice acquisition range by the voice acquisition is narrow; A wide-angle mode in which the voice acquisition range by the voice acquisition is wide, and the mode can be switched between the narrow-angle mode and the wide-angle mode.

この発明によれば使用者の視聴スタイルに合わせて、音声認識範囲を変えられるので、
音声認識率を向上させることが出来る。 According to the present invention, since the voice recognition range can be changed according to the user's viewing style,
The voice recognition rate can be improved.

本発明に係る表示装置のブロック図である。It is a block diagram of a display concerning the present invention. 本発明に係る表示装置の一実施形態を説明する図である。FIG. 2 is a diagram illustrating one embodiment of a display device according to the present invention. 本発明に係る表示装置の一実施形態を説明するフローチャートである。5 is a flowchart illustrating an embodiment of a display device according to the present invention. 本発明に係る表示装置のメニュー表示を示す図である。It is a figure showing the menu display of the display concerning the present invention. 本発明に係る表示装置の画面表示を説明する図である。FIG. 4 is a diagram illustrating a screen display of the display device according to the present invention. 本発明に係る表示装置の他の実施形態を説明する図である。FIG. 11 is a diagram illustrating another embodiment of the display device according to the present invention. 本発明に係る表示装置の他の実施形態を説明するフローチャートである。9 is a flowchart illustrating another embodiment of the display device according to the present invention.

本発明である画像表示装置は、使用者の視聴スタイルに合わせて、音声認識範囲を変えられることを最大の特徴とする。以下、実施例を図面に基づいて説明する。
なお、本実施例で示される画像表示装置の全体形状及び各部の形状は、下記に述べる実施例に限定されるものではなく、本発明の技術的思想の範囲内、即ち、同一の作用効果を発揮できる形状及び寸法の範囲内で変更することができるものである。
（実施例１） The greatest feature of the image display device of the present invention is that the voice recognition range can be changed in accordance with the viewing style of the user. Hereinafter, embodiments will be described with reference to the drawings.
Note that the overall shape and the shape of each part of the image display device shown in the present embodiment are not limited to the embodiments described below, and are within the scope of the technical idea of the present invention, that is, the same operation and effect. It can be changed within the range of shapes and dimensions that can be exhibited.
(Example 1)

図１から図４に沿って、本発明の構成を説明する。
図１は、本発明に係る表示装置のブロック図である。図２（ａ）は、本発明に係る表示装置の一実施形態での、狭角度モードを説明する図である。図２（ｂ）は、同じく、広角度モードを説明する図である。図３は、本発明に係る表示装置の一実施形態を説明するフローチャートである。図４（ａ）は、本発明に係る表示装置のメニュー表示を示す図である。 The configuration of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram of a display device according to the present invention. FIG. 2A is a diagram illustrating a narrow angle mode in one embodiment of the display device according to the present invention. FIG. 2B is a diagram illustrating the wide angle mode. FIG. 3 is a flowchart illustrating an embodiment of the display device according to the present invention. FIG. 4A is a diagram showing a menu display of the display device according to the present invention.

（ブロック図の説明）
表示装置１は、音声認識機能付テレビジョン受信機であり、一般的なテレビ受信機に音声認識機能が搭載されたものである。
表示装置１は、主に、音声入力機能と制御機能と映像表示機能と音声出力機能からなる。 (Explanation of block diagram)
The display device 1 is a television receiver with a voice recognition function, and is a general television receiver equipped with a voice recognition function.
The display device 1 mainly includes an audio input function, a control function, a video display function, and an audio output function.

音声入力機能は、マイクアレイ１００と音声入力処理部２００から構成されている。マイクアレイ１００は、複数の無指向性マイクから構成される。本例では、マイクを４つとする。マイクは、直線状に配置される。各マイクに入った音は、後段で、マイク毎に遅延がかけられ、加算され、指向性を備えた音声情報として使用される。遅延量を変えることで指向方向を変えることが出来る。この方式は、ビームフォーミングとも呼ばれている。この動作が並行して行われ、複数の音声情報が生成される。
言い換えれば、指向性を持ち指向方向を設定可能な音声取得を、並行して複数行うことになる。
複数の音声取得の方向を異ならせることによって、１つの音声取得動作ではカバーできない広い範囲の音声を取得することができる。
また、１つの音声取得について、テレビの表示部の表示面に対する法線方向、つまり、テレビの正面方向に、指向性を持たせることで、テレビの正面に重点を置いた音声取得が出来る。
音声入力処理部２００は、マイクアレイ１００によって入力された音声についての処理を行う部分であり、音声のノイズ除去、キーワードの認識、コマンド音声の確定等を行う。音声入力処理部２００は、主に、前処理部２１０と音声認識部２２０と音声認識パラメータ部２３０から成る。 The voice input function includes a microphone array 100 and a voice input processing unit 200. The microphone array 100 includes a plurality of omnidirectional microphones. In this example, there are four microphones. The microphones are arranged in a straight line. The sound that enters each microphone is delayed and added for each microphone in the subsequent stage, and is used as sound information having directivity. The directional direction can be changed by changing the delay amount. This method is also called beamforming. This operation is performed in parallel, and a plurality of pieces of audio information are generated.
In other words, a plurality of voice acquisitions having directivity and capable of setting the directivity direction are performed in parallel.
By making the directions of obtaining a plurality of sounds different, it is possible to obtain a wide range of sounds that cannot be covered by one sound obtaining operation.
In addition, for one audio acquisition, by giving directivity to the normal direction to the display surface of the display unit of the television, that is, the front direction of the television, it is possible to acquire audio with emphasis on the front of the television.
The voice input processing unit 200 is a part that performs processing on voice input by the microphone array 100, and performs noise removal of voice, recognition of keywords, determination of command voice, and the like. The voice input processing unit 200 mainly includes a preprocessing unit 210, a voice recognition unit 220, and a voice recognition parameter unit 230.

前処理部２１０は、マイクから入力された音声の認識率を上げるために、ノイズ、エコー成分等を除去する部分である。例えば、エコーキャンセル、ノイズリダクションを行う。エコーキャンセルでは、音声処理部６００からの音声情報を使用し、テレビから発せられる音成分を除去する。この処理により、テレビ音声による認識不良、誤認識を除去出来る。ノイズリダクションでは、低音、高音のノイズの除去、定常的なノイズの除去を行う。 The preprocessing unit 210 is a unit that removes noise, echo components, and the like in order to increase the recognition rate of voice input from a microphone. For example, echo cancellation and noise reduction are performed. In the echo cancellation, a sound component emitted from the television is removed using the sound information from the sound processing unit 600. By this processing, recognition failure and erroneous recognition due to television sound can be removed. In the noise reduction, low-frequency and high-frequency noises are removed, and stationary noise is removed.

音声認識部２２０は、キーワード検出とコマンド検出の２つの動作を行う。
キーワード検出時は、前処理部２１０で生成された複数の音声データを元に、複数の角度毎の音声データを生成する。音声データの特徴量に応じて、有効音声データを特定する。有効音声データを取得した際の指向方向を発話者方向とする。そして、有効音声データがキーワード音声であるかを判別する。音声データの特徴量としては、発話音量の大きさが考えられる。発話者の方向と指向方向が合うと音量が大きくなるからである。
発話がキーワード音声であれば、制御部３００に対して、キーワードが受信されたことを伝える。
コマンド検出時は、設定された角度の音声を受信し、音声データをコマンド音声データとして、制御部３００に送信する。
音声認識パラメータ部２３０は、キーワード検出時の音声受信角度を設定するパラメータ群である。この値を変えることで、音声取得の範囲を変えたり、音声取得の方向を変えたりすることが出来る。制御部３００からの指示で、音声認識パラメータ部２３０の中から適切なパラメータが音声認識部２２０に送られる。 The voice recognition unit 220 performs two operations of keyword detection and command detection.
At the time of keyword detection, voice data for each of a plurality of angles is generated based on the plurality of voice data generated by the preprocessing unit 210. The valid voice data is specified according to the feature amount of the voice data. The directional direction at the time of acquiring the effective voice data is defined as the speaker direction. Then, it is determined whether the valid voice data is a keyword voice. As the feature amount of the voice data, the magnitude of the utterance volume can be considered. This is because the volume increases when the direction of the speaker matches the pointing direction.
If the utterance is a keyword voice, the control unit 300 is notified that the keyword has been received.
When a command is detected, a voice at a set angle is received, and voice data is transmitted to the control unit 300 as command voice data.
The voice recognition parameter section 230 is a parameter group for setting a voice receiving angle at the time of keyword detection. By changing this value, the range of voice acquisition or the direction of voice acquisition can be changed. In response to an instruction from the control unit 300, an appropriate parameter from the speech recognition parameter unit 230 is sent to the speech recognition unit 220.

制御機能は、表示装置１全体を制御する部分である。主に、制御部３００と通信部３１０とＲＣ受光部３２０からなる。
制御部３００は、テレビの映像音声の制御、音声認識関連の制御、外部との通信とを行う部分である。マイコンと、メモリであるＲＯＭ、ＲＡＭを備える。
音声入力処理部２００に対して、音声認識パラメータ部２３０を介して、音声認識角度パラメータを送り、角度を変更する。
音声入力処理部２００で取得したコマンド音声を通信部３１０、インターネット７００を介して、音声解析部８００に送ったり、音声解析部８００からの解析結果を通信部３１０を介して、テキスト情報として受信したりする。音声解析部８００の結果のテキストを画像処理部４３０を介して表示したり、音声解析部８００の結果のテキストを、音声に変換して、音声処理部６００に音声データとして送信したりする。 The control function is a part that controls the entire display device 1. It mainly includes a control unit 300, a communication unit 310, and an RC light receiving unit 320.
The control unit 300 is a part that performs control of video and audio of the television, control related to voice recognition, and communication with the outside. It includes a microcomputer, and ROMs and RAMs as memories.
A voice recognition angle parameter is sent to the voice input processing unit 200 via the voice recognition parameter unit 230 to change the angle.
The command voice acquired by the voice input processing unit 200 is transmitted to the voice analysis unit 800 via the communication unit 310 and the Internet 700, and the analysis result from the voice analysis unit 800 is received as text information via the communication unit 310. Or The text of the result of the voice analysis unit 800 is displayed via the image processing unit 430, and the text of the result of the voice analysis unit 800 is converted into voice and transmitted to the voice processing unit 600 as voice data.

通信部３１０は、外部とネットワーク通信を行う部分である。インターネットと接続し、ネット上の音声解析部８００とやりとりすることが出来る。有線でもいいし、無線でもよい。
ＲＣ受光部３２０は、テレビのリモコンの受信部であり、赤外線リモコンからのデータを受信し、制御部３００に送信する。 The communication unit 310 is a part that performs network communication with the outside. The user can connect to the Internet and exchange data with the voice analysis unit 800 on the Internet. It may be wired or wireless.
RC light receiving section 320 is a receiving section of a remote control of a television, receives data from an infrared remote control, and transmits the data to control section 300.

映像表示機能は、テレビの映像関連であり、放送波の受信や、制御部３００からのテキストデータの映像への重畳を行う。主に、チューナ４００、アンテナ４１０、復号化部４２０、画像処理部４３０、表示部５００からなる。
チューナ４００は、一般的な放送受信のためのチューナであり、アンテナ４１０からの放送波を受信し、番組を選局する。復号化部４２０は、受信データのTS化を行い、映像、音声、メタ情報等を分離する部分であり、映像は画像処理部４３０に、音声は音声処理部６００に送られる。
制御部３００は映像／音声ソースのタイミングを映像処理部４３０、音声処理部６００に送信する。また、ユーザー操作の効果や字幕等の情報を適切な位置に表示したサブ画面を作成し、映像処理部に出力する。サブ画面とは、映像に重畳するためのレイアである。ま
た、情報のための音声入力があった際に音声認識モードとなったことを示すマークや、音声認識結果のテキスト、音声認識の認識範囲モードの表示等も行う。
画像処理部４３０は、映像の輝度、彩度、色相等の調整、映像の高精細化やスケーリング等を行う。また、サブ画面等を映像に重畳する処理を行う。 The video display function is related to the video of a television, and performs reception of broadcast waves and superimposition of text data from the control unit 300 on video. It mainly comprises a tuner 400, an antenna 410, a decoding unit 420, an image processing unit 430, and a display unit 500.
The tuner 400 is a tuner for general broadcast reception, and receives a broadcast wave from the antenna 410 and selects a program. The decoding unit 420 is a unit that converts the received data into TS and separates video, audio, meta information, and the like. The video is sent to the image processing unit 430 and the audio is sent to the audio processing unit 600.
The control unit 300 transmits the timing of the video / audio source to the video processing unit 430 and the audio processing unit 600. In addition, it creates a sub-screen that displays information such as effects of user operations and subtitles at appropriate positions, and outputs the sub-screen to the video processing unit. The sub-screen is a layer for superimposing on a video. In addition, a mark indicating that the voice recognition mode has been set when a voice input for information is performed, a text of a voice recognition result, a recognition range mode of voice recognition, and the like are displayed.
The image processing unit 430 performs adjustment of luminance, saturation, hue, and the like of an image, high definition and scaling of the image, and the like. In addition, a process of superimposing a sub screen or the like on a video is performed.

音声出力機能は、テレビの音声出力に関連する部分である。主に、音声処理部６００とスピーカ６１０から成る。音声処理部６００は、音声データからＤＡコンバータ等で音声波形を生成し、アンプを介して、スピーカ６１０に音声を出力する。
また、音声データを前処理部２１０に送り、エコーキャンセルさせる。前処理部２１０に送る音声データには、スピーカ６１０から発せられる音量に応じたデータ、又は音量レベルを付加する。 The audio output function is a part related to the audio output of the television. It mainly comprises an audio processing unit 600 and a speaker 610. The audio processing unit 600 generates an audio waveform from audio data using a DA converter or the like, and outputs audio to a speaker 610 via an amplifier.
In addition, the audio data is sent to the pre-processing unit 210 to cancel the echo. Data corresponding to the volume emitted from the speaker 610 or a volume level is added to the audio data to be sent to the preprocessing unit 210.

（狭角度モードの動作例）
次に、狭角度モードと広角度モードの動作を図２、図３に沿って、説明する。
図２（ａ）、図２（ｂ）は、狭角度モードの動作例である。図２（ｃ）、図２（ｄ）は、広角度モードの動作例である。図３は、いずれの場合も含むフローチャートである。
動作例として、音声検出方向は５つあり、各方向は、音声検出角度が６０度の扇型の範囲を持つ。音声取得可能範囲は、図２（ａ）のように、Ｂ１、Ｂ２、Ｂ３、Ｂ４、Ｂ５とする。中央に当たるＢ３の方向を音声取得領域中央方向Ｃとする。本例では、音声取得領域中央方向Ｃの方向は、テレビの表示面に対向する方向とする。
各音声取得可能範囲の刻み角度は、狭角度モードは１０度、広角度モードは３０度とする。 (Operation example in narrow angle mode)
Next, the operation in the narrow angle mode and the wide angle mode will be described with reference to FIGS.
FIGS. 2A and 2B show an operation example in the narrow angle mode. 2C and 2D are operation examples in the wide angle mode. FIG. 3 is a flowchart including both cases.
As an operation example, there are five voice detection directions, and each direction has a fan-shaped range with a voice detection angle of 60 degrees. The voice obtainable range is B1, B2, B3, B4, and B5 as shown in FIG. The direction of B3 corresponding to the center is defined as a voice acquisition area center direction C. In this example, the direction of the audio acquisition area center direction C is a direction facing the display surface of the television.
The step angle of each audio obtainable range is 10 degrees in the narrow angle mode and 30 degrees in the wide angle mode.

まず、狭角度モードの動作を図２（ａ）、図２（ｂ）に沿って説明する。表示装置１は、上から見た形状であり、紙面上、左が映像表示側、右がテレビの背面側とする。マイクアレイ１００はテレビの映像表示位置の中央に配置されている。また、特別に指定のない限り、右側、左側とは、テレビを正面から見ているユーザーを基準に右側（紙面上のB5の方向）、左側（紙面上のB1方向）とする。
テレビ起動後、音声認識パラメータを設定する（Ｓ１０１）。サーチ方向数は５個、サーチ方向は正面、サーチ刻み角は１０度と設定する。
キーワードは、「ＯＫベーグル」とする。 First, the operation in the narrow angle mode will be described with reference to FIGS. 2 (a) and 2 (b). The display device 1 has a shape viewed from the top, and the left side is the image display side and the right side is the back side of the television on the paper. The microphone array 100 is arranged at the center of the television image display position. Unless otherwise specified, the right and left sides are defined as the right side (direction of B5 on the paper) and the left side (direction of B1 on the paper) with respect to the user viewing the television from the front.
After starting the television, the voice recognition parameters are set (S101). The number of search directions is set to 5, the search direction is set to the front, and the search step angle is set to 10 degrees.
The keyword is “OK bagel”.

使用者Ｈが「ＯＫベーグル」と発話する（図２（ａ））。マイクアレイ１００、音声入力処理部２００は５つの方向毎に並行して音声を取得する（Ｓ１０２）。
使用者Ｈは、音声取得領域中央方向Ｃよりも若干左側にずれた位置に居る。最も音量の大きな方向は音声取得可能範囲Ｂ２の方向であるので、発話者方向をＢ２とし、この方向から取得された音声データを候補音声データとする。
Ｂ２を使用者Ｈの方向として記憶する（Ｓ１０３）。 The user H speaks "OK bagel" (FIG. 2A). The microphone array 100 and the voice input processing unit 200 acquire voice in parallel for each of the five directions (S102).
The user H is located at a position slightly shifted leftward from the center direction C of the voice acquisition area. Since the direction of the loudest volume is the direction of the voice obtainable range B2, the speaker direction is set to B2, and voice data obtained from this direction is set as candidate voice data.
B2 is stored as the direction of the user H (S103).

発話内容は、キーワード音声かを確認する。キーワードの確認出来た場合は、コマンド音声取得状態に入る、キーワード音声で無ければ、音声取得動作（Ｓ１０２）に戻る。 It confirms whether the utterance content is a keyword voice. If the keyword is confirmed, the command voice acquisition state is entered. If the keyword is not the keyword voice, the process returns to the voice acquisition operation (S102).

コマンド音声取得状態として、まず、音声取得方向をＳ１０３で記憶した方向であるＢ２に設定する（図２（ｂ））。音声取得方向の刻み角度が小さいため、微調整が可能である。使用者Ｈの位置を音声取得範囲のほぼ中央に調整することが出来る。 First, as the command voice acquisition state, the voice acquisition direction is set to B2, which is the direction stored in S103 (FIG. 2B). Since the step angle in the voice acquisition direction is small, fine adjustment is possible. The position of the user H can be adjusted to almost the center of the voice acquisition range.

コマンド音声取得待ちに入る（Ｓ１０５）。使用者Ｈが「音量下げて」と発話すると、その音声を取得する（Ｓ１０６、Ｓ１０７）。取得した音声データは、制御部３００、通信部３１０を介して、音声解析部８００に送られる（Ｓ１０８）。音声解析部８００では、データベース、大型コンピュータ等を用いて、音声データを解析し、音声をテキスト化
する。この例では、“オンリョウサゲテ”となる。音声がテレビの操作を指示する内容であるので、テレビ操作のコマンドである“ＶＯＬＤＷＮ”を生成する。また、音声が問い合わせの内容であれば、それに対する回答をテキストデータとして生成する。
音声解析部８００は、コマンド音声をテキスト化したデータである“オンリョウサゲテ”と、回答となるデータである“ＶＯＬＤＷＮ”を表示装置１に送信する（Ｓ１０９）。 The process enters a command voice acquisition wait state (S105). When the user H speaks "Turn down the volume", the voice is acquired (S106, S107). The acquired audio data is sent to the audio analysis unit 800 via the control unit 300 and the communication unit 310 (S108). The voice analysis unit 800 analyzes the voice data using a database, a large computer, or the like, and converts the voice to text. In this example, it is “Onryousagete”. Since the sound is the content for instructing the operation of the television, “VOL DWN” which is a command of the television operation is generated. If the voice is the content of the inquiry, an answer to the inquiry is generated as text data.
The voice analysis unit 800 transmits to the display device 1 “Onkyo Sagete” which is data in which the command voice is converted into text and “VOL DWN” which is data as an answer (S109).

制御部３００は、コマンド音声のテキストを表示したサブ画面の映像を画像処理部４３０に送り、画像処理部４３０は、主映像にサブ画面の映像を重畳する。回答となるデータが、テレビ操作のデータであれば、テレビの操作を行う。回答となるデータがテキストデータであれば、回答のサブ画面の映像を作成し、画像処理部４３０に送り、主映像にサブ画面の映像を重畳する。併せて制御部３００は、テキストから音声データを生成し、音声処理部６００に音声データを送信する。音声処理部６００はスピーカ６１０を用いて、テキストの内容を音声出力する（Ｓ１１０）。 The control unit 300 sends the video of the sub-screen displaying the text of the command voice to the image processing unit 430, and the image processing unit 430 superimposes the video of the sub-screen on the main video. If the answer data is TV operation data, a TV operation is performed. If the answer data is text data, a video of the sub screen of the answer is created, sent to the image processing unit 430, and the video of the sub screen is superimposed on the main video. At the same time, the control unit 300 generates audio data from the text, and transmits the audio data to the audio processing unit 600. The voice processing unit 600 outputs the text content by voice using the speaker 610 (S110).

一連の動作によって、テレビの正面付近にいる使用者Ｈの方向を特定し、音声取得角度を調整し、使用者Ｈの発話の認識率を上げることが出来る。そのため、狭角度モードは、使用者が、テレビの正面付近にいる生活スタイルの使用者に最適である。 Through a series of operations, the direction of the user H near the front of the television can be specified, the voice acquisition angle can be adjusted, and the recognition rate of the utterance of the user H can be increased. Therefore, the narrow angle mode is optimal for a lifestyle user whose user is near the front of the television.

本実施例の狭角度モードの説明では、キーワード音声を取得する際、複数の音声取得を行う例としたが、テレビの表示部の表示面に対する法線方向を指向方向とする音声取得のみとし、コマンド音声の取得も同じ正面方向としてもよい。使用者がテレビの正面にいる場合は、問題無く、キーワード音声や、コマンド音声を取得出来る。 In the description of the narrow angle mode of the present embodiment, when acquiring the keyword voice, a plurality of voices are obtained.However, only the voice is obtained with the direction normal to the display surface of the display unit of the television as the directivity direction. The command voice may be obtained in the same front direction. When the user is in front of the television, the keyword voice and the command voice can be obtained without any problem.

また、キーワード音声を取得する際、広角度モードと同様に、広範囲の音声を取得し、コマンド音声を取得する際、正面のみの音声を取得してもよい。このようにすることで、キーワード音声を取得しやすくなる。 Also, when acquiring the keyword voice, similarly to the wide angle mode, a wide range of voice may be obtained, and when obtaining the command voice, only the frontal voice may be obtained. By doing so, it becomes easier to acquire the keyword voice.

（広角度モードの動作例）
次に、広角度モードでの動作を、図２（ｃ）、図２（ｄ）に沿って説明する。
テレビ起動後、音声認識パラメータを設定する（Ｓ１０１）。サーチ方向数は５個、サーチ方向は正面、サーチ刻み角は３０度と設定する。５つの音声取得方向で、テレビの表示面方向の１８０度をほぼカバーする。使用者Ｈの位置は、テレビの正面よりも大きく左よりの位置とする。 (Operation example in wide angle mode)
Next, the operation in the wide angle mode will be described with reference to FIGS. 2C and 2D.
After starting the television, the voice recognition parameters are set (S101). The number of search directions is set to 5, the search direction is set to the front, and the search step angle is set to 30 degrees. The five sound acquisition directions cover almost 180 degrees in the direction of the display surface of the television. The position of the user H is set to be larger than the front of the television and to the left.

使用者Ｈが「ＯＫベーグル」と発話する（図２（ｃ））。マイクアレイ１００、音声入力処理部２００は５つの方向毎に並行して音声を取得する（Ｓ１０２）。
使用者Ｈは、音声取得領域中央方向Ｃよりも大きく左側にずれた位置に居る。最も音量の大きな方向は音声取得可能範囲Ｂ２の方向であるので、発話者方向をＢ２とし、この方向から取得された音声データを候補音声データとする。Ｂ２を使用者Ｈの方向として記憶する（Ｓ１０３）。 The user H speaks "OK bagel" (FIG. 2C). The microphone array 100 and the voice input processing unit 200 acquire voice in parallel for each of the five directions (S102).
The user H is located at a position shifted to the left more than the center direction C of the voice acquisition area. Since the direction of the loudest volume is the direction of the voice obtainable range B2, the speaker direction is set to B2, and voice data obtained from this direction is set as candidate voice data. B2 is stored as the direction of the user H (S103).

コマンド音声取得状態として、まず、音声取得方向をＳ１０３で記憶した方向であるＢ２に設定する（図２（ｄ））。音声取得方向の刻み角度が大きいため、使用者Ｈの位置が大きくずれていても、カバーが可能である。しかし、使用者Ｈの位置をサーチ範囲の中央からずれた位置に来る場合もある。その場合は、音声の認識率が低くなる。 First, as the command voice acquisition state, the voice acquisition direction is set to B2, which is the direction stored in S103 (FIG. 2D). Since the step angle in the voice acquisition direction is large, it is possible to cover even if the position of the user H is largely shifted. However, the position of the user H may be shifted from the center of the search range. In that case, the voice recognition rate is low.

コマンド音声取得待ちに入る（Ｓ１０５）。使用者Ｈが「音量下げて」と発話すると、その音声を取得する（Ｓ１０６、Ｓ１０７）。取得した音声データは、制御部３００、通信部３１０を介して、音声解析部８００に送られる（Ｓ１０８）。音声解析部８００では、データベース、大型コンピュータ等を用いて、音声データを解析し、音声をテキスト化する。この例では、“オンリョウサゲテ”となる。音声がテレビの操作を指示する内容であるので、テレビ操作のコマンドである“ＶＯＬＤＷＮ”を生成する。また、音声が問い合わせの内容であれば、それに対する回答をテキストデータとして生成する。
音声解析部８００は、コマンド音声をテキスト化したデータである“オンリョウサゲテ”と、回答となるデータである“ＶＯＬＤＷＮ”を表示装置１に送信する（Ｓ１０９）。 The process enters a command voice acquisition wait state (S105). When the user H speaks "Turn down the volume", the voice is acquired (S106, S107). The acquired audio data is sent to the audio analysis unit 800 via the control unit 300 and the communication unit 310 (S108). The voice analysis unit 800 analyzes the voice data using a database, a large computer, or the like, and converts the voice to text. In this example, it is “Onryousagete”. Since the sound is the content for instructing the operation of the television, “VOL DWN” which is a command of the television operation is generated. If the voice is the content of the inquiry, an answer to the inquiry is generated as text data.
The voice analysis unit 800 transmits to the display device 1 “Onkyo Sagete” which is data in which the command voice is converted into text and “VOL DWN” which is data as an answer (S109).

制御部３００は、コマンド音声のテキストを表示したサブ画面の映像を、画像処理部４３０に送り、画像処理部４３０は、主映像にサブ画面の映像を重畳する。サブ画面とは、主映像に、他の画像データ等を重畳する内容を描画するグラフィックプレーンである。回答となるデータが、テレビ操作のデータであれば、テレビの操作を行う。回答となるデータがテキストデータであれば、回答のサブ画面の映像を作成し、画像処理部４３０に送り、主映像にサブ画面の映像を重畳する。併せて制御部３００は、テキストから音声データを生成し、音声処理部６００に音声データを送信する。音声処理部６００はスピーカ６１０を用いて、テキストの内容を音声出力する（Ｓ１１０）。 The control unit 300 sends the video of the sub screen displaying the text of the command voice to the image processing unit 430, and the image processing unit 430 superimposes the video of the sub screen on the main video. The sub screen is a graphic plane that draws a content in which other image data or the like is superimposed on the main video. If the answer data is TV operation data, a TV operation is performed. If the answer data is text data, a video of the sub screen of the answer is created, sent to the image processing unit 430, and the video of the sub screen is superimposed on the main video. At the same time, the control unit 300 generates audio data from the text, and transmits the audio data to the audio processing unit 600. The voice processing unit 600 outputs the text content by voice using the speaker 610 (S110).

一連の動作によって、テレビの正面から大きくずれた位置にいる使用者Ｈの方向を特定し、音声取得角度を調整し、使用者Ｈの発話を認識することが出来る。そのため、広角度モードは、使用者が、テレビの正面付近にいない場合が多い生活スタイルの使用者に最適である。 Through a series of operations, it is possible to identify the direction of the user H located at a position significantly deviated from the front of the television, adjust the voice acquisition angle, and recognize the utterance of the user H. Therefore, the wide angle mode is most suitable for a lifestyle user who is often not near the front of the television.

（メニュー表示）
上述のように、表示装置１の音声認識モードとして、少なくとも狭角度モードと広角度モードを持つ必要がある。そこで、使用者が、容易に２つのモードを切り替えられるメニューを用いる（図４（ａ））。表示装置１の表示部５００にメニュー画面５１０を表示する。狭角度モード選択ボタン５２０と広角度モード選択ボタン５３０とを持つ。２つのボタンは、例えば、リモコンの上下キー、決定ボタンで、選択、決定する。図２（ａ）は、広角度モードが選択された画面である。広角度モードが選択されると、制御部３００は、音声認識パラメータ部２３０のパラメータから広角度モードのパラメータを選択し、音声認識部２２０のパラメータをセットする。 (Menu display)
As described above, the voice recognition mode of the display device 1 needs to have at least the narrow angle mode and the wide angle mode. Therefore, a menu is used in which the user can easily switch between the two modes (FIG. 4A). A menu screen 510 is displayed on the display section 500 of the display device 1. It has a narrow angle mode selection button 520 and a wide angle mode selection button 530. The two buttons are selected and determined by, for example, an up / down key and a determination button on a remote controller. FIG. 2A shows a screen in which the wide angle mode is selected. When the wide angle mode is selected, the control unit 300 selects a wide angle mode parameter from the parameters of the voice recognition parameter unit 230 and sets the parameters of the voice recognition unit 220.

（音声認識角度等表示）
図５に沿って、音声認識角度表示について説明する。音声認識のモードが複数あると、使用者は、現在、どちらのモードか確認することが必要な場合がある。図５（ａ）は、広角度モードを設定した場合である。表示部５００にコンテンツ映像５５０が表示され、画面の下部分に音声認識角度表示領域５６０を割り当てる。広角度モードを表すように、音声認識有効角度５７０は、細長い棒状の表示としている。使用者は、細長い棒状の表示を見て、広い角度を認識するモードと分かる。 (Indication of voice recognition angle, etc.)
The display of the voice recognition angle will be described with reference to FIG. If there are multiple voice recognition modes, the user may need to confirm which mode is currently selected. FIG. 5A shows a case where the wide angle mode is set. A content video 550 is displayed on the display unit 500, and a voice recognition angle display area 560 is assigned to a lower part of the screen. In order to represent the wide angle mode, the effective voice recognition angle 570 is displayed as an elongated bar. The user sees the elongated bar-shaped display and recognizes the mode to recognize a wide angle.

図５（ｂ）は、狭角度モードを設定した場合である。表示部５００にコンテンツ映像５５０が表示され、画面の下部分に音声認識角度表示領域５６０を割り当てる。狭角度モードを表すように、音声認識有効角度５７０は、小さい棒状の表示としている。使用者は、小さい棒状の表示を見て、狭い角度を認識するモードと分かる。 FIG. 5B shows a case where the narrow angle mode is set. A content video 550 is displayed on the display unit 500, and a voice recognition angle display area 560 is assigned to a lower part of the screen. To represent the narrow angle mode, the effective voice recognition angle 570 is displayed as a small bar. The user sees the small bar-shaped display and recognizes the mode of recognizing a narrow angle.

図５（ｃ）は、さらに、付加的な情報を示す図である。音声取得表示５８０は、キーワード音声、コマンド音声を取得していることを示す表示である。音声認識角度表示領域５
６０の中央上部分に表示が出ることで、使用者は、音声が取得できていることを確認出来る。また、音声入力、回答のテキストである音声テキスト５９０を画面に出すことで、使用者は、発話した内容や、回答の内容を確実に把握することが出来る。 FIG. 5C is a diagram showing additional information. The voice acquisition display 580 is a display indicating that keyword voice and command voice are being acquired. Voice recognition angle display area 5
By displaying the display in the upper center portion of 60, the user can confirm that sound has been acquired. In addition, by displaying voice text 590, which is a text of voice input and answer, on the screen, the user can surely grasp the uttered content and the content of the answer.

このように、本発明によれば、音声認識機能を持つテレビにおいて、複数の音声認識範囲を持つ音声認識機能によって、使用者の視聴スタイルに合わせて、音声認識範囲を変えることが出来、使用者の利便性を向上させることが出来る。 As described above, according to the present invention, in a television having a voice recognition function, the voice recognition range having a plurality of voice recognition ranges can be used to change the voice recognition range in accordance with the viewing style of the user. Convenience can be improved.

また、本実施例による構成を言い換えれば、表示部を持つ音声認識機能付き表示装置であって、指向性を持ち、指向性の方向である指向方向を所定の角度に設定し音声取得を、並行して複数回行う。所定の角度は、例えば、向を各音声取得毎に一定の刻み角度をいう。音声データの特徴量に応じて候補音声データを選定し、該候補音声データの指向方向を発話者方向とし、該候補音声データがキーワード音声か否かを判別する。
候補音声データを取得した際の角度である発話者方向を指向方向とした該音声取得を行い、コマンド音声を取得する音声入力部を備える。
複数の該音声取得による音声取得範囲が狭い（水平方向の取得角度が狭い）狭角度モードと、複数の該音声取得による該音声取得範囲が広い（水平方向の取得角度が狭い）広角度モードと、を備え、該狭角度モードと該広角度モードを切り替え可能である。 In other words, in other words, the configuration according to the present embodiment is a display device with a voice recognition function having a display unit, which has directivity, sets a directivity direction that is a directivity direction to a predetermined angle, and performs voice acquisition in parallel. And do it multiple times. The predetermined angle refers to, for example, a fixed step angle for each voice acquisition. The candidate voice data is selected according to the feature amount of the voice data, the direction of the candidate voice data is set as the speaker direction, and it is determined whether or not the candidate voice data is a keyword voice.
A voice input unit is provided for obtaining the voice with the utterer direction, which is the angle at which the candidate voice data was obtained, as the directional direction, and obtaining a command voice.
A narrow-angle mode in which a plurality of audio acquisition ranges are narrow (a narrow acquisition angle in the horizontal direction); and a wide-angle mode in which the plurality of audio acquisition ranges are wide (a narrow acquisition angle in the horizontal direction). , And can be switched between the narrow angle mode and the wide angle mode.

また、音声認識機能付き表示装置であって、表示部の表示面に対する法線方向を正面方向とした際、指向性の方向である指向方向を正面方向に設定し音声取得を行い、該音声取得によって取得した音声データがキーワード音声か否かを判別する。音声取得は、１方向のみについて行う。
指向方向を正面方向に設定し該音声取得を行い、コマンド音声を取得する。
このモードを狭角度モードとする。
指向性を持ち、指向性の方向である指向方向を所定の角度に設定し音声取得を、並行して複数回行い、それぞれの該音声取得の該角度は異なり、該音声取得によって取得した音声データの中から、該音声データの特徴量に応じて候補音声データを選定し、該候補音声データの指向方向を発話者方向とし、該有効音声データがキーワード音声か否かを判別する。
該発話者方向を指向方向とした該音声取得を行い、コマンド音声を取得する。このモードを広角度モードとする。
複数の該音声取得による音声取得範囲が狭い（水平方向の取得角度が狭い）狭角度モードと、複数の該音声取得による該音声取得範囲が広い（水平方向の取得角度が狭い）広角度モードと、を備え、該狭角度モードと該広角度モードを切り替え可能である。 Also, in the display device with a voice recognition function, when the normal direction to the display surface of the display unit is set to the front direction, the directivity direction, which is the direction of the directivity, is set to the front direction, and voice acquisition is performed. Is determined as to whether or not the voice data acquired is a keyword voice. Voice acquisition is performed only in one direction.
The pointing direction is set to the front direction, the voice is obtained, and the command voice is obtained.
This mode is referred to as a narrow angle mode.
It has directivity, and sets the directivity direction, which is the direction of the directivity, to a predetermined angle, performs voice acquisition a plurality of times in parallel, the angle of each voice acquisition is different, and the voice data acquired by the voice acquisition is different. , Candidate voice data is selected in accordance with the feature amount of the voice data, the direction of the candidate voice data is set as the speaker direction, and it is determined whether or not the valid voice data is a keyword voice.
The voice is acquired with the speaker direction as the directional direction, and a command voice is acquired. This mode is referred to as a wide angle mode.
A narrow angle mode in which a plurality of voice acquisition ranges are narrow (the horizontal acquisition angle is narrow); and a wide angle mode in which the plurality of voice acquisition ranges are wide (the horizontal acquisition angle is narrow). , And can be switched between the narrow angle mode and the wide angle mode.

（実施例２）
実施例１において、複数の音声認識範囲をメニューで切り替える例を説明した。しかし、２つのモードを変える毎に、メニューを表示させ、設定することは、場合によっては、面倒である。
そこで、メニューでの設定変更を行うことなく、モードを変える技術が求められていた。 (Example 2)
In the first embodiment, an example has been described in which a plurality of voice recognition ranges are switched using a menu. However, displaying and setting a menu every time the two modes are changed may be troublesome in some cases.
Therefore, a technique for changing the mode without changing the setting in the menu has been demanded.

図４（ｂ）、図６、図７に沿って、説明する。図４（ｂ）は、本実施例のメニュー表示例である。図６（ａ）、（ｂ）は、電源投入１回目の音声認識動作である。図６（ｃ）、（ｄ）は、２回目以降の音声認識動作である。図７は、電源投入１回目、２回目の音声認識のフローチャートである。
本実施例のモードをオートモードとして説明する。 This will be described with reference to FIGS. 4B, 6, and 7. FIG. FIG. 4B is a menu display example of the present embodiment. 6A and 6B show the first speech recognition operation when the power is turned on. FIGS. 6C and 6D show the second and subsequent speech recognition operations. FIG. 7 is a flowchart of the first and second power-on speech recognition.
The mode of this embodiment will be described as an auto mode.

図４（ｂ）に示すメニューを表示し、オートモード選択ボタン５４０を選択実行し、オ
ートモードとしておく。
テレビの電源を入れた後、音声認識パラメータを広角度モードに設定する（Ｓ２０１）。図６（ａ）に示すように、広範囲の音声を拾うモードとなる。
使用者Ｈが“ＯＫベーグル”と発話し、音声データを取得する（Ｓ２０２）。５つの方向の音声データを比較し、最も音量の大きな方向を特定し、変数ＳＣＮ１に記憶する。図６（ａ）の例では、Ｂ２を変数ＳＣＮ１に記憶する（Ｓ２０３）。
音声がキーワードで無ければＳ２０２に戻る。キーワードであれば、Ｓ２０５に移る（Ｓ２０４）。音声取得方向を変数ＳＣＮ１に記憶されたＢ２に設定する（Ｓ２０５）。コマンド音声を待ち、音声を取得する（Ｓ２０６）。
音声解析部８００にデータを送信し、解析結果を受け、制御部３００で表示部５００等に結果を表示する（Ｓ２０７）。 The menu shown in FIG. 4B is displayed, and the auto mode selection button 540 is selected and executed to set the mode to the auto mode.
After turning on the television, the voice recognition parameters are set to the wide angle mode (S201). As shown in FIG. 6A, a mode is set in which a wide range of sound is picked up.
The user H speaks "OK bagel" and acquires voice data (S202). The voice data in the five directions are compared, the direction with the highest volume is specified, and stored in the variable SCN1. In the example of FIG. 6A, B2 is stored in the variable SCN1 (S203).
If the voice is not a keyword, the process returns to S202. If it is a keyword, the process proceeds to S205 (S204). The voice acquisition direction is set to B2 stored in the variable SCN1 (S205). It waits for the command voice and acquires the voice (S206).
The data is transmitted to the voice analysis unit 800, the analysis result is received, and the control unit 300 displays the result on the display unit 500 or the like (S207).

２回目の音声認識に備え、パラメータをセットする。狭角度モードに設定し、中央方向は、ＳＣＮ１に記憶されたＢ２の方向とする（Ｓ２０８）。
音声取得領域中央方向Ｃの方向は、テレビの正面ではなく、ほぼ使用者Ｈの方向となる。
使用者Ｈが“ＯＫベーグル”と発話し、音声データを取得する（Ｓ２０９）。５つの方向の音声データを比較し、最も音量の大きな方向を特定し、変数ＳＣＮ２に記憶する。図６（ｃ）の例では、Ｂ２を変数ＳＣＮ２に記憶する（Ｓ２１０）。
音声がキーワードで無ければＳ２１２に戻る。キーワードであれば、Ｓ２１５に移る（Ｓ２１１）。音声取得方向を変数ＳＣＮ２に記憶されたＢ２に設定する（Ｓ２１２）。コマンド音声を待ち、音声を取得する（Ｓ２１３）。
音声解析部８００にデータを送信し、解析結果を受け、制御部３００で表示部５００等に結果を表示する（Ｓ２１４）。 The parameters are set in preparation for the second speech recognition. The mode is set to the narrow angle mode, and the center direction is the direction of B2 stored in the SCN1 (S208).
The direction of the sound acquisition area center direction C is not the front of the television but substantially the direction of the user H.
The user H speaks "OK bagel" and acquires voice data (S209). The sound data in the five directions are compared, the direction in which the volume is the highest is specified, and the direction is stored in the variable SCN2. In the example of FIG. 6C, B2 is stored in the variable SCN2 (S210).
If the voice is not a keyword, the process returns to S212. If it is a keyword, the process proceeds to S215 (S211). The voice acquisition direction is set to B2 stored in the variable SCN2 (S212). It waits for the command voice and acquires the voice (S213).
The data is transmitted to the voice analysis unit 800, the analysis result is received, and the control unit 300 displays the result on the display unit 500 or the like (S214).

この動作によって、使用者が正面方向に居なくても、音声認識範囲を使用者の方向に正確に合わせることが出来る。使用者は、メニューによる設定を行いことなく、音声認識率を向上させることが出来る。
また、使用者がテレビの正面から大きく離れた場合であっても、使用者の方向を精度よく捉えられるので、音声認識の認識率を向上させることができる。 With this operation, the voice recognition range can be accurately adjusted to the direction of the user even when the user is not in the front direction. The user can improve the speech recognition rate without making settings using the menu.
Further, even when the user is far away from the front of the television, the direction of the user can be accurately grasped, so that the recognition rate of voice recognition can be improved.

１表示装置
１００マイクアレイ
２００音声入力処理部
２１０前処理部
２２０音声認識部
２３０音声認識パラメータ部
３００制御部
３１０通信部
３２０ＲＣ受光部
４００チューナ
４１０アンテナ
４２０復号化部
４３０画像処理部
５００表示部
５１０メニュー画面
５２０狭角度モード選択ボタン
５３０広角度モード選択ボタン
５４０オートモード選択ボタン
５５０コンテンツ映像
５６０音声認識角度表示領域
５７０音声認識有効角度
５８０音声取得表示
５９０音声テキスト
６００音声処理部
６１０スピーカ
７００インターネット
８００音声解析部
Ｈ使用者
Ｂ１、・・・、Ｂ５音声取得可能範囲
Ｃ音声取得領域中央方向 1 display device 100 microphone array 200 voice input processing unit 210 preprocessing unit 220 voice recognition unit 230 voice recognition parameter unit 300 control unit 310 communication unit 320 RC light receiving unit 400 tuner 410 antenna 420 decoding unit 430 image processing unit 500 display unit 510 Menu screen 520 Narrow angle mode selection button 530 Wide angle mode selection button 540 Auto mode selection button 550 Content video 560 Voice recognition angle display area 570 Voice recognition effective angle 580 Voice acquisition display 590 Voice text 600 Voice processing unit 610 Speaker 700 Internet 800 Voice Analysis unit H Users B1,..., B5 Voice acquisition possible range C Voice acquisition area center direction

Claims

A display device with a voice recognition function having a display unit,
With directivity, set the directivity direction, which is the direction of the directivity, to a predetermined angle, perform voice acquisition multiple times in parallel,
The angle of each of the audio acquisitions is different,
From the voice data acquired by the voice acquisition, candidate voice data is selected according to the feature amount of the voice data, the direction of the candidate voice data is set to the speaker direction, and whether or not the candidate voice data is a keyword voice And determine
The voice acquisition unit performs the voice acquisition with the speaker direction as the directivity direction, and includes a voice input unit that acquires a command voice.
A narrow angle mode in which the voice acquisition range by the plurality of voice acquisitions is narrow;
A wide angle mode in which the voice acquisition range is wide by a plurality of the voice acquisitions,
A display device capable of switching between the narrow angle mode and the wide angle mode.

A display device with a voice recognition function,
When the normal direction to the display surface of the display unit is set to the front direction, the directivity direction, which is the direction of the directivity, is set to the front direction and voice acquisition is performed, and whether or not the voice data acquired by the voice acquisition is a keyword voice is determined. Determine,
A narrow-angle mode in which the pointing direction is set to the front direction and the voice is obtained, and a command voice is obtained;
With directivity, set the directivity direction, which is the direction of the directivity, to a predetermined angle, perform voice acquisition multiple times in parallel,
The angle of each of the audio acquisitions is different,
From the voice data acquired by the voice acquisition, candidate voice data is selected according to the feature amount of the voice data, the direction of the candidate voice data is set to the speaker direction, and whether or not the candidate voice data is a keyword voice And determine
A wide-angle mode for performing the voice acquisition with the speaker direction as the directivity direction and acquiring a command voice,
A display device capable of switching between the narrow angle mode in which the sound acquisition range is narrow and the wide angle mode in which the sound acquisition range is wide.

The display device according to claim 1, wherein the feature amount of the audio data is a magnitude of an utterance volume.

4. The display device according to claim 1, wherein the switching of the mode is performed by selecting an item of a menu.

The display device according to claim 1, wherein a display according to the mode being set is performed on the display unit.

The display device according to any one of claims 1 to 5, wherein the sound acquisition range in the wide angle mode is 60 degrees or more.

When the state of the display device changes,
The first acquisition of the audio data is performed in the wide angle mode,
2. The method according to claim 1, wherein a direction of the voice acquisition at the time of acquiring the voice data for the second time or later is set to a determined speaker direction in the first acquisition of the voice data, and the direction is obtained in the narrow angle mode. The display device according to claim 6.

A television receiver comprising: the television receiver according to claim 1.