JP2021015180A

JP2021015180A - Electronic apparatus, program and voice recognition method

Info

Publication number: JP2021015180A
Application number: JP2019129339A
Authority: JP
Inventors: 丈次山下; Joji Yamashita
Original assignee: Toshiba Visual Solutions Corp
Current assignee: Toshiba Visual Solutions Corp
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-02-12
Anticipated expiration: 2039-07-11
Also published as: CN112243588B; JP7216621B2; WO2021004511A1; CN112243588A

Abstract

To selectively use a plurality of microphones according to a situation of a speaker and to use voice collected by the respective microphones while sound collection sections are provided on both an external terminal and an electronic apparatus and instruction operability of the speaker is improved.SOLUTION: An electronic apparatus comprises a first voice acquisition section, a second sound collection section, a second voice acquisition section, a voice recognition section, and a control section. The first voice acquisition section acquires, from an external terminal, a first voice collected by a first sound collection section of the external terminal. The second sound collection section collects a second voice at a self-periphery. The second voice acquisition section acquires the second voice collected by the second sound collection section. The voice recognition section performs voice recognition processing on the inputted voice. The control section inputs the voice matched with a previously set condition in the first voice and the second voice to the voice recognition section to perform voice recognition processing.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、電子機器、プログラムおよび音声認識方法に関する。 Embodiments of the present invention relate to electronic devices, programs and speech recognition methods.

近年、音声による機器の操作や情報、コンテンツの検索を行うサービス（ＡＩによる音声対話型のコンテンツ検索サービス）のニーズが高まりつつある。この検索サービスは、機器に話しかけるだけで、リモートコントローラ（以下「リモコン」と称す）を手にすることなく操作や情報検索ができることの利便性から、急速に普及しつつある。 In recent years, there has been an increasing need for a service (voice interactive content search service by AI) for operating a device by voice and searching for information and contents. This search service is rapidly becoming widespread because of the convenience of being able to operate and search for information simply by talking to a device without having to pick up a remote controller (hereinafter referred to as "remote control").

操作の対象の機器は、話者が話しかけた機器のみならず家庭内のあらゆる機器が対象となるため、今後、こういった検索サービスを提供する事業者や機器製造メーカが増加することが予想される。 Since the devices to be operated are not only the devices spoken by the speaker but also all devices in the home, it is expected that the number of businesses and device manufacturers that provide such search services will increase in the future. To.

一方で、テレビジョン装置（以下「ＴＶ」と称す）やパーソナルコンピュータ（以下「ＰＣ」と称す）のような情報を表示できるデバイスを備える機器に対して、機器から離れた位置で指示を与える場合、リモコン操作が基本となるが、コンテンツの検索や文字入力等を行う際にリモコンを集音手段として活用することが考えられる。 On the other hand, when giving instructions to a device equipped with a device capable of displaying information such as a television device (hereinafter referred to as "TV") or a personal computer (hereinafter referred to as "PC") at a position away from the device. , Remote control operation is basic, but it is conceivable to use the remote control as a sound collecting means when searching for contents or inputting characters.

ＴＶで集音した音声を活用する具体的な例としては、例えばリモコンにマイクを内蔵し、話者が発する声をマイクで集音しリモコンから無線通信によりＴＶ本体に伝達して処理（音声認識）することや、ＴＶ本体にマイクを内蔵してユーザが発する声をＴＶ本体が直接集音して処理すること等が考えられる。 As a specific example of utilizing the sound collected by the TV, for example, the remote control has a built-in microphone, and the voice emitted by the speaker is collected by the microphone and transmitted from the remote control to the TV main body for processing (voice recognition). ), Or the TV main body directly collects and processes the voice emitted by the user by incorporating a microphone in the TV main body.

前者のリモコンにマイクを内蔵する例は、マイクと話者の距離が近いため、高品質な音声を収集し、収集した音声を高い精度で認識処理できる一方で、話者がリモコンを手で持つ必要があるというデメリットがある。 In the former remote control with a built-in microphone, the distance between the microphone and the speaker is short, so high-quality voice can be collected and the collected voice can be recognized and processed with high accuracy, while the speaker holds the remote control by hand. There is a disadvantage that it is necessary.

また、後者のＴＶ本体にマイクを内蔵する例は、前者と逆で、話者はリモコンを手にすることなく発話できるものの、マイクと話者との距離が離れるため、マイクで収集する音声の品質として高いものが望めない。 In addition, the latter example of incorporating a microphone in the TV body is the opposite of the former, and although the speaker can speak without holding the remote control, the distance between the microphone and the speaker is large, so the voice collected by the microphone I can't expect high quality.

そこで、双方のよいところ、つまりリモコンとＴＶ本体の双方にマイクを設けることが考えられる。 Therefore, it is conceivable to provide microphones in the good points of both, that is, both the remote controller and the TV main body.

特開２００６−３１９７９７号公報Japanese Unexamined Patent Publication No. 2006-319797

しかしながら、リモコン（外部端末）とＴＶ本体（電子機器）の両方にマイク（集音部）を設ける場合、それぞれのマイクで収集した音声がＴＶ本体に同時に入力されてしまうケース（音声の衝突）が発生し、収集した音声をうまく活用できないという問題がある。 However, when microphones (sound collectors) are provided on both the remote control (external terminal) and the TV main unit (electronic device), there are cases (audio collision) in which the sound collected by each microphone is input to the TV main unit at the same time. There is a problem that it occurs and the collected voice cannot be used well.

例えば話者がリモコンを持っているときは、リモコンのマイクで収集した音声を利用した方が良く、話者がリモコンを持ってないときは、ＴＶ本体側のマイクで収集した音声を利用した方が良い。このように話者の状況によってマイクの使い分けが必要になる。 For example, when the speaker has a remote control, it is better to use the voice collected by the microphone of the remote control, and when the speaker does not have the remote control, it is better to use the voice collected by the microphone of the TV main unit. Is good. In this way, it is necessary to use different microphones depending on the situation of the speaker.

本発明が解決しようとする課題は、外部端末と電子機器の両方に集音部を設けて話者の指示操作性を高めつつ、話者の状況によって複数の集音部を使い分け、それぞれの集音部で収集した音声を活用することができる電子機器、プログラムおよび音声認識方法を提供することにある。 The problem to be solved by the present invention is to provide sound collectors on both the external terminal and the electronic device to improve the instruction operability of the speaker, and to use a plurality of sound collectors properly according to the situation of the speaker. The purpose of the present invention is to provide electronic devices, programs, and voice recognition methods that can utilize the voice collected by the sound unit.

実施形態の電子機器は、自己の周囲の第１音声を収集する第１集音部を有する外部端末と無線接続または有線接続される電子機器であって、第１音声取得部、第２集音部、第２音声取得部、音声認識部、制御部を備える。第１音声取得部は外部端末の前記第１集音部が収集した第１音声を前記外部端末から取得する。第２集音部は自己の周囲の第２音声を収集する。第２音声取得部は第２集音部により収集された第２音声を取得する。音声認識部は入力される音声を音声認識処理する。制御部は第１音声および第２音声のうち予め設定された条件に合致する音声を音声認識部へ入力し音声認識処理させる。 The electronic device of the embodiment is an electronic device that is wirelessly or wiredly connected to an external terminal having a first sound collecting unit that collects the first sound around itself, and is a first sound acquisition unit and a second sound collecting unit. A unit, a second voice acquisition unit, a voice recognition unit, and a control unit are provided. The first voice acquisition unit acquires the first voice collected by the first sound collecting unit of the external terminal from the external terminal. The second sound collecting unit collects the second sound around itself. The second voice acquisition unit acquires the second sound collected by the second sound collection unit. The voice recognition unit performs voice recognition processing on the input voice. The control unit inputs a voice that matches a preset condition among the first voice and the second voice into the voice recognition unit and performs voice recognition processing.

実施形態の記録再生装置の構成を示す図である。It is a figure which shows the structure of the recording / reproduction apparatus of embodiment. 記録再生装置の第１動作例を示すフローチャートである。It is a flowchart which shows the 1st operation example of a recording / reproduction apparatus. 記録再生装置の第２動作例を示すフローチャートである。It is a flowchart which shows the 2nd operation example of a recording / reproduction apparatus. 記録再生装置の第３動作例を示すフローチャートである。It is a flowchart which shows the 3rd operation example of a recording / reproduction apparatus.

以下、図面を参照して、実施形態を詳細に説明する。
図１は電子機器に係る一つの実施の形態の記録再生装置１の概略構成の一例を示す図である。本実施形態では、映像表示部１４を備えた記録再生装置１について説明するが、映像表示部１４は必須構成ではない。電子機器が例えばデジタルレコーダ又はコンピュータの本体等の場合には、電子機器は、映像表示部１４を備えず、各種ケーブル等を介して、外部の映像表示部（ディスプレイ）に対して表示情報を出力する。この他、電子機器としては、例えばエアコン、冷蔵庫等であってもよい。 Hereinafter, embodiments will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing an example of a schematic configuration of a recording / reproducing device 1 according to an embodiment relating to an electronic device. In the present embodiment, the recording / reproducing device 1 provided with the video display unit 14 will be described, but the video display unit 14 is not an essential configuration. When the electronic device is, for example, a digital recorder or the main body of a computer, the electronic device does not include the video display unit 14 and outputs display information to an external video display unit (display) via various cables or the like. To do. In addition, the electronic device may be, for example, an air conditioner, a refrigerator, or the like.

図１を参照して記録再生装置１の構成を説明する。図１に示すように、記録再生装置１は、外部端末としてのリモートコントローラ２０（以下「リモコン２０」と称す）と無線接続される電子機器であり、ネットワーク上で音声によるコンテンツの検索サービスを提供する一つ以上のコンピュータとしてのサービスサーバ（サーバ２００，２０１等）にネットワークＮＴＷを介して接続される記録再生装置本体１００を備える。記録再生装置１は、リモコン２０と有線接続されることもある。 The configuration of the recording / reproducing device 1 will be described with reference to FIG. As shown in FIG. 1, the recording / playback device 1 is an electronic device wirelessly connected to a remote controller 20 (hereinafter referred to as “remote control 20”) as an external terminal, and provides a content search service by voice on a network. A recording / playback device main body 100 connected to a service server (servers 200, 201, etc.) as one or more computers via a network NTW is provided. The recording / playback device 1 may be connected to the remote controller 20 by wire.

記録再生装置本体１００は、リモコン２０とBluetooth（登録商標）および赤外線通信などの無線通信により接続される。リモコン２０は、この例のように記録再生装置１専用のものの他、例えばスマートホン、タブレット等の情報端末やマイクと通信機能を有するユニットであってもよい。 The recording / playback device main body 100 is connected to the remote controller 20 by wireless communication such as Bluetooth (registered trademark) and infrared communication. The remote controller 20 may be a unit dedicated to the recording / playback device 1 as in this example, or may be a unit having a communication function with an information terminal such as a smartphone or a tablet or a microphone.

リモコン２０は、記録再生装置本体１００の機能を操作するための複数のボタン２１、信号処理部２２、第１送信部としてのＩＲ送信部２３、第１集音部としてのマイク２４、音声処理部２５、第２送信部としてのBluetooth通信部２６（以下「ＢＴ通信部２６」と称す）等を有する。複数のボタン２１の中の一つに設定機能を呼び出すためのボタンである設定ボタン２１ａ、ボイス機能を動作させるためのボタンであるボイスボタン２１ｂがある。 The remote controller 20 includes a plurality of buttons 21 for operating the functions of the recording / playback device main body 100, a signal processing unit 22, an IR transmitting unit 23 as a first transmitting unit, a microphone 24 as a first sound collecting unit, and a voice processing unit. 25, It has a Bluetooth communication unit 26 (hereinafter referred to as “BT communication unit 26”) and the like as a second transmission unit. One of the plurality of buttons 21 includes a setting button 21a which is a button for calling the setting function and a voice button 21b which is a button for operating the voice function.

信号処理部２２は、複数のボタン２１の押下に応じた信号を生成する。ＩＲ送信部２３は、ボイスボタン２１ｂの操作に応じて信号処理部２２により生成される信号を赤外線通信で出力する。信号処理部２２は、ボイスボタン２１ｂが押下操作されることで、記録再生装置本体１００のボイス機能に録音動作を開始させる信号、つまり記録再生装置本体１００に録音開始を指示する指示信号（特定のトリガ信号）を生成する。 The signal processing unit 22 generates a signal corresponding to the pressing of a plurality of buttons 21. The IR transmission unit 23 outputs a signal generated by the signal processing unit 22 by infrared communication in response to the operation of the voice button 21b. The signal processing unit 22 is a signal that causes the voice function of the recording / reproducing device main body 100 to start a recording operation by pressing the voice button 21b, that is, an instruction signal (specific) that instructs the recording / reproducing device main body 100 to start recording. Trigger signal) is generated.

マイク２４は、狭い集音域（９０°程度の指向性、数十センチメートル程度の集音距離）を有しており、ボイスボタン２１ｂの操作によりアクティブになり、自己（マイク２４）の周囲の第１音声（主に話者がマイク２４に向かって発話した音声）を収集するため、比較的高品質な音声が得られる。 The microphone 24 has a narrow sound collecting range (directivity of about 90 °, sound collecting distance of about several tens of centimeters), becomes active by operating the voice button 21b, and is the first around itself (microphone 24). Since one voice (mainly the voice spoken by the speaker into the microphone 24) is collected, a relatively high quality voice can be obtained.

音声処理部２５は、マイク２４により集音されたアナログ音声をデジタル化してＢＴ通信部２６に渡す。ＢＴ通信部２６は、音声処理部２５によりデジタル化された音声をBluetooth通信に送信する。つまりＢＴ通信部２６および音声処理部２５は、マイク２４により集音された音声を記録再生装置本体１００へ送信する。 The voice processing unit 25 digitizes the analog voice collected by the microphone 24 and passes it to the BT communication unit 26. The BT communication unit 26 transmits the voice digitized by the voice processing unit 25 to the Bluetooth communication. That is, the BT communication unit 26 and the voice processing unit 25 transmit the voice collected by the microphone 24 to the recording / playback device main body 100.

記録再生装置本体１００は、地上デジタル放送受信用のアンテナ５０、チューナ５１、ＯＦＤＭ復調器５２、信号処理部５３、グラフィック処理部５８、音声処理部５９、ＯＳＤ信号生成部６１、映像表示部１４、スピーカ１５、操作部１６、図示しない各種端子（映像出力端子、音声出力端子等）、各種インターフェース（ＩＲ受信部１８、ＢＴ通信部１９、ＬＡＮや外部ネットワークＮＴＷとの通信インターフェース７３（以下「通信Ｉ／Ｆ７３」と称す））、本体マイク８１、制御モジュール６５、ハードディスクドライブ１０１（以下「ＨＤＤ１０１）と称す）等を有する。機器内部に備えるＨＤＤ１０１を内蔵ＨＤＤなどともいう。 The recording / playback device main body 100 includes an interface 50 for receiving terrestrial digital broadcasting, a tuner 51, an OFDM demodulator 52, a signal processing unit 53, a graphic processing unit 58, an audio processing unit 59, an OSD signal generation unit 61, and a video display unit 14. Speaker 15, operation unit 16, various terminals (video output terminal, audio output terminal, etc.) not shown, various interfaces (IR receiver 18, BT communication unit 19, communication interface 73 with LAN or external network NTW (hereinafter referred to as "communication I"). / F73 ”)), a main body microphone 81, a control module 65, a hard disk drive 101 (hereinafter referred to as“ HDD 101 ”), and the like. The HDD 101 provided inside the device is also referred to as an internal HDD.

アンテナ５０は、受信した地上デジタルテレビジョン放送信号を地上デジタル放送用のチューナ５１に供給する。チューナ５１は、供給された放送信号の中から指定されたチャンネルの放送信号を選択し、ＯＦＤＭ（orthogonal frequency division multiplexing）復調器５２に供給する。 The antenna 50 supplies the received terrestrial digital television broadcast signal to the tuner 51 for terrestrial digital broadcasting. The tuner 51 selects a broadcast signal of a designated channel from the supplied broadcast signals and supplies the broadcast signal to the OFDM (orthogonal frequency division multiplexing) demodulator 52.

ＯＦＤＭ復調器５２は、入力されたチャンネルの放送信号をデジタルの映像信号及び音声信号に復調した後、信号処理部５３に出力する。 The OFDM demodulator 52 demodulates the broadcast signal of the input channel into a digital video signal and an audio signal, and then outputs the signal to the signal processing unit 53.

信号処理部５３は、ＯＦＤＭ復調器５２から入力されたデジタルの映像信号及び音声信号に、所定のデジタル信号処理を施し、グラフィック処理部５８及び音声処理部５９に出力する。 The signal processing unit 53 performs predetermined digital signal processing on the digital video signal and audio signal input from the OFDM demodulator 52, and outputs them to the graphic processing unit 58 and the audio processing unit 59.

グラフィック処理部５８は、信号処理部５３から供給されるデジタルの映像信号に、ＯＳＤ（on screen display）信号生成部６１で生成されるＯＳＤ信号を重畳して映像処理部６２へ出力する。このグラフィック処理部５８は、信号処理部５３の出力映像信号と、ＯＳＤ信号生成部６１の出力ＯＳＤ信号とを選択的に出力し、または、両出力を組み合わせて出力する。 The graphic processing unit 58 superimposes the OSD signal generated by the OSD (on screen display) signal generation unit 61 on the digital video signal supplied from the signal processing unit 53 and outputs it to the video processing unit 62. The graphic processing unit 58 selectively outputs the output video signal of the signal processing unit 53 and the output OSD signal of the OSD signal generation unit 61, or outputs both outputs in combination.

映像処理部６２は、グラフィック処理部５８から入力されたデジタルの映像信号に、明度、輝度、彩度等の処理を施し、その映像信号を映像表示部１４と映像出力端子（図示せず）に供給する。映像処理部６２は、コンテンツの映像を画面へ出力する出力部として機能する。 The video processing unit 62 applies processing such as brightness, brightness, and saturation to the digital video signal input from the graphic processing unit 58, and sends the video signal to the video display unit 14 and the video output terminal (not shown). Supply. The video processing unit 62 functions as an output unit that outputs a video of the content to the screen.

映像表示部１４は、例えばディスプレイや表示パネル等であり、映像信号に基づく映像を表示パネルに表示する。映像出力端子に外部機器が接続されると、映像出力端子に供給された映像信号は、外部機器へ出力される。 The video display unit 14 is, for example, a display, a display panel, or the like, and displays a video based on a video signal on the display panel. When an external device is connected to the video output terminal, the video signal supplied to the video output terminal is output to the external device.

音声処理部５９は、入力されたデジタルの音声信号を、スピーカ１５で再生可能なアナログ音声信号に変換した後、スピーカ１５に出力して音声を出力させる。アナログ音声信号は、ヘッドホン端子などの音声出力端子（図示せず）を介して外部に出力される。 The audio processing unit 59 converts the input digital audio signal into an analog audio signal that can be reproduced by the speaker 15, and then outputs the output to the speaker 15 to output the audio. The analog audio signal is output to the outside via an audio output terminal (not shown) such as a headphone terminal.

操作部１６は、この記録再生装置本体１００に設けられたボタン、スイッチ類であり、記録再生装置本体１００の各機能に対してリモコン２０とほぼ同等の操作が可能である。 The operation unit 16 is buttons and switches provided on the recording / playback device main body 100, and can perform operations substantially equivalent to those of the remote controller 20 for each function of the recording / playback device main body 100.

詳述すると、操作部１６は、ユーザによる直接操作、例えば番組を視聴、録画予約するためのＥＰＧ（電子番組表）表示、ＥＰＧ（電子番組表）からテレビ放送（番組）のチャンネル（放送局）の選択、番組の録画開始（ＲＥＣ）、録画済みの番組を再生するための番組のリスト表示（過去番組表）、過去番組表から録画した番組を再生するための選択（上下左右の方向指示）や再生開始（ＰＬＡＹ）などに対応する制御コマンドを制御モジュール６５に入力する。 More specifically, the operation unit 16 is directly operated by the user, for example, an EPG (electronic program guide) display for viewing a program and making a recording reservation, and a channel (broadcast station) from the EPG (electronic program guide) to a television broadcast (program). Selection, program recording start (REC), program list display for playing recorded programs (past program guide), selection for playing programs recorded from the past program guide (up / down / left / right direction instructions) And a control command corresponding to playback start (PLAY) and the like are input to the control module 65.

本体マイク８１は、自己（本体マイク８１）の周囲（映像表示部１４の画面前方のある角度で指向性を持った数メートルの範囲）の第２音声（話者の音声）を収集する第２集音部であり、リモコン２０のマイク２４よりも広い集音域（１２０°程度の指向性、数メートル程度の集音距離）で集音する。 The main body microphone 81 collects the second sound (speaker's sound) around itself (main body microphone 81) (a range of several meters having directivity at a certain angle in front of the screen of the image display unit 14). It is a sound collecting unit and collects sound in a sound collecting range (directivity of about 120 °, sound collecting distance of about several meters) wider than the microphone 24 of the remote control 20.

入力音声処理部６４は、本体マイク８１により集音されたアナログ音声をデジタル化して制御モジュール６５に出力する。入力音声処理部６４は、本体マイク８１により収集される第２音声を取得する第２音声取得部として機能する。 The input voice processing unit 64 digitizes the analog voice collected by the main body microphone 81 and outputs it to the control module 65. The input voice processing unit 64 functions as a second voice acquisition unit that acquires the second voice collected by the main body microphone 81.

通常、記録再生装置本体１００が動作中、本体マイク８１は集音可能な状態（アクティブ状態）で常に集音し、リモコン２０のボイスボタン２１ｂが押下されたときに非アクティブ状態（集音動作を停止した状態）に切り替えられ、リモコン２０のマイク２４がアクティブにされ、マイク２４により収集された音声（第１音声）がリモコン２０から取得される。 Normally, while the recording / playback device main body 100 is operating, the main body microphone 81 always collects sound in a state where sound can be collected (active state), and when the voice button 21b of the remote controller 20 is pressed, the sound collecting operation is performed. It is switched to the stopped state), the microphone 24 of the remote controller 20 is activated, and the voice (first voice) collected by the microphone 24 is acquired from the remote controller 20.

この他、リモコン２０のボイスボタン２１ｂが押下されたときでも本体マイク８１を集音可能な状態（アクティブ状態）を維持し、２つのマイク２４、８１からそれぞれ集音される音の圧力の強い（音圧が大きい）方または音声が明りょうに集音される（明瞭性が高い）方（結果として、音声認識率の高い方）のマイクから集音した音声またはそれを録音した音声を音声認識部７１へ出力するようにしてもよい。 In addition, even when the voice button 21b of the remote control 20 is pressed, the main body microphone 81 is maintained in a state in which sound can be collected (active state), and the pressure of the sound collected from the two microphones 24 and 81 is strong ( Voice recognition of the sound collected from the microphone (higher sound pressure) or the sound collected (higher clarity) (as a result, the higher voice recognition rate) or the recorded sound. You may output to unit 71.

音声の明瞭性は、例えば、明瞭度指数（一例として、ＳＩＩ：Speech Intelligibility Index）によって評価できる。ＳＩＩは「ＡＮＳＩＳ３．５−１９９７」として標準化されており、基本的には、区分した周波数帯毎に信号対雑音比と周波数別の係数（周波数別の明瞭度への寄与率）から周波数別の明瞭度指数を求め、それらの総和により全体の明瞭度指数を求める。
これを簡略化し、周波数帯を音声の明瞭度に大きく寄与する周波数帯域（例えば、１０００Ｈｚ〜３，０００Ｈｚ）に限定して、明瞭度指数を求めてもよい。 Speech intelligibility can be evaluated, for example, by the intelligibility index (SII: Speech Intelligibility Index), for example. SII is standardized as "ANSI S3.5-1997", and basically, by frequency from the signal-to-noise ratio and the coefficient by frequency (contribution rate to clarity by frequency) for each divided frequency band. The intelligibility index of is calculated, and the total intelligibility index is calculated from the sum of them.
This may be simplified and the intelligibility index may be obtained by limiting the frequency band to a frequency band (for example, 1000 Hz to 3,000 Hz) that greatly contributes to the intelligibility of speech.

このとき、音圧Ｐｖと明瞭度指数ＳＩＩのいずれかから、音声認識率の高低を評価することができる。
なお、音圧Ｐｖと明瞭度指数ＳＩＩの組み合わせによって、音声認識率の高低を評価してもよい。例えば、次の式（１）のように、音圧Ｐｖと明瞭度指数ＳＩＩの線形加算によって、音声認識率を評価できる。
Ｒ＝Ｋ１＊Ｐｖ＋Ｋ２＊ＳＩＩ … 式（１）
ここで、計数Ｋ１，Ｋ２は、比例係数である。
すなわち、式（１）によって定まる値Ｒが大きい方の音声を音声認識率が高いものとすることができる。 At this time, the level of the voice recognition rate can be evaluated from either the sound pressure Pv or the intelligibility index SII.
The level of the voice recognition rate may be evaluated by the combination of the sound pressure Pv and the intelligibility index SII. For example, the speech recognition rate can be evaluated by the linear addition of the sound pressure Pv and the intelligibility index SII as in the following equation (1).
R = K1 * Pv + K2 * SII ... Equation (1)
Here, the counts K1 and K2 are proportional coefficients.
That is, the voice with the larger value R determined by the equation (1) can have a higher voice recognition rate.

ＩＲ受信部１８は、リモコン２０からの指示（操作入力）、例えばチャンネル（放送局）の選択（選局）、録画開始（ＲＥＣ）、録画した番組の再生（ＰＬＡＹ）、一時停止（ＰＡＵＳＥ）、特殊再生、あるいはメニュー表示等に対応するコマンドを赤外線通信により制御モジュール６５に入力する。 The IR receiver 18 receives instructions (operation input) from the remote controller 20, for example, channel (broadcasting station) selection (channel selection), recording start (REC), recorded program playback (PLAY), pause (PAUSE), and the like. A command corresponding to special playback, menu display, or the like is input to the control module 65 by infrared communication.

ＢＴ通信部１９は、リモコン２０とBluetooth通信（近距離無線通信）を行う。ＢＴ通信部１９は、リモコン２０から送信される音声信号を受信し制御モジュール６５に入力する。ＢＴ通信部１９は、リモコン２０のマイク２４が収集した第１音声をリモコン２０から取得する第１音声取得部として機能する。 The BT communication unit 19 performs Bluetooth communication (short-range wireless communication) with the remote controller 20. The BT communication unit 19 receives the audio signal transmitted from the remote controller 20 and inputs it to the control module 65. The BT communication unit 19 functions as a first voice acquisition unit that acquires the first voice collected by the microphone 24 of the remote controller 20 from the remote controller 20.

この他、ＷｉＦｉ（Wireless Fidelity）通信部などを備えることで、ＷｉＦｉ規格等に準拠する近距離無線通信機器との間で無線通信を行ってもよい。さらにＮＦＣ（Near Field Communication）等の規格の近距離無線通信部を設けて、同規格の外部機器と通信するようにしてもよい。 In addition, by providing a WiFi (Wireless Fidelity) communication unit or the like, wireless communication may be performed with a short-range wireless communication device conforming to the WiFi standard or the like. Further, a short-range wireless communication unit of a standard such as NFC (Near Field Communication) may be provided to communicate with an external device of the same standard.

ＵＳＢＩ／Ｆ７６は、ＵＳＢ規格に対応する外部接続装置（入力装置や記憶装置）等とデータや信号の通信を行う。入力装置としては、例えばキーボード、マウス等である。記憶装置としては、この例のようにＵＳＢ端子に接続されるＨＤＤ１０２などである。ＨＤＤ１０１、１０２は、設定により記憶領域をさまざまに利用することができる。 The USB I / F76 communicates data and signals with an external connection device (input device or storage device) or the like that supports the USB standard. Examples of the input device include a keyboard and a mouse. The storage device is an HDD 102 or the like connected to the USB terminal as in this example. The HDDs 101 and 102 can use various storage areas depending on the settings.

ＨＤＤ１０１には、電子番組表（ＥＰＧ）からユーザが個別に指定した番組を予約録画または手動録画するように設定し、ＨＤＤ１０２には、ユーザが予め指定した特定のチャンネル（放送事業者や配信事業者）および所定の時間帯の番組を一定期間の間、全て記録するタイムシフトマシン機能（全番組録画機能：「全録機能」または「ループ録画機能」ともいう）による録画を設定可能である。またこの逆の設定も可能である。 The HDD 101 is set to schedule or manually record a program individually specified by the user from the electronic program guide (EPG), and the HDD 102 is set to a specific channel (broadcaster or distributor) specified in advance by the user. ) And a time shift machine function (also referred to as "all program recording function" or "loop recording function") that records all programs in a predetermined time zone for a certain period of time can be set. The reverse setting is also possible.

なお、この例では、機器内部にＨＤＤ１０１を備え、機器外部にＨＤＤ１０２を接続した例を説明したが、外部接続のＨＤＤ１０２を複数接続してもよい。 In this example, the HDD 101 is provided inside the device and the HDD 102 is connected to the outside of the device. However, a plurality of externally connected HDD 102s may be connected.

通信Ｉ／Ｆ７３は、制御モジュール６５により制御されて、外部ネットワークＮＴＷへのアクセスと外部ネットワークＮＴＷ上のさまざまなサービスサーバ（音声認識によるコンテンツの検索サービスを提供するサーバ２００、２０１等）との通信を行う。具体的には、通信Ｉ／Ｆ７３は、制御モジュール６５により制御されて、情報の取得のための検索要求（入力情報の送信）や検索の結果の受け取り（情報の取得）等を行う。 The communication I / F73 is controlled by the control module 65 to access the external network NTW and communicate with various service servers (servers 200, 201, etc. that provide a content search service by voice recognition) on the external network NTW. I do. Specifically, the communication I / F 73 is controlled by the control module 65 to perform a search request (transmission of input information) for acquisition of information, reception of search results (acquisition of information), and the like.

サーバ２００は、テレビ番組の視聴や録画予約、録画済みのコンテンツの履歴保管等に利用する番組情報を管理し、ＡＩアシスタント機能の、発話（音声）による番組の検索や番組に関連するコンテンツの検索サービス（以下「Ａサービス」、「第１検索サービス」等という）を行う。 The server 200 manages program information used for watching TV programs, making recording reservations, storing a history of recorded contents, etc., and using the AI assistant function to search for programs by utterance (voice) and search for contents related to the programs. Provides services (hereinafter referred to as "A service", "first search service", etc.).

サーバ２０１は、ＡＩアシスタント機能の、発話（音声）によるインターネット上のコンテンツの検索サービス（以下「Ｂサービス」、「第２検索サービス」などという）を提供するコンピュータであり、交通情報、気象情報、インターネット番組、辞書等、広い範囲でのコンテンツの検索が可能である。 The server 201 is a computer that provides an AI assistant function for searching content on the Internet by utterance (voice) (hereinafter referred to as "B service", "second search service", etc.), and is used for traffic information, weather information, and the like. It is possible to search a wide range of contents such as Internet programs and dictionaries.

これらのサービスサーバのサービスは、音声での検索だけでなく、音声を文字化した文字データでの検索にも対応している。ここではデジタルの音声信号やその文字データを含めて音声データという。 The services of these service servers support not only the search by voice but also the search by character data in which voice is converted into characters. Here, it is called audio data including digital audio signals and their character data.

制御モジュール６５は、この装置の動作を司る制御プログラムを格納したＲＯＭ（read only memory）６６、信号やデータを処理する際の作業エリアを提供するＲＡＭ（random access memory）６７、録画予約情報、各種の設定情報、及び制御情報等が格納されるフラッシュメモリ６８、設定部６９、録音部７０、音声認識部７１、制御部７２等を有しており、上記した信号処理等を含む記録再生装置本体１００の全ての機能（放送受信機能、番組の録画及び再生機能、設定機能、ボイス機能、ネットワークとの通信機能）及び動作を統括的に制御する。ボイス機能とは、音声／文字変換機能及び構文解析機能を含む音声認識部７１の音声認識機能である。 The control module 65 includes a ROM (read only memory) 66 that stores a control program that controls the operation of this device, a RAM (random access memory) 67 that provides a work area for processing signals and data, recording reservation information, and various types. A flash memory 68, a setting unit 69, a recording unit 70, a voice recognition unit 71, a control unit 72, etc., which store the setting information and control information of the above, and a recording / playback device main body including the above-mentioned signal processing and the like. It comprehensively controls all 100 functions (broadcast reception function, program recording / playback function, setting function, voice function, communication function with network) and operation. The voice function is a voice recognition function of the voice recognition unit 71 including a voice / character conversion function and a syntax analysis function.

これにより、記録再生装置本体１００は、放送受信機能により地上デジタル放送を受信し、録画機能によりＨＤＤ１０１、１０２に記録した番組（音声を含む映像データ）を再生機能で再生することで、ユーザは番組を視聴可能になる。また、記録再生装置本体１００は、ホームネットワークに接続することで、ホームネットワークに接続された他のレコーダあるいはホームサーバーに保存（記録）された番組を再生できる。 As a result, the recording / playback device main body 100 receives the terrestrial digital broadcast by the broadcast receiving function, and reproduces the program (video data including audio) recorded in the HDDs 101 and 102 by the recording function by the playback function, so that the user can use the program. Will be available for viewing. Further, by connecting to the home network, the recording / reproducing device main body 100 can reproduce a program stored (recorded) in another recorder or a home server connected to the home network.

フラッシュメモリ６８には、予約録画機能により予約録画するための録画予約テーブルや個別の番組の録画予約テーブル、録画された番組の属性情報である録画情報、ボイス機能の設定情報等が記憶されている。設定情報は、予め設定されている場合もあり、設定部６９により表示される設定メニュー画面からユーザの選択操作により設定される場合もある。設定情報には、一つ以上のサービスサーバ（サーバ２００、２０１等）による検索サービスの中からいずれかを選定するための選定条件が含まれる。 The flash memory 68 stores a recording reservation table for scheduled recording by the reserved recording function, a recording reservation table for individual programs, recording information which is attribute information of recorded programs, setting information of voice function, and the like. .. The setting information may be set in advance, or may be set by a user's selection operation from the setting menu screen displayed by the setting unit 69. The setting information includes selection conditions for selecting one from search services by one or more service servers (servers 200, 201, etc.).

すなわち、フラッシュメモリ６８は、２つのマイク２４、８１のうちいずれかをアクティブ（動作状態）または非アクティブ（動作停止状態）にするための条件、または２つのマイク２４、８１で取得される２つの音声のうちいずれかを利用するための条件を記憶した記憶部といえる。 That is, the flash memory 68 is a condition for making any one of the two microphones 24 and 81 active (operating state) or inactive (operating stopped state), or two acquired by the two microphones 24 and 81. It can be said that it is a storage unit that stores the conditions for using any of the voices.

設定部６９は、フラッシュメモリ６８に設定情報を設定するための画面を表示し、ユーザによる設定操作の後、確定した設定情報をフラッシュメモリ６８に記憶する。 The setting unit 69 displays a screen for setting the setting information in the flash memory 68, and stores the confirmed setting information in the flash memory 68 after the setting operation by the user.

録音部７０は、ＢＴ通信部１（第１音声取得部）により取得された第１音声及び入力音声処理部６４（第２音声取得部）により取得された第２音声をフラッシュメモリ６８またはＨＤＤ１０１等に記憶（録音）する。 The recording unit 70 uses the flash memory 68, HDD 101, etc. for the first voice acquired by the BT communication unit 1 (first voice acquisition unit) and the second voice acquired by the input voice processing unit 64 (second voice acquisition unit). To memorize (record).

音声認識部７１は、録音部７０により録音された音声をフラッシュメモリ６８またはＨＤＤ１０１等から読み出して解析、つまり音声認識処理する。 The voice recognition unit 71 reads the voice recorded by the recording unit 70 from the flash memory 68, the HDD 101, or the like and analyzes it, that is, performs voice recognition processing.

なお、記録再生装置本体１００の処理能力が高ければ、録音された音声を読み出して処理するのではなく、ＢＴ通信部２６により受信されるリモコン２０からの音声（第１音声）または本体マイク８１により集音される音声（第２音声）をリアルタイムに解析してもよい。音声を解析するとは、音声（ユーザが発した声）を文字化し、文字化した音声データを予め設定されている解析用の辞書を用いて構文解析し、単語や意味のある文字、または文字列（キーワード）を抽出する音声認識処理をいう。 If the processing capacity of the recording / playback device main body 100 is high, the recorded voice is not read out and processed, but the voice from the remote controller 20 (first voice) received by the BT communication unit 26 or the main body microphone 81 is used. The collected voice (second voice) may be analyzed in real time. To analyze a voice is to convert the voice (voice uttered by the user) into characters, analyze the syntax of the converted voice data using a preset analysis dictionary, and analyze words, meaningful characters, or character strings. Refers to voice recognition processing that extracts (keywords).

制御部７２は、リモコン２０のマイク２４の第１音声および本体マイク８１の第２音声のうち予め設定された条件に合致する音声を音声認識部７１へ入力し音声認識処理させる。 The control unit 72 inputs to the voice recognition unit 71 a voice that matches a preset condition among the first voice of the microphone 24 of the remote control 20 and the second voice of the main body microphone 81, and performs voice recognition processing.

ここで、条件とは、以下の、「１．」〜「３．」の条件がある。
「１．」の条件…例えばリモコン２０のボイスボタン２１ｂの操作により信号が受信された場合、本体マイク８１の動作を停止させる、
「２．」の条件…リモコン２０のボイスボタン２１ｂの操作により信号が受信された場合、リモコン２０から得られた第１音声を音声認識部７１に認識させる、
「３．」の条件…録音された２つの音声のうち音質の良い方の音声を使用する、
等である。 Here, the conditions include the following conditions "1." to "3.".
Condition of "1." ... For example, when a signal is received by operating the voice button 21b of the remote controller 20, the operation of the main body microphone 81 is stopped.
Condition of "2." ... When a signal is received by operating the voice button 21b of the remote controller 20, the voice recognition unit 71 is made to recognize the first voice obtained from the remote controller 20.
Condition of "3." ... Use the voice with the better sound quality of the two recorded voices.
And so on.

制御部７２は、ＲＯＭ６６に保持されている制御プログラムをＲＡＭ６７が提供する作業エリアに呼び出し、呼び出した制御プログラムに基づき、入力信号や制御信号に対応する処理を実行する。 The control unit 72 calls the control program held in the ROM 66 into the work area provided by the RAM 67, and executes the processing corresponding to the input signal and the control signal based on the called control program.

制御部７２は、例えば記録再生機能、ボイス機能を制御し、コンテンツ（番組）に関連する、さまざまな情報（属性情報）を取得する。 The control unit 72 controls, for example, a recording / playback function and a voice function, and acquires various information (attribute information) related to the content (program).

制御部７２は、操作部１６からの操作情報（制御入力）やＩＲ受信部１８が受信するリモコン２０からの操作情報（制御入力）に基づいて、この装置の各部（設定部６９、録音部７０、音声認識部７１等）を制御する。 The control unit 72 is based on the operation information (control input) from the operation unit 16 and the operation information (control input) from the remote controller 20 received by the IR reception unit 18, and each unit (setting unit 69, recording unit 70) of this device. , Voice recognition unit 71, etc.).

また、制御部７２は、各種の設定情報やホームネットワークにおいてホームサーバーと接続する他のレコーダやテレビジョン装置に関する管理情報などをフラッシュメモリ６８に書き込む。 In addition, the control unit 72 writes various setting information and management information about other recorders and television devices connected to the home server in the home network to the flash memory 68.

制御部７２は、例えばユーザによる操作指示（制御入力）あるいは予約録画のための録画予約情報に基づいて記録再生機能を制御し、出力する映像信号や音声信号などを、予め指定された側のＨＤＤ（ＨＤＤ１０１、１０２のうちいずれか）に録画（記録）する。 The control unit 72 controls the recording / playback function based on, for example, an operation instruction (control input) by the user or recording reservation information for scheduled recording, and outputs a video signal, an audio signal, or the like to the HDD on the side designated in advance. Record (record) on (either HDD 101 or 102).

制御部７２は、検索サービスを提供するサービスサーバ（サーバ２００、２０１のうちのいずれか）に、音声認識部７１による認識結果の文字または文字列と、取得された音声（第１音声または第２音声）を用いたコンテンツの検索を行わせ、検索の結果を受け取る。 The control unit 72 sends the character or character string of the recognition result by the voice recognition unit 71 and the acquired voice (first voice or second voice) to the service server (any of the servers 200 and 201) that provides the search service. Have the content searched using (voice) and receive the search results.

つまり制御部７２は、サービスサーバ（サーバ２００、２０１のうちのいずれか）に対して、コンテンツの取得のための検索要求（入力情報の送信）や検索の結果の受け取り（コンテンツの取得）等を行う。 That is, the control unit 72 sends a search request (sending input information) for acquiring the content, receiving the search result (acquiring the content), and the like to the service server (either the server 200 or 201). Do.

詳述すると、制御部７２は、サービスサーバ（サーバ２００、２０１のうちのいずれか）に対して、通信Ｉ／Ｆ７３を介して、音声認識部７１による認識結果の文字または文字列と、取得した音声の少なくとも一部を用いてコンテンツの検索要求を行い、検索要求に対して当該サーバから受信された検索の結果を映像表示部１４に出力する。 More specifically, the control unit 72 has acquired the characters or character strings of the recognition result by the voice recognition unit 71 from the service server (any of the servers 200 and 201) via the communication I / F73. A content search request is made using at least a part of the voice, and the search result received from the server in response to the search request is output to the video display unit 14.

また、制御部７２は、通信Ｉ／Ｆ７３を介して外部のネットワークＮＴＷに接続されるサービスサーバ（サーバ２００、２０１等）と情報を送受信する。さらに、上記制御部７２は、ＵＳＢＩ／Ｆ７６を介してＵＳＢ対応機器と情報伝送を行なう。 Further, the control unit 72 transmits / receives information to / from a service server (servers 200, 201, etc.) connected to the external network NTW via the communication I / F73. Further, the control unit 72 transmits information to a USB compatible device via the USB I / F76.

さらに、制御部７２は、チューナ５１により受信され、選局されたチャネルのコンテンツ（番組）を表示する。また、制御部７２は、フラッシュメモリ６８に記憶された録画予約リストに含まれた録画予約情報を参照し、チューナ５１により受信された信号に基づくコンテンツ（番組）の録画動作を制御する。録画動作には、手動操作による録画等も含まれる。録画動作の際のコンテンツ（番組）の録画先は、例えば機器内部に備えるＨＤＤ１０１、ＵＳＢＩ／Ｆ７６を介して接続されたＨＤＤ１０２などである。 Further, the control unit 72 displays the content (program) of the channel received and selected by the tuner 51. Further, the control unit 72 refers to the recording reservation information included in the recording reservation list stored in the flash memory 68, and controls the recording operation of the content (program) based on the signal received by the tuner 51. The recording operation includes manual recording and the like. The recording destination of the content (program) during the recording operation is, for example, the HDD 101 provided inside the device, the HDD 102 connected via the USB I / F76, or the like.

以下、図２乃至図４を参照して、上記「１．」〜「３．」の条件に応じた動作を説明する。まず、図２のフローチャートを参照してこの記録再生装置１の「１．」の条件に応じた第１動作例を説明する。
この第１動作例の場合、記録再生装置本体１００が起動すると、制御部７２は、本体マイク８１をアクティブにして、本体マイク８１周辺からの集音を行う（図２のステップＳ１０１）。 Hereinafter, the operation according to the above conditions "1." to "3." will be described with reference to FIGS. 2 to 4. First, a first operation example according to the condition of "1." of the recording / reproducing device 1 will be described with reference to the flowchart of FIG.
In the case of this first operation example, when the recording / playback device main body 100 is activated, the control unit 72 activates the main body microphone 81 and collects sound from the vicinity of the main body microphone 81 (step S101 in FIG. 2).

本体マイク８１での集音中、リモコン２０のボイスボタン２１ｂが操作されずに信号が受信されなければ（ステップＳ１０２のＮｏ）、制御部７２は、録音部７０および音声認識部７１を制御して、本体マイク８１により集音された音声を録音し（ステップＳ１０３）、録音した音声を音声認識処理させる（ステップＳ１０４）。 If the voice button 21b of the remote control 20 is not operated and a signal is not received (No in step S102) during sound collection by the main body microphone 81, the control unit 72 controls the recording unit 70 and the voice recognition unit 71. , The voice collected by the main body microphone 81 is recorded (step S103), and the recorded voice is subjected to voice recognition processing (step S104).

そして、制御部７２は、音声認識処理の結果（単語（文字）、文字列、キーワードなど）および音声を基に、予め要求先として設定されたサービスサーバ（サーバ２００、２０１のいずれか）に検索要求を行う（ステップＳ１０５）。検索要求には、録音された音声の少なくとも一部が含まれ、必要に応じて解析結果の単語等が含まれる。 Then, the control unit 72 searches the service server (either server 200 or 201) set in advance as the request destination based on the result of the voice recognition process (word (character), character string, keyword, etc.) and the voice. Make a request (step S105). The search request includes at least a part of the recorded voice, and if necessary, a word of the analysis result and the like.

検索要求を受け取ったサービスサーバ（サーバ２００、２０１のいずれか）では、受け取った音声や単語を基にコンテンツの検索を行い、検索の結果（コンテンツ）を記録再生装置本体１００に送る。 The service server (either the server 200 or 201) that received the search request searches for the content based on the received voice or word, and sends the search result (content) to the recording / playback device main body 100.

記録再生装置本体１００では、サーバから送信された検索の結果（コンテンツ）が受信されると（ステップＳ１０６）、そのコンテンツを映像表示部１４に出力し（ステップＳ１０７）表示する。 When the search result (content) transmitted from the server is received (step S106), the recording / playback device main body 100 outputs the content to the video display unit 14 (step S107) and displays the content.

一方、本体マイク８１での集音中に（ステップＳ１０１）、ユーザによりリモコン２０のボタン２１が操作されると、リモコン２０では、信号処理部２２が、ボタン２１に応じた信号を生成し、生成された信号がＩＲ送信部２３から送信される。 On the other hand, when the button 21 of the remote controller 20 is operated by the user during sound collection by the main body microphone 81 (step S101), the signal processing unit 22 in the remote controller 20 generates a signal corresponding to the button 21 and generates the signal. The signal is transmitted from the IR transmission unit 23.

ここで、例えばリモコン２０の特定のボタンであるボイスボタン２１ｂが押下されると、信号処理部２２がマイク２４をアクティブにし、マイク２４による集音が開始される。 Here, for example, when the voice button 21b, which is a specific button of the remote controller 20, is pressed, the signal processing unit 22 activates the microphone 24, and the sound collection by the microphone 24 is started.

ここで、ユーザがリモコン２０のマイク２４に向かって発話すると、その音声がマイク２４によって収集されて音声処理の後、ＢＴ通信部２６から送信される。 Here, when the user speaks into the microphone 24 of the remote controller 20, the voice is collected by the microphone 24, processed by voice, and then transmitted from the BT communication unit 26.

記録再生装置本体１００では、リモコン２０から送信されたＩＲ信号がＩＲ受信部１８に受信されると（ステップＳ１０２のＹｅｓ）、制御部７２は、その信号がボイスボタン２１ｂの信号か否かを判定する（ステップＳ１０８）。 In the recording / reproducing device main body 100, when the IR signal transmitted from the remote controller 20 is received by the IR receiving unit 18 (Yes in step S102), the control unit 72 determines whether or not the signal is the signal of the voice button 21b. (Step S108).

判定の結果、ボイスボタン２１ｂの信号でなければ（ステップＳ１０８のＮｏ）、その信号に応じた機能の制御を行う（ステップＳ１０９）。 If the result of the determination is not the signal of the voice button 21b (No in step S108), the function is controlled according to the signal (step S109).

一方、受信された信号がボイスボタン２１ｂの信号の場合（ステップＳ１０８のＹｅｓ）、次に、制御部７２は、フラッシュメモリ６８の条件を参照する。この動作のときの条件「１．」は、リモコン２０のボイスボタン２１ｂの操作により信号が受信された場合、本体マイク８１の動作を停止させる、という条件であるため、制御部７２は、本体マイク８１を非アクティブにして（ステップＳ１１０）、本体マイク８１による第２音声の集音を停止する。 On the other hand, when the received signal is the signal of the voice button 21b (Yes in step S108), the control unit 72 then refers to the condition of the flash memory 68. Since the condition "1." at the time of this operation is a condition that the operation of the main body microphone 81 is stopped when a signal is received by the operation of the voice button 21b of the remote controller 20, the control unit 72 controls the main body microphone. The 81 is deactivated (step S110), and the collection of the second voice by the main body microphone 81 is stopped.

そして、リモコン２０からの第１音声が受信されると（ステップＳ１１１）、制御部７２は、録音部７０を制御して、リモコン２０からの第１音声を録音する（ステップＳ１１２）。 Then, when the first voice from the remote controller 20 is received (step S111), the control unit 72 controls the recording unit 70 to record the first voice from the remote controller 20 (step S112).

この第１動作例によれば、記録再生装置本体１００に設定部６９、録音部７０、音声認識部７１及び制御部７２を設け、リモコン２０のボイスボタン２１ｂが押下されその信号が受信された場合、本体マイク８１を非アクティブにしてリモコン２０のマイク２４から取得された第１音声を音声認識処理に用いることで、音声認識の精度を高めることができる。 According to this first operation example, when the recording / reproducing device main body 100 is provided with the setting unit 69, the recording unit 70, the voice recognition unit 71 and the control unit 72, and the voice button 21b of the remote control 20 is pressed and the signal is received. By deactivating the main body microphone 81 and using the first voice acquired from the microphone 24 of the remote control 20 for the voice recognition process, the accuracy of voice recognition can be improved.

例えば、通常は、本体マイク８１での集音および音声認識処理を行い、リモコン２０のボイスボタン２１ｂが押下されて録音開始のトリガ信号が受信された場合、制御部７２は、そのトリガをきっかけに本体マイク８１を非アクティブに、リモコン２０のマイク２４をアクティブにして、話者との距離が近いリモコン２０で集音された第１音声を音声認識処理に用いることで、リモコン２０を操作した話者（ユーザ）の高品質の音声を取得して音声認識処理を高精度に行うことができる。 For example, normally, when sound collection and voice recognition processing are performed by the main body microphone 81 and the voice button 21b of the remote control 20 is pressed and a trigger signal for starting recording is received, the control unit 72 triggers the trigger. A story in which the remote control 20 is operated by deactivating the main body microphone 81, activating the microphone 24 of the remote control 20, and using the first voice collected by the remote control 20 that is close to the speaker for voice recognition processing. It is possible to acquire high-quality voice of a person (user) and perform voice recognition processing with high accuracy.

次に、図３のフローチャートを参照してこの記録再生装置１の「２．」の条件に応じた第２動作例を説明する。なおこの第２動作例において、第１動作例と同じ動作には同一の符号を付しその説明は省略する。 Next, a second operation example according to the condition of “2.” of the recording / reproducing device 1 will be described with reference to the flowchart of FIG. In this second operation example, the same operation as that of the first operation example is designated by the same reference numerals, and the description thereof will be omitted.

この第２動作例の場合、記録再生装置本体１００が起動すると、制御部７２は、本体マイク８１をアクティブにして、本体マイク８１周辺からの集音を行う（図３のステップＳ１０１）。 In the case of this second operation example, when the recording / playback device main body 100 is activated, the control unit 72 activates the main body microphone 81 and collects sound from the vicinity of the main body microphone 81 (step S101 in FIG. 3).

本体マイク８１での集音中、リモコン２０のボイスボタン２１ｂが操作されずに信号が受信されなければ（ステップＳ１０２のＮｏ）、制御部７２は、第１動作例と同様に動作する（ステップＳ１０３〜Ｓ１０７）。 If the voice button 21b of the remote controller 20 is not operated and a signal is not received (No in step S102) during sound collection by the main body microphone 81, the control unit 72 operates in the same manner as in the first operation example (step S103). ~ S107).

そして、ユーザがリモコン２０のマイク２４に向かって発話すると、その音声がマイク２４によって収集されて音声処理の後、ＢＴ通信部２６から送信される。 Then, when the user speaks into the microphone 24 of the remote controller 20, the voice is collected by the microphone 24, processed by voice, and then transmitted from the BT communication unit 26.

一方、受信された信号がボイスボタン２１ｂの信号の場合（ステップＳ１０８のＹｅｓ）、続いて制御部７２は、リモコン２０からの音声の受信を待機し、リモコン２０の音声が受信されると（ステップＳ１２１）、録音部７０を制御して、リモコン２０からの音声を録音する（ステップＳ１２２）。なお、この間も本体マイク８１は、アクティブのままのため、本体マイク８１により集音された音声の録音も続けられる（ステップＳ１０３）。 On the other hand, when the received signal is the signal of the voice button 21b (Yes in step S108), the control unit 72 subsequently waits for the reception of the voice from the remote controller 20, and when the voice of the remote controller 20 is received (step). S121), the recording unit 70 is controlled to record the voice from the remote controller 20 (step S122). Since the main body microphone 81 remains active during this period, the recording of the sound collected by the main body microphone 81 can be continued (step S103).

次に、制御部７２は、フラッシュメモリ６８の条件を参照する。この動作の条件「２．」は、リモコン２０のボイスボタン２１ｂの操作により信号が受信された場合、リモコン２０から得られた第１音声を音声認識部７１に認識させる、という条件であるため、制御部７２は、録音部７０によりそれぞれ録音された２つの音声のうち、リモコン２０から得られた第１音声を音声認識部７１に入力し、音声認識部７１に音声認識処理させる（Ｓ１２３）。以降、音声認識部７１の音声認識結果を用いる動作は第１実施形態と同じである。 Next, the control unit 72 refers to the conditions of the flash memory 68. The condition "2." of this operation is a condition that the voice recognition unit 71 recognizes the first voice obtained from the remote control 20 when a signal is received by operating the voice button 21b of the remote control 20. The control unit 72 inputs the first voice obtained from the remote control 20 out of the two voices recorded by the recording unit 70 to the voice recognition unit 71, and causes the voice recognition unit 71 to perform voice recognition processing (S123). After that, the operation of using the voice recognition result of the voice recognition unit 71 is the same as that of the first embodiment.

この第２動作例によれば、リモコン２０のボイスボタン２１ｂの操作により信号が受信された場合、制御部７２は、録音部７０によりそれぞれ録音された２つの音声（第１音声及び第２音声）のうち、録音されたリモコン２０の第１音声を音声認識部７１に入力し、音声認識部７１に音声認識処理させる。 According to this second operation example, when a signal is received by operating the voice button 21b of the remote control 20, the control unit 72 receives two voices (first voice and second voice) recorded by the recording unit 70, respectively. Of these, the recorded first voice of the remote control 20 is input to the voice recognition unit 71, and the voice recognition unit 71 is made to perform voice recognition processing.

例えば、録音開始のトリガが記録再生装置本体１００の起動かまたはリモコン２０のボイス釦１ｂの押下であった場合に、そのトリガをきっかけに本体マイク８１の第２音声の録音とリモコン２０のマイク２４の第１音声の録音とを同時に行う。そして、トリガ発生元が話者（ユーザ）との距離が近いリモコン２０であれば、リモコン２０のマイク２４により集音された音声を取得して音声認識処理を行う。このように同時に録音した複数の音声の中から、話者との距離が近くリモコン２０から高品質の音声を認識処理することで音声の認識精度を高めることができる。 For example, when the trigger for starting recording is the activation of the recording / playback device main body 100 or the pressing of the voice button 1b of the remote controller 20, the trigger triggers the recording of the second voice of the main body microphone 81 and the microphone 24 of the remote controller 20. The recording of the first voice of is performed at the same time. Then, if the trigger generation source is the remote controller 20 that is close to the speaker (user), the voice collected by the microphone 24 of the remote controller 20 is acquired and the voice recognition process is performed. The voice recognition accuracy can be improved by recognizing and processing high-quality voice from the remote controller 20 because the distance to the speaker is short from the plurality of voices recorded at the same time.

次に、図４のフローチャートを参照してこの記録再生装置１の「３．」の条件に応じた第３動作例を説明する。なおこの第３動作例において、第２動作例と同じ動作には同一の符号を付しその説明は省略する。 Next, a third operation example according to the condition of “3.” of the recording / reproducing device 1 will be described with reference to the flowchart of FIG. In this third operation example, the same operation as that of the second operation example is designated by the same reference numerals, and the description thereof will be omitted.

この第３動作例の場合、記録再生装置本体１００が起動してから、各マイクにより集音される音声を録音するまでの動作は第２動作例と同じてあり、その説明は省略する。 In the case of this third operation example, the operation from the start of the recording / playback device main body 100 to the recording of the sound collected by each microphone is the same as that of the second operation example, and the description thereof will be omitted.

制御部７２は、２つの音声がそれぞれ録音される中、フラッシュメモリ６８の条件を参照する。この動作の条件「３．」は、録音された２つの音声のうち音質の良い方の音声を使用する、という条件であるため、制御部７２は、録音部７０によりそれぞれ録音された２つの音声に対して音質チェックを行い、音質チェックした２つの音声のうち、音声認識率の高い方の音声を音声認識部７１に入力し、音声認識部７１に音声認識処理させる（Ｓ１３１、Ｓ１３２）。以降、音声認識部７１の音声認識結果を用いる動作は第１および第２実施形態と同じである。 The control unit 72 refers to the condition of the flash memory 68 while the two sounds are recorded respectively. Since the condition "3." of this operation is that the voice having the better sound quality is used among the two recorded voices, the control unit 72 uses the two voices recorded by the recording unit 70, respectively. Of the two voices whose sound quality has been checked, the voice having the higher voice recognition rate is input to the voice recognition unit 71, and the voice recognition unit 71 is made to perform voice recognition processing (S131, S132). After that, the operation of using the voice recognition result of the voice recognition unit 71 is the same as that of the first and second embodiments.

この第３動作例によれば、リモコン２０のマイク２４及び本体マイク８１からそれぞれ取得され録音した複数の音声（第１音声及び第２音声）それぞれの品質を制御部７２がチェックし、録音した複数の音声のうち最も品質の良い音声を音声認識処理に用いるので、音声認識の精度を高めることができる。 According to this third operation example, the control unit 72 checks and records the quality of each of the plurality of voices (first voice and second voice) acquired and recorded from the microphone 24 of the remote control 20 and the main body microphone 81, respectively. Since the highest quality voice is used for the voice recognition process, the accuracy of voice recognition can be improved.

なお、この第３動作例では、マイク２４の集音を開始するトリガを第２動作例と同じとしたが、集音自体は、それぞれのマイクで常時行い、音声認識処理のタイミングをリモコン２０のボイスボタン２１ｂが押下されたとき、つまりボイスボタン２１ｂの信号が受信されたときとしてもよい。 In this third operation example, the trigger for starting the sound collection of the microphone 24 is the same as that of the second operation example, but the sound collection itself is always performed by each microphone, and the timing of the voice recognition process is set by the remote controller 20. It may be when the voice button 21b is pressed, that is, when the signal of the voice button 21b is received.

以上、説明したようにこの実施形態の記録再生装置１によれば、リモコン２０（外部端末）と記録再生装置本体１００（電子機器）の両方にマイク（集音部）を設けて集音し、集音した音声のうち「１．」〜「３．」の条件に合致する音声を音声認識処理に使用するよう構成することで、操作者（話者）の指示操作性を高めつつ、話者の状況によって複数のマイク２４、８１を使い分けてそれぞれのマイク２４、８１で収集した音声を活用することができる。
また、本実施形態では、話者の状況によって複数のマイク２４、８１を使い分けることにより、例えば話者に近いマイク２４に集音部を切り替えることで、高音質な音声データを取得できる。この他、リモコン２０のマイク２４で集音中に本体マイク８１が誤反応してしまうことを避けることができる、という効果も得られる。 As described above, according to the recording / reproducing device 1 of this embodiment, microphones (sound collecting units) are provided on both the remote controller 20 (external terminal) and the recording / reproducing device main body 100 (electronic device) to collect sound. By configuring the collected sounds that meet the conditions of "1." to "3." to be used for voice recognition processing, the speaker can improve the instruction operability of the operator (speaker). Depending on the situation, a plurality of microphones 24 and 81 can be used properly and the sound collected by the respective microphones 24 and 81 can be utilized.
Further, in the present embodiment, high-quality voice data can be acquired by properly using a plurality of microphones 24 and 81 depending on the situation of the speaker, for example, by switching the sound collecting unit to the microphone 24 close to the speaker. In addition, it is possible to prevent the main body microphone 81 from reacting erroneously while collecting sound with the microphone 24 of the remote controller 20.

なお、上記実施形態では、記録再生装置本体１００とリモコン２０にそれぞれマイク２４、８１等を設けた例を示したが、複数の外部端末（第１リモコン及び第２リモコン）それぞれマイクを設けて各リモコンから複数の音声を記録再生装置本体１００に伝送するようにしてもよい。 In the above embodiment, an example in which microphones 24, 81, etc. are provided on the recording / playback device main body 100 and the remote controller 20, respectively, is shown, but each of a plurality of external terminals (first remote controller and second remote controller) is provided with microphones. A plurality of voices may be transmitted from the remote controller to the recording / playback device main body 100.

すなわち、第１リモコンのマイクにより集音された第１音声と、第２リモコンのマイクにより集音された第２音声とを記録再生装置本体１００が取得し、記録再生装置本体１００内部で予め設定した条件に合致する音声を選択して音声認識処理に利用するよう構成してもよい。 That is, the recording / playback device main body 100 acquires the first sound collected by the microphone of the first remote controller and the second sound collected by the microphone of the second remote controller, and is preset inside the recording / playback device main body 100. A voice that matches the above-mentioned conditions may be selected and used for voice recognition processing.

本発明の実施形態を説明したが、この実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。この新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. This novel embodiment can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The above-described embodiment and its modifications are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

また上記実施形態に示した記録再生装置１の各構成要素を、コンピュータのハードディスク装置などのストレージにインストールしたプログラムで実現してもよく、また上記プログラムを、コンピュータ読取可能な電子媒体：electronic mediaに記憶しておき、プログラムを電子媒体からコンピュータに読み取らせることで本発明の機能をコンピュータが実現するようにしてもよい。 Further, each component of the recording / reproducing device 1 shown in the above embodiment may be realized by a program installed in a storage such as a hard disk device of a computer, or the above program may be realized by a computer-readable electronic medium: electronic media. The function of the present invention may be realized by the computer by storing the program and causing the computer to read the program from the electronic medium.

電子媒体としては、例えばＣＤ−ＲＯＭ等の記録媒体やフラッシュメモリ、リムーバブルメディア：Removable media等が含まれる。さらに、ネットワークを介して接続した異なるコンピュータに構成要素を分散して記憶し、各構成要素を機能させたコンピュータ間で通信することで実現してもよい。 Examples of electronic media include recording media such as CD-ROMs, flash memories, removable media: removable media, and the like. Further, it may be realized by distributing and storing the components in different computers connected via the network and communicating each component between the functioning computers.

１…記録再生装置、１４…映像表示部、１５…スピーカ、１６…操作部、１８…ＩＲ受信部、１９…ＢＴ通信部、２０…リモートコントローラ（リモコン）、２１…ボタン、２１ａ…設定ボタン、２１ｂ…ボイスボタン、２２…信号処理部、２３…ＩＲ送信部、２４…マイク、２５…音声処理部、２６…ＢＴ通信部、５０…アンテナ、５１…チューナ、５２…ＯＦＤＭ復調器、５３…信号処理部、５８…グラフィック処理部、５９…音声処理部、６１…ＯＳＤ信号生成部、６２…映像処理部、６４…入力音声処理部、６５…制御モジュール、６８…フラッシュメモリ、６９…設定部、７０…録音部、７１…音声認識部、７２…制御部、７３…通信インターフェース（通信Ｉ／Ｆ）、７６…ＵＳＢインターフェース（ＵＳＢＩ／Ｆ）、８１…本体マイク、１００…記録再生装置本体、１０１、１０２…ハードディスクドライブ（ＨＤＤ）、２００、２０１…サーバ、ＮＴＷ…ネットワーク。 1 ... Recording / playback device, 14 ... Video display unit, 15 ... Speaker, 16 ... Operation unit, 18 ... IR receiving unit, 19 ... BT communication unit, 20 ... Remote controller (remote control), 21 ... Button, 21a ... Setting button, 21b ... Voice button, 22 ... Signal processing unit, 23 ... IR transmission unit, 24 ... Microphone, 25 ... Voice processing unit, 26 ... BT communication unit, 50 ... Antenna, 51 ... Tuner, 52 ... OFDM demodulator, 53 ... Signal Processing unit, 58 ... Graphic processing unit, 59 ... Audio processing unit, 61 ... OSD signal generation unit, 62 ... Video processing unit, 64 ... Input audio processing unit, 65 ... Control module, 68 ... Flash memory, 69 ... Setting unit, 70 ... Recording unit, 71 ... Voice recognition unit, 72 ... Control unit, 73 ... Communication interface (communication I / F), 76 ... USB interface (USB I / F), 81 ... Main unit microphone, 100 ... Recording / playback device main unit, 101, 102 ... Hard disk drive (HDD), 200, 201 ... Server, NTW ... Network.

Claims

An electronic device that is wirelessly or wiredly connected to an external terminal having a first sound collecting unit that collects the first sound around itself.
A first voice acquisition unit that acquires the first sound collected by the first sound collection unit of the external terminal from the external terminal, and
The second sound collecting part that collects the second sound around you,
A second sound acquisition unit that acquires the second sound collected by the second sound collection unit, and a second sound acquisition unit.
A voice recognition unit that performs voice recognition processing on the input voice,
An electronic device including a control unit that inputs a voice that matches a preset condition among the first voice and the second voice into the voice recognition unit and performs voice recognition processing.

A recording unit for recording the first voice acquired by the first voice acquisition unit and the second voice acquired by the second voice acquisition unit is provided.
The control unit
The electronic device according to claim 1, wherein the voice recognition unit recognizes a voice that meets the above conditions among the first voice and the second voice recorded by the recording unit.

A receiver for receiving an instruction signal transmitted from the external terminal is provided.
The condition is the reception of a specific instruction signal from the external terminal.
The control unit
When a specific instruction signal from the external terminal is received by the receiving unit, the first voice obtained from the external terminal among the obtained first voice and the second voice is sent to the voice recognition unit. The electronic device according to claim 2, further comprising a control unit for (inputting) recognition.

A receiver for receiving an instruction signal transmitted from the external terminal is provided.
The condition is the reception of a specific instruction signal from the external terminal.
The control unit
When a specific instruction signal from the external terminal is received by the receiving unit, the operation of the second sound collecting unit is stopped (the first voice obtained from the external terminal is subjected to voice recognition processing). , The electronic device according to claim 1.

The above condition is a condition that "the voice having the higher voice recognition rate is used".
The control unit
The electronic device according to claim 1 or 2, wherein the voice recognition unit performs voice recognition processing on the voice having the higher voice recognition rate among the first voice and the second voice.

A program that operates an electronic device that is wirelessly or wiredly connected to an external terminal that has a first sound collecting unit that collects the first sound around itself.
The electronic device
A first voice acquisition unit that acquires the first sound collected by the first sound collection unit of the external terminal from the external terminal, and
A second sound acquisition unit that acquires a second sound collected from the surroundings of the second sound collection unit by a second sound collection unit provided in the electronic device, and a second sound acquisition unit.
A voice recognition unit that performs voice recognition processing on the input voice,
A program that functions as a control unit that inputs a voice that matches a preset condition among the first voice and the second voice into the voice recognition unit and performs voice recognition processing.

A voice recognition method in an electronic device that is wirelessly or wiredly connected to an external terminal having a first sound collecting unit that collects the first voice around itself.
The first sound collected by the first sound collecting unit of the external terminal is acquired from the external terminal, and the first sound is acquired.
The second sound collection unit provided in the electronic device acquires the second sound collected from the surroundings of the second sound collection unit.
A voice recognition method for performing voice recognition processing on voices that meet preset conditions among the first voice and the second voice.