JP2007286174A

JP2007286174A - Electronic apparatus

Info

Publication number: JP2007286174A
Application number: JP2006110940A
Authority: JP
Inventors: Mayumi Kaneko; 真由美金子; Shusuke Narita; 修輔成田
Original assignee: Funai Electric Co Ltd
Current assignee: Funai Electric Co Ltd
Priority date: 2006-04-13
Filing date: 2006-04-13
Publication date: 2007-11-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide an electronic apparatus capable of surely performing voice recognition by selecting a sound model most suitable for each user. <P>SOLUTION: In a digital television receiver 100, a sound model pattern corresponding to identification information extracted by an identification information extracting means (a CPU121, an identification information extraction program 123c) is acquired by a sound model pattern acquiring means (the CPU 121, a sound model pattern acquiring means 123d). The voice recognition of voice information acquired by a voice information acquiring section 11 is performed by using the acquired sound model pattern, and command information is extracted by a command information extracting means (the CPU121, a command information extracting program 123e), and control is performed by a control means based on the extracted command information. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声認識機能により各種制御を行うことができる電子機器に関する。 The present invention relates to an electronic apparatus capable of performing various controls by a voice recognition function.

近年、テレビやビデオ等の電子機器の多機能化・高機能化が進むに伴い、様々な機能に対応し、かつ簡易な操作が可能なものが求められ、現在では、このような操作に音声認識技術を用いた電子機器が利用されている。 In recent years, as electronic devices such as televisions and videos have become more multifunctional and sophisticated, it has become necessary to support various functions and enable simple operation. Electronic devices using recognition technology are used.

音声認識技術を用いた電子機器には、例えば、留守番電話機能付ファクシミリ装置があり、通常の操作補助として音声を入力する際には、予め定められた数の単語を基に音声認識を行い、ＩＤの照合など特定話者の音声認識を行う場合は、予め、使用するフレーズの音声データを登録しておき、音声認識を行う機器がある（特許文献１参照）。 Electronic devices using voice recognition technology include, for example, a facsimile machine with an answering machine function, and when voice is input as a normal operation aid, voice recognition is performed based on a predetermined number of words, When performing speech recognition of a specific speaker such as ID verification, there is a device that registers speech data of a phrase to be used in advance and performs speech recognition (see Patent Document 1).

また、機器番号及びそれに対応する発音表記を表形式で格納する音声標本−機器番号対応テーブルを備え、音声認識結果を用いて前記音声標本−機器番号対応テーブルを検索し、対応する機器番号を検出することで被制御機器の特定を行う装置がある（特許文献２参照）。 It also has a voice sample-equipment number correspondence table that stores device numbers and corresponding phonetic notation in tabular form, and searches the voice sample-equipment number correspondence table using speech recognition results to detect corresponding device numbers. By doing so, there is an apparatus for specifying a controlled device (see Patent Document 2).

また、操作装置が、入力された音声と電子機器へのコマンドを多重化した多重化信号を生成して電子機器に送信し、電子機器が、受信した多重化信号より、音声とコマンドを分離し、分離した音声を認識し、認識した認識内容、および、分離されたコマンドに対応して、自らの動作を制御するようにした機器がある（特許文献３参照）。 In addition, the controller device generates a multiplexed signal obtained by multiplexing the input voice and the command to the electronic device and transmits the multiplexed signal to the electronic device. The electronic device separates the voice and the command from the received multiplexed signal. There is a device that recognizes a separated voice and controls its own operation in accordance with the recognized content and the separated command (see Patent Document 3).

また、音声認識機能を有する小型形状の携帯型リモートコントロール装置であって、音声によりコマンドを入力することにより、音声コマンドに対応するコマンドを無線で制御対象である機器に対して送信し、当該機器を制御する装置がある（特許文献４参照）。
なお、一般に、音声認識技術は、特定話者音声認識と不特定話者音声認識との２つに分けることができる。特定話者音声認識技術は、利用者がある特定の個人であるという前提に基づき、その利用者の個人の声から音響的モデルを作成して音声認識を行う技術である。このため、利用者に対してだけの精密な音響モデルを作成することができるので、高い認識率を得ることができる。一方、不特定話者音声認識技術は、利用者が不特定であるという前提に基づき、一般的な音響モデルを予め作成しておき、音声認識を行う技術である。このため、特定話者音声認識に比較して認識性能は劣るものの、誰でも使用可能な音声認識をはじめから提供できる特徴がある。
特開２００１−３０９０８０号公報特開２００３−３３０４８３号公報特開２００３−２５００６１号公報特開２００３−１１４６９４号公報 Also, a small-sized portable remote control device having a voice recognition function, by inputting a command by voice, a command corresponding to the voice command is wirelessly transmitted to a device to be controlled, and the device There is a device that controls the above (see Patent Document 4).
In general, the speech recognition technology can be divided into two types: specific speaker speech recognition and non-specific speaker speech recognition. The specific speaker voice recognition technology is a technology that performs voice recognition by creating an acoustic model from a voice of a user's individual based on the premise that the user is a specific individual. For this reason, since a precise acoustic model only for the user can be created, a high recognition rate can be obtained. On the other hand, the unspecified speaker speech recognition technology is a technology for performing speech recognition by creating a general acoustic model in advance based on the premise that the user is unspecified. For this reason, although the recognition performance is inferior to the specific speaker voice recognition, there is a feature that voice recognition that anyone can use is provided from the beginning.
JP 2001-309080 A JP 2003-330483 A JP 2003-250061 A JP 2003-114694 A

しかしながら、音響モデルは、子供・大人などの話者層、電話や自動車内などの使用環境に応じて大きく異なり、特許文献１〜４に開示された音声認識機能を用いた電子機器では、各ユーザに最適な音声信号の周波数パターンに関する音響モデルを選択する構成ではないので、音声認識が正しく行われない場合もあった。 However, the acoustic model varies greatly depending on the use of speakers, children, adults, etc., the usage environment such as in a telephone or a car, and in the electronic device using the voice recognition function disclosed in Patent Documents 1 to 4, each user Since the acoustic model relating to the frequency pattern of the voice signal that is most suitable for this is not selected, voice recognition may not be performed correctly.

本発明の課題は、各ユーザに最適な音響モデルを選択し、より確実に音声認識を行うことができる電子機器を提供することである。 An object of the present invention is to provide an electronic device that can select an optimal acoustic model for each user and perform voice recognition more reliably.

上記課題を解決するため、請求項１に記載の発明は、外部から入力される音声情報の音声認識を行い、音声認識結果に基づき得られたコマンド情報により制御される電子機器において、
ユーザを識別する音声情報からなる識別情報と、音声信号の周波数パターンに関する音響モデルパターンとを対応付けて記憶することができ、且つ、一の識別情報に対して複数の音響モデルパターンを対応付けて記憶可能に構成されている音響モデルパターン記憶手段と、
当該電子機器を制御するための音声情報からなるコマンド情報を記憶するコマンド情報記憶手段と、
音声情報を取得する音声情報取得手段と、
前記音声情報取得手段により取得された前記音声情報の音声認識を行い、前記音響モデルパターン記憶手段に記憶された識別情報を抽出する識別情報抽出手段と、
前記識別情報抽出手段により抽出された識別情報に対応する音響モデルパターンを前記音響モデルパターン記憶手段から取得する音響モデルパターン取得手段と、
前記音響モデルパターン取得手段により取得された音響モデルパターンを用いて、前記音声情報取得手段により取得された音声情報の音声認識を行い、前記コマンド情報記憶手段に記憶されたコマンド情報を抽出するコマンド情報抽出手段と、
前記コマンド情報抽出手段により抽出したコマンド情報に基づいて制御を行う制御手段と、
識別情報毎であって、且つ音響モデルパターン毎に、前記コマンド情報抽出手段による音声認識率を算出する算出手段と、
前記算出手段により算出された音声認識率を、識別情報毎であって、且つ音響モデルパターン毎に記憶する音声認識率記憶手段と、
前記コマンド情報抽出手段により抽出されたコマンド情報に対応する前記制御手段による制御内容を表示手段に表示させる表示制御手段と、
前記表示制御手段により前記表示手段に表示された前記制御内容を実行するか、又は、中止するかを指定する指定手段と、を備え、
前記音響モデルパターン取得手段は、前記識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて前記音響モデルパターン記憶手段に記憶されている場合、前記音声認識率記憶手段に記憶された、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得し、
前記制御手段は、前記指定手段により、前記制御内容を実行すると指定された場合、前記コマンド情報に対応する制御を行うことを特徴とする。 In order to solve the above-described problem, the invention according to claim 1 is an electronic device that performs voice recognition of voice information input from the outside and is controlled by command information obtained based on the voice recognition result.
Identification information composed of audio information for identifying a user and an acoustic model pattern related to the frequency pattern of the audio signal can be stored in association with each other, and a plurality of acoustic model patterns can be associated with one identification information. Acoustic model pattern storage means configured to be capable of storage;
Command information storage means for storing command information including voice information for controlling the electronic device;
Audio information acquisition means for acquiring audio information;
Identification information extraction means for performing voice recognition of the voice information acquired by the voice information acquisition means, and extracting identification information stored in the acoustic model pattern storage means;
Acoustic model pattern acquisition means for acquiring an acoustic model pattern corresponding to the identification information extracted by the identification information extraction means from the acoustic model pattern storage means;
Command information for performing voice recognition of the voice information acquired by the voice information acquisition means using the acoustic model pattern acquired by the acoustic model pattern acquisition means and extracting command information stored in the command information storage means Extraction means;
Control means for performing control based on the command information extracted by the command information extraction means;
Calculating means for calculating a speech recognition rate by the command information extracting means for each identification information and for each acoustic model pattern;
Voice recognition rate storage means for storing the voice recognition rate calculated by the calculation means for each identification information and for each acoustic model pattern;
Display control means for causing the display means to display the control content by the control means corresponding to the command information extracted by the command information extraction means;
Specifying means for specifying whether to execute or stop the control content displayed on the display means by the display control means,
When the acoustic model pattern acquisition unit stores a plurality of acoustic model patterns in association with the identification information extracted by the identification information extraction unit and stores them in the acoustic model pattern storage unit, the acoustic model pattern acquisition unit stores Acquired acoustic model patterns in descending order of the speech recognition rate of the stored acoustic model pattern associated with the identification information,
The control means performs control corresponding to the command information when the designation means designates execution of the control content.

請求項２に記載の発明は、外部から入力される音声情報の音声認識を行い、音声認識結果に基づき得られたコマンド情報により制御される電子機器において、
ユーザを識別する音声情報からなる識別情報と、音声信号の周波数パターンに関する音響モデルパターンとを対応付けて記憶する音響モデルパターン記憶手段と、
当該電子機器を制御するための音声情報からなるコマンド情報を記憶するコマンド情報記憶手段と、
音声情報を取得する音声情報取得手段と、
前記音声情報取得手段により取得された前記音声情報の音声認識を行い、前記音響モデルパターン記憶手段に記憶された識別情報を抽出する識別情報抽出手段と、
前記識別情報抽出手段により抽出された識別情報に対応する音響モデルパターンを前記音響モデルパターン記憶手段から取得する音響モデルパターン取得手段と、
前記音響モデルパターン取得手段により取得された音響モデルパターンを用いて、前記音声情報取得手段により取得された音声情報の音声認識を行い、前記コマンド情報記憶手段に記憶されたコマンド情報を抽出するコマンド情報抽出手段と、
前記コマンド情報抽出手段により抽出したコマンド情報に基づいて制御を行う制御手段と、を備えることを特徴とする。 The invention according to claim 2 is an electronic device that performs voice recognition of voice information input from the outside and is controlled by command information obtained based on a voice recognition result.
Acoustic model pattern storage means for storing identification information composed of voice information for identifying a user and an acoustic model pattern related to a frequency pattern of the voice signal in association with each other;
Command information storage means for storing command information including voice information for controlling the electronic device;
Audio information acquisition means for acquiring audio information;
Identification information extraction means for performing voice recognition of the voice information acquired by the voice information acquisition means, and extracting identification information stored in the acoustic model pattern storage means;
Acoustic model pattern acquisition means for acquiring an acoustic model pattern corresponding to the identification information extracted by the identification information extraction means from the acoustic model pattern storage means;
Command information for performing voice recognition of the voice information acquired by the voice information acquisition means using the acoustic model pattern acquired by the acoustic model pattern acquisition means and extracting command information stored in the command information storage means Extraction means;
Control means for performing control based on the command information extracted by the command information extraction means.

請求項３に記載の発明は、請求項２に記載の発明において、
前記音響モデルパターン記憶手段は、一の識別情報に対して複数の音響モデルパターンを対応付けて記憶可能に構成され、
前記音響モデルパターン取得手段は、前記識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて前記音響モデルパターン記憶手段に記憶されている場合、当該識別情報に対応付けられた一の音響モデルパターンを取得し、前記コマンド情報抽出手段により、前記音声情報の音声認識を行った結果、コマンド情報を抽出できなかった場合には、当該識別情報に対応付けられた他の一の音響モデルパターンを取得することを特徴とする。 The invention according to claim 3 is the invention according to claim 2,
The acoustic model pattern storage means is configured to be capable of storing a plurality of acoustic model patterns in association with one identification information,
The acoustic model pattern acquisition unit is associated with the identification information when a plurality of acoustic model patterns are associated with the identification information extracted by the identification information extraction unit and stored in the acoustic model pattern storage unit. If the command information cannot be extracted as a result of performing voice recognition of the voice information by the command information extraction means, the other information associated with the identification information is obtained. The acoustic model pattern is acquired.

請求項４に記載の発明は、請求項３に記載の発明において、
識別情報毎であって、且つ音響モデルパターン毎に、前記コマンド情報抽出手段による音声認識率を算出する算出手段と、
前記算出手段により算出された音声認識率を、識別情報毎であって、且つ音響モデルパターン毎に記憶する音声認識率記憶手段と、をさらに備え、
前記音響モデルパターン取得手段は、前記識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて前記音響モデルパターン記憶手段に記憶されている場合、前記音声認識率記憶手段に記憶された、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得することを特徴とする。 The invention according to claim 4 is the invention according to claim 3,
Calculating means for calculating a speech recognition rate by the command information extracting means for each identification information and for each acoustic model pattern;
Voice recognition rate storage means for storing the voice recognition rate calculated by the calculation means for each identification information and for each acoustic model pattern;
When the acoustic model pattern acquisition unit stores a plurality of acoustic model patterns in association with the identification information extracted by the identification information extraction unit and stores them in the acoustic model pattern storage unit, the acoustic model pattern acquisition unit stores The acoustic model patterns are acquired in descending order of the speech recognition rate of the stored acoustic model pattern associated with the identification information.

請求項５に記載の発明は、請求項２〜４の何れか一項に記載の発明において、
前記コマンド情報抽出手段により抽出されたコマンド情報に対応する前記制御手段による制御内容を表示手段に表示させる表示制御手段と、
前記表示制御手段により前記表示手段に表示された前記制御内容を実行するか、又は、中止するかを指定する指定手段と、をさらに備え、
前記制御手段は、前記指定手段により、前記制御内容を実行すると指定された場合、前記コマンド情報に対応する制御を行うことを特徴とする。 The invention according to claim 5 is the invention according to any one of claims 2 to 4,
Display control means for causing the display means to display the control content by the control means corresponding to the command information extracted by the command information extraction means;
Further comprising designation means for designating whether to execute or stop the control content displayed on the display means by the display control means,
The control means performs control corresponding to the command information when the designation means designates execution of the control content.

請求項１に記載の発明によれば、識別情報抽出手段によって、音声情報取得手段により取得された音声情報の音声認識を行い、音響モデルパターン記憶手段に記憶された識別情報を抽出することができ、算出手段によって、識別情報毎であって、且つ音響モデルパターン毎に、コマンド情報抽出手段による音声認識率を算出することができ、識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターン記憶手段に記憶されている場合、音響モデルパターン取得手段によって、音声認識率記憶手段に記憶された、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得することができ、コマンド情報抽出手段によって、音響モデルパターン取得手段により取得された音響モデルパターンを用いて、音声情報取得手段により取得された音声情報の音声認識を行い、コマンド情報記憶手段に記憶されたコマンド情報を抽出することができる。
従って、電子機器が音声情報を取得した場合、当該音声情報を入力したユーザを識別し、当該ユーザに最適な音響モデルパターンを取得して音声認識を行うことができることとなって、より確実に音声認識を行うことができる。
また、表示制御手段によって、コマンド情報抽出手段により抽出されたコマンド情報に対応する制御手段による制御内容を表示手段に表示させることができ、指定手段により、表示制御手段により表示手段に表示された制御内容を実行するか、又は、中止するかを指定することができ、制御手段によって、指定手段により、制御内容を実行すると指定された場合、コマンド情報に対応する制御を行うことができる。
従って、音声認識結果に基づく制御を行う前に、当該制御内容をユーザに対して表示確認することができ、誤作動を好適に低減することができる。 According to the first aspect of the invention, the identification information extraction unit can perform voice recognition of the voice information acquired by the voice information acquisition unit and extract the identification information stored in the acoustic model pattern storage unit. The voice recognition rate by the command information extraction unit can be calculated for each identification information and for each acoustic model pattern by the calculation unit, and a plurality of acoustic model patterns are included in the identification information extracted by the identification information extraction unit. Is stored in the acoustic model pattern storage means, the acoustic model pattern acquisition means stores the voice recognition rate of the acoustic model pattern associated with the identification information stored in the voice recognition rate storage means. The acoustic model pattern can be acquired in descending order, and the command information extracting means can acquire the acoustic model pattern. Using the obtained acoustic model pattern by, it performs speech recognition of the speech information obtained by the speech information acquisition unit, it is possible to extract the command information stored in the command information storage means.
Therefore, when the electronic device acquires voice information, the user who has input the voice information can be identified, and an acoustic model pattern optimal for the user can be acquired and voice recognition can be performed. Recognition can be performed.
Further, the display control means can cause the display means to display the control content by the control means corresponding to the command information extracted by the command information extraction means, and the control means displayed on the display means by the display control means by the designation means. Whether the content is to be executed or canceled can be specified. When the control means specifies that the control content is to be executed by the specifying means, control corresponding to the command information can be performed.
Therefore, before the control based on the voice recognition result is performed, the control content can be displayed and confirmed to the user, and malfunctions can be suitably reduced.

請求項２に記載の発明によれば、識別情報抽出手段によって、音声情報取得手段により取得された音声情報の音声認識を行い、音響モデルパターン記憶手段に記憶された識別情報を抽出することができ、音響モデルパターン取得手段によって、識別情報抽出手段により抽出された識別情報に対応する音響モデルパターンを音響モデルパターン記憶手段から取得することができ、コマンド情報抽出手段によって、音響モデルパターン取得手段により取得された音響モデルパターンを用いて、音声情報取得手段により取得された音声情報の音声認識を行い、コマンド情報記憶手段に記憶されたコマンド情報を抽出することができ、制御手段により、コマンド情報抽出手段により抽出したコマンド情報に基づいて制御を行うことができる。
従って、電子機器が音声情報を取得した場合、当該音声情報を入力したユーザを識別し、当該ユーザに適した音響モデルパターンを取得して、音声認識を行うことができることとなって、より確実に音声認識を行うことができる。 According to the second aspect of the present invention, the identification information extraction unit can perform voice recognition of the voice information acquired by the voice information acquisition unit and extract the identification information stored in the acoustic model pattern storage unit. The acoustic model pattern acquisition unit can acquire the acoustic model pattern corresponding to the identification information extracted by the identification information extraction unit from the acoustic model pattern storage unit, and the command information extraction unit acquires the acoustic model pattern by the acoustic model pattern acquisition unit. The voice information acquired by the voice information acquisition means can be recognized using the acoustic model pattern, and the command information stored in the command information storage means can be extracted, and the command information extraction means can be extracted by the control means. Control can be performed based on the command information extracted by the above.
Therefore, when the electronic device acquires voice information, the user who has input the voice information can be identified, an acoustic model pattern suitable for the user can be acquired, and voice recognition can be performed more reliably. Voice recognition can be performed.

請求項３に記載の発明によれば、請求項２に記載の発明と同様の効果が得られることは無論のこと、音響モデルパターン記憶手段は、一の識別情報に対して複数の音響モデルパターンを対応付けて記憶可能に構成され、識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターン記憶手段に記憶されている場合、音響モデルパターン取得手段によって、当該識別情報に対応付けられた一の音響モデルパターンを取得し、コマンド情報抽出手段により、音声情報の音声認識を行った結果、コマンド情報を抽出できなかった場合には、当該識別情報に対応付けられた他の一の音響モデルパターンを取得することができる。
従って、ユーザは、複数の音響モデルパターンから任意の音響モデルパターンを選択して音声認識を行うことができるので、例えば、ユーザの体調の変化により周波数パターンが変化した場合、一の音響モデルパターンでの音声認識が失敗しても、別の音響モデルパターンを選択することができるので、好適に音声認識を行うことができる。 According to the invention described in claim 3, it is needless to say that the same effect as that of the invention described in claim 2 can be obtained, and the acoustic model pattern storage means has a plurality of acoustic model patterns for one piece of identification information. Are stored in association with each other, and when a plurality of acoustic model patterns are associated with the identification information extracted by the identification information extraction means and stored in the acoustic model pattern storage means, the acoustic model pattern acquisition means If one acoustic model pattern associated with the identification information is obtained and the command information extraction means performs speech recognition of the speech information, the command information cannot be extracted. Another obtained acoustic model pattern can be acquired.
Therefore, the user can select any acoustic model pattern from a plurality of acoustic model patterns and perform speech recognition. For example, when the frequency pattern changes due to a change in the user's physical condition, the user can use one acoustic model pattern. Even if the voice recognition fails, another acoustic model pattern can be selected, so that the voice recognition can be suitably performed.

請求項４に記載の発明によれば、請求項３に記載の発明と同様の効果が得られることは無論のこと、算出手段によって、識別情報毎であって、且つ音響モデルパターン毎に、コマンド情報抽出手段による音声認識率を算出することができ、識別情報抽出手段により抽出された識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターン記憶手段に記憶されている場合、音響モデルパターン取得手段によって、音声認識率記憶手段に記憶された、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得することができる。
従って、音声認識率に基づいて、各ユーザに最適な音響モデルパターンを選択し、より確実に音声認識を行うことができる。 According to the invention described in claim 4, it is a matter of course that the same effect as that of the invention described in claim 3 can be obtained, and the calculation means executes the command for each identification information and for each acoustic model pattern. When the speech recognition rate by the information extraction unit can be calculated and a plurality of acoustic model patterns are associated with the identification information extracted by the identification information extraction unit and stored in the acoustic model pattern storage unit, the acoustic model pattern The acquisition unit can acquire the acoustic model patterns in descending order of the speech recognition rate of the acoustic model pattern associated with the identification information stored in the speech recognition rate storage unit.
Therefore, it is possible to select the optimal acoustic model pattern for each user based on the speech recognition rate and perform speech recognition more reliably.

請求項５に記載の発明によれば、請求項２〜４の何れか一項に記載の発明と同様の効果が得られることは無論のこと、表示制御手段によって、コマンド情報抽出手段により抽出されたコマンド情報に対応する制御手段による制御内容を表示手段に表示させることができ、指定手段により、表示制御手段により表示手段に表示された制御内容を実行するか、又は、中止するかを指定することができ、指定手段により、制御内容を実行すると指定された場合、制御手段によって、コマンド情報に対応する制御を行うことができる。
従って、音声認識結果に基づく制御を行う前に、当該制御内容をユーザに対して表示確認することができ、誤作動を好適に低減することができる。 According to the invention described in claim 5, it is needless to say that the same effect as the invention described in any one of claims 2 to 4 can be obtained, and it is extracted by the command information extracting means by the display control means. The control content by the control means corresponding to the command information can be displayed on the display means, and the designation means designates whether the control content displayed on the display means is executed or stopped by the display control means When the control means specifies that the control content is to be executed, the control means can perform control corresponding to the command information.
Therefore, before the control based on the voice recognition result is performed, the control content can be displayed and confirmed to the user, and malfunctions can be suitably reduced.

以下、図面を参照しながら、本発明の実施の形態を詳細に説明する。
なお、本実施の形態では、電子機器として、ディジタルテレビ受像機を例に説明する。しかし、電子機器はこれに限られることなく、例えば、ビデオやエアコン等の家電機器であっても良い。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Note that in this embodiment, a digital television receiver is described as an example of the electronic device. However, the electronic device is not limited to this, and may be a home appliance such as a video or an air conditioner.

まず、図１を参照しながら、機器全体の構成及び要部構成について説明する。
本実施形態のディジタルテレビ受像機１００は、例えば、テレビジョン放送信号（以下、放送信号という。）を受信し、受信した放送信号を、所定の出力信号に変換し、映像／音声データを出力する機器本体部１と、ユーザが音声情報を機器本体部１に無線で出力することができる音声情報出力手段としてのマイク付きリモートコントロール装置（以下、マイク付きリモコンという。）２と、を備えて構成されている。 First, with reference to FIG. 1, the configuration of the entire device and the configuration of the main part will be described.
The digital television receiver 100 of this embodiment receives, for example, a television broadcast signal (hereinafter referred to as a broadcast signal), converts the received broadcast signal into a predetermined output signal, and outputs video / audio data. A device main body unit 1 and a remote control device with a microphone (hereinafter referred to as a remote controller with a microphone) 2 as a voice information output means that allows a user to output voice information to the device main body unit 1 wirelessly. Has been.

次に、ディジタルテレビ受像機１００の要部構成について説明する。
機器本体部１は、放送信号を受信するアンテナ３と、アンテナ３により受信された放送信号から所定の放送チャンネルの放送信号を選局するチューナ４と、チューナ４から出力された放送信号の復調及び誤り訂正処理を行う復調部５と、復調部５で復調された放送信号に付加されている不正視聴防止用のスクランブル信号を解除するデスクランブル６と、デスクランブル６で解除された放送信号から各データを分離抽出することで、映像／音声データを取得するデマルチクサ７と、デマルチクサ７で取得された映像／音声データに伸張処理を行うデコーダ８と、デコーダ８で伸張された映像データにＯＳＤデータを付加するＯＳＤ（On-Screen Character Display）処理部９と、デコーダ８、ＯＳＤ処理部９で処理された映像／音声データを出力する表示手段としての受像部１０と、音声情報を取得する音声情報取得部１１と、機器本体部１全体を統括制御する制御部１２と、各部を接続する制御バス１３と、を備えて構成されている。 Next, the configuration of the main part of the digital television receiver 100 will be described.
The device main unit 1 includes an antenna 3 that receives a broadcast signal, a tuner 4 that selects a broadcast signal of a predetermined broadcast channel from the broadcast signal received by the antenna 3, demodulation of the broadcast signal output from the tuner 4, and Each of the demodulator 5 that performs error correction processing, the descramble 6 that cancels the scramble signal for preventing illegal viewing added to the broadcast signal demodulated by the demodulator 5, and the broadcast signal that is canceled by the descramble 6 By separating and extracting the data, a demultiplexer 7 for acquiring video / audio data, a decoder 8 for performing expansion processing on the video / audio data acquired by the demultiplexer 7, and OSD data for the video data expanded by the decoder 8 An OSD (On-Screen Character Display) processing unit 9 to be added, a decoder 8 and a table for outputting video / audio data processed by the OSD processing unit 9 The image receiving unit 10 as a means, an audio information acquisition unit 11 that acquires audio information, a control unit 12 that performs overall control of the entire device main body 1, and a control bus 13 that connects the units are configured. .

音声情報取得部１１は、例えば、マイクロホンが用いられ、音声をそのまま音声情報として取得する。
なお、音声情報取得部１１は、音声をそのまま音声情報として取得する場合に限らず、音声が音声信号に変換された後に、当該音声信号を音声情報として取得する設計であっても良い。 The voice information acquisition unit 11 uses, for example, a microphone, and acquires voice as it is as voice information.
Note that the audio information acquisition unit 11 is not limited to acquiring audio as audio information as it is, and may be designed to acquire the audio signal as audio information after the audio is converted into an audio signal.

制御部１２は、例えば、図１に示すように、ＣＰＵ（Central Processing Unit）１２１と、ＲＡＭ（Random Access Memory）１２２と、記憶部１２３と、などを備えている。 For example, as shown in FIG. 1, the control unit 12 includes a CPU (Central Processing Unit) 121, a RAM (Random Access Memory) 122, a storage unit 123, and the like.

ＣＰＵ１２１は、記憶部１２３に記憶された音響モデルパターンテーブル、コマンドテーブル及び各種処理プログラムに従って各種の制御動作を行う。 The CPU 121 performs various control operations according to the acoustic model pattern table, command table, and various processing programs stored in the storage unit 123.

ＲＡＭ１２２は、ＣＰＵ１２１によって実行される処理プログラムなどを展開するためのプログラム格納領域や、入力データや上記処理プログラムが実行される際に生じる処理結果などを格納するデータ格納領域などを備える。 The RAM 122 includes a program storage area for expanding a processing program executed by the CPU 121, a data storage area for storing input data, a processing result generated when the processing program is executed, and the like.

記憶部１２３は、機器本体部１で実行可能なシステムプログラム、当該システムプログラムで実行可能な各種処理プログラム、これら各種処理プログラムを実行する際に使用されるデータ、ＣＰＵ１２１によって演算処理された処理結果のデータなどを記憶する。なお、プログラムは、コンピュータが読み取り可能なプログラムコードの形で記憶部１２３に記憶されている。
具体的には、記憶部１２３には、例えば、図１に示すように、音響モデルパターンテーブル１２３ａ、コマンドテーブル１２３ｂ、識別情報抽出プログラム１２３ｃ、音響モデルパターン取得プログラム１２３ｄ、コマンド情報抽出プログラム１２３ｅ、制御プログラム１２３ｆ、算出プログラム１２３ｇ、表示制御プログラム１２３ｈなどが記憶されている。 The storage unit 123 stores a system program that can be executed by the device main unit 1, various processing programs that can be executed by the system program, data that is used when these various processing programs are executed, and processing results that are calculated by the CPU 121. Store data etc. Note that the program is stored in the storage unit 123 in the form of a computer-readable program code.
Specifically, in the storage unit 123, for example, as shown in FIG. 1, an acoustic model pattern table 123a, a command table 123b, an identification information extraction program 123c, an acoustic model pattern acquisition program 123d, a command information extraction program 123e, and a control A program 123f, a calculation program 123g, a display control program 123h, and the like are stored.

音響モデルパターンテーブル１２３ａは、ユーザを識別する音声情報からなる識別情報と、音声信号の周波数パターンに関する音響モデルパターンとを対応付けて記憶することができ、且つ、一の識別情報に対して複数の音響モデルパターンを対応付けて記憶可能に構成され、また、後述する算出プログラム１２３ｇにより算出された音声認識率を、識別情報毎であって、且つ音響モデルパターン毎に記憶したテーブルである。
具体的には、音響モデルパターンテーブル１２３ａには、例えば、図２に示すように、ユーザＩＤ−１は、音声情報「一郎」からなる識別情報を有し、かかる識別情報に対して複数の音響モデルパターンＡ、Ｂ、Ｃと、かかる音響モデルパターン毎の音声認識率が対応付けられて記憶されている。
記憶部１２３は、かかる音響モデルパターンテーブル１２３ａを記憶することで、音響モデルパターン記憶手段及び音声認識率記憶手段として機能する。 The acoustic model pattern table 123a can store identification information including audio information for identifying a user and an acoustic model pattern related to a frequency pattern of the audio signal in association with each other, and a plurality of identification information can be stored with respect to one identification information. The table is configured to be able to store an acoustic model pattern in association with each other, and stores a speech recognition rate calculated by a calculation program 123g described later for each identification information and for each acoustic model pattern.
Specifically, in the acoustic model pattern table 123a, for example, as shown in FIG. 2, the user ID-1 has identification information composed of the audio information “Ichiro”, and a plurality of acoustics are associated with the identification information. The model patterns A, B, and C are stored in association with the speech recognition rate for each acoustic model pattern.
The storage unit 123 functions as an acoustic model pattern storage unit and a voice recognition rate storage unit by storing the acoustic model pattern table 123a.

コマンドテーブル１２３ｂは、機器本体部１を制御するための音声情報からなるコマンド情報を記憶したテーブルである。
記憶部１２３は、かかるコマンドテーブル１２３ｂを記憶することで、コマンド情報記憶手段として機能する。 The command table 123 b is a table that stores command information including voice information for controlling the device main body 1.
The storage unit 123 functions as a command information storage unit by storing the command table 123b.

識別情報抽出プログラム１２３ｃは、ＣＰＵ１２１に、音声情報取得部１１により取得された音声情報に基づき音声認識を行い、識別情報を抽出する機能を実現させるプログラムである。
ここで、音声認識は、例えば、Ｊｕｌｉｕｓという音声認識ソフトウェアが用いられ、膨大な音声パターンと言語パターンの統計データを集積することにより実現される。音声認識の基本原理は、音響モデル（音素（ローマ字１文字にほぼ相当）や音節（かな１文字に相当））の周波数パターンを保持し、単語辞書を参照しながら、入力された音声の音声波形信号から単音節部分を切出した音節列信号とマッチングを行い、音声の認識を行う。
具体的には、ＣＰＵ１２１は、識別情報抽出プログラム１２３ｃを実行することにより、音声情報取得部１１により取得された音声情報の音声認識を予め設定された音響モデルパターンを用いて行い、音響モデルパターンテーブル１２３ａに記憶された識別情報の中から一致する識別情報を抽出する。
ＣＰＵ１２１は、かかる識別情報抽出プログラム１２３ｃを実行することにより、識別情報抽出手段として機能する。 The identification information extraction program 123c is a program that causes the CPU 121 to realize a function of performing voice recognition based on the voice information acquired by the voice information acquisition unit 11 and extracting the identification information.
Here, speech recognition is realized by, for example, using Julius speech recognition software and accumulating enormous speech patterns and language pattern statistical data. The basic principle of speech recognition is to maintain the frequency pattern of an acoustic model (phoneme (approximately equivalent to one Roman character) or syllable (equivalent to one kana character)) and refer to the word dictionary while referring to the word dictionary. It performs matching with the syllable string signal obtained by cutting out a single syllable part from the signal and recognizes the voice.
Specifically, the CPU 121 executes voice recognition of voice information acquired by the voice information acquisition unit 11 by executing the identification information extraction program 123c, using a preset acoustic model pattern, and an acoustic model pattern table Matching identification information is extracted from the identification information stored in 123a.
The CPU 121 functions as an identification information extraction unit by executing the identification information extraction program 123c.

音響モデルパターン取得プログラム１２３ｄは、ＣＰＵ１２１に、識別情報抽出プログラム１２３ｃの実行により抽出された識別情報に対応する音響モデルパターンを音響モデルパターンテーブル１２３ａから取得する機能を実現させるプログラムである。
具体的には、ＣＰＵ１２１は、音響モデルパターン取得プログラム１２３ｄを実行することにより、識別情報抽出プログラム１２３ｃの実行により抽出された識別情報に対応する音響モデルパターンを音響モデルパターンテーブル１２３ａから取得し、また、識別情報抽出プログラム１２３ｃにより抽出された識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターンテーブル１２３ａに記憶されている場合、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得する。例えば、図２に示すように、「一郎」からなる識別情報が抽出された場合、ＣＰＵ１２１は、最初に、音声認識率の一番高い音響モデルパターンＡを取得し、次いで、音響モデルパターンＢ、音響モデルパターンＣの順で取得する。
ＣＰＵ１２１は、かかる音響モデルパターン取得プログラム１２３ｄを実行することにより、音響モデルパターン取得手段として機能する。 The acoustic model pattern acquisition program 123d is a program that causes the CPU 121 to realize a function of acquiring, from the acoustic model pattern table 123a, an acoustic model pattern corresponding to the identification information extracted by executing the identification information extraction program 123c.
Specifically, the CPU 121 acquires an acoustic model pattern corresponding to the identification information extracted by the execution of the identification information extraction program 123c from the acoustic model pattern table 123a by executing the acoustic model pattern acquisition program 123d. When a plurality of acoustic model patterns are associated with the identification information extracted by the identification information extraction program 123c and stored in the acoustic model pattern table 123a, the voice recognition rate of the acoustic model pattern associated with the identification information The acoustic model pattern is acquired in descending order. For example, as shown in FIG. 2, when the identification information consisting of “Ichiro” is extracted, the CPU 121 first acquires the acoustic model pattern A having the highest speech recognition rate, and then acquires the acoustic model pattern B, Acquired in the order of acoustic model pattern C.
The CPU 121 functions as an acoustic model pattern acquisition unit by executing the acoustic model pattern acquisition program 123d.

コマンド情報抽出プログラム１２３ｅは、ＣＰＵ１２１に、音声情報取得部１１により取得された音声情報の音声認識を行い、音声認識結果に基づくコマンド情報を抽出する機能を実現させるプログラムである。
具体的には、ＣＰＵ１２１は、コマンド情報抽出プログラム１２３ｅを実行することにより、音響モデルパターン取得プログラム１２３ｄにより取得された音響モデルパターンを用いて音声情報の音声認識を行い、コマンドテーブル１２３ｂに記憶されたコマンド情報の中から一致するコマンド情報を抽出する。
ＣＰＵ１２１は、かかるコマンド情報抽出プログラム１２３ｅを実行することにより、コマンド情報抽出手段として機能する。 The command information extraction program 123e is a program that causes the CPU 121 to realize voice recognition of voice information acquired by the voice information acquisition unit 11 and to extract command information based on the voice recognition result.
Specifically, by executing the command information extraction program 123e, the CPU 121 performs voice recognition of the voice information using the acoustic model pattern acquired by the acoustic model pattern acquisition program 123d, and is stored in the command table 123b. Extracts matching command information from the command information.
The CPU 121 functions as a command information extraction unit by executing the command information extraction program 123e.

制御プログラム１２３ｆは、ＣＰＵ１２１に、制御プログラム１２３ｆの実行により、音声認識結果に基づくコマンド情報により、機器本体部１の制御を行う機能を実現させるプログラムである。
具体的には、ＣＰＵ１２１は、後述する表示制御プログラム１２３ｈを実行することにより、コマンド情報抽出プログラム１２３ｅの実行により抽出されたコマンド情報に対応する制御内容「チャンネルアップ」が受像部１０に表示され、マイク付きリモコン２により、表示された制御内容の実行が指示された場合に、制御プログラム１２３ｆを実行することにより、コマンド情報に対応する制御を行う。
ＣＰＵ１２１は、かかる制御プログラム１２３ｆを実行することにより、制御手段として機能する。 The control program 123f is a program that causes the CPU 121 to realize a function of controlling the device main body 1 based on command information based on the voice recognition result by executing the control program 123f.
Specifically, the CPU 121 displays the control content “channel up” corresponding to the command information extracted by the execution of the command information extraction program 123e on the image receiving unit 10 by executing the display control program 123h described later. When the execution of the displayed control content is instructed by the remote controller 2 with the microphone, the control corresponding to the command information is performed by executing the control program 123f.
The CPU 121 functions as a control unit by executing the control program 123f.

算出プログラム１２３ｇは、ＣＰＵ１２１に、音声認識結果に基づく、音声認識率を算出する機能を実現させるプログラムである。
具体的には、ＣＰＵ１２１は、算出プログラム１２３ｇを実行することにより、音響モデルパターンテーブル１２３ａに記憶された、識別情報毎であって、且つ音響モデルパターン毎に、コマンド情報抽出プログラム１２３ｅの実行による音声認識率を算出する。また、算出方法は、例えば、コマンド情報抽出プログラム１２３ｅの実行により、抽出に成功した回数をコマンド情報抽出プログラム１２３ｅの実行回数で割った値などを用いる。
ＣＰＵ１２１は、かかる算出プログラム１２３ｇを実行することにより、算出手段として機能する。 The calculation program 123g is a program that causes the CPU 121 to realize a function of calculating a voice recognition rate based on a voice recognition result.
Specifically, by executing the calculation program 123g, the CPU 121 executes the command information extraction program 123e for each piece of identification information stored in the acoustic model pattern table 123a and for each acoustic model pattern. The recognition rate is calculated. The calculation method uses, for example, a value obtained by dividing the number of successful extractions by the number of executions of the command information extraction program 123e by executing the command information extraction program 123e.
The CPU 121 functions as a calculation unit by executing the calculation program 123g.

表示制御プログラム１２３ｈは、ＣＰＵ１２１に、音声認識結果に基づくコマンド情報に対応する制御内容を受像部１０に表示させる機能を実現させるプログラムである。
具体的には、ＣＰＵ１２１は、図３に示すように、表示制御プログラム１２３ｈを実行することによって、コマンド情報抽出プログラム１２３ｅの実行により抽出されたコマンド情報に対応する制御内容を表示手段としての受像部１０に表示させる。
ＣＰＵ１２１は、かかる表示制御プログラム１２３ｈを実行することにより、表示制御手段として機能する。 The display control program 123h is a program that causes the CPU 121 to realize a function of causing the image receiving unit 10 to display control contents corresponding to command information based on the voice recognition result.
Specifically, as shown in FIG. 3, the CPU 121 executes the display control program 123 h to thereby display the control content corresponding to the command information extracted by the execution of the command information extraction program 123 e as the display unit. 10 is displayed.
The CPU 121 functions as a display control unit by executing the display control program 123h.

次に、本発明の機器本体部１の動作について、本発明の実施形態であるディジタルテレビ受像機を例に図４を用いて説明する。なお、ユーザである一郎が、チャンネルアップを行いたい場合を想定して、説明する。
まず、ステップS１において、音声情報取得部１１で音声情報である「一郎」と「チャンネルアップ」を取得する。次いで、ステップＳ２において、ＣＰＵ１２１は、識別情報抽出プログラム１２３ｃを実行することにより、音声情報取得部１１で取得された音声情報「一郎」の音声認識を行い、音響モデルパターンテーブル１２３ａに記憶された識別情報「一郎」を抽出する。 Next, the operation of the device main unit 1 according to the present invention will be described with reference to FIG. 4 using a digital television receiver as an embodiment of the present invention as an example. The explanation will be made assuming that Ichiro, who is a user, wants to channel up.
First, in step S1, the audio information acquisition unit 11 acquires “Ichiro” and “Channel Up” that are audio information. Next, in step S2, the CPU 121 performs voice recognition of the voice information “Ichiro” acquired by the voice information acquisition unit 11 by executing the identification information extraction program 123c, and the identification stored in the acoustic model pattern table 123a. Extract information "Ichiro".

次いで、ステップＳ３において、識別情報を抽出できた場合には（ステップＳ３；Ｙｅｓ）、ステップＳ４へ進む。一方、ステップＳ３において、識別情報を抽出できなかった場合（ステップＳ３；Ｎｏ）、ステップＳ１へ戻る。 Next, when the identification information can be extracted in step S3 (step S3; Yes), the process proceeds to step S4. On the other hand, if the identification information cannot be extracted in step S3 (step S3; No), the process returns to step S1.

次いで、ステップＳ４において、ＣＰＵ１２１は、音響モデルパターン取得プログラム１２３ｄを実行することにより、識別情報に対応する音響モデルパターンを音響モデルパターンテーブル１２３ａから取得する。かかる場合、図２に示すように、識別情報「一郎」に対して、複数の音響モデルパターンＡ、Ｂ、Ｃが対応付けられ記憶されているが、音声認識率の一番高い音響モデルパターンＡを取得する。 Next, in step S4, the CPU 121 acquires an acoustic model pattern corresponding to the identification information from the acoustic model pattern table 123a by executing the acoustic model pattern acquisition program 123d. In such a case, as shown in FIG. 2, a plurality of acoustic model patterns A, B, and C are stored in association with the identification information “Ichiro”, but the acoustic model pattern A having the highest speech recognition rate is stored. To get.

次いで、ステップＳ５において、ＣＰＵ１２１は、コマンド情報抽出プログラム１２３ｅを実行することにより、取得された音響モデルパターンを用いて、音声情報取得部１１で取得された音声情報「チャンネルアップ」の音声認識を行い、コマンドテーブル１２３ｂに記憶されたコマンド情報を抽出する。 Next, in step S5, the CPU 121 executes the command information extraction program 123e to perform voice recognition of the voice information “channel up” acquired by the voice information acquisition unit 11 using the acquired acoustic model pattern. The command information stored in the command table 123b is extracted.

次いで、ステップＳ６において、コマンド情報抽出プログラム１２３ｅの実行により、コマンド情報を抽出できた場合（ステップＳ６；Ｙｅｓ）、ステップＳ７へ進む。一方、ステップＳ６において、コマンド情報を抽出できなかった場合（ステップＳ６；Ｎｏ）、ステップＳ４へ戻り、ＣＰＵ１２１は、音響モデルパターン取得プログラム１２３ｄを実行することにより、再度、音響モデルパターンの取得を行う。このとき、識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターンテーブル１２３ａに記憶されている場合、ＣＰＵ１２１は、音響モデルパターン取得プログラム１２３ｄの実行により、音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得する。かかる場合、音響モデルパターンＡの次に音声認識率の高い音響モデルパターンＢを取得する。 Next, in step S6, when command information can be extracted by executing the command information extraction program 123e (step S6; Yes), the process proceeds to step S7. On the other hand, when command information cannot be extracted in step S6 (step S6; No), the process returns to step S4, and the CPU 121 acquires the acoustic model pattern again by executing the acoustic model pattern acquisition program 123d. . At this time, when a plurality of acoustic model patterns are associated with the identification information and stored in the acoustic model pattern table 123a, the CPU 121 has a high speech recognition rate of the acoustic model pattern by executing the acoustic model pattern acquisition program 123d. An acoustic model pattern is acquired in order. In such a case, the acoustic model pattern B having the highest speech recognition rate next to the acoustic model pattern A is acquired.

次いで、ステップＳ７において、ＣＰＵ１２１は、表示制御プログラム１２３ｈを実行することにより、コマンド情報抽出プログラム１２３ｅの実行により抽出されたコマンド情報に対応する制御内容「チャンネルアップ」を受像部１０に表示する。 Next, in step S <b> 7, the CPU 121 displays the control content “channel up” corresponding to the command information extracted by the execution of the command information extraction program 123 e on the image receiving unit 10 by executing the display control program 123 h.

次いで、ステップＳ８において、受像部１０に表示された制御内容を実行する場合（ステップＳ８；Ｙｅｓ）、指定手段としてのマイク付きリモコン等により、図３に示すように、「Ｙｅｓ」の表示の選択指定を行い、ステップＳ９へ進む。一方、「Ｎｏ」の表示の選択指定を行った場合（ステップＳ８；Ｎｏ）、本処理を終了する。 Next, when the control content displayed on the image receiving unit 10 is executed in step S8 (step S8; Yes), the selection of “Yes” display is selected as shown in FIG. Designate and proceed to step S9. On the other hand, when the selection designation of the display of “No” is performed (step S8; No), this process is terminated.

次いで、ステップＳ９において、ＣＰＵ１２１は、制御プログラム１２３ｆを実行することにより、コマンド情報に対応する制御を行い、本処理を終了する。 Next, in step S9, the CPU 121 performs control corresponding to the command information by executing the control program 123f, and ends this processing.

以上説明した本発明にかかるディジタルテレビ受像機１００によれば、ＣＰＵ１２１が、識別情報抽出プログラム１２３ｃを実行することによって、音声情報取得部１１により取得された音声情報の音声認識を行い、音響モデルパターンテーブル１２３ａに記憶された識別情報を抽出することができ、算出プログラム１２３ｇを実行することによって、識別情報毎であって、且つ音響モデルパターン毎に、コマンド情報抽出プログラム１２３ｅの実行による音声認識率を算出することができ、音響モデルパターン取得プログラム１２３ｄの実行によって、識別情報抽出プログラム１２３ｃの実行により抽出された識別情報に複数の音響モデルパターンが対応付けられて音響モデルパターンテーブル１２３ａに記憶されている場合、音響モデルパターンテーブル１２３ａに記憶された、当該識別情報に対応付けられた音響モデルパターンの音声認識率の高い順に音響モデルパターンを取得することができ、コマンド情報抽出プログラム１２３ｅの実行によって、音響モデルパターン取得プログラム１２３ｄの実行により取得された音響モデルパターンを用いて、音声情報取得部１１により取得された音声情報の音声認識を行い、コマンドテーブル１２３ｂに記憶されたコマンド情報を抽出することができる。
従って、電子機器が音声情報を取得した場合、当該音声情報を入力したユーザを識別し、当該ユーザに最適な音響モデルパターンを取得し、音声認識を行うことができる。
また、ＣＰＵ１２１が、表示制御プログラム１２３ｈを実行することによって、コマンド情報抽出プログラム１２３ｅの実行により抽出されたコマンド情報に対応する制御内容を受像部１０に表示させることができ、マイク付きリモコン２により、表示制御プログラム１２３ｈを実行するにより受像部１０に表示された制御内容を実行するか、又は、中止するかを指定することができ、制御プログラム１２３ｆの実行によって、マイク付きリモコン２により、制御内容を実行すると指定された場合、コマンド情報に対応する制御を行うことができる。
従って、音声認識結果に基づく制御を行う前に、当該制御内容をユーザに対して表示確認することができ、誤作動を好適に低減することができる。 According to the digital television receiver 100 according to the present invention described above, the CPU 121 executes voice recognition of the voice information acquired by the voice information acquisition unit 11 by executing the identification information extraction program 123c, and the acoustic model pattern. The identification information stored in the table 123a can be extracted, and by executing the calculation program 123g, the speech recognition rate by the execution of the command information extraction program 123e is determined for each identification information and for each acoustic model pattern. A plurality of acoustic model patterns are associated with the identification information extracted by the execution of the identification information extraction program 123c and stored in the acoustic model pattern table 123a by the execution of the acoustic model pattern acquisition program 123d. If the acoustic model The acoustic model pattern can be acquired in descending order of the speech recognition rate of the acoustic model pattern associated with the identification information stored in the turntable 123a. By executing the command information extraction program 123e, the acoustic model pattern acquisition program Using the acoustic model pattern acquired by executing 123d, the voice information acquired by the voice information acquisition unit 11 can be recognized, and the command information stored in the command table 123b can be extracted.
Therefore, when the electronic device acquires voice information, it is possible to identify a user who has input the voice information, acquire an acoustic model pattern optimal for the user, and perform voice recognition.
In addition, the CPU 121 can display the control content corresponding to the command information extracted by the execution of the command information extraction program 123e on the image receiving unit 10 by executing the display control program 123h. By executing the display control program 123h, it is possible to specify whether to execute or cancel the control content displayed on the image receiving unit 10, and by executing the control program 123f, the control content is controlled by the remote controller with microphone 2. When it is designated to execute, control corresponding to the command information can be performed.
Therefore, before the control based on the voice recognition result is performed, the control content can be displayed and confirmed to the user, and malfunctions can be suitably reduced.

なお、本発明は、上記実施の形態に限定されることなく、本発明の趣旨を逸脱しない範囲において、種々の改良並びに設計の変更を行っても良い。
例えば、指定手段は、本発明におけるマイク付きリモコンに限らず、機器本体部に設けられた操作パネルの操作キーなどによっても指定することができる設計であっても良い。
また、音響モデルパターン取得手段は、ユーザにより、手動で任意の音響モデルパターンを取得することができる設計であっても良い。
また、音声認識率の算出は、コマンド情報抽出手段によりコマンド情報が抽出されたか否かに限らず、コマンド情報が抽出され、表示手段に表示された当該コマンド情報に対応する制御内容を指定手段により、実行指定されたか否かを算出基準に加えることができる設計であっても良い。 The present invention is not limited to the above embodiment, and various improvements and design changes may be made without departing from the spirit of the present invention.
For example, the designating means is not limited to the remote controller with a microphone according to the present invention, and may be designed so that it can be designated by an operation key on an operation panel provided in the apparatus main body.
The acoustic model pattern acquisition unit may be designed such that an arbitrary acoustic model pattern can be manually acquired by a user.
The calculation of the speech recognition rate is not limited to whether or not the command information is extracted by the command information extracting means, but the command information is extracted and the control content corresponding to the command information displayed on the display means is specified by the specifying means. The design may be such that whether or not execution is designated can be added to the calculation criteria.

本発明にかかるディジタルテレビ受像機の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the digital television receiver concerning this invention. 本発明における音響モデルパターンテーブルの１例を示す図である。It is a figure which shows an example of the acoustic model pattern table in this invention. 本発明における表示制御手段による表示例を示す図である。It is a figure which shows the example of a display by the display control means in this invention. 本発明における機器本体部の動作処理を示すフローチャートである。It is a flowchart which shows the operation | movement process of the apparatus main body part in this invention.

Explanation of symbols

１００ディジタルテレビ受像機（電子機器）
１機器本体部
２マイク付きリモコン（指定手段）
１０受像部（表示手段）
１１音声情報取得部（音声情報取得手段）
１２１ＣＰＵ（識別情報抽出手段、音響モデルパターン取得手段、コマンド情報抽出手段、制御手段、算出手段、表示制御手段）
１２３ａ音響モデルパターンテーブル（音響モデルパターン記憶手段、音声認識率記憶手段）
１２３ｂコマンドテーブル（コマンド情報記憶手段）
１２３ｃ識別情報抽出プログラム（識別情報抽出手段）
１２３ｄ音響モデルパターン取得プログラム（音響モデルパターン取得手段）
１２３ｅコマンド情報抽出プログラム（コマンド情報抽出手段）
１２３ｆ制御プログラム（制御手段）
１２３ｇ算出プログラム（算出手段）
１２３ｈ表示制御プログラム（表示制御手段） 100 Digital television receiver (electronic equipment)
1 Device body 2 Remote control with microphone (designating means)
10 Image receiver (display means)
11 Voice information acquisition unit (voice information acquisition means)
121 CPU (identification information extraction means, acoustic model pattern acquisition means, command information extraction means, control means, calculation means, display control means)
123a Acoustic model pattern table (acoustic model pattern storage means, speech recognition rate storage means)
123b Command table (command information storage means)
123c Identification information extraction program (identification information extraction means)
123d Acoustic model pattern acquisition program (acoustic model pattern acquisition means)
123e Command information extraction program (command information extraction means)
123f Control program (control means)
123g Calculation program (calculation means)
123h Display control program (display control means)

Claims

In an electronic device that performs voice recognition of voice information input from the outside and is controlled by command information obtained based on the voice recognition result,
Identification information composed of audio information for identifying a user and an acoustic model pattern related to the frequency pattern of the audio signal can be stored in association with each other, and a plurality of acoustic model patterns can be associated with one identification information. Acoustic model pattern storage means configured to be capable of storage;
Command information storage means for storing command information including voice information for controlling the electronic device;
Audio information acquisition means for acquiring audio information;
Identification information extraction means for performing voice recognition of the voice information acquired by the voice information acquisition means, and extracting identification information stored in the acoustic model pattern storage means;
Acoustic model pattern acquisition means for acquiring an acoustic model pattern corresponding to the identification information extracted by the identification information extraction means from the acoustic model pattern storage means;
Command information for performing voice recognition of the voice information acquired by the voice information acquisition means using the acoustic model pattern acquired by the acoustic model pattern acquisition means and extracting command information stored in the command information storage means Extraction means;
Control means for performing control based on the command information extracted by the command information extraction means;
Calculating means for calculating a speech recognition rate by the command information extracting means for each identification information and for each acoustic model pattern;
Voice recognition rate storage means for storing the voice recognition rate calculated by the calculation means for each identification information and for each acoustic model pattern;
Display control means for causing the display means to display the control content by the control means corresponding to the command information extracted by the command information extraction means;
Specifying means for specifying whether to execute or stop the control content displayed on the display means by the display control means,
When the acoustic model pattern acquisition unit stores a plurality of acoustic model patterns in association with the identification information extracted by the identification information extraction unit and stores them in the acoustic model pattern storage unit, the acoustic model pattern acquisition unit stores Acquired acoustic model patterns in descending order of the speech recognition rate of the stored acoustic model pattern associated with the identification information,
The electronic device according to claim 1, wherein the control means performs control corresponding to the command information when the designation means designates execution of the control content.

In an electronic device that performs voice recognition of voice information input from the outside and is controlled by command information obtained based on the voice recognition result,
Acoustic model pattern storage means for storing identification information composed of voice information for identifying a user and an acoustic model pattern related to a frequency pattern of the voice signal in association with each other;
Command information storage means for storing command information including voice information for controlling the electronic device;
Audio information acquisition means for acquiring audio information;
Identification information extraction means for performing voice recognition of the voice information acquired by the voice information acquisition means, and extracting identification information stored in the acoustic model pattern storage means;
Acoustic model pattern acquisition means for acquiring an acoustic model pattern corresponding to the identification information extracted by the identification information extraction means from the acoustic model pattern storage means;
Command information for performing voice recognition of the voice information acquired by the voice information acquisition means using the acoustic model pattern acquired by the acoustic model pattern acquisition means and extracting command information stored in the command information storage means Extraction means;
Control means for performing control based on the command information extracted by the command information extraction means;
An electronic device comprising:

The acoustic model pattern storage means is configured to be capable of storing a plurality of acoustic model patterns in association with one identification information,
The acoustic model pattern acquisition unit is associated with the identification information when a plurality of acoustic model patterns are associated with the identification information extracted by the identification information extraction unit and stored in the acoustic model pattern storage unit. If the command information cannot be extracted as a result of performing voice recognition of the voice information by the command information extraction means, the other information associated with the identification information is obtained. The electronic device according to claim 2, wherein an acoustic model pattern is acquired.

Calculating means for calculating a speech recognition rate by the command information extracting means for each identification information and for each acoustic model pattern;
Voice recognition rate storage means for storing the voice recognition rate calculated by the calculation means for each identification information and for each acoustic model pattern;
When the acoustic model pattern acquisition unit stores a plurality of acoustic model patterns in association with the identification information extracted by the identification information extraction unit and stores them in the acoustic model pattern storage unit, the acoustic model pattern acquisition unit stores The electronic device according to claim 3, wherein the acoustic model patterns are acquired in descending order of the speech recognition rate of the stored acoustic model pattern associated with the identification information.

Display control means for causing the display means to display the control content by the control means corresponding to the command information extracted by the command information extraction means;
Further comprising designation means for designating whether to execute or stop the control content displayed on the display means by the display control means,
5. The electronic apparatus according to claim 2, wherein the control unit performs control corresponding to the command information when the specification unit specifies that the control content is to be executed. 6. .