JP2020135127A

JP2020135127A - Electronic apparatus

Info

Publication number: JP2020135127A
Application number: JP2019024654A
Authority: JP
Inventors: 国明山本; Kuniaki Yamamoto; 裕貴岡田; Hirotaka Okada
Original assignee: Onkyo Corp
Current assignee: Onkyo Corp
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2020-08-31

Abstract

To provide an electronic apparatus which uses a voice recognition function and solves various problems.SOLUTION: A speaker device 1 outputs a voice to the result of voice recognition. The speaker device 1 has a micro computer 2. The micro computer 2 displays the result of voice recognition instead of outputting a voice when the level of an input voice is a predetermined level or lower. The micro computer 2 mutes the output of the speaker when the level of an input voice is the predetermined level or lower.SELECTED DRAWING: Figure 2

Description

本発明は、音声認識機能を利用する電子機器に関する。 The present invention relates to an electronic device that utilizes a voice recognition function.

音声認識機能を利用する電子機器は、ユーザーの発話を音声認識し、例えば、ユーザーの発話が質問であれば、その質問に対して、回答を行う（例えば、特許文献１参照。）。例えば、ユーザーは、電子機器に対して、「今日の天気は」という質問をし、電子機器は、「今日の天気は晴れです」という回答をする。 An electronic device that uses the voice recognition function recognizes the user's utterance by voice, and if the user's utterance is a question, for example, answers the question (see, for example, Patent Document 1). For example, a user asks an electronic device "what is the weather today" and the electronic device answers "the weather is fine today".

特開２０１９−０１５９５０号公報JP-A-2019-015950

上述したような、音声認識機能を利用する従来の電子機器では、周囲に他の人間が存在すると、発話（入力）内容が、周囲の他の人間に知られてしまう。また、発話（入力）行為そのものが、周囲に雑音を撒き散らすことになる。また、ユーザーによっては、電子機器に対して発話することを恥ずかしく感じる場合もある。このように、音声認識機能を利用する従来の電子機器は、種々の問題がある。 In the conventional electronic device that uses the voice recognition function as described above, if another person exists in the vicinity, the utterance (input) content is known to the other people in the vicinity. In addition, the act of speaking (input) itself scatters noise around. Also, some users may feel embarrassed to speak to an electronic device. As described above, the conventional electronic device using the voice recognition function has various problems.

本発明の目的は、種々の問題を解決する、音声認識機能を利用する電子機器を提供することである。 An object of the present invention is to provide an electronic device that utilizes a voice recognition function to solve various problems.

第１の発明の電子機器は、音声認識の結果に対して音声を出力する電子機器であって、入力される音声のレベルが、所定のレベル以下である場合に、音声認識の結果に対して、音声出力の代わりに、表示を行う制御部を備えることを特徴とする。 The electronic device of the first invention is an electronic device that outputs voice with respect to the result of voice recognition, and when the level of input voice is equal to or lower than a predetermined level, the electronic device with respect to the result of voice recognition , It is characterized in that it is provided with a control unit that performs display instead of audio output.

本発明では、制御部は、入力される音声のレベルが、所定のレベル以下である場合に、音声認識の結果に対して、音声出力の代わりに、表示を行う。これにより、ユーザーは、小さい声で話せばよい。このため、発話（入力）内容を周囲の人間に知られることがない。また、周囲に雑音を撒き散らすこともない。また、ユーザーは、視覚で情報を得ることができる。このように、本発明によれば、種々の問題を解決することができる。 In the present invention, when the input voice level is equal to or lower than a predetermined level, the control unit displays the voice recognition result instead of the voice output. This allows the user to speak in a quiet voice. Therefore, the content of the utterance (input) is not known to the surrounding people. In addition, it does not scatter noise around. In addition, the user can obtain information visually. As described above, according to the present invention, various problems can be solved.

第２の発明の電子機器は、第１の発明の電子機器において、前記制御部は、入力される音声のレベルが、所定のレベル以下である場合に、スピーカー出力をミュートすることを特徴とする。 The electronic device of the second invention is the electronic device of the first invention, wherein the control unit mutes the speaker output when the level of the input voice is equal to or lower than a predetermined level. ..

第３の発明の電子機器は、第１又は第２の発明の電子機器において、前記制御部は、音声認識の結果と、それに対する実行内容と、を表示することを特徴とする。 The electronic device of the third invention is the electronic device of the first or second invention, wherein the control unit displays a result of voice recognition and an execution content thereof.

第４の発明の電子機器は、第１〜第３のいずれかの発明の電子機器において、複数のマイクと、前記複数のマイクのうち、いずれかのマイクが集音した音声を減衰する音量調整部と、をさらに備え、前記音量調整部による減衰量は、前記所定のレベル以下の音声である場合に、音声がクリップしない減衰量であることを特徴とする。 The electronic device of the fourth invention is the electronic device of any one of the first to third inventions, and the volume adjustment for attenuating a plurality of microphones and the sound collected by any of the plurality of microphones. The volume adjusting unit further comprises a unit, and the amount of attenuation by the volume adjusting unit is characterized in that the amount of attenuation that the sound does not clip when the sound is at or below the predetermined level.

第５の発明の電子機器は、第１〜第３のいずれかの発明の電子機器において、音声を集音し、集音した音声を、デジタル音声信号として出力するデジタルマイクと、前記所定のレベルを記憶するメモリと、をさらに備え、前記制御部は、前記メモリに記憶されている前記所定のレベルに基づいて、入力される音声が、前記所定のレベル以下であるか否かを判断することを特徴とする。 The electronic device of the fifth invention is the electronic device of any one of the first to third inventions, which includes a digital microphone that collects sound and outputs the collected sound as a digital voice signal, and the predetermined level. The control unit further includes a memory for storing the above, and the control unit determines whether or not the input voice is equal to or lower than the predetermined level based on the predetermined level stored in the memory. It is characterized by.

本発明によれば、種々の問題を解決することができる。 According to the present invention, various problems can be solved.

本発明の第１実施形態に係るスピーカー装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speaker apparatus which concerns on 1st Embodiment of this invention. 音声認識時のスピーカー装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the speaker apparatus at the time of voice recognition. 本発明の第２実施形態に係るスピーカー装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speaker apparatus which concerns on 2nd Embodiment of this invention.

以下、本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described.

（第１実施形態）
図１は、第１実施形態に係るスピーカー装置１を示すブロック図である。図１に示すように、スピーカー装置１（電子機器）は、マイクロコンピューター２と、ＤＳＰ（Digital Signal Processor）３と、Ｄ／Ａコンバーター（以下、「ＤＡＣ」という。）４と、アンプ５と、スピーカー６と、第１マイク７１〜第ｎマイク７ｎと、第１プリアンプ８１〜第（ｎ−１）プリアンプ８ｎ−１と、第１Ａ／Ｄコンバーター（以下、「ＡＤＣ」という。）９１〜第ｎＡＤＣ９ｎと、ボリュームＩＣ１０と、表示部１１と、を備える。 (First Embodiment)
FIG. 1 is a block diagram showing a speaker device 1 according to the first embodiment. As shown in FIG. 1, the speaker device 1 (electronic device) includes a microcomputer 2, a DSP (Digital Signal Processor) 3, a D / A converter (hereinafter referred to as “DAC”) 4, an amplifier 5, and the like. The speaker 6, the first microphone 71st to the nth microphone 7n, the first preamplifier 81 to the (n-1) preamplifier 8n-1, and the first A / D converter (hereinafter referred to as "ADC") 91 to the nADC9n. A volume IC 10 and a display unit 11 are provided.

マイクロコンピューター２（制御部）は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、入出力インターフェース等のハードウェアから構成されている。ＣＰＵは、ＲＯＭに格納されたプログラムに従って、スピーカー装置１を構成する各部を制御する。ＤＳＰ３は、デジタル音声信号に、各種の信号処理を行う。ＤＡＣ４は、ＤＳＰ３から供給されるデジタル音声信号を、アナログ音声信号にＤ／Ａ変換する。アンプ５は、ＤＡＣ４から供給されるアナログ音声信号を増幅する。スピーカー６は、アンプ５から供給されるアナログ音声信号に基づいて、音声を出力する。 The microcomputer 2 (control unit) is composed of hardware such as a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and an input / output interface. The CPU controls each unit constituting the speaker device 1 according to a program stored in the ROM. DSP3 performs various signal processing on the digital audio signal. The DAC4 D / A-converts the digital audio signal supplied from the DSP3 into an analog audio signal. The amplifier 5 amplifies the analog audio signal supplied from the DAC 4. The speaker 6 outputs audio based on the analog audio signal supplied from the amplifier 5.

第１マイク７１〜第ｎマイク７ｎ（複数のマイク）（マイク７）は、音声を集音する。第１マイク７１〜第ｎマイク７は、アナログマイクである。第１プリアンプ８１〜第（ｎ−１）プリアンプ８ｎ−１（プリアンプ８）は、マイクから供給されるアナログ音声信号を増幅する。第１ＡＤＣ９１〜第（ｎ−１）ＡＤＣ９ｎ−１（ＡＤＣ９）は、プリアンプから供給されるアナログ音声信号を、デジタル音声信号にＤ／Ａ変換する。デジタル音声信号は、ＤＳＰ３に供給される。 The first microphones 71 to nth microphones 7n (a plurality of microphones) (microphones 7) collect sound. The first microphone 71 to the nth microphone 7 are analog microphones. The first preamplifier 81 to the (n-1) preamplifier 8n-1 (preamplifier 8) amplifies the analog audio signal supplied from the microphone. The first ADC 91-th (n-1) ADC 9n-1 (ADC9) D / A-converts an analog audio signal supplied from the preamplifier into a digital audio signal. The digital audio signal is supplied to the DSP3.

ボリュームＩＣ１０（音量調整部）は、第ｎマイク７ｎが集音した音声を減衰する。第ｎＡＤＣ９ｎは、ボリュームＩＣ１０から供給されるアナログ音声信号を、デジタル信号にＡ／Ｄ変換する。デジタル音声信号は、ＤＳＰ３に供給される。表示部１１は、音声認識の状況を表示するためのＬＥＤ、画像等を表示するＬＣＤにより構成される。 The volume IC 10 (volume adjusting unit) attenuates the sound collected by the nth microphone 7n. The nADC9n A / D-converts the analog audio signal supplied from the volume IC 10 into a digital signal. The digital audio signal is supplied to the DSP3. The display unit 11 is composed of an LED for displaying the status of voice recognition, an LCD for displaying an image, and the like.

第ｎマイク７ｎは、ささやき声（所定のレベル以下の音声）認識用のマイクである。ここで、ボリュームＩＣによる音声の減衰量は、第ｎマイク７ｎが集音した音声が、所定のレベル以下である場合に、クリップしない減衰量である。従って、第ｎマイク７ｎが集音した音声が、所定のレベル以下であれば、クリップせず、音声認識可能である。一方で、第ｎマイク７ｎが集音した音声が、所定のレベルよりも大きければ（例えば、普段の音声レベル）、クリップし、音声認識することができない。ささやき声のレベルであれば、ボリュームＩＣ１０からの出力は、正常な値となり、第ｎマイク７ｎが集音した音声により、音声認識が可能となる。ユーザーは、ボリュームＩＣ１０による減衰量を設定することができる。このため、ユーザーは、所望の音声レベルのささやき声が認識されるように、設定することができる。 The nth microphone 7n is a microphone for recognizing whispering voice (voice below a predetermined level). Here, the amount of sound attenuation by the volume IC is an amount of attenuation that does not clip when the sound collected by the nth microphone 7n is equal to or lower than a predetermined level. Therefore, if the sound collected by the nth microphone 7n is at a predetermined level or less, the sound can be recognized without clipping. On the other hand, if the voice collected by the nth microphone 7n is louder than a predetermined level (for example, a normal voice level), it cannot be clipped and the voice cannot be recognized. At the level of whispering, the output from the volume IC 10 becomes a normal value, and voice recognition is possible by the voice collected by the nth microphone 7n. The user can set the amount of attenuation by the volume IC 10. Therefore, the user can set the whispering voice level to be recognized.

通常、マイクロコンピューター２は、マイク７が集音した音声を、サーバーに送信し、サーバーから送信された質問に対する回答等を、スピーカー６に出力する。従って、スピーカー装置１は、音声認識の結果に対して音声を出力する。一方、マイクロコンピューター２は、第ｎマイク７ｎから入力される音声のレベルが、所定のレベル以下である場合、音声がクリップしていないことから、サーバーで音声認識可能であり、音声認識の結果に対して、音声出力の代わりに、表示部１１のＬＣＤに表示を行う。例えば、音声認識の結果が、「今日の天気は」（命令文）であり、それに対する実行内容が、「今日の天気は晴れです」（実行文）であった場合、マイクロコンピューター２は、これらをＬＣＤに表示する。このとき、マイクロコンピューター２は、ＤＡＣ４やアンプ５を制御し、スピーカー出力をミュートする。 Normally, the microcomputer 2 transmits the sound collected by the microphone 7 to the server, and outputs the answer to the question transmitted from the server to the speaker 6. Therefore, the speaker device 1 outputs voice to the result of voice recognition. On the other hand, when the level of the voice input from the nth microphone 7n is equal to or lower than the predetermined level, the microcomputer 2 can recognize the voice on the server because the voice is not clipped, and the result of the voice recognition is On the other hand, instead of the audio output, the display is performed on the LCD of the display unit 11. For example, if the result of voice recognition is "Today's weather is" (command sentence) and the execution content for it is "Today's weather is sunny" (execution sentence), the microcomputer 2 will use these. Is displayed on the LCD. At this time, the microcomputer 2 controls the DAC 4 and the amplifier 5 to mute the speaker output.

以下、音声認識時のスピーカー装置１０１の処理動作を、図２に示すフローチャートに基づいて説明する。マイクロコンピューター２は、音声を受信すると（Ｓ１）、音声を認識できたか否かを判断する（Ｓ２）。マイクロコンピューター２は、音声認識できなかったと判断した場合（Ｓ２：Ｎｏ）、ＬＥＤにより、音声を受信できなかった旨を通知する（Ｓ３）。マイクロコンピューター２は、音声を認識できたと判断した場合（Ｓ２：Ｙｅｓ）、ＬＥＤにより、受信状況（受信できた旨）をユーザーにフィードバックする（Ｓ４）。次に、マイクロコンピューター２は、サーバーに受信した音声を送信する（Ｓ５）。次に、マイクロコンピューター２は、サーバーからの応答結果を受信する（Ｓ６）。 Hereinafter, the processing operation of the speaker device 101 during voice recognition will be described with reference to the flowchart shown in FIG. When the microcomputer 2 receives the voice (S1), it determines whether or not the voice can be recognized (S2). When the microcomputer 2 determines that the voice cannot be recognized (S2: No), the microcomputer 2 notifies that the voice could not be received by the LED (S3). When the microcomputer 2 determines that the voice can be recognized (S2: Yes), the microcomputer 2 feeds back the reception status (the fact that the voice has been received) to the user by the LED (S4). Next, the microcomputer 2 transmits the received voice to the server (S5). Next, the microcomputer 2 receives the response result from the server (S6).

次に、マイクロコンピューター２は、受話音量が４０ｄＢ以下であるか否かを判断する（Ｓ７）。マイクロコンピューター２は、受話音量が４０ｄＢ以下であると判断した場合（Ｓ７：Ｙｅｓ）、話者位置が２ｍ以内であるか否かを判断する（Ｓ８）。ここで、スピーカー装置１は、マイク７を複数備えているため、話者の位置をある程度特定可能である。マイクロコンピューター１０２は、話者位置が２ｍ以内でないと判断した場合（Ｓ８：Ｎｏ）、主要フォルマントの落ち込みが少ないか否かを判断する（Ｓ９）。マイクロコンピューター１０２は、受話音量が４０ｄＢ以下でないと判断した場合（Ｓ７：Ｎｏ）、主要フォルマントの落ち込みが少なくないと判断した場合（Ｓ９：Ｎｏ）、応答結果を音声で返答する（Ｓ１０）。マイクロコンピューター１０２は、話者位置が２ｍ以内であると判断した場合（Ｓ８：Ｙｅｓ）、主要フォルマントの落ち込みが少ないと判断した場合（Ｓ９：Ｙｅｓ）、応答結果を画面に表示する（Ｓ１１）。 Next, the microcomputer 2 determines whether or not the received volume is 40 dB or less (S7). When the microcomputer 2 determines that the received volume is 40 dB or less (S7: Yes), the microcomputer 2 determines whether or not the speaker position is within 2 m (S8). Here, since the speaker device 1 includes a plurality of microphones 7, the position of the speaker can be specified to some extent. When the microcomputer 102 determines that the speaker position is not within 2 m (S8: No), the microcomputer 102 determines whether or not the main formant has a small drop (S9). When it is determined that the received volume is not 40 dB or less (S7: No), and when it is determined that the drop of the main formant is not small (S9: No), the microcomputer 102 replies the response result by voice (S10). The microcomputer 102 displays the response result on the screen when it is determined that the speaker position is within 2 m (S8: Yes) and when it is determined that the main formant is not depressed (S9: Yes).

Ｓ９において、マイクロコンピューター２は、主要フォルマントの落ち込みが少ないと判断した場合（Ｓ９：Ｙｅｓ）、ささやき声であるため、応答結果を画面に表示し（Ｓ１１）、主要フォルマントの落ち込みが少なくないと判断した場合（Ｓ９：Ｎｏ）、ささやき声でないため、応答結果を音声で返答する（Ｓ１０）。このように、ささやき声が持つ通常の会話との音響特性の違いから、画面表示と音声とによる返答が、切り分けられている。 In S9, when the microcomputer 2 determines that the depression of the main formant is small (S9: Yes), it is a whisper, so the response result is displayed on the screen (S11), and it is determined that the depression of the main formant is not small. In the case (S9: No), since it is not a whisper, the response result is replied by voice (S10). In this way, the response by the screen display and the voice is separated from the difference in the acoustic characteristics of the whispering voice from the normal conversation.

以上説明したように、本実施形態では、マイクロコンピューター２は、入力される音声のレベルが、所定のレベル以下である場合に、音声認識の結果に対して、音声出力の代わりに、表示を行う。これにより、ユーザーは、小さい声で話せばよい。このため、発話（入力）内容を周囲の人間に知られることがない。また、周囲に雑音を撒き散らすこともない。また、ユーザーは、視覚で情報を得ることができる。このように、本実施形態によれば、種々の問題を解決することができる。 As described above, in the present embodiment, when the input voice level is equal to or lower than a predetermined level, the microcomputer 2 displays the voice recognition result instead of the voice output. .. This allows the user to speak in a quiet voice. Therefore, the content of the utterance (input) is not known to the surrounding people. In addition, it does not scatter noise around. In addition, the user can obtain information visually. As described above, according to the present embodiment, various problems can be solved.

（第２実施形態）
図３は、第２実施形態に係るスピーカー装置１０１を示すブロック図である。図３に示すように、スピーカー装置１０１（電子機器）は、マイクロコンピューター１０２と、ＤＳＰ１０３と、ＤＡＣ１０４と、アンプ１０５と、スピーカー１０６と、マイク１０７と、メモリ１０８と、表示部１０９と、を備える。第２実施形態に係るスピーカー装置１０１は、第１実施形態に係るスピーカー装置１と比べて、デジタルマイクであるマイク１０７が１つである点、プリアンプ、ＡＤＣ、ボリュームＩＣを備えていない点、メモリ１０８を備えている点が異なる。マイクロコンピューター１０２、ＤＳＰ１０３、ＤＡＣ１０４、アンプ１０５、スピーカー１０６、及び、表示部１０９は、それぞれ、マイクロコンピューター２、ＤＳＰ３、ＤＡＣ４、アンプ５、スピーカー６、及び、表示部１１と同様の構成である。 (Second Embodiment)
FIG. 3 is a block diagram showing the speaker device 101 according to the second embodiment. As shown in FIG. 3, the speaker device 101 (electronic device) includes a microcomputer 102, a DSP 103, a DAC 104, an amplifier 105, a speaker 106, a microphone 107, a memory 108, and a display unit 109. .. Compared to the speaker device 1 according to the first embodiment, the speaker device 101 according to the second embodiment has one microphone 107, which is a digital microphone, does not have a preamplifier, an ADC, and a volume IC, and has a memory. The difference is that it has 108. The microcomputer 102, DSP103, DAC104, amplifier 105, speaker 106, and display unit 109 have the same configurations as the microcomputer 2, DSP3, DAC4, amplifier 5, speaker 6, and display unit 11, respectively.

マイク１０７は、上述のように、デジタルマイクである。マイク１０７は、音声を集音し、集音した音声を、デジタル音声信号として出力する。デジタル音声信号は、ＤＳＰ１０３に供給される。メモリ１０８は、例えば、フラッシュメモリである。メモリ１０８は、ささやき声を判定するための、所定のレベルを記憶している。マイクロコンピューター１０２は、マイク１０７が集音した音声が、メモリ１０８に記憶されている所定のレベル以下あるか否かを判断する。マイクロコンピューター１０２は、マイク１０７が集音した音声が、メモリ１０８に記憶されている所定のレベル以下であると判断した場合、音声認識の結果に対して、音声出力の代わりに、表示部１１のＬＣＤに表示を行う。 The microphone 107 is a digital microphone as described above. The microphone 107 collects sound and outputs the collected sound as a digital voice signal. The digital audio signal is supplied to the DSP 103. The memory 108 is, for example, a flash memory. The memory 108 stores a predetermined level for determining the whispering voice. The microcomputer 102 determines whether or not the sound collected by the microphone 107 is at or below a predetermined level stored in the memory 108. When the microcomputer 102 determines that the sound collected by the microphone 107 is equal to or lower than a predetermined level stored in the memory 108, the display unit 11 instead of the voice output is used for the voice recognition result. Display on the LCD.

以上、本発明の実施形態について説明したが、本発明を適用可能な形態は、上述の実施形態には限られるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更を加えることが可能である。 Although the embodiments of the present invention have been described above, the embodiments to which the present invention can be applied are not limited to the above-described embodiments, and modifications can be made as appropriate without departing from the spirit of the present invention. is there.

本発明は、音声認識機能を利用する電子機器に好適に採用され得る。 The present invention can be suitably adopted in an electronic device that utilizes a voice recognition function.

１、１０１スピーカー装置（電子機器）
２、１０２マイクロコンピューター（制御部）
３、１０３ＤＳＰ
４、１０４ＤＡＣ
５、１０５アンプ
６、１０６スピーカー
７、１０７マイク
８プリアンプ
９ＡＤＣ
１０ボリュームＩＣ（音量調整部）
１１、１０９表示部
１０８メモリ 1,101 Speaker device (electronic device)
2,102 Microcomputer (control unit)
3,103 DSP
4,104 DAC
5, 105 Amplifier 6, 106 Speaker 7, 107 Microphone 8 Preamplifier 9 ADC
10 Volume IC (volume control unit)
11, 109 Display 108 Memory

Claims

An electronic device that outputs voice to the result of voice recognition.
An electronic device including a control unit that displays a voice recognition result instead of voice output when the input voice level is equal to or lower than a predetermined level.

The electronic device according to claim 1, wherein the control unit mutes the speaker output when the level of the input voice is equal to or lower than a predetermined level.

The electronic device according to claim 1 or 2, wherein the control unit displays a voice recognition result and an execution content for the voice recognition result.

With multiple microphones
A volume control unit that attenuates the sound collected by any of the plurality of microphones is further provided.
The electronic device according to any one of claims 1 to 3, wherein the amount of attenuation by the volume adjusting unit is an amount of attenuation at which the sound is not clipped when the sound is at or below the predetermined level.

A digital microphone that collects sound and outputs the collected sound as a digital audio signal,
A memory for storing the predetermined level is further provided.
The control unit according to claim 1 to 3, wherein the control unit determines whether or not the input voice is equal to or lower than the predetermined level based on the predetermined level stored in the memory. The electronic device according to any one item.