JP2009122210A

JP2009122210A - Speech processing system, microcomputer for speech processing for implementing same, and electronic equipment with speech output function and speech input function

Info

Publication number: JP2009122210A
Application number: JP2007293871A
Authority: JP
Inventors: Taketomi Shimizu; 建臣清水
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2007-11-13
Filing date: 2007-11-13
Publication date: 2009-06-04

Abstract

<P>PROBLEM TO BE SOLVED: To perform speech recognition, even when speech output is overlapped during a speech input period. <P>SOLUTION: This electronic equipment includes: a speech output processing section 3 for generating and outputting a speech output source waveform; a speaker 15 for outputting the speech output source waveform to outside by performing speech conversion on it; a microphone 17 for taking-in external speech; a correction processing section 5 for performing reverse phase conversion of the waveform similar to the speech output source waveform; and a feed-back processing section 7 for removing the speech output source waveform or its similar waveform, by using the waveform on which the reverse phase conversion is performed by the correction processing section 5, from the waveform of the external speech taken by the feed-back processing section 7. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声処理方式、この方式を実行するマイクロコンピュータ、およびこのマイクロコンピュータを搭載する電子機器に関するものである。 The present invention relates to a voice processing method, a microcomputer that executes this method, and an electronic device in which the microcomputer is mounted.

各種製品では、ユニバーサル・デザイン（万人共通設計）の一環として、スピーカからの音声ガイダンスでユーザに対し操作を促したり、現在の状態を知らせたりするガイダンス音声出力機能付き製品が多数提案されて実現されている。一方、製品側でユーザ等の各種音声（ユーザ音声）のマイク入力を認識することにより、所要の操作等を可能としたユーザ音声入力機能付き製品もある。そして、近年では、上記ガイダンス音声出力機能とユーザ音声入力機能とを具備した音声入出力機能付き製品も多数出現している。この種の音声入出力機能付き製品においては、ガイダンス音声がマイクからユーザ音声として入力されないように、ガイダンス音声の出力タイミングとユーザ音声の入力タイミングとが重ならないようにしている。これに関して説明する。マイクロコンピュータ１が音声入力（認識）中であるときに音声ガイダンスが流れてしまうと、本来ならユーザの入力音声のみを受け付けているのに、ガイダンス音声までもがマイクを通じて内部に入力されてしまうことになる。通常はこれを防ぐため、音声出力と音声入力は交互に処理される。音声出力中であるときには音声入力は受け付けず、音声入力中であるときには音声出力はされない。 In various products, as part of universal design (common design for all people), many products with guidance voice output function that prompts the user with voice guidance from the speaker and informs the current status are proposed and realized. Has been. On the other hand, there is also a product with a user voice input function that enables a required operation or the like by recognizing a microphone input of various voices (user voices) of the user or the like on the product side. In recent years, many products with voice input / output functions having the above-mentioned guidance voice output function and user voice input function have appeared. In this type of product with a voice input / output function, the guidance voice output timing and the user voice input timing do not overlap so that the guidance voice is not inputted as a user voice from a microphone. This will be described. If voice guidance flows while the microcomputer 1 is inputting voice (recognition), only the voice input from the user is normally accepted, but even the guidance voice is input to the inside through the microphone. become. Usually, to prevent this, audio output and audio input are processed alternately. When the voice is being output, no voice input is accepted, and when the voice is being input, no voice is output.

しかしながら、このような音声入出力機能付き製品では、ユーザサイドとしては、既に理解済みのガイダンス音声が終了するのを待ってからユーザ音声を入力する必要があり、迅速性に欠けるという課題がある。そこで、この音声入出力機能付き製品では、ガイダンス音声出力中でもユーザ音声受付可能とした場合、ガイダンス音声がユーザ音声と共にマイクから入力されると、当該ガイダンス音声が製品側から雑音と判断されてしまい、製品側ではその雑音によりユーザ音声を正しく認識することができなくなるということが課題となっている。このような上記した雑音を消去する技術としては例えば特許文献１を挙げることができるが、この特許文献１に開示されている技術では、雑音を消去することができても、特定の音であるユーザ音声のみを抽出することはできなかった。
特開平６−２９５１８８号公報 However, in such a product with a voice input / output function, it is necessary for the user side to input the user voice after waiting for the already understood guidance voice to end, and there is a problem of lack of speed. Therefore, in this product with voice input / output function, when the user voice can be accepted even during the guidance voice output, when the guidance voice is input from the microphone together with the user voice, the guidance voice is judged as noise from the product side, The problem is that the product side cannot correctly recognize the user voice due to the noise. For example, Patent Literature 1 can be cited as a technique for eliminating the above-described noise. However, the technique disclosed in Patent Literature 1 is a specific sound even if the noise can be eliminated. It was not possible to extract only the user voice.
JP-A-6-295188

したがって、本発明により解決すべき課題は、ガイダンス音声等の音声出力中でもユーザ音声等の音声入力を受け付けることが可能であると共に、ガイダンス音声等の音声がユーザ音声等の音声と共にマイクから同時入力されてもユーザ音声等の音声を正しく認識できるように音声処理することである。 Therefore, the problem to be solved by the present invention is that voice input such as user voice can be received even during voice output such as guidance voice, and voice such as guidance voice is simultaneously input from a microphone together with voice such as user voice. However, voice processing is performed so that voice such as user voice can be correctly recognized.

本発明第１による音声処理方式は、音声出力源波形を生成出力する第１手段と、上記音声出力源波形を音声変換して外部に放出する第２手段と、外部音声を取り込む第３手段と、音声出力源波形に近似した波形を逆位相変換する第４手段と、上記第３手段が取り込んだ外部音声の波形から上記第４手段で逆位相変換した波形を用いて上記音声出力源波形またはこれに近似する波形を除去処理する第５手段と、を備えることを特徴とする音声処理方式である。 The sound processing method according to the first aspect of the present invention includes a first means for generating and outputting a sound output source waveform, a second means for converting the sound output source waveform into sound and releasing it to the outside, and a third means for capturing external sound. A fourth means for performing reverse phase conversion on the waveform approximated to the sound output source waveform, and the sound output source waveform using the waveform obtained by reverse phase conversion by the fourth means from the waveform of the external sound captured by the third means. And a fifth means for removing a waveform that approximates the sound processing method.

音声出力源波形は例えばガイダンス音声等の音声波形である。外部からの音声は例えばユーザ音声等の音声に限らず、第２手段から外部に放出したガイダンス音声等の音声も含む。また、第２手段は例えばスピーカであり、第３手段は例えばマイクであることが好ましい。 The voice output source waveform is a voice waveform such as a guidance voice. The voices from the outside are not limited to voices such as user voices but also include voices such as guidance voices emitted from the second means to the outside. The second means is preferably a speaker, for example, and the third means is preferably a microphone, for example.

上記本発明第１において、第３手段である例えばマイクに取り込んだ外部音声の波形には、ガイダンス音声等の音声とユーザ音声等の音声とがある。そして、第５手段では、音声出力源波形に近似した波形の除去に、当該音声出力源波形に近似した波形を逆位相変換した波形を用いているので、外部音声の波形からは音声出力源波形に近似した波形が除去され、ユーザ音声等の音声を取り出すことができるようになる。これにより、ユーザサイドとしては、既に理解済みのガイダンス音声が終了を待たずにマイクからユーザ音声を入力して迅速性を向上させた場合でも、ユーザ音声を正しく認識させることができるようになる。 In the first aspect of the present invention, the waveform of the external voice captured by the microphone, which is the third means, includes voice such as guidance voice and voice such as user voice. In the fifth means, the waveform approximated to the audio output source waveform is used to remove the waveform approximated to the audio output source waveform, so that the waveform approximated to the audio output source waveform is antiphase-transformed. The waveform approximated to is removed, and the user's voice and the like can be extracted. As a result, the user side can correctly recognize the user voice even when the already-understood guidance voice is input without the user waiting for the end of the guidance voice to improve the speed.

本発明第１において好ましい態様は、上記第２手段が、スピーカである。 In a first aspect of the present invention, the second means is a speaker.

本発明第１において好ましい態様は、上記第３手段が、マイクである。 In the first aspect of the present invention, the third means is a microphone.

本発明第１において好ましい態様は、上記音声出力源波形に近似した波形が、音声出力源波形（内部音声出力源波形）と、第２手段で外部に放出され第３手段で取り込まれた音声出力源波形（外部音声出力源波形）とを比較して近似係数を演算し、内部音声出力源波形をこの演算により得た近似係数で波形近似したものである。 In the first preferred embodiment of the present invention, the waveform approximated to the sound output source waveform is a sound output source waveform (internal sound output source waveform), and sound output that is emitted to the outside by the second means and captured by the third means. The approximation coefficient is calculated by comparing with the source waveform (external audio output source waveform), and the internal audio output source waveform is approximated with the approximation coefficient obtained by this calculation.

本発明第２による音声処理マイクロコンピュータは、音声出力源波形を音声変換して外部に放出するスピーカと、外部から音声を取り込むマイクと、を備えた電子機器に搭載される音声処理用マイクロコンピュータであって、当該マイクロコンピュータは、音声出力源波形を生成しスピーカから音声に変換して出力処理する第１ステップと、音声出力源波形に近似した波形を逆位相変換処理する第２ステップと、マイクに入力した音声波形から上記逆位相変換した波形を用いて上記音声出力源波形に近似する波形を除去処理する第３ステップと、を実行することが可能になっていることを特徴とするものである。 An audio processing microcomputer according to a second aspect of the present invention is an audio processing microcomputer mounted on an electronic device including a speaker that converts an audio output source waveform into sound and emits it outside, and a microphone that captures sound from outside. The microcomputer generates a sound output source waveform, converts the sound from the speaker into sound, and performs output processing; a second step that performs reverse phase conversion processing on the waveform approximated to the sound output source waveform; and a microphone And a third step of removing a waveform approximating the audio output source waveform using the waveform obtained by reverse phase conversion from the audio waveform input to the audio waveform. is there.

本発明第２において好ましい態様は、第２ステップが、上記音声出力源波形に近似した波形を、音声出力源波形（内部音声出力源波形）と、スピーカから外部に放出されマイクで取り込まれた音声出力源波形（外部音声出力源波形）とを比較して近似係数を演算し、内部音声出力源波形をこの演算により得た近似係数で波形近似して得るステップである。 In a second preferred embodiment of the present invention, the second step is a voice output source waveform (internal voice output source waveform), which is a waveform approximated to the voice output source waveform, and a voice that is emitted from a speaker and is captured by a microphone. In this step, an approximation coefficient is calculated by comparing with an output source waveform (external audio output source waveform), and an internal audio output source waveform is approximated with an approximation coefficient obtained by this calculation.

本発明第３による電子機器は、ガイダンス音声を出力する音声出力源と、上記音声出力源からの音声出力波形を音声変換して外部に放出するスピーカと、外部から音声を取り込むマイクと、音声出力源波形に近似した波形を逆位相変換する第１手段と、上記マイクが取り込んだ外部音声の波形から上記第１手段で逆位相変換した波形を用いて上記音声出力源波形に近似する波形を除去処理する第２手段と、上記除去処理した音声の波形をユーザ音声の波形として認識処理する第３手段と、を具備したことを特徴とする電子機器である。 An electronic device according to a third aspect of the present invention includes an audio output source that outputs a guidance audio, a speaker that converts an audio output waveform from the audio output source into an external sound, a microphone that captures audio from the outside, an audio output A first means for performing antiphase conversion on a waveform approximated to a source waveform, and a waveform approximating to the sound output source waveform using a waveform obtained by performing antiphase conversion on the waveform of the external sound captured by the microphone; An electronic apparatus comprising: second means for processing; and third means for recognizing and processing the waveform of the voice subjected to the removal processing as a waveform of a user voice.

本発明第３において好ましい態様は、上記第１手段において、上記音声出力源波形に近似した波形は、音声出力源波形（内部音声出力源波形）と、スピーカから外部に放出されマイクで取り込まれた音声出力源波形（外部音声出力源波形）とを比較して近似係数を演算し、内部音声出力源波形をこの演算により得た近似係数で波形近似したものである。 According to a third aspect of the present invention, in the first means, the waveform approximated to the audio output source waveform is an audio output source waveform (internal audio output source waveform) and is emitted from a speaker to the outside and captured by a microphone. The approximation coefficient is calculated by comparing with the audio output source waveform (external audio output source waveform), and the internal audio output source waveform is approximated by the approximation coefficient obtained by this calculation.

本発明第３において好ましい態様は、ユーザ操作により、アジャストモードと通常モードとに設定可能であり、アジャストモードでは、第１手段を実行し、通常モードでは第２手段を実行することが可能になっている。 In the third aspect of the present invention, it is possible to set the adjustment mode and the normal mode by a user operation. In the adjustment mode, the first means can be executed, and in the normal mode, the second means can be executed. ing.

本発明第３において好ましい態様は、ユーザ操作による上記モードの切替状態をレジスタに記憶させて設定することである。 In the third aspect of the present invention, the mode switching state by the user operation is stored in a register and set.

本発明の音声処理方式では、例えば音声出力源波形として音声ガイダンス機能とユーザ音声入力機能付き製品においてガイダンス音声で製品の操作をガイド中にユーザ音声を入力しても、ユーザ音声を正しく認識してユーザ音声に対応することができるようになり、ガイダンス音声出力処理とユーザ音声入力処理とを同時処理可能として即時性の高い操作を実現できるようになる。 In the voice processing method of the present invention, for example, even if a user voice is input while guiding the operation of the product with a guidance voice in a product with a voice guidance function and a user voice input function as a voice output source waveform, the user voice is correctly recognized. It becomes possible to cope with the user voice, and the guidance voice output process and the user voice input process can be processed at the same time, thereby realizing a highly immediate operation.

以下、本発明の実施の形態に係る音声処理方式を、図面を参照して、詳細に説明する。 Hereinafter, a voice processing method according to an embodiment of the present invention will be described in detail with reference to the drawings.

図１に本発明の実施の形態に係る音声処理方式を実施する音声処理用マイクロコンピュータを内蔵した電子レンジ等の電子機器において特にその音声処理用マイクロコンピュータ１の音声処理機能をハードウェア的な概略構成で示す。この電子機器２３は、音声処理用マイクロコンピュータ１と、音声出力インターフェース１９と、スピーカ１５と、音声入力インターフェース２１と、マイク１７と、を備える。音声処理用マイクロコンピュータ１はガイダンス音声を出力処理することができると共に、ユーザ音声を入力処理することができる。 FIG. 1 shows a hardware outline of the sound processing function of the sound processing microcomputer 1 particularly in an electronic device such as a microwave oven incorporating a sound processing microcomputer that implements the sound processing method according to the embodiment of the present invention. Shown in configuration. The electronic device 23 includes a sound processing microcomputer 1, a sound output interface 19, a speaker 15, a sound input interface 21, and a microphone 17. The voice processing microcomputer 1 can output guidance voice and input user voice.

なお、実施の形態においては、電子機器が例えば電子レンジであれば、調理者の調理開始操作ボタンやその他のボタン操作に応じて音声処理用マイクロコンピュータ１がそれに対応したガイダンス音声をスピーカ１５から出力して調理者の調理を補助したりする一方、マイク１７への調理者からの調理等に関わる音声入力を音声入力処理部９が処理して電子レンジを作動制御するようにしてもよい。このような制御およびそれに必要な他の機能要素に関しては実施の形態ではその説明および図示を略する。 In the embodiment, if the electronic device is, for example, a microwave oven, the voice processing microcomputer 1 outputs a guidance voice corresponding to the cook start operation button or other button operation from the speaker 15 in response to a cook's cooking start operation button or other button operations. Then, while assisting cooking by the cook, the voice input processing unit 9 may process voice input related to cooking from the cook to the microphone 17 to control the operation of the microwave oven. The description and illustration of such control and other functional elements necessary for the control are omitted in the embodiments.

音声処理用マイクロコンピュータ１は、上記音声処理において、音声出力処理部３と、補正処理部５と、フィードバック処理部７と、音声入力処理部９と、アジャストモード設定部１１と、切換部（ＭＵＸ）１３と、を備える。なお、実施の形態では説明の都合で各機能要素に名称を付けたが、この名称は説明の便宜で付したものであり、これら名称に限定されるものでは何等ない。 The voice processing microcomputer 1 includes a voice output processing unit 3, a correction processing unit 5, a feedback processing unit 7, a voice input processing unit 9, an adjustment mode setting unit 11, a switching unit (MUX) in the voice processing. 13). In the embodiment, each functional element is given a name for convenience of explanation, but this name is given for convenience of explanation, and is not limited to these names.

音声出力処理部３は、音声出力源波形としてガイダンス音声波形を音声出力インターフェース１９を介してスピーカ１５からガイダンス音声に変換してユーザに出力する。ガイダンス音声は、例えば電子機器が電子レンジであれば、その電子レンジの操作手順や現在の調理状態等をユーザに説明する音声のことをいう。もちろん、電子レンジではなく、他の電子機器、電子装置、携帯端末、情報機器、その他に応じて適宜に音声出力源を設定することができるものであり、音声出力源波形がガイダンス音声に限定されるものではない。 The voice output processing unit 3 converts the guidance voice waveform as the voice output source waveform from the speaker 15 through the voice output interface 19 into the guidance voice and outputs it to the user. For example, when the electronic device is a microwave oven, the guidance voice refers to a voice explaining the operation procedure of the microwave oven, the current cooking state, and the like to the user. Of course, the sound output source can be set appropriately according to other electronic devices, electronic devices, portable terminals, information devices, etc., not the microwave oven, and the sound output source waveform is limited to the guidance sound. It is not something.

ここで、ガイダンス音声波形においては、音声出力処理部３からフィードバック処理部７に直接入力されるガイダンス音声波形を内部ガイダンス音声波形、スピーカ１５およびマイク１７を通してフィードバック処理部７に入力されるガイダンス音声を外部ガイダンス音声波形ということにする。 Here, in the guidance voice waveform, the guidance voice waveform that is directly input from the voice output processing unit 3 to the feedback processing unit 7 is the internal guidance voice waveform, and the guidance voice that is input to the feedback processing unit 7 through the speaker 15 and the microphone 17. This is called the external guidance voice waveform.

フィードバック処理部７においては、マイク１７から外部ガイダンス音声波形を取り込むと共に、取り込んだ外部ガイダンス音声波形と音声出力処理部３からアジャストモード設定部１１を介して入力する内部ガイダンス音声波形とを比較して両波形の近似係数を演算する。そして、フィードバック処理部７では、上記算出した波形近似係数を補正処理部５に入力する。 The feedback processing unit 7 captures the external guidance speech waveform from the microphone 17 and compares the captured external guidance speech waveform with the internal guidance speech waveform input from the speech output processing unit 3 via the adjustment mode setting unit 11. Calculate approximate coefficients of both waveforms. The feedback processing unit 7 inputs the calculated waveform approximation coefficient to the correction processing unit 5.

補正処理部５は音声出力処理部３からのガイダンス音声波形に上記フィードバック処理部７からの波形近似係数をかけ合わせて外部ガイダンス音声近似波形を得ると共に、その外部ガイダンス音声近似波形を逆位相変換処理して外部ガイダンス音声波形除去用波形として切換部１３を介してフィードバック処理部７に出力処理する。 The correction processing unit 5 multiplies the guidance speech waveform from the speech output processing unit 3 by the waveform approximation coefficient from the feedback processing unit 7 to obtain an external guidance speech approximate waveform, and performs antiphase conversion processing on the external guidance speech approximate waveform. Then, it is output to the feedback processing unit 7 via the switching unit 13 as an external guidance voice waveform removal waveform.

フィードバック処理部７ではマイク１７を通じて入力される外部ガイダンス音声波形とユーザ音声との合成音声波形から、補正処理部５から入力される外部ガイダンス音声波形除去用波形を用いて外部ガイダンス音声波形を除去する。 The feedback processing unit 7 removes the external guidance speech waveform from the synthesized speech waveform of the external guidance speech waveform input through the microphone 17 and the user speech, using the external guidance speech waveform removal waveform input from the correction processing unit 5. .

これにより外部ガイダンス音声とユーザ音声とがマイク１７を介して音声処理用マイクロコンピュータ１内部に同時に入力されても、ユーザ音声のみが音声入力処理部９に取り込まれる。 Thereby, even if the external guidance voice and the user voice are simultaneously input into the voice processing microcomputer 1 via the microphone 17, only the user voice is taken into the voice input processing unit 9.

以上からガイダンス音声出力中であってもユーザ音声の入力を受け付けることが実現でき、反対に、ユーザが音声入力中であっても、ガイダンス音声を流すことが実現できる。 From the above, it is possible to accept the input of the user voice even while the guidance voice is being output, and conversely, it is possible to play the guidance voice even when the user is inputting the voice.

なお、電子機器２３の周囲環境によっては、スピーカ１５およびマイク１７を通じてフィードバック処理部７に入力される外部ガイダンス音声波形は相違するので、フィードバック処理部７で、その周囲の環境に即した波形近似係数を演算するようになっている。そして、演算した波形近似係数を補正処理部５に記憶させておき、ユーザがマイク１７に入力したユーザ音声と外部ガイダンス音声それぞれの波形を合成した波形から上記波形近似係数を用いて外部ガイダンス音声波形を除去し、ユーザ−音声のみを音声入力処理部９に入力することができるようにしている。 Note that, depending on the surrounding environment of the electronic device 23, the external guidance voice waveform input to the feedback processing unit 7 through the speaker 15 and the microphone 17 is different. Therefore, the feedback processing unit 7 uses the waveform approximation coefficient corresponding to the surrounding environment. Is calculated. Then, the calculated waveform approximation coefficient is stored in the correction processing unit 5, and the external guidance voice waveform is obtained from the waveform obtained by synthesizing the waveforms of the user voice and the external guidance voice input to the microphone 17 by the user using the waveform approximation coefficient. So that only the user-voice can be input to the voice input processing unit 9.

そのため、この電子機器２３においてはユーザ操作により、波形近似係数を周囲の環境にアジャストさせるよう演算させるためのアジャストモードの設定と、その設定が終了した後はそのアジャストモードを解除して通常のモードに設定することができるようにしている。 Therefore, in this electronic device 23, setting of the adjustment mode for calculating the waveform approximation coefficient to adjust to the surrounding environment by the user operation, and after the setting is completed, the adjustment mode is canceled and the normal mode is set. To be able to set.

図２を参照してそのアジャストモードの設定と解除とを中心にして説明する。アジャストモード設定部１１は、レジスタで構成されている。このレジスタはユーザ操作でアジャストモード設定であるＯＮ（ビット１）と、アジャストモード設定解除で通常モードであるＯＦＦ（ビット０）とをその特定ビットの変更で変更できるものである。 The setting and cancellation of the adjustment mode will be mainly described with reference to FIG. The adjustment mode setting unit 11 includes a register. This register can change ON (bit 1), which is the adjustment mode setting, by user operation, and OFF (bit 0), which is the normal mode, when the adjustment mode setting is canceled by changing the specific bit.

アジャストモード設定部１１のレジスタＯＮでアジャストモードが設定され、切換部１３からフィードバック処理部７に内部ガイダンス音声波形が入力されると共に、スピーカ１５、マイク１７から外部ガイダンス音声波形がフィードバック処理部７に入力される。フィードバック処理部７は上記入力した内部ガイダンス音声波形と外部ガイダンス音声波形との波形比較を行ない外部ガイダンス音声波形に内部ガイダンス音声波形を近似させるための波形近似係数を算出する。この波形近似係数はフィードバック処理部７から補正処理部５にフィードバック入力される。 The adjustment mode is set by turning ON the register of the adjustment mode setting unit 11, the internal guidance voice waveform is input from the switching unit 13 to the feedback processing unit 7, and the external guidance voice waveform is input from the speaker 15 and the microphone 17 to the feedback processing unit 7. Entered. The feedback processing unit 7 performs a waveform comparison between the input internal guidance speech waveform and the external guidance speech waveform, and calculates a waveform approximation coefficient for approximating the internal guidance speech waveform to the external guidance speech waveform. This waveform approximation coefficient is fed back from the feedback processing unit 7 to the correction processing unit 5.

アジャストモードが解除されると、切換部１３により補正処理部５とフィードバック処理部７とが接続されて通常モードとなる。音声出力処理部３よりガイダンス音声の出力がスピーカ１５から放出され、マイク１７からフィードバック処理部７にはユーザ音声波形と外部ガイダンス音声波形とを合成した合成音声波形が入力される。補正処理部５では波形近似係数を内部ガイダンス音声波形に掛け合わせ、さらにそれを逆位相変換したものを外部ガイダンス音声波形除去用波形を生成し、フィードバック処理部７へと送る。 When the adjustment mode is cancelled, the correction processing unit 5 and the feedback processing unit 7 are connected by the switching unit 13 to enter the normal mode. An output of the guidance voice is emitted from the speaker 15 from the voice output processing unit 3, and a synthesized voice waveform obtained by synthesizing the user voice waveform and the external guidance voice waveform is input from the microphone 17 to the feedback processing unit 7. The correction processing unit 5 multiplies the waveform approximation coefficient by the internal guidance speech waveform, and further generates an external guidance speech waveform removal waveform by anti-phase conversion, and sends the waveform to the feedback processing unit 7.

フィードバック処理部７では外部ガイダンス音声波形除去用波形と合成音声波形とから外部ガイダンス音声波形を除去し、ユーザ音声のみを取り出して音声入力処理部９に出力する。このことにより、ユーザがいかなるタイミングでマイク１７に音声入力を行っても音声入力処理部９へはユーザ音声のみが入力される。 The feedback processing unit 7 removes the external guidance speech waveform from the external guidance speech waveform removal waveform and the synthesized speech waveform, extracts only the user speech, and outputs it to the speech input processing unit 9. Thus, only the user voice is input to the voice input processing unit 9 regardless of the timing at which the user inputs voice to the microphone 17.

図３を参照して本発明の他の実施の形態に係る音声処理方式を説明する。図３において図１と対応する部分には同一の符号を付している。この実施の形態の方式においては、フィードバック処理部７により、マイク１７から外部ガイダンス音声波形を取り込むと共に、取り込んだ外部ガイダンス音声波形と音声出力処理部３から入力する内部ガイダンス音声波形とを比較して両波形の近似係数を演算する。 A speech processing method according to another embodiment of the present invention will be described with reference to FIG. 3, parts corresponding to those in FIG. 1 are denoted by the same reference numerals. In the system of this embodiment, the feedback processing unit 7 captures the external guidance speech waveform from the microphone 17 and compares the captured external guidance speech waveform with the internal guidance speech waveform input from the speech output processing unit 3. Calculate approximate coefficients of both waveforms.

そして、フィードバック処理部７では、音声出力処理部３からのガイダンス音声波形に上記波形近似係数をかけ合わせて外部ガイダンス音声近似波形を得ると共に、その外部ガイダンス音声近似波形を逆位相変換処理して外部ガイダンス音声波形除去用波形として得る。さらに、フィードバック処理部７ではマイク１７を通じて入力される外部ガイダンス音声波形とユーザ音声との合成音声波形から、上記外部ガイダンス音声波形除去用波形を用いて外部ガイダンス音声波形を除去処理するようにしてもよい。 The feedback processing unit 7 multiplies the guidance speech waveform from the speech output processing unit 3 by the above waveform approximation coefficient to obtain an external guidance speech approximate waveform, and performs external phase conversion processing on the external guidance speech approximate waveform to perform external phase conversion processing. Obtained as a waveform for removing the guidance voice waveform. Further, the feedback processing unit 7 may remove the external guidance voice waveform from the synthesized voice waveform of the external guidance voice waveform input through the microphone 17 and the user voice by using the waveform for removing the external guidance voice waveform. Good.

以上説明した音声処理は電子レンジに限定されず、音声出力機能と音声入力機能とを備えたいかなる電子機器にも適用することができる。 The audio processing described above is not limited to a microwave oven, and can be applied to any electronic device having an audio output function and an audio input function.

以上説明したように本実施の形態では、ユーザサイドとしては、既に理解済みのガイダンス音声の終了を待たずにマイク１７からユーザ音声を入力して迅速性を向上させた場合でも、ユーザ音声を正しく認識させて電子機器に所要の動作を行わせることができるようになる。 As described above, in the present embodiment, the user side does not wait for the already-understood guidance voice to end even if the user voice is input from the microphone 17 and the speed is improved. The electronic device can be made to recognize and perform a required operation.

図１は本発明の実施の形態に係る音声処理方式を実施する音声処理用マイクロコンピュータを内蔵した電子レンジ等の電子機器において特にその音声処理用マイクロコンピュータ１の音声処理機能をハードウェア的な概略構成で示す図である。FIG. 1 shows a hardware outline of the sound processing function of the sound processing microcomputer 1 in an electronic device such as a microwave oven incorporating a sound processing microcomputer for implementing the sound processing method according to the embodiment of the present invention. It is a figure shown by a structure. 図２はアジャストモードと通常モード時の設定フローを示す図である。FIG. 2 is a diagram showing a setting flow in the adjustment mode and the normal mode. 図３は図１に対応した、本発明の他の実施の形態に係る図である。FIG. 3 is a diagram corresponding to FIG. 1 according to another embodiment of the present invention.

Explanation of symbols

１音声処理用マイクロコンピュータ１
３音声出力処理部
５補正処理部
７フィードバック処理部
９音声入力処理部 1 Microcomputer for voice processing 1
3 Audio output processing unit 5 Correction processing unit 7 Feedback processing unit 9 Audio input processing unit

Claims

A first means for generating and outputting an audio output source waveform;
A second means for converting the sound output source waveform into sound and releasing it to the outside;
A third means for capturing external audio;
A fourth means for anti-phase converting a waveform approximated to the audio output source waveform;
Fifth means for removing the sound output source waveform or a waveform approximate thereto using a waveform obtained by reverse phase conversion by the fourth means from the waveform of the external sound captured by the third means;
An audio processing method comprising:

The method according to claim 1, wherein the second means is a speaker.

The method according to claim 1 or 2, wherein the third means is a microphone.

The waveform approximated to the sound output source waveform in the fourth means is the sound output source waveform (internal sound output source waveform) and the sound output source waveform released to the outside by the second means and taken in by the third means (external sound). 4. The method according to claim 1, wherein an approximation coefficient is calculated by comparing with an output source waveform), and an internal audio output source waveform is approximated with an approximation coefficient obtained by the calculation.

An audio processing microcomputer mounted on an electronic device including a speaker that converts an audio output source waveform into sound and emits the same to outside, and a microphone that captures sound from outside,
The microcomputer is
A first step of generating a sound output source waveform, converting the sound from a speaker to sound, and performing output processing;
A second step of performing anti-phase conversion processing on the waveform approximated to the audio output source waveform;
A third step of removing a waveform approximating to the audio output source waveform using the waveform obtained by antiphase conversion from the audio waveform input to the microphone;
It is possible to execute a microcomputer for voice processing.

In the second step, a waveform approximated to the sound output source waveform is divided into a sound output source waveform (internal sound output source waveform) and a sound output source waveform (external sound output source waveform) emitted from a speaker and captured by a microphone. 6. The sound processing microcomputer according to claim 5, wherein an approximation coefficient is calculated by comparing with (1) and an internal audio output source waveform is approximated with an approximation coefficient obtained by this calculation.

An audio output source for outputting guidance audio;
A speaker that converts the sound output waveform from the sound output source into sound and emits it outside;
A microphone that captures audio from outside,
A first means for antiphase converting a waveform approximated to the audio output source waveform;
A second means for removing a waveform approximating the sound output source waveform using a waveform obtained by antiphase conversion by the first means from the waveform of the external sound captured by the microphone;
A third means for recognizing the waveform of the voice subjected to the removal processing as a waveform of the user voice;
An electronic apparatus comprising:

In the first means, the waveform approximated to the audio output source waveform includes an audio output source waveform (internal audio output source waveform), and an audio output source waveform (external audio output source) emitted from a speaker and captured by a microphone. The electronic device according to claim 7, wherein an approximation coefficient is calculated by comparing with the waveform), and the waveform of the internal audio output source waveform is approximated with the approximation coefficient obtained by the calculation.

The adjustment mode and the normal mode can be set by a user operation. The first means can be executed in the adjustment mode, and the second means can be executed in the normal mode. The electronic device described.

The electronic device according to claim 9, wherein the switching state of the mode by a user operation is set in a register.