CN1194427A - Method and device for voice operating and remote controlling apparatus - Google Patents

Method and device for voice operating and remote controlling apparatus Download PDF

Info

Publication number
CN1194427A
CN1194427A CN98105349A CN98105349A CN1194427A CN 1194427 A CN1194427 A CN 1194427A CN 98105349 A CN98105349 A CN 98105349A CN 98105349 A CN98105349 A CN 98105349A CN 1194427 A CN1194427 A CN 1194427A
Authority
CN
China
Prior art keywords
signal
sound
transmitter
voice
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN98105349A
Other languages
Chinese (zh)
Inventor
伊姆雷·瓦尔加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deutsche Thomson Brandt GmbH
Original Assignee
Deutsche Thomson Brandt GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deutsche Thomson Brandt GmbH filed Critical Deutsche Thomson Brandt GmbH
Priority to CN98105349A priority Critical patent/CN1194427A/en
Publication of CN1194427A publication Critical patent/CN1194427A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

Usually, a voice-operated control system for the remote control of electronic entertainment appliances is composed of a microphone, a signal processing means, a voice detector and a system manager. The voice detector works based on pattern detections. If there is acoustic interference, in particular as a result of sound playback of the appliance, the voice detector will not unambiguously detect the patterns and the commands then have to be uttered repeatedly until they are detected. In the case of very severe interference, voice-operated control may then even be completely impossible. According to the invention, the immunity to acoustic interference is improved in that a sound-compensation unit is provided in which the audio signals emitted by the appliance are estimated at the location of the microphone or the microphones by modeling the transmission paths in the space between the loudspeakers and the microphones, and are used to correct the microphone signal.

Description

Equipment is carried out the method and apparatus of voice operating and remote controlling
The present invention relates to equipment is especially carried out electronic amusement apparatus the method and apparatus of voice operating and remote controlling.
Aspect recreation electronic, become more and more important by the remote control of voice operating.On the one hand, improved work efficiency significantly by the remote control of voice operating, promptly the quality of the mode that can move of equipment on the other hand, usually is to make the disabled person use these equipment to become possibility first.
Usually, the control system of voice operating comprises: sound mapping is become the transmitter of electric signal, and signal processing apparatus becomes the speech detector of word and the system administration manager of this system of control to converting electrical signal.Speech detector itself is based on mode detection; Make comparisons with canned data at this each dictation.
, have such problem here, if promptly user's acoustic environments suffers noise, if sound interference is arranged in other words, especially owing to the result of device playback sound, then speech detector does not detect pattern, thereby does not have necessary mistake immunity.Therefore must repeat these verbal orders until detecting these orders.This has just reduced the attractive force of Voice-Operated System.Disturbing very under the serious situation, even can not carry out control fully by voice operating.
The objective of the invention is to determine a kind of system that when equipment especially electronic amusement apparatus is carried out voice operating and remote controlling, improves sound interference immunity.
Have the one or more transmitters that voice command are transformed into electric signal according to the system that the equipment that sends sound signal via at least one loudspeaker is carried out voice operating and remote controlling of the present invention, reach the speech detection unit that this converting electrical signal is become control command, its principle is that the transmitter signal can be by voice command, sound signal and other ground unrests are formed, also be to provide a kind of sound compensating unit to estimate in this transmitter or these transmitter positions by the transmission path in the space between analog speakers and the transmitter to send by equipment sound signal and with these signal correction transmitter signals.As a result, improved the detection to control command, this causes improving the immunity that voice operating and remote controlling is made mistakes again, perhaps first and foremost in fact makes the remote control of voice operating can be possible.
Here, preferably at first the transmitter signal is delivered to the sound compensating unit, then the signal through over-compensation is delivered to noise suppression unit, eliminate ground unrest therein as much as possible, then signal is delivered to the speech detection unit, come sense command by mode detection therein.
Here, can be integrated in transmitter in the unit of establishing for above purpose (as telechirics); But, also can be integrated in one or more transmitters in the casing of equipment.
Especially useful is to obtain a single signal (monosignal) from a plurality of sound signals, and it is delivered to the sound compensating unit, because this can reduce the complexity of sound compensating unit.
The form of sound compensator is adaptive N LMS-FIR (normalization minimum mean-square-finite impulse response) wave filter especially preferably.This make in addition, for example, the compensation of sound also became possibility when personnel moved while speaking.
Equally, if there are various loudspeaker signals, each outfit that is preferably each loudspeaker signal divides other sef-adapting filter, because this can better compensate.
At last, if there are a plurality of transmitters, it can be lined up array so that obtain tangible directivity characteristics.
Below in conjunction with accompanying drawing each one exemplary embodiment of the present invention is described.
Fig. 1 represents to have the transmitter in being integrated in and each loudspeaker signal is synthesized the voice operating control system of single-signal telechirics;
Fig. 2 is the circuit block diagram of self-adaptation sound compensator;
Fig. 3 is the block scheme that is used to suppress the spectral substraction device of noise;
But Fig. 4 represents to have the transmitter in being integrated in each loudspeaker signal is not synthesized the voice operating control system of single-signal telechirics;
Fig. 5 represents to have a plurality of voice operating control system that are integrated in the transmitter in the television cabin.
Fig. 1 represents according to voice operating control system of the present invention.In the case, transmitter MIC is integrated among the telechirics RCU.The transmitter here can have specific directivity characteristics (spherical, heart-shaped, super heart-shaped), makes and as far as possible only accepts useful signal, i.e. voice.Telechirics can for example be used for controlling televisor by rf modulations or through cable.Except other parts, also have two loudspeaker L1 and L2, the mono signal that are integrated among the televisor TV form device MON, sound compensator SCOMP, noise suppressor NSUP and speech detection cell S REC.
The transmitter signal is made up of the combination of the useful signal in the space, sound of television composition and other noises usually.The transmitter signal is carried out rf modulations, send it to the first input end of also delivering to sound compensator SCOMP in the equipment TV therefrom.The signal that arrives loudspeaker L1 and L2 is carried out the single signal that synthesizes (monoformation) and obtain deliver to another input end of compensator (reference input or second input end).Transmission path in the space between sound compensator analog speakers and transmitter then.Owing to do not know the statistical nature of various signal contents in advance, so should adopt adaptive system to carry out this simulation.
Fig. 2 represents a kind of simple designs of self-adaptation sound compensator.Transmitter signal i delivers to first input end, and monophone (monophone) loudspeaker signal r delivers to sef-adapting filter AF via another input end.In sef-adapting filter AF, estimate at the loudspeaker signal at transmitter place and signal y after filtering is provided.Subtraction signal y from the transmitter signal i that sends here by first input end then is so provide the signal that has reduced loudspeaker signal e at output terminal o.This signal e delivers to sef-adapting filter AF again.
A kind of possible method that designs self-adaptation sound compensator is to use adaptive N LMS-FIR wave filter.In this wave filter, use the NLMS algorithm of special LMS (lowest mean square) algorithm.
The effect of LMS algorithm be according to following equation make the FIR wave filter coefficient h 1, h2 ... the hn self-adaptation:
Hi (n+1)=hi (n)+a *E (n) *X (n-i+1), i=1,2 ... each variable of N is defined as follows: n: discrete time index a: step-length x: the sampled value e of reference input: error signal:
E=d-y wherein, d: the signal at the first input end place,
Y: signal at FIR filter output place
NLMS (normalized LMS) algorithm has been expanded the LMS algorithm by the power that self-adaptation is normalized to reference-input signal:
Hi (n+1)=hi (n)+a *E (n) *X (n-i+1)/Px (n), Px represents the power of signal x.Px can for example be calculated as follows
Px(n+1)=q *Px(n)+(1-q) *x^2(n),q<1
Compare with the LMS algorithm, the advantage of NLMS algorithm is that the power of adaptive characteristic and input signal is irrelevant, and this is particular importance under the situation of fluctuating signal (for example voice, music).
Sef-adapting filter in the sound compensator produces from the approximate composition of the process in the transmitter signal of reference input, and these compositions are with relevant with reference to input.In other words, sef-adapting filter produces those signal contents that arrive transmitter from each loudspeaker by acoustic space according to (monophone) loudspeaker signal.The output of sound compensator is the difference signal between transmitter signal and the sef-adapting filter output; Therefore it comprises loudspeaker signal composition that has reduced and the useful signal composition (voice) that does not change.
Then this signal is delivered to the input end of Noise Suppression Device.This function of handling level is to reduce each noise component (for example noise of other housed devices of street noise, suction cleaner and so on, background music or the like) that is not to originate from loudspeaker.
Can use spectral substraction to suppress noise, as shown in Figure 3 here.After getting window W to input signal i ', voice intermittently detecting device SD determine that corresponding piece comprises voice or comprises intermittence.This piece is carried out Fourier transform FFT, calculate absolute value then.If this piece does not contain any voice, then the frequency measurement of these absolute values spectrum is stored in the memory RAM as noise spectrum.On the other hand, if this piece contains the voice that are interfered, then the frequency spectrum of each absolute noise value of storage in a last intermittence was deducted from the frequency spectrum of each absolute value of this piece.The final frequency spectrum of absolute output valve is filled phase of input signals P after level and smooth SM, stand inverse Fourier transform IFT then.At last, for example produce continuous signal, output signal o ' then from each piece by the summation (SS) of sample of signal is next.
Therefore signal has the higher signal noise ratio after the processing of Noise Suppression Device output terminal.This signal is delivered to the input end of speech detector SREC, so speech detector SREC provides preferably to the verification and measurement ratio of control command with bigger mistake immunity work.
Here, control command refers to miscellaneous words that the user sends to equipment.Under the situation of so-called order and control system, these words can be for example orders of " it is brighter that image is wanted ", " sound is turned off ", " channel 1 ".Equally, under the situation of so-called conversational system, the dialogue between user and the equipment can be arranged.Therefore, the control command form that can adopt is for example " the tennis program to be arranged " today.Equipment can be answered the problem of proposition like this, for example: " have, at this channel, 6 pm half ".
Before can using the voice operating control system, when having the detecting device relevant, at first carry out the accent training with accent, the accent how relevant user sends control command will be learnt by system during this period.Store these spoken language orders, when carrying out voice operating control later on, just the spoken language order of storage is compared with spoken language order at that time.Here, especially under the situation of the interdictum duplex of forming by a plurality of words, at first be presented at the order speech on the display of telechirics or may be suitable on TV screen.On the other hand, if with the detecting device of speaker-independent, then finished training by manufacturer.
Another one exemplary embodiment is shown among Fig. 4.Here, the loudspeaker signal of equipment (TV, stereophonic sound system) is not that addition forms mono signal in each case, but is applied to the sef-adapting filter of sound compensator SCOMP separately.Thereby for self-adaptation sound compensator provide a plurality of with reference to the input.Error signal, from but the output signal of this many reference voices compensator is the poor of transmitter signal and all sef-adapting filters output summations.Make in cost higher on the auto adapted filtering and can in the transmitter signal, suppress the television audio signals composition more.The difference of this benefit is when having more than one loudspeaker, and is for example in having the surround sound system of five loudspeakers or under the situation of " Doby-Pu Luo logic gram playback " (Dolby Pro1ogic Playback), especially remarkable.
Fig. 5 one exemplary embodiment is characterised in that uses a plurality of transmitter MIC1, MIC2, the MIC3 that is arranged to array.So just, can obtain tangible directivity characteristics, therefore allow the speaker farther from transmitter.
The prerequisite here be useful signal from specific direction, particularly from the place ahead, enter array.Directivity characteristics occurring is the result of array geometry.In the case, transmitter be integrated in equipment itself for example in the casing of televisor, to allow speaking of hands-free transmitter.
This device also can be used for operating the computer game that also has sound equipment output usually.Here, computer game can be at computing machine, in televisor or combination, carry out at these equipment.Under the situation of the voice operating control system that is used for computing machine, voice are usually through transmitter and sound card input computing machine.Transmitter is not to be integrated in the casing, but can be integrated in control device for example in computer mouse or the so-called operating rod, perhaps be equipped for the mouth that the wear-type transmitter directly is placed in the user before.
Use the present invention can increase the immunity of voice operating control system, therefore except that other advantages, improved the speech detection rate significantly mistake.As a result, can improve miscellaneous electronic amusement apparatus such as televisor, video recorder, satellite receiver, stereo set and the remote control of voice operatings stereophonic sound system and personal computer and other housed devices completely.

Claims (9)

1. to passing through at least one loudspeaker (L1, L2) equipment (TV) that sends sound signal carries out the system of voice operating and remote controlling, have the transmitter (MIC) or a plurality of transmitter (MIC1 that voice command are become electric signal, MIC2, MIC3) and have a speech detection unit (SREC) that these electric signal is become control command, it is characterized in that described transmitter signal can be by all voice commands, all sound signals and other all ground unrests are formed, also be to be provided with sound compensating unit (SCOMP), in this sound compensating unit, on described transmitter or described all transmitters position, estimate all sound signals of sending by equipment, and described all sound signals are used to revise described transmitter signal by all transfer paths in the space of simulation between described all loudspeakers and described all transmitters.
2. according to the described system of claim 1, it is characterized in that at the beginning described all transmitter signals being delivered to described sound compensating unit (SCOMP), then all signals after the compensation are delivered to noise suppression unit (NSUP), in this noise suppression unit, eliminate all ground unrests as much as possible, then described all signals are delivered to speech detection unit (SREC), in this speech detection unit, detect described all orders by mode detection.
3. according to claim 1 or 2 described systems, it is characterized in that described sound compensating unit (SCOMP) comprises one or more sef-adapting filter (AF).
4. according to the described system of claim 3, it is characterized in that various sef-adapting filters being set for sound signal from various loudspeakers.
5. according to claim 3 or 4 described systems, it is characterized in that described sef-adapting filter or all sef-adapting filters are designed to adaptive N LMS-FIR wave filter.
6. according to the described system of one of above all claims, it is characterized in that from a plurality of sound signals, obtaining to deliver to the mono signal (MON) of described sound compensating unit.
7. according to the described system of one of above all claims, it is characterized in that transmitter (MIC) is integrated in the unit (RCU), particularly in the telechirics of the described equipment of operation.
8. according to the described system of one of above all claims, it is characterized in that at least one transmitter (MIC1, MIC2, MIC3) is integrated in the casing of described equipment (TV).
9. according to the described system of one of above all claims, it is characterized in that a plurality of transmitters are arranged to array so that obtain tangible directivity characteristics.
CN98105349A 1997-03-26 1998-02-27 Method and device for voice operating and remote controlling apparatus Pending CN1194427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN98105349A CN1194427A (en) 1997-03-26 1998-02-27 Method and device for voice operating and remote controlling apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE19712632.4 1997-03-26
CN98105349A CN1194427A (en) 1997-03-26 1998-02-27 Method and device for voice operating and remote controlling apparatus

Publications (1)

Publication Number Publication Date
CN1194427A true CN1194427A (en) 1998-09-30

Family

ID=5218755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN98105349A Pending CN1194427A (en) 1997-03-26 1998-02-27 Method and device for voice operating and remote controlling apparatus

Country Status (1)

Country Link
CN (1) CN1194427A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100337268C (en) * 2004-02-23 2007-09-12 宏碁股份有限公司 Method and system of voice interaction
CN102915732A (en) * 2012-10-31 2013-02-06 黑龙江省电力有限公司信息通信分公司 Method and device for identifying voice commands restraining background broadcasts
CN104658535A (en) * 2015-02-26 2015-05-27 深圳市中兴移动通信有限公司 Voice control method and device
CN105702263A (en) * 2016-01-06 2016-06-22 清华大学 Voice playback detection method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100337268C (en) * 2004-02-23 2007-09-12 宏碁股份有限公司 Method and system of voice interaction
CN102915732A (en) * 2012-10-31 2013-02-06 黑龙江省电力有限公司信息通信分公司 Method and device for identifying voice commands restraining background broadcasts
CN104658535A (en) * 2015-02-26 2015-05-27 深圳市中兴移动通信有限公司 Voice control method and device
CN105702263A (en) * 2016-01-06 2016-06-22 清华大学 Voice playback detection method and device
CN105702263B (en) * 2016-01-06 2019-08-30 清华大学 Speech playback detection method and device

Similar Documents

Publication Publication Date Title
EP0867860A2 (en) Method and device for voice-operated remote control with interference compensation of appliances
US11348595B2 (en) Voice interface and vocal entertainment system
US8385557B2 (en) Multichannel acoustic echo reduction
JP4283212B2 (en) Noise removal apparatus, noise removal program, and noise removal method
US9595997B1 (en) Adaption-based reduction of echo and noise
US9992572B2 (en) Dereverberation system for use in a signal processing apparatus
US7970147B2 (en) Video game controller with noise canceling logic
JP4954334B2 (en) Apparatus and method for calculating filter coefficients for echo suppression
EP1743323B1 (en) Adaptive beamformer, sidelobe canceller, handsfree speech communication device
KR101726737B1 (en) Apparatus for separating multi-channel sound source and method the same
US6411927B1 (en) Robust preprocessing signal equalization system and method for normalizing to a target environment
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
US20140153740A1 (en) Beamforming pre-processing for speaker localization
JP2007306553A (en) Multi-channel echo compensation
US5864804A (en) Voice recognition system
CN111798860B (en) Audio signal processing method, device, equipment and storage medium
CN111354368B (en) Method for compensating processed audio signal
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
Maas et al. A two-channel acoustic front-end for robust automatic speech recognition in noisy and reverberant environments
WO2003107327A1 (en) Controlling an apparatus based on speech
EP2490218B1 (en) Method for interference suppression
US20080288253A1 (en) Automatic speech recognition method and apparatus, using non-linear envelope detection of signal power spectra
Compernolle DSP techniques for speech enhancement
CN1194427A (en) Method and device for voice operating and remote controlling apparatus
US20230335149A1 (en) Speech processing device and speech processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication