CN101882370A

CN101882370A - Voice recognition remote controller

Info

Publication number: CN101882370A
Application number: CN2010102149949A
Authority: CN
Inventors: 罗笑南; 吴其泽; 刘广发
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2010-06-30
Filing date: 2010-06-30
Publication date: 2010-11-10

Abstract

The embodiment of the invention discloses a voice recognition remote controller, which comprises buttons and a chip of a common remote controller, a sensor group, an analog-to-digital conversion module, a blind source separation module, a voice recognition module and a control and response module, wherein the sensor group is arranged on the remote controller and is a group of porous microphones used for receiving voice signals; the analog-to-digital conversion module is used for receiving voice signals received and input by the sensor group, and converting the voice signals to digital acquisition signals which can be processed by the digital chip; the blind source separation module receives the digital acquisition signals from the analog-to-digital conversion module, and separates mixed signals through a blind source separation algorithm; and the voice recognition module receives signals separated by the blind source separation module, recognizes useful signals, and sends response voice instruction codes to the control and response module according to the recognized signals. Through the voice recognition remote controller and by utilizing the blind source separation technology, the mixed voice signals are separated, and the subsequent recognition rate is improved.

Description

A kind of voice recognition remote controller

Technical field

The present invention relates to digital home technical field, be specifically related to a kind of voice recognition remote controller.

Background technology

Present stage on the market the TV remote controller of main flow all be based on simple electronic circuit and on button realize control function.Its biggest advantage is exactly with low cost, reliable in quality; But shortcoming also is conspicuous, and that is exactly that button is various, and is directly perceived inadequately, is not easy to the user and remembers use.The telepilot of a complexity can allow the user that a kind of forbidding sensation is arranged.

Along with the continuous progress of science and technology, among the speech recognition technology life that appears at us gradually, as mobile phone, PC.An importance of household electrical appliance development is to allow user interface hommization more, and convenient nature accomplishes that the elderly and the disabled can use without barrier.Utilize speech recognition technology to realize that voice control is an important channel of improving household appliances user interface quality.

The telepilot that has speech identifying function can greatly improve the availability of household appliances.With the TV remote controller is example, if the user wants to watch " central authorities one cover " program, he otherwise channel browsing is wanted the program seen up to him occurring one by one, or the platform numeral of memory " central authorities' one cover ", this is not easy to use.The TV remote controller that has added speech identifying function only need be said " central authorities' one cover ", and the control signal of turntable just can be discerned and send to televisor to telepilot automatically.

The telepilot that has speech identifying function also has a difficult problem, runs into the identification problem of multi-source input signal exactly.With the TV remote controller is example.The user sends instruction by voice to telepilot when televiewing, at this moment, it is not the phonetic order that simple user says that telepilot receives voice signal, but the mixed signal of TV loudspeaker and user's voice instruction.Though user's voice instruction intensity may be greater than the sound of TV loudspeaker, the signal that mixes is very big for the influence of speech recognition, influences its discrimination greatly.

Summary of the invention

The invention provides a kind of voice recognition remote controller that separates based on blind source, make that the preceding mixed signal of speech recognition is separated, improve discrimination.

In order to realize goal of the invention, the embodiment of the invention discloses a kind of voice recognition remote controller, comprise conventional remote controller buttons and chip, sensor groups, D/A converter module, blind source separation module, sound identification module, control and respond module, wherein:

Conventional remote controller buttons and chip make telepilot have the function that general telepilot has, and comprise Menu key, the volume adjusting key ,+/-key, signal emission module;

Sensor groups is one group of poroid microphone that is used for received speech signal on telepilot;

D/A converter module is used to receive the voice signal that biography comes from the input of sensor group of received, and changes into the accessible digital collection signal of digit chip;

Blind source separation module receives the digital collection signal from D/A converter module, by blind source separation algorithm, the Signal Separation of mixing;

Sound identification module receives from the signal after the separation of blind source separation module, identifies useful signal, and sends the voice responsive coded instructions according to the voice that identify to control and respond module;

Control and respond module, preestablish the rule of man-machine interaction, be used for receiving information, send information, and confirmed to send steering order to the telepilot chip after the instruction by loudspeaker by sound identification module.

Microphone number in the described sensor groups is no less than two.

After described blind source separation module is used to receive the multichannel mixed signal, at first carries out centralization and albefaction and handle, the iteration optimization separation matrix is tried to achieve separation signal by separation matrix after the convergence then, the signal after output separates at last.

The present invention has the following advantages: utilize speech recognition technology, and can be so that man-machine interaction be more humane.Utilize blind source separate technology, the voice signal that mixes is separated, improve follow-up discrimination.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the voice recognition remote controller structural representation in the embodiment of the invention;

Fig. 2 is the blind source separation module workflow diagram among Fig. 1;

Fig. 3 uses the flow process of telepilot of the present invention for the user in the embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtained under the creative work prerequisite.

The remote-controller function structural drawing as shown in Figure 1, telepilot is by sensor groups, D/A converter module, blind source separation module, sound identification module, control and respond module, the conventional func module is formed.The sensor received speech signal is after become digital signal to analog signal conversion by D/A converter module, by blind source separation module mixed signal is separated then, sound identification module is discerned useful command signal then, and send to control and respond module, control and rule and the user interactions of respond module according to setting, last transmitting control commands is finished the process of term sound control system telepilot to the conventional func module.

Concrete, here conventional remote controller buttons and chip make telepilot have the function that general telepilot has, and comprise Menu key, the volume adjusting key ,+/-key, signal emission module or the like.

Sensor groups is one group of poroid microphone that is used for received speech signal on telepilot; According to the theory that separate in blind source, to separate for the blind source of realizing signal smoothly, the number of signals of reception must not be less than the number of sound source, so the microphone number is no less than two, could differentiate the sound of TV and user's sound.

D/A converter module is used to receive the voice signal that biography comes from the input of sensor group of received, and changes into the accessible digital collection signal of digit chip.

Blind source separation module receives the digital collection signal from D/A converter module, by blind source separation algorithm, the Signal Separation of mixing.

Sound identification module receives from the signal after the separation of blind source separation module, identifies useful signal, and sends the voice responsive coded instructions according to the voice that identify to control and respond module.

Control and respond module, the inside is set with the rule of man-machine interaction, receives information by sound identification module, sends information by loudspeaker, carries out alternately with the user in this way.And confirmed to send steering order to the telepilot chip after the instruction.

Blind source separation module workflow diagram after the multichannel mixed signal is imported this module, at first carries out centralization and albefaction and handles as shown in Figure 2, and the iteration optimization separation matrix is tried to achieve separation signal by separation matrix after the convergence then, at last output.

Here represent the source signal matrix with S, A represents hybrid matrix, X represents the observation signal matrix, W represents separation matrix, Y ecbatic signal matrix, then, X=AS is exactly the signal that sensor groups receives, we will obtain separation matrix W exactly, make Y=WS approach S, have just realized the separation of mixed signal like that.

Specifically describe each step principle below.

Signal centerization is exactly to make that the average of signal is zero.If x is the non-vanishing stochastic variable of average, only need use x ₀=x-E (x) replaces x to get final product.Then replace its mathematical expectation to realize zero-meanization in practice with arithmetic mean.

The albefaction of signal is exactly to make the correlation matrix of the stochastic variable x ' after the conversion satisfy R by certain linear transformation T:x '=Tx _{X '}=E[x ' x ' ^H]=I.

If the correlation matrix of mixed signal vector x is R _x, by the character of correlation matrix as can be known, R _xExist characteristic value decomposition to be:

R _x＝Q∑ ²Q ^T

∑ in the formula ²Be diagonal matrix.

Make the T=∑ ^-1Q ^T, establish x '=Tx, then can be so that the correlation matrix of the x ' after the conversion is I, thus realized the albefaction of signal.

The process of iteration optimization separation matrix is described with maximum entropy method (MEM) below.

Entropy is a notion of information theory the inside.The entropy H (A) of definition A is the mean value of incident self-information, and the mathematic(al) representation of the entropy of discrete random variable is:

H (A) = E (I) = - Σ_{k = 1}^{n} p_{i} \cdot \log (p_{k})

The combination entropy of two stochastic variable x and y is defined as:

H (x, y) = - \underset{i}{Σ} p (x = a_{i}, y = b_{i}) lgp (x = a_{i}, y = b_{i})

Mutual information between stochastic variable x and the y is defined as:

I(x，y)＝H(x)+H(y)-H(x，y)

Be edge entropy sum and deduct combination entropy.

Maximum entropy method (MEM) is characterized in after output u replacing estimation to high-order statistic by a nonlinear function yi=gi of component ground introducing (ui).The criterion of this method is: behind given suitable gi (ui), make output y=[y1, y2 ..., yn] total entropy amount H (y) very big.Here gi (ui) is a reversible dull nonlinear function, and u=Wx.The combination entropy of output signal is

H(y1，...yN)＝H(y1)+...+H(yN)-I(y1，...yN)

In the formula: H (yi) respectively exports the destination edge entropy,, and I (y1 ... yN) be their mutual information.The maximization of combination entropy means the maximization with the edge entropy of minimizing of mutual information.To the stochastic variable y1 of bounded ... yN, when mutual information is zero, H (y1 ... yN) reach maximal value, marginal distribution is uniform.

There are two parameters to be used for determining maximum combined entropy, just nonlinear function yi=gi (ui) and weight coefficient W.Behind selected nonlinear function, remaining parameter is exactly W.Differentiate gets to W:

\frac{&PartialD; H (y)}{&PartialD; W} = \frac{&PartialD;}{&PartialD; W} (- D (p (s) | p (u)))

Wherein D (.) represents the KL distance.

Define non-linear or evaluation function is

φ (u) = - \frac{\frac{&PartialD; p (u)}{&PartialD; u}}{p (u)}

The formula of final iteration is:

W(k+1)＝W(k)+μ _k[W-T(k)-φ(u(k))x ^T(k)]

By iteration repeatedly, just obtain separation matrix W after the convergence.

After trying to achieve separation matrix, just can realize the separation of mixed signal by Y=WX, the signal after the separation passes to sound identification module.

The user uses process flow diagram as shown in Figure 3, and what this figure described is the flow process that the user uses telepilot of the present invention.At first, the user says instruction, and such as " adjustment brightness ", what the sensor groups of telepilot received will be the mixed signal of the sound of user's voice instruction and televisor.Mixed signal sends to blind source separation module and carries out the separation of signal through after the digital-to-analog conversion.Signal after the separation passes to sound identification module, and the information after the identification passes to control and respond module, and this is control and the rule of respond module according to setting, the information that response pass is come.If control and the instruction that the respond module affirmation need send then directly send instruction to the conventional func module, if uncertain, then write down current interaction mode, continuation and user carry out alternately.When receiving that " after " adjustment brightness " message, telepilot sends " please adjust brightness ", and the user says instruction once more and " brightens ", and this moment, control then can clearly be instructed with respond module, then sent instruction to the conventional func module.The conventional func module is then finished control to household electrical appliances according to instruction.

To sum up,, utilize speech recognition technology by implementing the embodiment of the invention, can be so that man-machine interaction be more humane.Utilize blind source separate technology, the voice signal that mixes is separated, improve follow-up discrimination

More than the voice recognition remote controller that the embodiment of the invention provided separates based on blind source is described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. a voice recognition remote controller is characterized in that, comprises conventional remote controller buttons and chip, sensor groups, and D/A converter module, blind source separation module, sound identification module, control and respond module, wherein:

2. voice recognition remote controller as claimed in claim 1 is characterized in that, the microphone number in the described sensor groups is no less than two.

3. voice recognition remote controller as claimed in claim 2, it is characterized in that, after described blind source separation module is used to receive the multichannel mixed signal, at first carrying out centralization and albefaction handles, iteration optimization separation matrix then, separation signal is tried to achieve by separation matrix in the convergence back, the signal after output separates at last.