CN101516005A

CN101516005A - Speech recognition channel selecting system, method and channel switching device

Info

Publication number: CN101516005A
Application number: CNA2008100654170A
Authority: CN
Inventors: 吴治国; 张勤伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2008-02-23
Filing date: 2008-02-23
Publication date: 2009-08-26
Also published as: WO2009103226A1

Abstract

The invention provides a speech recognition channel selecting system, a method and a channel switching device. The method comprises that: a controller receives a speech input signal of a user; the channel switching device recognizes a name to be matched according to the input speech signal and a recognition word list; the name to be matched is matched with a match list to acquire a channel needing to be switched; and the channel needing to be switched is switched. The system, the method and the device avoid the problems of complex speech recognition operation on the controller and high cost, are quite convenient for the user to operate, and make full use of the property of the channel switching device to save the control cost. The system, the method and the device recognize the name to be matched through the channel switching device, do not need to arrange a special speech recognition server in a network, prevent overlong response time, avoid the problem of loss of network transmission data, and save the cost for constructing the network.

Description

A kind of speech recognition channel selection system, method and channel switch device

Technical field

The present invention relates to communication technical field, relate in particular to and a kind ofly carry out channel selection system, device and method by speech recognition.

Background technology

Along with the development of information technology and broadcast television technique, business developments such as cable digital TV and IPTV are rapid in recent years.(Set-top Box, STB), as IP set-top box and top box of digital machine etc., progressively under Shi Changhua the trend, the complete function of set-top box has replaced traditional VCD machine and DVD player gradually in set-top box.On the other hand, along with the development of automatic speech recognition technology, make set-top box select channel to become possibility by voice, this technology also becomes the emphasis of industry research and development.

Traditional speech recognition selects channel that dual mode is arranged: a kind of is by increasing the mode of voice recognition processor on remote controller, imports by the user when identification and downloads the definite speech data of sound template and the speech data coupling of user's input and come converted channel; A kind of is by special speech recognition server is set in network.

The inventor finds that in realizing process of the present invention there is following shortcoming at least in the mode of traditional speech recognition selection channel: by increase the mode of voice recognition processor on remote controller, because each sound template that upgrades all needs user's manual operation to download on the remote controller when identification, it is very complicated, inconvenient to operate, simultaneously, also increased the cost of remote controller; By the mode of special speech recognition server is set in network, owing to voice signal need be uploaded to network during the identification voice, response time is longer, and the possibility by network uplink and twice data-bag lost of downlink transfer also can increase, and special in addition speech recognition server has also increased the cost of building network.

Summary of the invention

In view of this, be necessary to provide a kind of easy to operate, cost-effective speech recognition band selecting method in fact.

Simultaneously, provide a kind of easy to operate, cost-effective speech recognition channel switch system.

Simultaneously, provide a kind of easy to operate, cost-effective channel switch device.

A kind of speech recognition band selecting method comprises the steps:

Controller receives the user's voice input signal;

The channel switch device identifies title to be matched according to the voice signal and the identification vocabulary of input;

Mate the channel that draws the needs switching according to described title to be matched and matching list;

Switch to the described channel that needs switching.

A kind of speech recognition channel selection system comprises: controller is used for communicating with the channel switch processing unit;

Described controller is used to receive the user's voice input signal;

Described channel switch processing unit is used for identifying title to be matched according to the voice input signal of described input and identification vocabulary, mates the channel that draws the needs switching according to described title to be matched and matching list, and switches to the described channel that needs switching.

A kind of channel switch device comprises:

Receiver module is used to receive the user's voice input signal that controller sends;

Recognition processing module is used for identifying title to be matched according to the voice input signal and the identification vocabulary of described input;

The match query module is used for mating the channel that draws the needs switching according to described title to be matched and matching list;

The channel switch control module is used to switch to the channel that described needs switch.

Compared with prior art, the embodiment of the invention receives the user's voice input signal by controller, identify title to be matched by the channel switch device according to the voice input signal of described input, mate the channel that draws the needs switching according to described title to be matched and matching list, and switch to the described channel that need to switch, avoided the complicated and high problem of cost at the enterprising lang sound of controller identifying operation, make the user operate very convenient, and make full use of the performance of channel switch device, saved the cost of control.Identify title to be matched by the channel switch device, special speech recognition server need be set in network, prevent that the response time is long, avoided because the problem that transmitted data on network is lost, and saved the cost of building network.

Description of drawings

Fig. 1 is an embodiment of the invention speech recognition channel switch system configuration schematic diagram.

Fig. 2 is an embodiment of the invention controller architecture schematic diagram.

Fig. 3 is an embodiment of the invention channel switch processing unit structural representation.

Fig. 4 is an embodiment of the invention speech recognition band selecting method flow chart.

Fig. 5 is embodiment of the invention channel and listing update method flow chart.

Fig. 6 is embodiment of the invention identification vocabulary and matching list update method flow chart.

Embodiment

Please referring to Fig. 1, embodiment of the invention speech recognition channel switch system 100 comprises: (Electronic Program Guide, EPG) server 106 for controller 102, channel switch device 104 and electronic program guides.Controller 102 is used to receive the user's voice input signal.Channel switch device 104 is used for identifying title to be matched according to the voice input signal and the identification vocabulary of input, mates the channel that draws the needs switching according to title to be matched and matching list, and switches to the channel that needs switching.EPG server 106, the identification vocabulary of up-to-date matching list that is used to provide to be updated and/or up-to-date renewal, channel switch device 104 can upgrade matching list according to up-to-date matching list, and/or upgrades the identification vocabulary according to up-to-date identification vocabulary.Controller 102 can be system's external controller, HS (Handset, mobile phone) or remote controller, in the present embodiment, is example with the remote controller.Channel switch device 104 can be PC (Personal Computer, PC), STB (Set-top Box, set-top box), NB (NotebookComputer, notebook computer), HS (Handset, mobile phone), GP (Game Player, game machine) or ODD (Optical Disc Drive, CD-ROM device) etc., in the present embodiment, be that example describes with STB.

Please in conjunction with referring to Fig. 2, in the present embodiment, controller 102 comprises: voice receiver module 202, voice signal processing module 204, input module 210, controller receiver module 212 and sending module 216.

Voice signal receiver module 202 is used to receive the user's voice input signal, and in the present embodiment, voice input module can be a microphone on the remote controller.

Voice signal processing module 204 is used for the voice input signal of process user.Voice signal processing module 204 also comprises: speech conversion unit 206 and speech coding unit 208.Speech conversion unit 206 is used for voice signal is converted into digital signal, and in the present embodiment, speech conversion unit 206 can be the A/D change-over circuit.Speech coding unit 208 is used for the digital signal after encoded voice converting unit 206 is changed, and this coding can be a compressed encoding, comprises diminishing compressed encoding or lossless compression-encoding.The user's voice collection can have different schemes with handling, in the present embodiment, sample with the 16KHz sample rate, by 16 or the precision of 8bit quantize.The coded format of voice signal after over-sampling and processing is PCM (Pulse Code Modulation, pulse code modulation) form.

Input module 210 is used to receive the instruction of user's input, as, the voice activation instruction, it is voice activated to be used to control the channel switch device, and in the present embodiment, input module 210 can be keyboard or touch-screen.

Controller receiver module 212 is used for the signal that receiving channels conversion equipment 104 sends, and this signal comprises the command signal returned and notification message etc.

Sending module 216, be used to send signal and operation signal after the speech coding of user's input, in the present embodiment, sending module 216 can be wireless communication apparatus such as infrared, bluetooth, as passing through Bluetooth2.0 (bluetooth 2.0 technology), purple honeybee Zigbee or high speed infrared agreement etc. can guarantee the high-speed radiocommunication technology that PCM (Pulse Code Modulation, pulse code modulation) speech data can real-time Transmission.Sending module 216 also comprises: operation signal transmitting unit 218, be used to send the operation signal that the user imports, for example, keyboard input and touch-screen input signal.Voice signal transmitting element 214 is used to send the voice signal that the user imports, and this signal also can be the signal behind the compressed encoding for the digital signal through the A/D conversion.

Please in conjunction with referring to Fig. 3, in the present embodiment, channel switch device 104 (STB) comprising: receiver module 302, quiet control module 308, speech selection module 310, recognition processing module 312, sending module 322, refusal identification reminding module 324, memory module 326, match query module 336, channel switch control module 338 and update module 340.

Receiver module 302, be used to receive the user's voice input signal of controller transmission and user's operation control command, in the present embodiment, user input signal comprises user's voice input signal and user's operation control command, if be phonetic entry all, also can not comprise user's control command signal.The user's voice input signal is the audio digital signals after changing through analog/digital A/D.Receiver module 302 also comprises operation signal receiving element 304 and voice signal receiving element 306.Operation signal receiving element 304 is used to receive user's operation control command, for example voice activated control command.Voice signal receiving element 306 is used to receive the user's voice input signal.

Quiet control module 308 is used for the voice activated instruction according to user input, and the channel switch device is changed to mute state, and mute state is switched to non-mute state behind voice collecting.

Speech selection module 310 is used for the speech selection signal according to user input, select one with the corresponding acoustic model of described speech selection signal.

Recognition processing module 312 is used for identifying title to be matched according to the voice signal and the identification vocabulary of input.Recognition processing module 312 comprises: voice activation detecting unit 314, phonetic feature extraction unit 316, voice recognition unit 318 and voice judging unit 320.

Voice activation detecting unit 314 is used to detect the starting point and the terminal point of actual speech section.In the present embodiment, the sane end-point detection algorithm of voice activation detecting unit 314 employings detects the starting point and the terminal point of actual speech, with actual speech section and non-speech segment in the voice signal of distinguishing input.

Phonetic feature extraction unit 316 is used for that voice signal is carried out phonetic feature and extracts.In the present embodiment, phonetic feature extraction unit 316 is handled the voice signal that voice activation detecting unit 314 sends, and extracts voice feature data.The phonetic feature type can adopt MFCC (Mel-FrequencyCeptral Coefficients, the Mei Er frequency cepstral coefficient) feature, PLP (Perceptually LinearPrediction, the perception linear prediction) feature or LPCC (Linear Predictive Cepstral Coding, the linear prediction cepstrum coefficient) feature, in order to improve the anti-noise effect, the processing that can in the phonetic feature leaching process, use cepstral mean to subtract.Consider the MFCC characteristic use people's ear the acoustics apperceive characteristic and noise is had robustness preferably, preferred MFCC feature is as phonetic feature.Voice signal has frame-to-frame correlation as stationary signal in short-term between the speech frame, can improve the accuracy rate of speech recognition to MFCC feature extraction first-order difference or single order and second differnce for this reason.

Voice recognition unit 318 is used for calculating the acoustics distance of the voice feature data of input with respect to entry according to acoustic model and identification vocabulary.In the present embodiment, voice recognition unit 318 obtains the shortest accumulation acoustics distance of each isolated word according to acoustic model data and isolated vocabulary data, get then the shortest acoustics apart from the isolated word of minimum as the first-selected recognition result of these voice.The acoustic model that speech recognition is adopted comprises continuous HMM (Hidden Markov Model hidden Markov model) model and Discrete HMM model.In addition, the recognition result that voice recognition unit 318 can also provide a plurality of candidates allows the user select, and the foundation of ordering is the shortest accumulation acoustics distance.

Voice judging unit 320, be used to judge voice feature data with respect to the acoustics distance of entry whether less than threshold value, if voice feature data less than threshold value, calculates the channel designation of current speech correspondence with respect to the acoustics of entry distance according to identification vocabulary and matching list.

Sending module 322 is used for sending the identification processing signals to controller 102, and after identification disposed, controller 102 can stop to gather the user's voice input signal.In the present embodiment, sending module 322 also can adopt bluetooth, wireless mode such as infrared to transmit signal.

Refusal identification reminding module 324 is used for when recognition result is non-voice, and the prompting user re-enters voice.This prompting can be message notifying, video display reminding or auditory tone cues, and in the present embodiment, employing mode of display reminding literal on screen is pointed out the user.

Memory module 326 is used for data such as storage of channel and listing, identification vocabulary, acoustic model and matching list.In the present embodiment, memory module 326 comprises: channel and listing memory cell 328, identification vocabulary memory cell 330, acoustic model memory cell 332, matching list memory cell 334.

Channel and listing memory cell 328 are used for storage of channel and program correspondence table, and in the present embodiment, each entry of table is the channel designation and the in progress programm name of this channel of current time of live telecast.This channel and program correspondence table can be upgraded according to EPG server 106, and the update cycle can be set to one day or a week, and the concrete time interval can be with reference to the EPG server update interval of IPTV or cable digital TV system.

Identification vocabulary memory cell 330 is used for storage identification vocabulary, and in the present embodiment, the identification vocabulary also comprises an isolated vocabulary that is used for alone word voice identification.

Acoustic model memory cell 332 is used to store acoustic model to be matched.In the present embodiment, employing comprises the model parameter at the acoustic model of bilingual kind of hybrid modeling of HMM model.Parameter and speaker that bilingual kind is mixed acoustic model have nothing to do, and are the model at unspecified person.Model parameter needs to train through training aids according to the good expectation data of mark in advance, the parameter that training obtains just can be cured to the speech recognition that acoustic model parameter storage part is used for isolated word, and the acoustic model parameter comprises the state parameter of hidden Markov model and the probability-distribution function of state output observational characteristic vector.

Matching list memory cell 334 is used to store matching list, and matching list has been stored the channel that the user need switch and the channel corresponding relation of user's voice input.

Match query module 336 is used for mating the channel that draws the needs switching according to title to be matched and matching list.In the present embodiment, as key word of the inquiry, during ranking, the channel of table that at first inquiry in the channel program table comprises inquires about the entry that meets keyword with the isolated word that identifies.

Channel switch control module 338 is used to switch to the channel that needs switch.If there is the entry of coupling, when Query Result was single entry, controller top box live telecast switched to the channel of entry mid band name attribute-bit; When Query Result is a plurality of record, the control video screen shows the property value of the channel name of a plurality of entries, and the prompting user selects one of them channel to watch live television programming by remote controller, treat that the user finishes selection after, the control TV switches to the channel that the user selects.

Update module 340 is used for according to the EPG server with new matching list and/or identification vocabulary.Update module 340 also comprises: upgrade timing unit 342 and upgrade control unit 344.Upgrade timing unit 342, be used to write down the time of renewal, and when arrive or be overtime update time, trigger and upgrade, in the present embodiment, channel and listing can be set to upgrade every day update time, and identification vocabulary and matching list can be set to the per minute renewal update time.Upgrade control unit 344, be used for when satisfying update time, matching list and/or identification vocabulary are upgraded in control.

The embodiment of the invention receives the user's voice input signal by controller, identify title to be matched by the channel switch device according to the voice input signal of described input, mate the channel that draws the needs switching according to described title to be matched and matching list, and switch to the described channel that need to switch, avoided the complicated and high problem of cost at the enterprising lang sound of controller identifying operation, make the user operate very convenient, and make full use of the performance of channel switch device, saved the cost of control.Identify title to be matched by the channel switch device, special speech recognition server need be set in network, prevent that the response time is long, avoided because the problem that transmitted data on network is lost, and saved the cost of building network.The embodiment of the invention is by intercepting actual speech section, and the accuracy rate of speech recognition is improved.During by quiet control unit control phonetic entry, set-top box is quiet, prevent the sound of televising interference to user speech.From EPG server more new channel and listing automatically, identification vocabulary and matching list have avoided that the user is manual affected to bring unhandy drawback by update module.

Please in conjunction with referring to Fig. 4, embodiment of the invention speech recognition band selecting method comprises the steps:

Step 402, controller receives the voice activated instruction of user's input.In the present embodiment, the voice activation instruction can be the push button signalling that the user imports, and the user can be by the command signal of input equipments such as keyboard or touch-screen input.

Step 404, controller send to the channel switch device and start the speech recognition controlled command signal.In the present embodiment, be example, send startup speech recognition controlled command signal to set-top box by remote controller in wireless transmission modes such as bluetooth, high speed infrared agreement, purple honeybee Zigbee.

Step 406, the channel switch device is changed to mute state.

Step 408, channel switch device send to controller and start the voice collecting control command signal.If when not adopting mute function, also can not comprise above step, repeat no more.

Step 410, controller receives the user's voice input signal, and the voice signal of collection and process user input in the present embodiment, converts analog voice signal to audio digital signals by A/D converter, and sends the channel switch device to by wireless mode.

Step 412, channel switch device detect the starting point and the terminal point of actual speech section, are used to identify title to be matched according to the starting point and the terminal point of actual speech section.In the present embodiment, voice activation detects starting point and the terminal point that the sane end-point detection algorithm of employing detects actual speech, with actual speech section and non-speech segment in the voice signal of distinguishing input.

Step 414, channel switch device send to controller and stop the voice collecting control signal.After identification disposed, controller can stop to gather the user's voice input signal.In the present embodiment, send mode also can adopt wireless modes such as bluetooth, high speed infrared agreement and Zigbee to transmit signal.

Step 416, controller stops to gather and processes voice signals according to the control that stops the voice collecting control signal of channel switch device.

Step 418 sends the signal of the actual speech section between starting point and the terminal point to the phonetic feature extraction unit.Step 418 and step 414 can not have precedence relationship, also can be first execution in step 418 back execution in step 416, repeat no more.

Step 420, the phonetic feature extraction unit extracts phonetic feature according to the voice signal of input, and voice signal is carried out feature extraction, in the present embodiment, obtains the step that the actual speech paragraph detects if having before, just only needs extraction actual speech section.The phonetic feature type can adopt the MFCC feature, and PLP feature or LPCC feature are in order to improve the anti-noise effect, the processing that can use cepstral mean to subtract in the phonetic feature leaching process.Consider the MFCC characteristic use people's ear the acoustics apperceive characteristic and noise is had robustness preferably, preferred MFCC feature is as phonetic feature.Voice signal has frame-to-frame correlation as stationary signal in short-term between the speech frame, can improve the accuracy rate of speech recognition to MFCC feature extraction first-order difference or single order and second differnce for this reason.

Step 422 calculates the acoustics distance of the voice feature data of input with respect to entry according to acoustic model and identification vocabulary.In the present embodiment, speech recognition obtains the shortest accumulation acoustics distance of each isolated word according to acoustic model data and isolated vocabulary data, get then the shortest acoustics apart from the isolated word of minimum as the first-selected recognition result of these voice.The acoustic model that speech recognition is adopted comprises continuous HMM model and Discrete HMM model.In addition, the recognition result that speech recognition can also provide a plurality of candidates allows the user select, and the foundation of ordering is the shortest accumulation acoustics distance.In the present embodiment, employing comprises the model parameter at the acoustic model of the bilingual kind of hybrid modeling of HMM.Parameter and speaker that bilingual kind is mixed acoustic model have nothing to do, and are the model at unspecified person.Model parameter needs to train through training aids according to the good expectation data of mark in advance, the parameter that training obtains just can be cured to the speech recognition that acoustic model parameter storage part is used for isolated word, and the acoustic model parameter comprises the state parameter of HMM and the probability-distribution function of state output observational characteristic vector.Before this step, can also comprise speech selection signal, select the step of an acoustic model corresponding with this speech selection signal according to user's input.

Step 424, judge voice feature data with respect to each entry acoustics distance whether less than threshold value, if the acoustics distance is not less than threshold value, execution in step 426; If acoustics distance is less than threshold value, execution in step 428.

Step 426, if voice feature data with respect to the acoustics of entry distance more than or equal to threshold value, recognition result is a non-voice, the prompting user re-enters.This prompting can be message notifying, video display reminding or auditory tone cues, and in the present embodiment, employing mode of display reminding literal on screen is pointed out the user.After the execution of step 426, finish this identifying.

Step 428, if voice feature data with respect to the acoustics of entry distance less than threshold value, calculate the channel designation of current speech correspondence according to identification vocabulary and matching list.In the present embodiment, obtain the shortest accumulation acoustics distance of each isolated word according to acoustic model data and isolated vocabulary data, get then the shortest acoustics apart from the isolated word of minimum as the first-selected recognition result of these voice.The acoustic model that speech recognition is adopted comprises continuous HMM model and Discrete HMM model.In addition, the recognition result that can also provide a plurality of candidates allows the user select, and the foundation of ordering is the shortest accumulation acoustics distance.

Step 430 switches to the channel that needs switch according to the channel designation that identifies.If there is the entry of coupling, when Query Result was single entry, controller top box live telecast switched to the channel of entry mid band name attribute-bit; When Query Result is a plurality of record, the control video screen shows the property value of the channel name of a plurality of entries, and the prompting user selects one of them channel to watch live television programming by remote controller, treat that the user finishes selection after, the control TV switches to the channel that the user selects.

Please in conjunction with referring to Fig. 5, embodiment of the invention channel and listing update method comprise the steps:

Step 502 checks whether channel and listing satisfy the condition that is provided with of upgrading, and upgrading the condition that is provided with can be according to user's demand setting, and the renewal of identification vocabulary and matching list can be set to one day.If satisfy to upgrade condition execution in step 504 is set, otherwise returns step 502.

Step 504, the channel switch device is downloaded up-to-date channel and listing data, more new channel and listing from the EPG server.

The target of this renewal can be the EPG server, also can be local network or CD etc.

Please in conjunction with referring to Fig. 6, embodiment of the invention identification vocabulary and matching list update method comprise the steps:

Step 602 checks whether identification vocabulary and matching list satisfy the condition that is provided with of upgrading, and upgrading the condition that is provided with can be according to user's demand setting, and the renewal of identification vocabulary and matching list can be set to one minute.If satisfy to upgrade condition execution in step 604 is set, otherwise returns step 602.

Step 604 is upgraded local identification vocabulary and matching list according to channel and listing.

One of ordinary skill in the art will appreciate that all or part of step in the said method can be finished by the relevant hardware of program command, this program can be stored in the computer-readable recording medium, this storage medium as, RAM, ROM or CD etc.

The embodiment of the invention receives the user's voice input signal by controller, identify title to be matched by the channel switch device according to the voice input signal of described input, mate the channel that draws the needs switching according to described title to be matched and matching list, and switch to the described channel that need to switch, avoided the complicated and high problem of cost at the enterprising lang sound of controller identifying operation, make the user operate very convenient, and make full use of the performance of channel switch device, saved the cost of control.Identify title to be matched by the channel switch device, special speech recognition server need be set in network, prevent that the response time is long, avoided because the problem that transmitted data on network is lost, and saved the cost of building network.The embodiment of the invention is by intercepting actual speech section, and the accuracy rate of speech recognition is improved, and has removed the interference of noise.During by quiet control unit control phonetic entry, set-top box is quiet, prevent the sound of televising interference to user speech.From EPG server more new channel and listing automatically, identification vocabulary and matching list have avoided that the user is manual affected to bring unhandy drawback by update module.

In sum, more than be preferred embodiment of the present invention only, be not to be used to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a speech recognition band selecting method is characterized in that, this method comprises:

Controller receives the user's voice input signal;

Switch to the described channel that needs switching.

2. speech recognition band selecting method as claimed in claim 1, it is characterized in that, this method further comprises: receive the voice activated instruction of user's input, it is voice activated that this instruction is used to control described channel switch device, and the channel switch device is changed to mute state.

3. speech recognition band selecting method as claimed in claim 1, it is characterized in that, described channel switch device identifies title to be matched according to the voice signal of importing and comprises: the voice signal of collection and process user input, detect the starting point and the terminal point of actual speech section, identify described title to be matched according to the starting point and the terminal point of described actual speech section.

4. speech recognition band selecting method as claimed in claim 1 is characterized in that, described channel switch device identifies title to be matched according to the voice signal of importing and comprises: voice signal is carried out phonetic feature extract; Calculate the acoustics distance of described voice feature data according to acoustic model and identification vocabulary with respect to the entry in the identification vocabulary; If voice feature data less than threshold value, calculates the channel designation of current speech correspondence with respect to the acoustics of entry distance according to identification vocabulary and matching list.

5. speech recognition band selecting method as claimed in claim 4 is characterized in that, this method also comprises: if voice feature data with respect to the acoustics of entry distance more than or equal to threshold value, the prompting user re-enters voice.

6. speech recognition band selecting method as claimed in claim 5 is characterized in that, the mode that described prompting user re-enters voice is for can't discern by the voice of the current input of video screen explicit user, and the prompting user re-enters.

7. speech recognition band selecting method as claimed in claim 1, it is characterized in that, this method also further comprises: the channel switch device sends to controller and stops the voice collecting control signal, and controller stops to gather and processes voice signals according to the described control that stops the voice collecting control signal.

8. speech recognition band selecting method as claimed in claim 1 is characterized in that, this method further comprises: the channel switch device is according to described matching list of electronic program guide (EPG) server update and/or described identification vocabulary.

9. speech recognition band selecting method as claimed in claim 1 is characterized in that, this method further comprises: according to the speech selection signal of user input, select one with the corresponding acoustic model of described speech selection signal.

10. speech recognition band selecting method as claimed in claim 1 is characterized in that, described controller and described channel switch device communicate by wireless transmission protocol.

11. speech recognition band selecting method as claimed in claim 10 is characterized in that, described wireless transmission protocol comprises: one or more in high speed infrared agreement, Bluetooth transmission protocol and the purple honeybee Zigbee host-host protocol.

12. a speech recognition channel selection system is characterized in that, this system comprises: controller is used for communicating with the channel switch processing unit;

Described controller is used to receive the user's voice input signal;

13. speech recognition channel selection system as claimed in claim 2, it is characterized in that, this system also comprises: the electronic program guide (EPG) server, be used to matching list that provides to be updated and/or the identification vocabulary that upgrades most, described channel switch device upgrades described matching list according to described matching list to be updated, and/or upgrades described identification vocabulary according to described up-to-date identification vocabulary.

14. a channel switch device is characterized in that, this device comprises:

15. channel switch device as claimed in claim 14 is characterized in that, this device also comprises:

Quiet control module is used for the voice activated instruction according to user's input, and the channel switch device is changed to mute state.

16. channel switch device as claimed in claim 14 is characterized in that, described recognition processing module further comprises:

The voice activation detecting unit is used to detect the starting point and the terminal point of actual speech section.

17. channel switch device as claimed in claim 14 is characterized in that, described recognition processing module further comprises:

The phonetic feature extraction unit is used for that voice signal is carried out phonetic feature and extracts;

Voice recognition unit is used for calculating the acoustics distance of the voice feature data of input with respect to identification vocabulary entry according to acoustic model and identification vocabulary;

The voice judging unit, be used to judge voice feature data with respect to the acoustics distance of entry whether less than threshold value, if voice feature data with respect to the acoustics distance of entry less than threshold value, calculate the channel designation of current speech correspondence according to identification vocabulary and matching list.

18.. channel switch device as claimed in claim 17 is characterized in that, this device also comprises:

Refusal identification reminding module is used for when recognition result is non-voice, and the prompting user re-enters voice.

19. channel switch device as claimed in claim 14 is characterized in that, this device also comprises:

Update module is used for according to described matching list of electronic program guide (EPG) server update and/or described identification vocabulary.

20. channel switch device as claimed in claim 14 is characterized in that, this device also comprises:

The speech selection module is used for the speech selection signal according to user input, select one with the corresponding acoustic model of described speech selection signal.