Background technology
The control of television set develops into remote control mode gradually by early stage knob, button, and the product of this convenience of remote controller and the control of energy far-end has become the accessory that television set can't lack.TV remote controller is mainly button at present, and has two types: a kind of is the fixed code type, corresponding one or more yards type of each key, and manufacturer preestablishes, and the user can not change; Another kind is learning-oriented, function with self-teaching remote control mode, the user can define the sign indicating number type of each key correspondence of remote controller, it can roll into one multiple remote control, with a remote controller with regard to other a plurality of household electrical appliances of may command, can be used as the backup of first wife's TV remote controller again, the business opportunity of television set also improves with the abundant of distant control function.Because the advance TV function constantly increases, above-mentioned two kinds of remote controllers all have button too much, and the user is difficult for remembeing the problems such as implication of each key.
The abundant development of speech recognition technology provides possibility for addressing the above problem.Speech recognition technology is applied to remote controller, utilizes the voice command replacing key-press, make voice-operated remote controller, like this, both made things convenient for memory and the use of user, saved a large amount of buttons simultaneously, dwindled the volume of remote controller order.Existing several speech recognition technologies can be used to discern user's spoken language order, wherein a kind of continuous speech recognition system that is based on big vocabulary speech retrieval, another kind then is based on sub-speech (Sub Word) indexing units or based on the single speech recognition mode of the discovery (Spotting) of keyword.Mainly adopted single speech recognition mode as voice recognition remote controllers such as the disclosed utility model patent of the 2001.05.09 of Kelong Electrical Equipment Co., Ltd., Guangdong " can voice-operated intelligent refrigerator " (2429801), the disclosed utility model patents of 2001.01.24 " can voice-operated air conditioner ".
Yet the program channel number that the TV function is watched constantly increases, and each radio and television are play programme content that unit broadcasts and quality difference to some extent, and the user does not want that the content seen such as advertisement etc. are more, and user's choice is bigger.Present most of manufacturer all fails to provide the voice control mode of the attached remote controller of television set, and people make people think intactly to watch and want to see some difficulty of programme content through the manual switchover repeatedly between different program channels of being everlasting.The important symbol of speech recognition is the validity to voice control command on the analysis foundation of the aural signature of the keyword of identification content topic, semantic feature, because rich and varied, fast and expansionary restriction of variation of broadcast TV program content topic utilize speech recognition technology correctly to produce control command, correct recognition rata is reduced greatly, even wrong identification occurs.
The utility model content
The utility model provides a kind of and can retrieve the TV playing speech on demand device that programme content also can be logged on to a TV channel fast by voice command at the above-mentioned defective of prior art.
The utility model also provides a kind of speech-sound intelligent remote controller that is used to select audio-video medium equipment broadcast items contents such as TV or multimedia.
The technical solution of the utility model is as follows:
A kind of TV playing speech on demand device, comprise voice sensing element, filter, analog to digital converter, dynamic memory, loud speaker, digital to analog converter, controller, power supervisor, supervisory keyboard, digital signal processor and static memory, the voice sample card that it is characterized in that being electrically connected with the audiovisual/visual information indexing unit in the described static memory He contain the voice command Identification Lists.
Described controller also is connected with infrared dispensing device and infrared receiving device.
Described voice sensing element is equipped with the microphone array device and the infra-red tracing system of voice de-noising, directional reception.
Voice command Identification Lists in the described voice sample card can upgrade or change, and can be provided with dialect or foreign language.
Described voice sample card is two or more, all has movable splicing interface, and voice command Identification Lists separately is corresponding with the audiovisual/visual information retrieve data in the described static memory respectively.
A kind of speech-sound intelligent remote controller, comprise voice sensing element, filter, analog to digital converter, dynamic memory, loud speaker, digital to analog converter, controller, power supervisor, supervisory keyboard, digital signal processor and static memory, the voice sample card that it is characterized in that being electrically connected with the audiovisual/visual information indexing unit in the described static memory He contain the voice command Identification Lists.
The technique effect of the utility model TV playing speech on demand device is by the user's voice order, can retrieve and lock the user fast and want the program seen; Easy and simple to handle, can keep the integrality of original radio and television playback equipment complete machine, for television set, only auxiliary product such as remote controller are partly improved, so that at present more user can accept and use.
Because the category of language that TV playing speech on demand device of the present utility model and user use is irrelevant, its operation can be opened, be locked by the audible password that the user says.
TV playing speech on demand device of the present utility model can use for many people, and the different users can be provided with voice command Identification Lists separately and contain the voice sample card of this Identification Lists.
Speech-sound intelligent remote controller of the present utility model, owing to be provided with the audiovisual/visual information indexing unit, can utilize user speech input to set request program in certain one-period singly controls such as the broadcast of automaticallying switch of audio-video medium playing devices such as television set, multimedia, reduce the operation of user, also can eliminate the worry of user with the impression troublesome operation to the remote controller operating key.
Embodiment
The utility model is described in further detail below in conjunction with accompanying drawing.
With reference now to Fig. 1,, connects the power supply of TV playing speech on demand device by pressing power connection key on the supervisory keyboard 40.TV playing speech on demand device is placed on the front of user's mouth, leans on to such an extent that enough closely speak so that picked up the phonetic entry order that control TV playing speech on demand device is provided by voice sensing element 12 by the user.Voice sensing element 12 is an analog signal with user's language conversion.What be connected to voice sensing element 12 is filter 14, and this filter 14 is with the noise signals filtering in the non-genus voice frequency range in the analog signal of voice sensing element 12 generations.What be connected to filter 14 is analog to digital converter 16, and this analog to digital converter 16 becomes digital signal with the analog signal conversion of filter 14 filterings.Digital signal sends to digital signal processor 24 by analog to digital converter 16, and this digital signal processor 24 deposits signal in the dynamic memory 18 that is connected with this digital signal processor 24.Then, in most preferred embodiment of the present utility model, digital signal processor 24 calls the audiovisual/visual information retrieve data in the audiovisual/visual information indexing unit 26 that is stored in the static memory 30 of the present utility model, so that the digital signal that is stored in the dynamic memory 18 is carried out a series of frequency domain transformations.Audiovisual/visual information indexing unit 26 produces model of cognition, and this model of cognition is spectrum transformation, compares with the model of cognition (also being spectrum transformation) of order in the voice sample card that contains the voice command Identification Lists 28 that is stored in static memory 30.The professional and technical personnel can know, any other proper method that is used for discerning speech model can be used in and replace spectrum transformation in the utility model.If occurrence is arranged, digital signal processor 24 is visited the instruction set in the voice sample card that contains the voice command Identification Lists 28 that is connected to the command recognition model so.So, Xiang Guan instruction set just is performed by controller 32, infrared dispensing device 38 and infrared receiving device 36.
In order to start the action of the utility model TV playing speech on demand device, such as the switching channels action that utilizes remote controller, the user connects the power supply of TV playing speech on demand device, import voice control input command " in " then, this voice control input command " in " is picked up by the voice sensing element that is positioned at TV playing speech on demand device by TV playing speech on demand device.TV playing speech on demand device at recognition command " in " afterwards, remote controller is prepared television screen to switch on the channel at the Chinese Central Television (CCTV) one cover place, this repeats to point out the user by what can listen received pronunciation " in ".In case TV playing speech on demand device recognition command, it just carries out required operation.If TV playing speech on demand device processed voice input command and when not finding occurrence just can listen no occurrence prompting such as received pronunciation to export to the user.Then, the playing speech on demand device waits and receives next voice control input command.
When certain two or more broadcast items is identical at one time or when a certain playing programs, do not finish as yet and another follow-up setting broadcast items when having begun, TV playing speech on demand device can automatically switch between these broadcast items, broadcast, each program stayed for some time, as one minute.The user can be with confirming that voice command locks the broadcast items that need see.This operation delay is cancelled behind this programme content close of transmission automatically, and TV playing speech on demand device then automaticallyes switch according to predefined broadcast items list and broadcasts the programme content of follow-up setting.
By using oral password, voice control input makes power supply look the playing speech on demand device to specific user's startup.In case TV playing speech on demand device energized, before receiving and handling correct password, it can not move.As long as user's password is not eavesdropped, just can avoid preventing that the people outside the user from using TV playing speech on demand device.
For the user of non-standard voice, TV playing speech on demand device of the present utility model must be trained earlier before use, so that digital signal processor 24 can be discerned the voice command that the user assigns.Control television set by the voice operating order is input in the TV playing speech on demand device, under this type of situation, the playing speech on demand device must be placed on the front of user's mouth.Digital signal processor 24 with the user's voice signal storage in static memory 30, as the foundation that receives voice command contrast backward; If have many people user, then next user to carry out above-mentioned training, import its voice signal.
In the present embodiment, TV playing speech on demand device of the present utility model is trained the verbal speech control input command of discerning the user by the audiovisual/visual information retrieve data in the audiovisual/visual information indexing unit 26.When in training method, using audiovisual/visual information indexing unit 26 and data thereof, can control the scheduled program advance notice that input command carries out to TV playing speech on demand device by voice and singly give the user.For example, order 1 can represent to carry out the instruction set of switching television screen to first channel.When select command 1 is trained and analyzed, audiovisual/visual information indexing unit 26 will point out the user to call TV station's name of switching the first channel functions instruction set.Then, user-selected TV station's name will be pointed out.Logical selection should be select command " TV station's name ", but user-selected any switching mode all is feasible.At every turn the repeating of " TV station's name " all picked up by TV playing speech on demand device, and done the data network analysis by audiovisual/visual information indexing unit 26, comprises the user and sends the variation of voice of " TV station's name " order and the model of cognition of tone so that produce.The user-selected model of cognition that is used for calling the voice command of various functions all is stored in the voice sample card 28 that contains the voice command Identification Lists of static memory 30 of TV playing speech on demand device.That model of cognition in containing the voice sample card 28 of voice command Identification Lists links separately is predetermined, also be stored in the instruction set of the various functions in the static memory 30.Therefore when the verbal speech input command was received and discerns by TV playing speech on demand device, the instruction set relevant with this order keyword just was performed.Because the instruction set of function depends on that the keyword of program is selected and the user is to the training and the speech analysis of that keyword selection subsequently, so present embodiment is irrelevant with category of language, making can be with foreign language as voice control input command keyword.
In this example, spoken order at first is detected as voice signal, usually pick up by one or more voice sensing elements, then, user voice signal is input in the dynamic memory 18 storage and is fed to audiovisual/visual information indexing unit 26, and aural signature, the semantic feature of the voice command in 26 pairs of dynamic memories 18 of audiovisual/visual information indexing unit carried out the analysis identification based on sound model and speech model.Sound model adopts a large amount of figure of speech, used mathematical algorithm to indicate the speech that mates most with the spoken language order on the acoustics.And speech model is based on analysis, and this analyzes a plurality of keywords of use, and such audiovisual/visual information indexing unit 26 that contains the audiovisual/visual information retrieve data is not only discerned single word and can be discerned continuous uttered sentence in high discrimination ground.
The model of cognition that contains the voice sample card 28 of voice command Identification Lists is scheduled to, and the particular keywords that must use with the user is relevant.For example, the user can revise the pronunciation of his or her order keyword " TV play ", can discern the order of saying up to the playing speech on demand device as the user.Therefore, in this embodiment, TV playing speech on demand device will be aimed at a kind of specific language earlier, and in this language, caused action represented in the order keyword.The foreign language form that can make this equipment for the user who uses the foreign language keyword of representing caused action.
TV playing speech on demand utensil has the voice audio frequency input/output by digital signal processor 24 controls.When receiving voice control input command, digital signal processor 24 is stored in digitized phonetic entry in the dynamic memory 18.Digital signal processor 24 processing commands then, and command recognition model and the model of cognition that is stored in the voice sample card 28 that contains the voice command Identification Lists in the static memory 30 compared.After finding occurrence, begin to carry out the instruction set relevant with model of cognition.The instruction set of particular command can comprise by the audible of output command title to be play, to the customer acceptance order.Particular command also can have the time-delay in the instruction set of being embedded in, and allows the free mandatum cassatorium of user.If the user changes mind to the order of just having sent, perhaps, if the not correct understanding order of TV playing speech on demand device, the user can be by the supervisory keyboard on the remote controller 40, perhaps by can cancelling the voice control input command of the previous order that is received, before command execution with its cancellation.Otherwise,, will carry out the instruction set of this order if do not receive the input of mandatum cassatorium.
When same TV playing speech on demand device uses for many people, the voice sample card that contains the voice command Identification Lists 28 of static memory 30 storages can be changed, voice command Identification Lists in the voice sample card 28 also can be changed or be upgraded, and the voice sample card is made movable grafting form.If detect a plurality of possible users, so various processing is conceivable.According to embodiment, the user that TV playing speech on demand device always adapts at first to be detected is so TV playing speech on demand device is followed this user.Equally, TV playing speech on demand device is also followed the user who at first finishes phonetic entry.In addition, between various possible users, it is possible finishing frequent conversion.Here also more advantageously, TV can't only change channel because other people have mentioned the order keyword in talk.
Broadcasting the input of program voice command table, the process of renewal in advance, at first is that user's read code (R.C.) starts TV playing speech on demand device, and TV playing speech on demand device is in holding state; Speech recognition software thinks that the user can import, upgrade the phonetic entry operation of broadcasting the voice command Identification Lists in the program voice sample card 28 in the static memory 30 in advance; Speech detection has promptly judged whether phonetic entry; After having determined phonetic entry, these voice are carried out the processing of two aspects, the one, extract the feature of these voice, promptly calculate its MFCC parameter, the 2nd, speech data is carried out compressed encoding; If the user keys in the quality of the dissatisfied voice command of information representation, then repeat above operation, if the user keys in the satisfied voice quality of information representation, then point out the user to key in the coding of voice command, in the voice program notice list and code storage static memory 30 thereof after the characteristic parameter (being template) of the voice command of input and the compression, at this time finished the once operation of training then.
Because the arrangement of program column and the program notice list of each radio and television broadcasting agency have relative stability in a period of time, therefore, broadcast program voice command table in advance and also have metastable form and template.The foundation of broadcasting program voice command table in advance in the TV playing speech on demand device can be undertaken by each radio and television broadcasting agency is unified; Also can download from corresponding website by the user; The user also can be with the input of recording from TV guide of TV playing speech on demand device; The user also can obtain the back phonetic entry from the broadcast TV program advance notice of next time period is single.For the user who says non-standard mandarin or foreign language, the foundation of broadcasting program voice command table in advance in the TV playing speech on demand device can only also can be obtained the back phonetic entry by the user from the broadcast TV program advance notice of next time period is single, what a standard mandarin was arranged in the TV playing speech on demand device broadcasts program voice command table in advance, also has that one or more users' broadcast program voice command table in advance.
Because the programme content that broadcasts in a period of time of each radio and television broadcasting agency has nothing in common with each other, only partly need revise timely and upgrade for broadcasting the program that program voice command table changes in advance.Here be divided into two kinds of situations:, can be undertaken by each radio and television broadcasting agency is unified for the user who says standard mandarin; Also can download from corresponding website by the user; The user also can be with the input of recording from TV guide of TV playing speech on demand device; The user also can obtain the back phonetic entry from the broadcasting ﹠ TV news of next time period.For the user who says non-standard mandarin or foreign language, can only also can from the broadcast TV program advance notice of next time period is single, obtain the back phonetic entry by the user.
Voice sensing element 12 converts user voice signal to the signal of telecommunication, digital signal processor 24 becomes operational order to these electrical signal conversion, be input to storage in the dynamic memory 18, voice command is fed to voice recognition unit, this unit becomes keyword to electrical signal conversion.Speech recognition software is analyzed identification to aural signature, the semantic feature of voice keyword, and with static memory 30 in voice sample card 28 in the voice command Identification Lists carry out the uniterm coupling, select, discern the program request of errorless the Keywords section of promptly being correlated with by the television set or the computer of digital signal processor 24 control TV playing speech on demand devices connections.The set of voice command can guarantee in certain one-period to watch when decided at the higher level but not officially announced predefined serial rating program.
This shows that the process of speech recognition is at first carried out speech detection, has judged whether phonetic entry; If have then these voice are carried out feature extraction, promptly extract the MFCC parameter of input voice; The laggard line parameter of parameter extraction relatively, promptly the characteristic parameter of input voice and the characteristic parameter (being template) that is stored in the voice command in the voice sample card 28 in the static memory 30 are compared, determine whether and certain template matches wherein, two kinds of situations are arranged here, first kind of situation is to mate fully, then the template of being mated is the voice command of input, at this time the matching template corresponding codes is the coding of phonetic entry, be input to combinational logic by data wire, remove to control television set then.Second kind of situation is incomplete coupling, at this time find three immediate voice command templates, and their image is play switching respectively on television set, allow the user judge, if wherein there is one to be the voice command of input, remove to control television set after then confirming by the user; If three is not the voice command of input, then prompting allows the user re-enter voice command one time, repeats above-mentioned speech recognition process.
Utilize user speech input regular update to broadcast voice command Identification Lists in the program voice sample card 28 in advance, make its program notice list consistent and be stored in the static memory 30 with radio and television playback equipment broadcast at present such as user's television set or computer.Input contains the voice sample card 28 of voice command Identification Lists, and is stored in the static memory 30.The user of use standard words must not train just can directly use and broadcast program voice command Identification Lists in advance and come voice to control such as radio and television playing devices such as television set or computer medias.
The utility model TV playing speech on demand device also can be connected on the radio and television playback equipment of user's television set or computer and so on, so that training television playing speech on demand device identification user's voice control command, input, correction, renewal are broadcast the voice command Identification Lists in the program voice sample card 28 in advance and are stored in the static memory 30.
The above only is a preferred implementation of the present utility model.Should be understood that; for a person skilled in the art, based on the same principle of invention of the utility model, can also make some modification and improvement; and the technical program is in the application in other similar fields, but these all fall among the protection range of the present utility model.