CN102013254A

CN102013254A - Man-machine interactive system and method for digital television voice recognition

Info

Publication number: CN102013254A
Application number: CN 201010549953
Authority: CN
Inventors: 罗笑南; 刘宁; 苏嘉伟; 薛凯军; 陈健民
Original assignee: DONGGUAN JVCHUAN ELECTRONIC TECHNOLOGY Co Ltd; GUANGDONG ZSU TELECOMMUNICATION INFORMATION CO Ltd
Current assignee: DONGGUAN JVCHUAN ELECTRONIC TECHNOLOGY Co Ltd; GUANGDONG ZSU TELECOMMUNICATION INFORMATION CO Ltd
Priority date: 2010-11-17
Filing date: 2010-11-17
Publication date: 2011-04-13

Abstract

The invention discloses a man-machine interactive system and method for digital television voice recognition. The system comprises a target voice acquisition module, a voice analysis module, a semantic computation module and an intelligent control module, wherein the target voice acquisition module comprises a signal amplification module, a forward filtering module, a signal sampling module and a data compression and coding module; and the voice analysis module comprises a noise removal module, a feature extraction module and a decoding device. The method comprises the processes of target voice acquisition, voice noise removal, voice recognition processing, command recognition conversion and intelligent control processing. In the invention, through the cooperative work of the modules, a digital TV man-machine interactive technology for anti-interference voice intelligent identification and voice analysis and interaction under the digital TV reverberation acoustic environment of digital home life is achieved, and an advanced digital TV voice language interaction mode is provided.

Description

A kind of Digital Television speech recognition man-machine interactive system and method

Technical field

The present invention relates to speech processes and semantic recognition technology field, and the technology of computer intelligence analysis, processing and collection voice, be specifically related to a kind of Digital Television speech recognition man-machine interactive system and method.

Background technology

Speech recognition technology mainly is to allow machine voice signal be become the technology of corresponding text or order by identification and understanding.The collection input of speech recognition technology by voice extracts the feature of voice, and the voice messaging feature of performance model database is carried out pattern match again, and obtaining the information translation that voice comprise is literal or order.

According to the object difference of speech recognition, in the speech recognition personage, can be divided into isolated word identification, key word recognition and continuous speech recognition three classes substantially.Isolated voice identification is used in identification known vocabulary in advance, key word recognition is used in the middle of the continuous voice, but it is the whole literal of nonrecognition also, and only detect the appearance of known plurality of keywords, and continuous speech recognition is used to discern continuous a sentence or one section word.

Under the Digital Television reverberation acoustic enviroment of real family life, the speech recognition influence that noise caused is bigger.In real family life, the restriction of speech recognition mainly is the lack of standard and the arbitrariness of The noise and interactive voice.Briefly, because noise impacts user's voice sampling and input, in speech recognition losing of misinterpretation or user speech can be taken place.The uncertainty at random that the lack of standard of user interaction voice and the arbitrariness coupling in speech recognition is brought may be because the lack of standard of voice and arbitrariness and matching error cause the language semanteme of voice to separate mistake in the middle of the coupling voice.

Solution is under the Digital Television reverberation acoustic enviroment of family life, and under the situation of user interaction voice lack of standard and arbitrariness, the identification of keyword voice has better application to the continuous speech recognition under this environment.In user's continuous speech order, key word recognition can match the position at known keyword place, and according to the position and the combination of keyword, and explaining needs the order carried out.

Therefore, the present invention proposes a kind of Digital Television speech recognition man-machine interactive system and method, purpose is under the environment of Digital Television, and advanced Digital Television voice language interactive mode is provided.

Summary of the invention

The objective of the invention is under the Digital Television reverberation acoustic enviroment of real family life, solve the lack of standard and the arbitrariness problem of interactive voice, interpersonal interactive system of a kind of Digital Television speech recognition and method are provided.

The interpersonal interactive system of Digital Television speech recognition of the present invention is made up of target voice acquisition module, speech analysis module, semantic computing module and intelligent control module.

Described target voice acquisition module is one or more microphone or other input systems of being used to gather voice messaging, realize the automatic collection of voice messaging, and the voice messaging of simulation is to the conversion of digital speech information, comprise signal amplification module, filtration module, signal sampling module forward, the data compression coding module;

Described speech analysis module is used for processed voice information, under the Digital Television reverberation acoustic enviroment of real family life, extract useful voice messaging, remove the noise noise, draw the voice messaging data then, be converted into Word message, comprise noise remove module, characteristic extracting module, decoder module;

Described semantic computing module is used to understand the implication of the Word message that speech analysis module draws, and by fuzzy message search and Chinese characters spoken language understanding, voice is carried out feature extraction, voice messaging is interpreted as the order that can carry out.At first in Word message, search for all literal relevant and carry out semanteme calculating, according to the position of order literal and the context statement of order and order literal, judge the order of required execution again with order according to the command information storehouse.Semantic computing module is set voice and order corresponding conversion relation, thereby crucial Word message is converted into order by the crucial literal information that identifies is made an explanation in the Digital Television reverberation acoustic enviroment of real family life.

Described intelligent control module, be used to receive the order of semantic computing module, when order can correct execution, carry out the order that obtained and the user is carried out the prompting of sound, image and video and mutual, continue to return the target voice acquisition module then the user is carried out alternately.When order is invalid, invalid to the user prompt order, return the interactive voice information that the target voice acquisition module is waited for the user then.

In the technique scheme, described target voice acquisition module also comprises the data compression coding module, and transmission speed is accelerated, and reduces the time-delay of system.

In the technique scheme, signal sampling module in the described target voice acquisition module uses single-chip microcomputer to make the double data processing of control, sampled data is read in CPU control just, carries out data compression then voluntarily, can meet the requirements of in speed that cost is relatively low simultaneously.

Speech analysis module described in the present invention is provided with deposits Chinese characters spoken language database of information module.When setting up keyword, adopt the syllable modeling, hidden Markov model (HMM) topological structure on acoustic model and language model basis is cut apart earlier, each section is decoded again.

Described semantic computing module is provided with the database module of depositing fill order and information extraction strategy, and described database module is provided with the artificial intelligence self-study mechanism, and is provided with the manual control interface.Artificial selection ambiguity information is set in semantic analysis, and the information extraction strategy of database is carried out artificial intelligence study, strengthen the accuracy of semantic identification.

In the such scheme, described semantic computing module has merged Chinese fuzzy information retrieval, Chinese characters spoken language understanding technology, utilize Chinese fuzzy information retrieval to find out the key words that comprises order, utilize Chinese characters spoken language understanding technology key words is understood and to be explained again, thus obtain the order that need carry out.

Described intelligent control module can be according to the direct control figure TV of order, and intelligent control module can be operated set-top box according to order, thereby reaches control figure TV and the mutual effect of people.

In addition, a kind of Digital Television speech recognition man-machine interaction method, its step is as described below:

1) initial step is used to start the interpersonal interactive system of this speech recognition;

2) gather voice messaging, under the Digital Television reverberation acoustic enviroment of real family life,, then gather user's voice information by the target voice acquisition module if the user wants to be undertaken alternately by voice and Digital Television.At first utilize measuring amplifier that voice signal is amplified, adopt 5 rank Butterworth low passes and 5 rank Butterworth high pass cascades to carry out filtering forward then, utilize the AD sampling A to carry out the signal sampling of 4k and 8k sampling rate according to Nyquist criterion again.Carry out data compression coding at last, make data become digital speech information;

3) conversion of voice messaging, the voice messaging that the target voice acquisition module is gathered comprises noise, by the processing of speech analysis module, user's voice information is extracted, and explanation becomes Word message.With reference to the fill order of all Digital Television, the keyword that definition is relevant with order, by speech analysis module, coupling identifies the position of keyword in user's continuous speech input, and keyword is mapped as Word message;

4) semantic understanding according to the Word message that is drawn, by semantic computing module, draws the order that will be performed.In Word message, search for all and order relevant literal according to the command information storehouse, carry out semanteme calculating according to the position of order literal and the context statement of order and order literal again, judge the order of required execution;

5) by in order that semantic computing module drew, when order can be executed correctly, the intelligent control module fill order is also carried out the mutual of sound, image and video to the user, and return the target voice acquisition module and the user is carried out next step is mutual, when order is invalid, intelligent control module is invalid to the user prompt order, returns the interactive voice information that the target voice acquisition module is waited for the user then.

Beneficial effect of the present invention is as follows:

1, a kind of Digital Television speech recognition man-machine interactive system and method proposed by the invention realizes the mutual of Digital Television voice language.The present invention provides the mutual of user and advanced Digital Television voice language under the Digital Television reverberation acoustic enviroment of real family life, realize the application towards digital home.

2, a kind of Digital Television speech recognition man-machine interactive system and method proposed by the invention, when setting up keyword, adopt the syllable modeling, hidden Markov model (HMM) topological structure on acoustic model and language model basis, cut apart earlier, again each section is decoded, can make speech recognition more accurate.

3, a kind of Digital Television speech recognition man-machine interactive system and method proposed by the invention, in semantic understanding, utilization interactive operation and artificial intelligence learning method, in Word message, search for all literal relevant according to the command information storehouse with order, carry out semanteme according to the context statement of the position of order literal and order and order literal again and calculate, make and semanticly judge more accurately and quick.

4, a kind of Digital Television speech recognition man-machine interactive system and method proposed by the invention, in the Digital Television reverberation acoustic enviroment of real family life, set voice and order corresponding conversion relation, can be in lack of standard that adapts to voice better and arbitrariness.

Description of drawings

Fig. 1 is an entire system module frame chart of the present invention;

Fig. 2 is the operational flowchart of the inventive method;

Fig. 3 is a voice collecting process flow diagram of the present invention;

Fig. 4 is a speech analysis process flow diagram of the present invention.

Embodiment

Describe the present invention below in conjunction with accompanying drawing.

As shown in Figure 1, a kind of Digital Television speech recognition man-machine interactive system, it comprises target voice acquisition module, speech analysis module, semantic computing module and intelligent control module; Described target voice acquisition module comprises signal amplification module, filtration module, signal sampling module, data compression coding module forward; Described speech analysis module comprises noise remove module, characteristic extracting module, decoder module.

The functional description of each module is as follows:

1, target voice acquisition module: one or more is used to gather microphone or other input systems of voice messaging, realizes the automatic collection of voice messaging, and the voice messaging of simulation is to the conversion of digital speech information.Be transferred to speech analysis module after the data-switching and carry out the identification processing of voice.

1) signal amplification module: because under the Digital Television reverberation acoustic enviroment of real family life, the voice signal that microphone is gathered is comparatively small and weak, need isolate to amplify to small-signal to strengthen voice signal.

2) filtration module forward: utilize Filtering Processing sound, can remove noise, outstanding voice signal.

3) signal sampling module: the voice signal to simulation carries out signal sampling and conversion process, utilizes single-chip microcomputer to carry out computing, and analog voice information is converted to digital speech information.

4) data compression coding module: the digital speech information after the sampling is carried out compressed encoding, and convenient storage and transmission improve transmitting speed.

2, speech analysis module: under the Digital Television reverberation acoustic enviroment of real family life, extract useful voice messaging, remove the noise noise, draw the voice messaging data then, be converted into Word message.With reference to the fill order of all Digital Television, definition and the relevant keyword of order by speech analysis module, are mated in user's continuous speech is imported and are identified the position of keyword, and keyword is mapped as Word message passes to semantic computing module.

1) noise remove module: in digital speech information, the utilization Wiener filtering is removed noise, makes the digital speech information can be not affected by noise, and makes digital speech information express more accurately.

2) characteristic extracting module: in digital speech information, extract phonetic feature, according to the various combination of voice to cutting apart that voice carry out looking like.

3) decoder module: the voice messaging that splits is carried out speech recognition decoder, the decoding finish after converting voice message into text message.

3, semantic computing module: understand the implication of the Word message that speech analysis module draws,, then voice are carried out feature extraction, voice messaging is interpreted as the order that can be performed by fuzzy message search and Chinese characters spoken language understanding.The command transfer explaining out carry out processing to intelligent control module.

4, intelligent control module: receive the order of semantic computing module, when order can be executed correctly, carry out the order obtained and the user is carried out the prompting of sound, image and video and mutual, continue to return the target voice acquisition module then the user is carried out alternately.When order is invalid, invalid to the user prompt order, return the interactive voice information that the target voice acquisition module is waited for the user then.

Be illustrated in figure 2 as a kind of operational flowchart of Digital Television speech recognition man-machine interactive system.

Operating process divides following several steps:

2) gather voice messaging, under the Digital Television reverberation acoustic enviroment of real family life,, then gather user's voice information by the target voice acquisition module if the user wants to be undertaken alternately by voice and Digital Television.At first in the signal amplification module, utilize measuring amplifier that voice signal is amplified, in filtration module forward, adopt 5 rank Butterworth low passes and 5 rank Butterworth high pass cascades to carry out filtering forward then, utilize AD sampling A in the signal sampling module to carry out the signal sampling of 4k and 8k sampling rate according to Nyquist criterion again.How carry out data compression coding in the data compression coding module at last, make data become digital speech information;

3) conversion of voice messaging, the voice messaging that the target voice acquisition module is gathered comprises noise, by the processing of speech analysis module, user's voice information is extracted, and explanation becomes Word message.At first the noise remove module is removed the digital speech noise, fill order with reference to all Digital Television, the keyword that definition is relevant with order, pass through characteristic extracting module, coupling identifies the position of keyword in user's continuous speech input, and by decoder module keyword is mapped as Word message;

Be illustrated in figure 3 as the voice collecting process flow diagram in a kind of Digital Television speech recognition man-machine interactive system.When the present invention carries out voice collecting, the analog voice information of input at first utilizes measuring amplifier that voice signal is amplified, adopt 5 rank Butterworth low passes and 5 rank Butterworth high pass cascades to carry out filtering forward then, utilize the AD sampling A to carry out the signal sampling of 4k and 8k sampling rate according to Nyquist criterion again.Carry out data compression coding at last, make data become digital speech information.

Be illustrated in figure 4 as the speech analysis process flow diagram in a kind of Digital Television speech recognition man-machine interactive system.When the present invention carries out speech analysis, the speech data of input at first uses Wiener filtering to remove noise, draw user speech information accurately, utilize the feature extraction acoustic feature of Chinese characters spoken language, utilize acoustic feature to decode by the speech model set pair feature that it is good that the Viterbi algorithm utilizes training in advance, at last decoded information and literal are mated, generate Word message.

Claims

1. Digital Television speech recognition man-machine interactive system is characterized in that comprising:

Realize the automatic collection of voice messaging, and the voice messaging of simulation is to the target voice acquisition module of the conversion of digital speech information;

Be responsible for processed voice information, under the Digital Television reverberation acoustic enviroment of real family life, extract useful voice messaging, remove the noise noise, draw the voice messaging data then, be converted into the speech analysis module of Word message;

Be used to understand the implication of the Word message that speech analysis module draws, voice messaging be interpreted as the semantic computing module of the order that can be performed;

Be used to receive the order of semantic computing module, the intelligent control module of fill order information.

2. Digital Television speech recognition man-machine interactive system according to claim 1 is characterized in that described target voice acquisition module also comprises signal amplification module, filtration module, signal sampling module forward, data compression coding module.

3. Digital Television speech recognition man-machine interactive system according to claim 2 is characterized in that the double data processing of described signal sampling module use single-chip microcomputer do control.

4. Digital Television speech recognition man-machine interactive system according to claim 1 is characterized in that described speech analysis module also comprises noise remove module, characteristic extracting module, decoder module.

5. Digital Television speech recognition man-machine interactive system according to claim 1 is characterized in that described speech analysis module is provided with to deposit Chinese characters spoken language database of information module.

6. Digital Television speech recognition man-machine interactive system according to claim 1, it is characterized in that described semantic computing module is provided with the database module of depositing fill order and information extraction strategy, described database module is provided with the artificial intelligence self-study mechanism, and is provided with the manual control interface.

7. Digital Television speech recognition man-machine interactive system according to claim 1 or 5 is characterized in that described semantic computing module has merged Chinese fuzzy information retrieval, Chinese characters spoken language understanding technology.

8. require described Digital Television speech recognition man-machine interactive system according to right 1, it is characterized in that described intelligent control module can be according to the direct control figure TV of order.

9. the method for a Digital Television speech recognition man-machine interaction is characterized in that may further comprise the steps:

2) gather voice messaging, under the Digital Television reverberation acoustic enviroment of real family life,, then gather user's voice information by the target voice acquisition module if the user wants to be undertaken alternately by voice and Digital Television;

3) conversion of voice messaging, the voice messaging that the target voice acquisition module is gathered comprises noise, by the processing of speech analysis module, user's voice information is extracted, and explanation becomes Word message;

4) semantic understanding according to the Word message that is drawn, by semantic computing module, draws the order that will be performed;