CN103024530A

CN103024530A - Intelligent television voice response system and method

Info

Publication number: CN103024530A
Application number: CN2012105532157A
Authority: CN
Inventors: 常连城
Original assignee: Tianjin Samsung Electronics Co Ltd; Samsung Electronics Co Ltd
Current assignee: Tianjin Samsung Electronics Co Ltd; Samsung Electronics Co Ltd
Priority date: 2012-12-18
Filing date: 2012-12-18
Publication date: 2013-04-03

Abstract

The invention discloses an intelligent television voice response system which comprises a user identity characteristic identification module, a voice command identification module and an intelligent response module, and further discloses an intelligent television voice response method, wherein the method comprises the following steps: firstly, obtaining user identity characteristics; secondly, carrying out voice identification, and obtaining a command according with the user identity characteristics according to the user identity characteristics; and thirdly, carrying out matching of a knowledge base according to the command so as to obtain the response information of the user, and feeding back the response information. According to the intelligent television voice response system and method, the identity characteristic of the user can be identified, and the thought of the user can be understood according to context simultaneously so as to accurately give out feedback response for the voice input of the user; and therefore, the accuracy of intelligent television voice response is increased, minute and complicated steps of repeated selection are eliminated for the user, the time is saved, the use satisfaction degree of intelligent television is improved, and the operation of intelligent television is more humanized.

Description

Intelligent television voice response system and method

Technical field

The present invention relates to the intelligent television field, especially a kind of intelligent television voice answer method and system.

Background technology

Along with the intellectuality of TV and popularizing of intelligent television, broadband cabled television network integrates the technology such as the Internet, multimedia, communication, and the multiple interactive services that comprises Digital Television is provided to the domestic consumer.Intelligent television can adapt to the trend of current network develop rapidly well, fully effectively utilizes Internet resources.The intellectuality of television set is accompanied by is that the difficulty of TV button operation and loaded down with trivial details degree are also with increasing.Therefore speech recognition technology is incorporated in the operating system of intelligent television, realize that the voice operating of control command will make the user that both hands are freed, only depend on the easily complete operation of just speaking, this will be the direction of intelligent television field development.

Speech recognition technology is one of large important development in science and technology technology of areas of information technology ten so far from 2000.It is a cross discipline, just progressively becomes the key technology of man-machine interface in the information technology.Speech recognition technology develops into today, and particularly medium and small vocabulary unspecified person speech recognition system accuracy of identification is greater than 98%.These technology can satisfy the requirement of common application.Because the development of large scale integrated circuit technology, these complicated speech recognition systems also can be made special chip fully, a large amount of production.In west economy developed country, a large amount of speech recognition products have come into the market and service field.Some users hand over mechanical, electrical phone, mobile phone to comprise the speech recognition dial feature, also have the products such as voice memo basis, Intelligent toy with speech sounds also to comprise speech recognition and speech-sound synthesizing function.People can inquire about relevant air ticket, tourism, bank information with the speech recognition spoken dialogue system by telephone network, and obtain good result.Investigation statistics shows that nearly the people more than 85% is satisfied with to the performance of the information inquiry service system of speech recognition.Can predict that the application of speech recognition system will be more extensive in nearly five to ten years.Various speech recognition system products will appear on the market.The patent No. is that 201010255337.9 patent of invention discloses and a kind ofly looks audio frequency playing method and system based on voice command for another example.Relate to the media play technical field, only need press a start key, the user all can finish by voice command the operation of described terminal equipment.The user presses the start key of the one-key type control device on the fixed part that is arranged on vehicle, and described terminal equipment is set up voice conversation with the VSP server and is connected, and system enters the automatic-answering back device state.Described VSP server adopts the unspecified person speech recognition technology that user's voice command is resolved, and analysis result is sent to described terminal equipment, look audio playing module by described terminal equipment according to described analysis result startup, and obtain video-voice frequency flow according to looking the audio frequency broadcast address.This patented technology scheme is mainly used in amusement equipment, especially is applied to vehicular amusement apparatus.

Speech recognition technology is applied to the intelligent television field, in existing voice response system and the method, a kind of is to adopt server mode to carry out speech recognition, sound identification module is at server end, that is to say that every voice control command that the user sends all needs to send to server end through set-top box and carries out speech recognition, by server the operational order that identifies is transferred to set-top box again and carry out, like this will certainly the Seize ACK message transmission bandwidth, increase the operating time, reduce the sensitivity of voice operating; Another kind is to adopt the Embedded Speech Recognition System pattern, language identification software and model are write in the memory of intelligent television, identifying is finished in terminal, the operational order of intelligent television is more fixing in this pattern, it is not very large needing the vocabulary of identification, can not take the too large space of memory, so embedded pattern is compared required time of identification with server mode shorter, can make faster complete operation of user.

Along with the development of society, amount of information is also more and more.This must cause that people live in numerous information every day, can not to what all understand more clearly, so information inquiry is more and more necessary.Do not fix a point to watch weather by TV that such as not wishing wish can both inquire about at any time, if use PC or mobile phone to search for, it is cumbersome that some people can think, also needs input at home.The process of therefore wishing inquiry is more simple better, and the intelligent response system is best selection so.Only need to say the information that you go for, the intelligent response system will reply to the answer that you want.Yet the meaning that language can be explained has open characteristics, the meaning that same order is expressed under different context is also different, and each user's age, sex all there are differences, the meaning that these factors are wanted to express to same order also has considerable influence, for example when the user uses the order of " selection film ", system movie listings occurs and further selects for the user, and system exactly predictive user may want the films types viewed and admired, so that the film of user's the type can not preferentially appear in movie listings, the user has to continue to search in numerous and diverse tabulation.

Summary of the invention

The objective of the invention is to overcome defective of the prior art, provide a kind of can be for user voice command, intelligent television voice answer method and system.

For addressing the above problem, a kind of intelligent television voice response system of the present invention comprises:

User identity feature identification module links to each other with voice command recognition module, is used for obtaining the user identity feature, and the identity characteristic information of obtaining is sent to voice command recognition module;

Voice command recognition module, link to each other with the intelligent response module, be used for receiving user speech, and voice are identified, and carry out semanteme according to the subscriber identity information that user identity feature identification module sends and identify, draw the order that meets the user identity feature, and this order is sent to the intelligent response module;

The intelligent response module be used for to receive the order that meets the user identity feature that voice command recognition module sends, and carries out the coupling of knowledge base according to this order, draws the response message to the user, and response message is fed back.

Described user identity feature identification module comprises:

Image acquisition units links to each other with image analyzing unit, is used for gathering user images, and the user images that collects is sent to image analyzing unit;

Image analyzing unit links to each other with voice command recognition module, is used for receiving the user images that image acquisition units sends, and user images is carried out discriminance analysis, draws the user identity characteristic information and is sent to voice command recognition module.

Described image analyzing unit comprises:

Based on the Age estimation unit of recognition of face, be used for user's facial image is carried out discriminance analysis, draw user's age information.

Described image analyzing unit comprises:

Based on the sex judging unit of recognition of face, be used for user's facial image is carried out discriminance analysis, draw user's sex information.

Described voice command recognition module comprises:

The voice collecting unit links to each other with the speech analysis unit, is used for gathering user speech information, and the voice messaging that collects is sent to the speech analysis unit;

The speech analysis unit links to each other with the semantic analysis unit, be used for to receive the voice messaging that the voice collecting unit sends, and voice messaging analysis is drawn should Word message corresponding to voice, and this literal information is sent to the semantic analysis unit;

The semantic analysis unit, link to each other with the intelligent response module with user identity feature identification module respectively, be used for to receive the user identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfer the order that is complementary with the user identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module.

Described intelligent television voice response system also comprises:

The TTS module links to each other with the intelligent response module, is used for that the intelligent response module is drawn response message to the user and transforms into audio format by text formatting and export.

A kind of intelligent television voice answer method may further comprise the steps:

1) the user identity feature is obtained in the identification of user identity feature, and the identity characteristic information of obtaining is sent to voice command recognition module;

2) voice command recognition module, receive user speech, voice are identified, and carry out semanteme identification according to the subscriber identity information that user identity feature identification module sends, draw the order that meets the user identity feature, and this order is sent to the intelligent response module;

3) the intelligent response module receives the order that meets the user identity feature that voice command recognition module sends, and carries out the coupling of knowledge base according to this order, draws the response message to the user, and response message is fed back.

Described step 1 comprises:

11) image acquisition units gathers user images, and the user images that collects is sent to image analyzing unit;

12) image analyzing unit receives the user images that image acquisition units sends, and user images is carried out discriminance analysis, draws the user identity characteristic information and is sent to voice command recognition module.

Described step 12) may further comprise the steps:

121) picture preliminary treatment makes people's face position, size and image normalization in the user images;

122) picture region is divided and training, people's face in the user images is carried out the zone divide, and is divided into a plurality of identified regions, different gray value and the binary images of each identified region output;

123) regional face feature value template matches, each described identified region is isolated different template matches unit, calculate the characteristic value of each template matches unit and matching template, with described characteristic value as the input neural unit in the input layer of neural network algorithm, through the Processing with Neural Network algorithm, draw the picture result of optimum Match degree;

124) described picture result is carried out face's weighted calculation;

125) picture recognition knowledge of result storehouse feedback.

Described step 2 comprises:

21) the voice collecting unit gathers user speech information, and the voice messaging that collects is sent to the speech analysis unit;

22) the speech analysis unit receives the voice messaging that the voice collecting unit sends, and voice messaging analysis is drawn should Word message corresponding to voice, and this literal information is sent to the semantic analysis unit;

23) the semantic analysis unit receives the user identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfers the order that is complementary with the user identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module.

Described user identity feature comprises user's age and/or sex.

Adopt intelligent television voice response system of the present invention and method, user identity feature identification module can identify user's essential information, such as age, sex etc.These information provide decision-making foundation for voice command recognition module, such as user language order " selection film ", result according to the identification of user identity feature is different, it is also different to feed back the film that: if the user is children, the result of system feedback will be the Disney film, if the user is the young people, then the result of feedback is romance movie, if the user is a middle-aged person, then the result of feedback is feature film, and this is just so that user's option program has obtained greatly simplification.The present invention can identify user's identity characteristic, can based on context understand simultaneously user's the meaning, accurately user's phonetic entry being provided feedback replys, increased the accuracy of intelligent television voice answer-back, for the user has saved the loaded down with trivial details step of repeatedly selecting, saved the time, improved the user satisfaction of intelligent television, made the more hommization of operation of intelligent television.

Description of drawings

Fig. 1 is intelligent television voice response system structured flowchart of the present invention.

Fig. 2 is recognition of face FB(flow block) in the intelligent television voice response system of the present invention.

Fig. 3 main identified region of face of behaving is divided schematic diagram.

Embodiment

In order to make those skilled in the art person understand better technical solution of the present invention, the present invention is described in further detail below in conjunction with drawings and embodiments.

As shown in Figure 1, a kind of intelligent television voice response system of the present invention comprises user identity feature identification module, voice command recognition module and intelligent response module.

Wherein, user identity feature identification module links to each other with voice command recognition module, is used for obtaining the user identity feature, and the identity characteristic information of obtaining is sent to voice command recognition module.

This user identity feature identification module comprises image acquisition units and image analyzing unit.

Described image acquisition units links to each other with image analyzing unit, is used for gathering user images, and the user images that collects is sent to image analyzing unit; Image acquisition units is included in three cameras of TV top edge, and three cameras lay respectively at the upper left corner of TV, and the positive side bit image of catching user's head portrait can be pounced on by three cameras in three positions in the upper right corner and middle, top.With respect to a camera, three cameras can capture more fully image.

This image analyzing unit comprises Age estimation unit, sex judging unit or the expression judging unit based on recognition of face, facial image to the user carries out discriminance analysis, realize from user's facial image, drawing user's essential information, information such as age of user, sex or expression by image processing algorithm and fuzzy matching algorithm.

As shown in Figure 2, image analyzing unit carries out discriminance analysis to user's facial image and mainly passes through following several stages:

1) picture pretreatment stage;

In actual applications, restriction and the interference of image acquisition units because being subject to external environment, the image that collects may be with a lot of Noise and Interference signals, and this people's face pattern recognition problem of having relatively high expectations for picture quality can have a huge impact, and causes the decline of classification capacity.Therefore before digital picture was extracted feature, the image preliminary treatment was very important, and makes people's face position, size and image normalization in the facial image, and overcome block, the impact of the factors such as attitude, illumination, jewelry.

2) picture region is divided and the training stage;

Fig. 3 main identified region of face of behaving is divided schematic diagram.As shown in Figure 3, before carrying out the face template coupling, need to carry out the zone to face and divide the template matches of subregional memory picture.Facial zone is divided and is comprised hair hair style district, forehead district, eyebrow eyes district, nose region, cheekbone cheek district, mouth district and chin district.The gray-scale map that each zone output is different and the image of binaryzation.

3) the regional face feature value template matches stage;

The matching algorithm of face feature is the combination of neural net method and template matches.Different template matches unit is isolated in again refinement in each identified region.Simultaneously, also should comprise examination project general in the whole face recognition, different examination projects are reallocated to different weights.The examination project refinement of present whole face recognition as shown in Table 1.

Divide with the template matches unit along with the development deep and recognition technology of studying has adjustment for the zone of people's face.

For each template matches project, adopt the method for the characteristic value of calculating and matching stencil.Suppose that training set is arranged I}, wherein I be m * n size facial image (i=1,2 ... N), at first every row of each image I being linked to each other consists of the column vector that a size is d=m * n dimension.Obtain like this X} (i=1,2 ... N), X represents people's face vector that i width of cloth facial image forms, and then the computational methods of the characteristic value of matching stencil are:

S = Σ_{i = 1}^{N} (X_{i} - \overset{&OverBar;}{X}) {(X_{i} - \overset{&OverBar;}{X})}^{T}

With the characteristic value result of the matching stencil in image processing target zone as each the input neural unit in the input layer of neural network algorithm, then begin the Processing with Neural Network algorithm, the result that the repeatedly competition between the process neuron and cluster draw the optimum Match degree.

Table one: face recognition template matches cell distribution table

4) based on face's weighted calculation stage of picture background and picture quality:

After drawing each regional picture result, be weighted calculating.If each regional matching result is x, weights are p, and then result of calculation c is the product summation of weights and matching result.

Computing formula is as follows:

c (i) = Σ_{j = 0}^{i} p (x_{i})

Divide with the template matches unit along with the development deep and recognition technology of studying has adjustment for the zone of people's face.The weights of different examination projects are along with the expansion of sample set and the expansion of data training sample have correction.

5) picture recognition knowledge of result storehouse feedback stage.

To face's characteristic value ATL, the training sample that has enriched like this stencil value has improved the precision of recognition of face with the result feedback of picture recognition.

Certainly; based on recognition of face estimation Age and sex; can also use other alternative methods; all within protection range of the present invention; for example the patent No. is 200910032756.3 the disclosed a kind of age assessment method based on face recognition technology of patent of invention, and the patent No. is 200810226414.0 the disclosed a kind of face gender identification method based on fuzzy support vector machine of patent of invention.

Voice command recognition module in the intelligent television voice response system of the present invention, link to each other with the intelligent response module, be used for receiving user speech, and voice are identified, and carry out semanteme according to the subscriber identity information that user identity feature identification module sends and identify, draw the order that meets the user identity feature, and this order is sent to the intelligent response module.

Described voice command recognition module comprises voice collecting unit, speech analysis unit and semantic analysis unit.

Wherein, the voice collecting unit can be the external or built-in microphone of television set, links to each other with the speech analysis unit, is used for gathering user speech information, and the voice messaging that collects is sent to the speech analysis unit; The speech analysis unit links to each other with the semantic analysis unit, be used for to receive the voice messaging that the voice collecting unit sends, and voice messaging analysis is drawn should Word message corresponding to voice, and this literal information is sent to the semantic analysis unit; The semantic analysis unit, link to each other with the intelligent response module with user identity feature identification module respectively, be used for to receive the user identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfer the order that is complementary with the user identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module.

Described intelligent response module be used for to receive the order that meets the user identity feature that voice command recognition module sends, and carries out the coupling of knowledge base according to this order, draws the response message to the user, and response message is fed back.

Described intelligent television voice response system comprises that also TTS(is from Text To Speech Text To Speech) module, this TTS module links to each other with the intelligent response module, the response message that is used for the intelligent response module is drawn to the user transforms into audio format by text formatting, exports by the loud speaker of television set.

A kind of intelligent television voice answer method of the present invention is achieved by above-mentioned intelligent television voice response system, may further comprise the steps:

1) image acquisition units gathers user images, and the user images that collects is sent to image analyzing unit;

2) image analyzing unit receives the user images that image acquisition units sends, and user images is carried out discriminance analysis, draws the user identity characteristic information, comprises age of user, sex or expression information, and is sent to voice command recognition module;

3) the voice collecting unit gathers user speech information, and the voice messaging that collects is sent to the speech analysis unit;

4) the speech analysis unit receives the voice messaging that the voice collecting unit sends, and voice messaging analysis is drawn should Word message corresponding to voice, and this literal information is sent to the semantic analysis unit;

5) the semantic analysis unit receives the user identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfers the order that is complementary with the user identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module;

6) the intelligent response module receives the order that meets the user identity feature that voice command recognition module sends, and carries out the coupling of knowledge base according to this order, draws the response message to the user, and response message is sent to the TTS module;

7) response message of TTS module reception intelligent response module transmission, and the response message that the intelligent response module is drawn to the user transforms into audio format by text formatting, exports by the loud speaker of television set.

Wherein said semantic analysis unit can be server mode or Embedded Speech Recognition System pattern.When the semantic analysis unit is that server mode is, owing to information need to be sent to high in the clouds, be the data of encrypting at the transmission of data that is sent to server therefore, with protection user's privacy.

The above only is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1. an intelligent television voice response system is characterized in that, comprising:

Voice command recognition module, link to each other with the intelligent response module, be used for receiving user speech, and voice are identified, and carry out semanteme according to the identity characteristic information that user identity feature identification module sends and identify, draw the order that meets the user identity feature, and this order is sent to the intelligent response module;

2. intelligent television voice response system as claimed in claim 1 is characterized in that described user identity feature identification module comprises:

3. intelligent television voice response system as claimed in claim 2 is characterized in that described image analyzing unit comprises:

4. intelligent television voice response system as claimed in claim 2 is characterized in that described image analyzing unit comprises:

5. intelligent television voice response system as claimed in claim 1 is characterized in that described voice command recognition module comprises:

The semantic analysis unit, link to each other with the intelligent response module with user identity feature identification module respectively, be used for to receive the identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfer the order that is complementary with identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module.

6. such as claim 1 to 5 intelligent television voice response system as described in each, it is characterized in that described intelligent television voice response system also comprises:

7. intelligent television voice answer method may further comprise the steps:

2) voice command recognition module, receive user speech, voice are identified, and carry out semanteme identification according to the identity characteristic information that user identity feature identification module sends, draw the order that meets the user identity feature, and this order is sent to the intelligent response module;

8. intelligent television voice answer method as claimed in claim 7 is characterized in that described step 1 comprises:

9. intelligent television voice answer method as claimed in claim 8 is characterized in that described step 12) may further comprise the steps:

124) described picture result is carried out face's weighted calculation, draw the picture recognition result;

125) picture recognition knowledge of result storehouse feedback.

10. intelligent television voice answer method as claimed in claim 7 is characterized in that described step 2 comprises:

23) the semantic analysis unit receives the identity characteristic information that Word message that the speech analysis unit sends and user identity feature identification module send, and transfers the order that is complementary with identity characteristic information in command library corresponding to described Word message and be sent to the intelligent response module.