CN105654950A

CN105654950A - Self-adaptive voice feedback method and device

Info

Publication number: CN105654950A
Application number: CN201610060206.2A
Authority: CN
Inventors: 李丰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-01-28
Filing date: 2016-01-28
Publication date: 2016-06-08
Anticipated expiration: 2036-01-28
Also published as: CN105654950B

Abstract

The invention discloses a self-adaptive voice feedback method and device. One specific implementation manner of the method comprises the following steps: acquiring input information; recognizing scene information of the input information; analyzing the input information to obtain at least one of user emotion information, communication mode information and subject content information, wherein the communication mode information comprises language category information; according to at least one of the user emotion information, the language category information and the subject content information, generating a user attribute label; matching the user attribute label with a pre-trained adaptive label with a voice feedback form and obtaining the matching degree; and carrying out voice feedback by using the voice feedback form which has the highest matching degree with user attributes. With the adoption of the implementation manner, self-adaptive voice feedback is realized, and the pertinence and effectiveness of the voice feedback are improved.

Description

Adaptive voice feedback method and device

Technical field

The application relates to field of computer technology, is specifically related to Internet technical field, particularly relates to adaptive voice feedback method and device.

Background technology

Along with the development of computer technology especially Internet technology, the function of the client application of terminal unit is more and more diversified. Voice assistant is that a class can be realized by interactive voice or the application of our inquiry on mobile phone of Substitute For Partial and operation, certainly, it is achieved have also the including of voice assistant function the application of voice assistant function or have the website of voice assistant function. By this type of application or website, it is possible to be greatly improved the convenience of operation terminal unit. But, how correct existing above-mentioned application or website and people mutual also only reside within the phonetic entry understanding people, and answers a question as early as possible, performs some operation (such as inquiry, displaying, application operating etc.), function is comparatively single, and specific aim is not strong.

Summary of the invention

The purpose of the application is in that to propose the adaptive voice feedback method of a kind of improvement and device, solves the technical problem that background section above is mentioned.

First aspect, this application provides a kind of adaptive voice feedback method, and described method includes: obtain input information; Identify the scene information of described input information; Resolving described input information and obtain at least one in user emotion information, exchange way information, subject content information, wherein, described exchange way information includes language classification information; User property label is generated according at least one in described user emotion information, described language classification information, described subject content information and described scene information; The applicable label of described user property label with the voice feedback pattern of training in advance is mated, and obtains matching degree; The voice feedback pattern the highest with described user property matching degree is used to carry out voice feedback.

In certain embodiments, described exchange way information also includes: word speed information, group sentence mode information or close language classification information; And, the voice feedback pattern that described use and described user property matching degree are the highest carries out voice feedback and includes: adjust described voice feedback pattern according to described word speed information, described group of sentence mode information or described close language classification information; The voice feedback pattern being adjusted is used to feed back.

In certain embodiments, described according to described word speed information, described group of sentence mode information or described close language classification information adjust described voice feedback pattern include: the word speed of described voice feedback pattern is adjusted the word speed corresponding with described word speed information;The group sentence mode of described voice feedback pattern is adjusted consistent with described group of sentence mode information; Obtaining the close language set consistent with described close language classification information pre-build, wherein, described close language set includes positive common-use words and the close language corresponding with described positive common-use words; Words and phrases in described voice feedback pattern are compared with the positive common-use words in described close language set; If described words and phrases are identical with described positive common-use words, then replace described words and phrases with the close language corresponding with described positive common-use words.

In certain embodiments, described input information includes: voice messaging and/or video information; And, resolve described input information and obtain user emotion information and include: resolve described voice messaging and obtain at least one in word speed information, prosody information or spectrum information; Described word speed information, prosody information and word speed threshold value, intonation threshold value are compared, obtains voice mood result; Resolve described video information and obtain video emotion result; User emotion information is drawn based on described voice mood result and described video emotion result.

In certain embodiments, described method also includes: according to choosing the content recommendation being associated with described scene information, described user emotion information, described subject content information from the content recommendation set preestablished; Generate content recommendation and perform request; Send described content recommendation to client and perform request, choose whether to permit the described content recommendation of described execution for client; If described content recommendation is performed request and sends license by described client, then perform described content recommendation.

In certain embodiments, described scene information include following at least one: temporal information, location information or terminal applies classification information.

Second aspect, this application provides a kind of adaptive voice feedback device, and described device includes: acquisition module, and configuration is used for obtaining input information; Identification module, configuration is for identifying the scene information of described input information; Parsing module, configuration obtains at least one in user emotion information, exchange way information, subject content information for resolving described input information, and wherein, described exchange way information includes language classification information; Generation module, configuration is for generating user property label according at least one in described user emotion information, described language classification information, described subject content information and described scene information; Matching module, configuration is for mating the applicable label of described user property label with the voice feedback pattern of training in advance, and obtains matching degree; Feedback module, configuration is for using the voice feedback pattern the highest with described user property matching degree to carry out voice feedback.

In certain embodiments, described exchange way information also includes: word speed information, group sentence mode information or close language classification information; And, described feedback module includes adjusting submodule, and the configuration of described adjustment submodule is used for: adjust described voice feedback pattern according to described word speed information, described group of sentence mode information or described close language classification information; The voice feedback pattern being adjusted is used to feed back.

In certain embodiments, described according to described word speed information, described group of sentence mode information or described close language classification information adjust described voice feedback pattern include: the word speed of described voice feedback pattern is adjusted the word speed corresponding with described word speed information; The group sentence mode of described voice feedback pattern is adjusted consistent with described group of sentence mode information; Obtaining the close language set consistent with described close language classification information pre-build, wherein, described close language set includes positive common-use words and the close language corresponding with described positive common-use words; Words and phrases in described voice feedback pattern are compared with the positive common-use words in described close language set; If described words and phrases are identical with described positive common-use words, then replace described words and phrases with the close language corresponding with described positive common-use words.

In certain embodiments, described input information includes: voice messaging and/or video information;And, resolve described input information and obtain user emotion information and include: resolve described voice messaging and obtain at least one in word speed information, prosody information or spectrum information; Described word speed information, prosody information and word speed threshold value, intonation threshold value are compared, obtains voice mood result; Resolve described video information and obtain video emotion result; User emotion information is drawn based on described voice mood result and described video emotion result.

In certain embodiments, described device also includes recommending module, and the configuration of described recommending module is used for: according to choosing the content recommendation being associated with described scene information, described user emotion information, described subject content information from the content recommendation set preestablished; Generate content recommendation and perform request; Send described content recommendation to client and perform request, choose whether to permit the described content recommendation of described execution for client; If described content recommendation is performed request and sends license by described client, then perform described content recommendation.

The adaptive voice feedback method of the application offer and device, by obtaining input information, identify the scene information of input information, resolve input information and obtain user emotion information, exchange way information, at least one in subject content information, further according to user emotion information, described language classification information, at least one in subject content information and scene information generate user property label, then the applicable label of user property label with the voice feedback pattern of training in advance is mated, and obtain matching degree, the voice feedback pattern the highest with described user property matching degree is finally used to carry out voice feedback, achieve self adaptation feedback voice, improve specific aim and the effectiveness of voice feedback.

Accompanying drawing explanation

By reading the detailed description that non-limiting example is made made with reference to the following drawings, other features, purpose and advantage will become more apparent upon:

Fig. 1 is the exemplary system architecture of the embodiment that can apply the adaptive voice feedback method of the application or adaptive voice feedback device;

Fig. 2 is the flow chart of an embodiment of the adaptive voice feedback method according to the application;

Fig. 3 is the flow chart of another embodiment of the adaptive voice feedback method according to the application;

Fig. 4 is the data flow schematic diagram of an application scenarios of embodiment illustrated in fig. 3;

Fig. 5 is the structural representation of an embodiment of the adaptive voice feedback device according to the application;

Fig. 6 is adapted for the structural representation of the computer system for the terminal unit or server realizing the embodiment of the present application.

Detailed description of the invention

Below in conjunction with drawings and Examples, the application is described in further detail. It is understood that specific embodiment described herein is used only for explaining related invention, but not the restriction to this invention. It also should be noted that, for the ease of describing, accompanying drawing illustrate only the part relevant to about invention.

It should be noted that when not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined. Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.

Fig. 1 illustrates the exemplary system architecture 100 of the embodiment of adaptive voice feedback method or the adaptive voice feedback device that can apply the application.

As it is shown in figure 1, system architecture 100 can include terminal unit 101,102,103, network 104 and server 105. Network 104 in order to provide the medium of communication link between terminal unit 101,102,103 and server 105. Network 104 can include various connection type, for instance wired, wireless communication link or fiber optic cables etc.

User can use terminal unit 101,102,103 to be fed back by network 104 and server 105, to receive or to send message (such as voice messaging) etc. Terminal unit 101,102,103 can be provided with the application of various telecommunication customer end, for instance the application of voice assistant class, the application of document management class, searching class application, mailbox client, social platform software etc.

Terminal unit 101,102,103 can be various electronic equipment, include but not limited to smart mobile phone, panel computer, E-book reader, MP3 player (MovingPictureExpertsGroupAudioLayerIII, dynamic image expert's compression standard audio frequency aspect 3), MP4 (MovingPictureExpertsGroupAudioLayerIV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desk computer etc.

Server 105 can be to provide the server of various service, for instance provides, to the voice assistant class application on terminal unit 101,102,103 etc., the backstage Speech processing services device supported. The voice received from terminal unit can be stored by above-mentioned backstage Speech processing services device, analysis etc. processes, and result feeds back to above-mentioned terminal unit and performs corresponding operation.

As shown in Figure 1, by installing the application of corresponding voice assistant class on terminal unit 101,102,103, or by having the communication class application of voice assistant function on terminal unit 101,102,103, or by browsing the communication class website with voice assistant function on terminal unit 101,102,103, these terminal units can send object dispensing request by the form of speech message to server 105, can be performed above-mentioned adaptive voice feedback method by server 105 afterwards. Correspondingly, adaptive voice feedback device can be arranged in server 105.

It should be understood that the number of terminal unit in Fig. 1, network and server is merely schematic. According to realizing needs, it is possible to have any number of terminal unit, network and server.

With continued reference to Fig. 2, it is shown that the flow process 200 according to an embodiment of the adaptive voice feedback method of the application. Above-mentioned adaptive voice feedback method, comprises the following steps:

Step 201, obtains input information.

In the present embodiment, adaptive voice feedback method run on electronic equipment thereon (such as the server shown in Fig. 1) can from the input information locally or remotely obtaining user. When input information is saved in the memorizer of electronic equipment, above-mentioned electronic equipment directly can obtain above-mentioned input information from local memorizer. Or, when above-mentioned electronic equipment is that the voice assistant class on terminal unit is applied the background server being supported, it can pass through wired connection mode or radio connection and obtain input information from above-mentioned terminal unit. Above-mentioned radio connection includes but not limited to that 3G/4G connects, WiFi connects, bluetooth connects, WiMAX connects, Zigbee connects, UWB (ultrawideband) connects and other currently known or exploitation in the future radio connection.

In the present embodiment, input information includes but not limited to voice messaging, video information, image information or Word message.

Step 202, identifies the scene information of input information.

In the present embodiment, the scene information of input information refers to that user sends scene residing during this input information, and above-mentioned scene information can include but not limited to temporal information, location information or terminal applies classification information. Here, temporal information refers to that user sends the time of above-mentioned input information. Location information refers to that user sends the place of above-mentioned input information, and above-mentioned place can be concrete address, such as x street, x district, x city of x province; Address above mentioned can also be residing place, such as family, company, hospital. Here, terminal applies classification information can be that user indicates the information operating which kind of terminal applies, for instance if user indicates class application of spreading out the map by voice, then terminal applies classification information is exactly the application of this map class; Alternatively, terminal applies classification information can also is that the above-mentioned input information that user inputs and sends is used the information of which kind of terminal applies, such as user uses one or more in communication class application transmission Word message, video information, image information, then terminal applies classification information is exactly the application of this communication class.

In some optional implementations of the present embodiment, it is possible to identify location information by location-based service (LBS, LocationBasedServices, also known as positioning service).

In some optional implementations of the present embodiment, it is possible to the statistical data according to history, set up several scene information models, for instance " evening-family-take-away ", " morning-meeting-Du is secret in company ", " Sunday-open air-map ", " acquiescence ". By the scene information of the input information of acquisition and scene information Model Matching, judge the scene information model the highest with the scene information matching degree of current input information, if this scene information model meets predetermined matching degree threshold value, then using this scene information model as scene information. If the matching degree of the scene information of scene information model and identification all can not reach predetermined matching degree threshold value, then the scene information using the scene information of identification as input information; Or, the scene information using " acquiescence " model of scene information model as input information.

Step 203, resolves input information and obtains at least one in user emotion information, exchange way information, subject content information.

In the present embodiment, the emotional information of user refers to emotional state information when user inputs above-mentioned input information, for instance joyful, sad, tire out, have much energy.

In some optional implementations of the present embodiment, resolve above-mentioned voice messaging and obtain user emotion information and can be accomplished by: resolve above-mentioned voice messaging and obtain at least one in pace value, intonation value or spectrum information; Above-mentioned pace value, intonation value are compared with word speed threshold value, intonation threshold value, obtains voice mood result; Resolve above-mentioned video information and obtain video emotion result; User emotion information is drawn based on above-mentioned voice mood result and above-mentioned video emotion result.

In the present embodiment, above-mentioned exchange way information includes but not limited to language classification information, and above-mentioned language classification information can be languages classification information, for instance Chinese, English, Japanese; Above-mentioned language classification information can also be the dialect kind of information that language of the same race is different, for instance Guangdong language, Ningbo words, mandarin, more such as Americanese, British English.

In the present embodiment, above-mentioned subject content information refers to the semantic information that above-mentioned input information is transmitted, such as, input information is " I to order take-away ", in some optional implementations, process and semantics recognition process through cutting word, show that subject content information can be " ordering " " take-away ".

In some optional implementations of the present embodiment, the main subject content information adopting speech recognition technology identification voice messaging. The voice messaging of such as user is identified as shopping, inquiry, XX square, excellent clothing storehouse, then subject content information table is shown as a series of subject key words { shopping, inquiry, XX square, excellent clothing storehouse }.

In some optional implementations of the present embodiment, operate accordingly according to subject content information and executing, for instance, input information is " I to order take-away ", so performing to open have the operation ordering the terminal applies taking out function, right rear line is recommended to take out ordering information.

Step 204, generates user property label according at least one in user emotion information, language classification information, subject content information and scene information.

In the present embodiment; user property label is for characterizing emotion, the language used to user, being presently in the relevant situations such as scene; such as, when at 9 in female user evening is in, surely taken out by dialect; emotion is dejected; so user property label can be { evening, family, female voice; dejected, Shanghai native language }.

Step 205, mates the applicable label of above-mentioned user property label with the voice feedback pattern of training in advance, and obtains matching degree.

In the present embodiment, several voice feedback patterns of training in advance voice, above-mentioned voice feedback pattern includes feature tag and applicable label, wherein, features described above label is for characterizing the feature of this voice feedback pattern, for instance, the feature tag of voice feedback pattern is { female voice, Lin Zhiling, mandarin, after 80s; Above-mentioned applicable label is used for characterizing which situation this voice feedback pattern is suitable for, for instance feature tag is the applicable label of voice feedback pattern is { scene 1, scene 2, emotion 1, emotion 2, dialect 1, close language 1, close language 2, theme 1, theme 2}.

Step 206, uses the voice feedback pattern the highest with above-mentioned user property matching degree to carry out voice feedback.

In the present embodiment, based on above-mentioned steps 205, by above-mentioned user property label and above-mentioned applicable tag match, obtain the highest voice feedback pattern of matching degree and feed back, the such as user mandarin input when driving in a good humor, system will obtain { female voice, Lin Zhiling, mandarin, after 80s } this voice feedback pattern carries out voice feedback.

The method that above-described embodiment of the application provides is by obtaining input information, identify the scene information of input information, resolve input information and obtain user emotion information, exchange way information, at least one in subject content information, further according to user emotion information, described language classification information, at least one in subject content information and scene information generate user property label, then the applicable label of user property label with the voice feedback pattern of training in advance is mated, and obtain matching degree, the voice feedback pattern the highest with described user property matching degree is finally used to carry out voice feedback, achieve self adaptation feedback voice, improve specific aim and the effectiveness of voice feedback.

With further reference to Fig. 3, it illustrates the flow process 300 of another embodiment of adaptive voice feedback method.The flow process 300 of this adaptive voice feedback method, comprises the following steps:

Step 301, obtains input information.

In the present embodiment, adaptive voice feedback method run on electronic equipment thereon (such as the server shown in Fig. 1) can from the input information locally or remotely obtaining user.

In the present embodiment, input information can include but not limited to voice messaging, video information, image information or Word message.

Step 302, identifies the scene information of input information.

In the present embodiment, the scene information of input information refers to that user sends scene residing during this input information, and above-mentioned scene information can include but not limited to temporal information, location information or terminal applies classification information.

Step 303, resolves input information and obtains user emotion information, exchange way information, subject content information.

In the present embodiment, resolve above-mentioned voice messaging to obtain user emotion information and can be accomplished by: resolve above-mentioned voice messaging and obtain pace value, intonation value; Above-mentioned pace value, intonation value are compared with word speed threshold value, intonation threshold value, obtains voice mood result; Resolve above-mentioned video information and obtain video emotion result; User emotion information is drawn based on above-mentioned voice mood result and above-mentioned video emotion result. It is understood that above-mentioned word speed threshold value, intonation threshold value are different according to the difference of user.

In some optional implementations of the present embodiment, if above-mentioned pace value is lower than word speed threshold value, and intonation value is lower than intonation threshold value, then judge that voice mood result is as negative. If above-mentioned pace value is higher than word speed threshold value, and intonation value is higher than intonation threshold value, then judge that voice mood result is as front. If above-mentioned pace value, intonation value only one of which are higher than its threshold value, then judge that voice mood result is as neutrality.

In some optional implementations of the present embodiment, resolve above-mentioned video information and obtain video emotion result and can pass through video Dynamic Recognition or sampled images and know and realize otherwise.

In some optional implementations of the present embodiment, when voice mood result and video emotion result be all front or negative time, emotion object information is just front or negative; If voice mood result and video emotion result are inconsistent, it is determined that emotion object information is neutral.

In some optional implementations of the present embodiment, it is possible to set up different weights for above-mentioned voice mood result and above-mentioned video emotion result, for instance when video input is second-rate when phonetic entry quality is better, set up higher weights for voice mood result.It is 1 ,-1,0 by front, the numerical value respectively that arranges negative, neutral, by voice mood result with the weight of video emotion result and front, corresponding numerical operation negative, neutral, draw emotion end value, it is judged that emotion end value is in the numerical intervals of which kind of emotion, draw emotion object information.

In some optional implementations of the present embodiment, above-mentioned exchange way information includes: language classification information, pace value, group sentence mode information or close language classification information.

Above-mentioned subject content information refers to the semantic information that above-mentioned input information is transmitted, such as, input information is " I to order take-away ", in some optional implementations, process and semantics recognition process through cutting word, show that subject content information can be " ordering " " take-away ".

Step 304, generates user property label according to above-mentioned user emotion information, above-mentioned language classification information, above-mentioned subject content information and above-mentioned scene information.

In the present embodiment; user property label is for characterizing the subject content expressed by the emotion to user, the language used, user and be presently in the relevant situations such as scene, for instance, when at 9 in female user evening is in; surely taken out by dialect; emotion is dejected, then user property label can be { evening, family; female voice; dejected, Shanghai native language, order take-away }.

Step 305, mates the applicable label of above-mentioned user property label with the voice feedback pattern of training in advance, and obtains matching degree.

Step 306, adjusts above-mentioned voice feedback pattern according to word speed information, group sentence mode information or close language classification information.

In some optional implementations of the present embodiment, adjust above-mentioned voice feedback pattern according to above-mentioned word speed information can be accomplished by: the word speed of above-mentioned voice feedback pattern is adjusted the word speed corresponding with above-mentioned word speed information, here, word speed information is a value range inputted user near pace value numerical value. Such as, user uses the slower voice releived, then the word speed of voice feedback pattern is also also adjusted to the slower voice releived, and is transferred in this value range by the word speed of voice feedback pattern. Again such as, the word speed of above-mentioned voice feedback pattern is adjusted in the scope different from above-mentioned word speed information, for instance user uses voice faster, it is possible to user is in an irritable state, then be the slower voice releived by the word speed of voice feedback pattern.

In some optional implementations of the present embodiment, it is possible to adjusted the word speed of voice feedback pattern by the interval adjusted in voice feedback pattern between word and the word of voice, between sentence and sentence.

In some optional implementations of the present embodiment, adjusting above-mentioned voice feedback pattern according to above-mentioned group of sentence mode information can be accomplished by: the group sentence mode of above-mentioned voice feedback pattern adjusted consistent with above-mentioned group of sentence mode information; Above-mentioned group of sentence mode information is obtained, for instance identify that the group sentence mode obtaining user is accustomed to as { predicate, subject, object }, then the group sentence mode adjusting voice feedback pattern is { predicate, subject, object } by speech recognition technology. Again such as, if the group sentence mode of user is accustomed to as { having a meal, first }, as Normal Feedback is probably " getting ahead ", according to being likely to " walk, first " according to being adjusted to herein.

In some optional implementations of the present embodiment, adjust above-mentioned voice feedback pattern according to close language classification information can be accomplished by: obtain the close language set that above-mentioned close language classification information is consistent, wherein, above-mentioned close language set includes positive common-use words and the close language corresponding with above-mentioned positive common-use words; Words and phrases in above-mentioned voice feedback pattern are compared with the positive common-use words in above-mentioned close language set; If above-mentioned words and phrases are identical with above-mentioned positive common-use words, then replace above-mentioned words and phrases with the close language corresponding with above-mentioned positive common-use words.

Step 307, uses the voice feedback pattern being adjusted to feed back.

In the present embodiment, based on above-mentioned steps 306, the voice feedback pattern being adjusted is used to carry out voice feedback.

In some optional implementations of the present embodiment, the flow process 300 of above-mentioned adaptive voice feedback method can also include step 308, after feedback voice, performs content recommendation; Above-mentioned execution content recommendation can be realized by following steps: according to choosing the content recommendation being associated with above-mentioned scene information, above-mentioned user emotion information, above-mentioned subject content information from the content recommendation set preestablished; Generate content recommendation and perform request; Send above-mentioned content recommendation to client and perform request, choose whether to permit the above-mentioned content recommendation of above-mentioned execution for client; If above-mentioned content recommendation is performed request and sends license by above-mentioned client, then perform above-mentioned content recommendation. Such as, recognition result is the exercise (subject content information) of (scene information) happy (user emotion information) in the morning out of doors, so can recommend to play a little pure and fresh song, if user's license, then perform to play the operation recommending song. Again such as, when at 9 in one female user evening is in, surely taken out by dialect, carry out voice and video identification during phonetic entry by user, it is judged that user is very tired very hungry very dejected, then while performing rapidly fixed take-away, with the same sex dialect sound releived, can ask the user whether to need to play the music liked, or dial the phone of conventional boudoir honey, if permitted, then play the music that user likes or the phone dialing conventional boudoir honey.

In the present embodiment, above-mentioned step 301, step 302, step 303, step 304 and the step 305 realized in flow process is essentially identical with the step 201 in previous embodiment, step 202, step 203, step 204 and step 205 respectively, does not repeat them here.

As can be seen from Figure 3, the difference that the embodiment corresponding with Fig. 2 is main is, the flow process 300 of the adaptive voice feedback method in the present embodiment has had more and adjusted the step 306 of above-mentioned voice feedback pattern according to word speed information, group sentence mode information or close language classification information and perform the step 308 of content recommendation after feedback voice.By increase step 306 and 308, the present embodiment describe scheme can significantly more efficient feedback information in advance, improve feedback specific aim.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides an embodiment of a kind of adaptive voice feedback device, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and this device specifically can apply in various electronic equipment.

As it is shown in figure 5, the above-mentioned adaptive voice feedback device 500 of the present embodiment includes: acquisition module 501, configuration is used for obtaining input information; Identification module 502, configuration is for identifying the scene information of above-mentioned input information; Parsing module 503, configuration obtains at least one in user emotion information, exchange way information, subject content information for resolving above-mentioned input information, and wherein, above-mentioned exchange way information includes language classification information; Generation module 504, configuration is for generating user property label according at least one in above-mentioned user emotion information, above-mentioned language classification information, above-mentioned subject content information and above-mentioned scene information; Matching module 505, configuration is for mating the applicable label of above-mentioned user property label with the voice feedback pattern of training in advance, and obtains matching degree; Feedback module 506, configuration is for using the voice feedback pattern the highest with above-mentioned user property matching degree to carry out voice feedback.

In an optional embodiment of the present embodiment, above-mentioned exchange way information also includes: word speed information, group sentence mode information or close language classification information; And, above-mentioned feedback module includes adjusting submodule, and the configuration of above-mentioned adjustment submodule is used for: adjust above-mentioned voice feedback pattern according to above-mentioned word speed information, above-mentioned group of sentence mode information or above-mentioned close language classification information; The voice feedback pattern being adjusted is used to feed back.

In an optional embodiment of the present embodiment, above-mentioned according to above-mentioned word speed information, above-mentioned group of sentence mode information or above-mentioned close language classification information adjust above-mentioned voice feedback pattern include: the word speed of above-mentioned voice feedback pattern is adjusted the word speed corresponding with above-mentioned word speed information; The group sentence mode of above-mentioned voice feedback pattern is adjusted consistent with above-mentioned group of sentence mode information; Obtaining the close language set consistent with above-mentioned close language classification information pre-build, wherein, above-mentioned close language set includes positive common-use words and the close language corresponding with above-mentioned positive common-use words; Words and phrases in above-mentioned voice feedback pattern are compared with the positive common-use words in above-mentioned close language set; If above-mentioned words and phrases are identical with above-mentioned positive common-use words, then replace above-mentioned words and phrases with the close language corresponding with above-mentioned positive common-use words.

In an optional embodiment of the present embodiment, above-mentioned input information includes: voice messaging and/or video information; And, resolve above-mentioned input information and obtain user emotion information and include: resolve above-mentioned voice messaging and obtain at least one in word speed information, prosody information or spectrum information; Above-mentioned word speed information, prosody information and word speed threshold value, intonation threshold value are compared, obtains voice mood result; Resolve above-mentioned video information and obtain video emotion result; User emotion information is drawn based on above-mentioned voice mood result and above-mentioned video emotion result.

In an optional embodiment of the present embodiment, said apparatus 500 also includes recommending module 507, and the configuration of above-mentioned recommending module is used for: according to choosing the content recommendation being associated with above-mentioned scene information, above-mentioned user emotion information, above-mentioned subject content information from the content recommendation set preestablished; Generate content recommendation and perform request; Send above-mentioned content recommendation to client and perform request, choose whether to permit the above-mentioned content recommendation of above-mentioned execution for client; If above-mentioned content recommendation is performed request and sends license by above-mentioned client, then perform above-mentioned content recommendation.

In an optional embodiment of the present embodiment, above-mentioned scene information include following at least one: temporal information, location information or terminal applies classification information.

It will be understood by those skilled in the art that above-mentioned adaptive voice feedback device 500 also includes some other known features, for instance processor, memorizer etc., embodiment of the disclosure in order to unnecessarily fuzzy, these known structures are not shown in Figure 5.

Below with reference to Fig. 6, it illustrates the structural representation of the computer system 600 being suitable to terminal unit or server for realizing the embodiment of the present application.

As shown in Figure 6, computer system 600 includes CPU (CPU) 601, its can according to the program being stored in read only memory (ROM) 602 or from storage part 608 be loaded into the program random access storage device (RAM) 603 and perform various suitable action and process. In RAM603, also storage has system 600 to operate required various programs and data. CPU601, ROM602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

It is connected to I/O interface 605: include the importation 605 of keyboard, mouse etc. with lower component; Output part 607 including such as cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc.; Storage part 608 including hard disk etc.; And include the communications portion 609 of the NIC of such as LAN card, modem etc. Communications portion 609 performs communication process via the network of such as the Internet. Driver 610 is connected to I/O interface 605 also according to needs. Detachable media 611, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged in driver 610 as required, in order to the computer program read from it is mounted into storage part 608 as required.

Especially, according to embodiment of the disclosure, the process described above with reference to flow chart may be implemented as computer software programs. Such as, embodiment of the disclosure and include a kind of computer program, it includes the computer program being tangibly embodied on machine readable media, and above computer program package is containing the program code being used for the method shown in flow chart that performs. In such embodiments, this computer program can pass through communications portion 609 and be downloaded and installed from network, and/or is mounted from detachable media 611. When this computer program is performed by CPU (CPU) 601, perform the above-mentioned functions limited in the present processes.

Flow chart in accompanying drawing and block diagram, it is illustrated that according to the system of the various embodiment of the application, the architectural framework in the cards of method and computer program product, function and operation. In this, flow chart or each square frame in block diagram can represent a part for a module, program segment or code, and a part for above-mentioned module, program segment or code comprises the executable instruction of one or more logic function for realizing regulation. It should also be noted that at some as in the realization replaced, the function marked in square frame can also to be different from the order generation marked in accompanying drawing. Such as, two square frames succeedingly represented can essentially perform substantially in parallel, and they can also perform sometimes in the opposite order, and this determines according to involved function. It will also be noted that, the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, can realize by the special hardware based system of the function or operation that perform regulation, or can realize with the combination of specialized hardware Yu computer instruction.

It is described in module involved in the embodiment of the present application to be realized by the mode of software, it is also possible to realized by the mode of hardware.Described module can also be arranged within a processor, for instance, it is possible to it is described as: a kind of processor includes acquisition module. Wherein, the title of these modules is not intended that the restriction to this module itself under certain conditions, for instance, acquisition module is also described as " for obtaining the module of input information ".

As on the other hand, present invention also provides a kind of nonvolatile computer storage media, this nonvolatile computer storage media can be the nonvolatile computer storage media comprised in said apparatus in above-described embodiment; Can also be individualism, be unkitted the nonvolatile computer storage media allocating in terminal. Above-mentioned nonvolatile computer storage media storage has one or more program, when said one or multiple program are performed by an equipment so that the said equipment: obtain input information; Identify the scene information of described input information; Resolving described input information and obtain at least one in user emotion information, exchange way information, subject content information, wherein, described exchange way information includes language classification information; User property label is generated according at least one in described user emotion information, described language classification information, described subject content information and described scene information; The applicable label of described user property label with the voice feedback pattern of training in advance is mated, and obtains matching degree; The voice feedback pattern the highest with described user property matching degree is used to carry out voice feedback.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle. Skilled artisan would appreciate that, invention scope involved in the application, it is not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, also should be encompassed in when conceiving without departing from foregoing invention, other technical scheme being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed simultaneously. Such as features described above and (but not limited to) disclosed herein have the technical characteristic of similar functions and replace mutually and the technical scheme that formed.

Claims

1. an adaptive voice feedback method, it is characterised in that described method includes:

Obtain input information;

Identify the scene information of described input information;

Resolving described input information and obtain at least one in user emotion information, exchange way information, subject content information, wherein, described exchange way information includes language classification information;

User property label is generated according at least one in described user emotion information, described language classification information, described subject content information and described scene information;

The applicable label of described user property label with the voice feedback pattern of training in advance is mated, and obtains matching degree;

The voice feedback pattern the highest with described user property matching degree is used to carry out voice feedback.

2. method according to claim 1, it is characterised in that described exchange way information also includes: word speed information, group sentence mode information or close language classification information; And,

The voice feedback pattern that described use and described user property matching degree are the highest carries out voice feedback and includes:

Described voice feedback pattern is adjusted according to described word speed information, described group of sentence mode information or described close language classification information;

The voice feedback pattern being adjusted is used to feed back.

3. method according to claim 2, it is characterised in that described according to described word speed information, described group of sentence mode information or described close language classification information adjust described voice feedback pattern include:

The word speed of described voice feedback pattern is adjusted the word speed corresponding with described word speed information;

The group sentence mode of described voice feedback pattern is adjusted consistent with described group of sentence mode information;

Obtaining the close language set consistent with described close language classification information pre-build, wherein, described close language set includes positive common-use words and the close language corresponding with described positive common-use words; Words and phrases in described voice feedback pattern are compared with the positive common-use words in described close language set; If described words and phrases are identical with described positive common-use words, then replace described words and phrases with the close language corresponding with described positive common-use words.

4. the method according to any one of claim 1-3, it is characterised in that described input information includes: voice messaging and/or video information; And,

Resolve described input information to obtain user emotion information and include:

Resolve described voice messaging and obtain at least one in word speed information, prosody information or spectrum information;

Described word speed information, prosody information and word speed threshold value, intonation threshold value are compared, obtains voice mood result;

Resolve described video information and obtain video emotion result;

User emotion information is drawn based on described voice mood result and described video emotion result.

5. method according to claim 4, it is characterised in that described method also includes:

According to choosing the content recommendation being associated with described scene information, described user emotion information, described subject content information from the content recommendation set preestablished;

Generate content recommendation and perform request;

Send described content recommendation to client and perform request, choose whether to permit the described content recommendation of described execution for client;

If described content recommendation is performed request and sends license by described client, then perform described content recommendation.

6. method according to claim 5, it is characterised in that described scene information include following at least one: temporal information, location information or terminal applies classification information.

7. an adaptive voice feedback device, it is characterised in that described device includes:

Acquisition module, configuration is used for obtaining input information;

Identification module, configuration is for identifying the scene information of described input information;

Parsing module, configuration obtains at least one in user emotion information, exchange way information, subject content information for resolving described input information, and wherein, described exchange way information includes language classification information;

Generation module, configuration is for generating user property label according at least one in described user emotion information, described language classification information, described subject content information and described scene information;

Matching module, configuration is for mating the applicable label of described user property label with the voice feedback pattern of training in advance, and obtains matching degree;

Feedback module, configuration is for using the voice feedback pattern the highest with described user property matching degree to carry out voice feedback.

8. device according to claim 7, it is characterised in that described exchange way information also includes: word speed information, group sentence mode information or close language classification information; And,

Described feedback module includes adjusting submodule, and the configuration of described adjustment submodule is used for:

The voice feedback pattern being adjusted is used to feed back.

9. device according to claim 8, it is characterised in that described according to described word speed information, described group of sentence mode information or described close language classification information adjust described voice feedback pattern include:

10. the device according to any one of claim 7-9, it is characterised in that described input information includes: voice messaging and/or video information; And,

Resolve described video information and obtain video emotion result;

11. device according to claim 10, it is characterised in that described device also includes recommending module, the configuration of described recommending module is used for:

Generate content recommendation and perform request;

12. device according to claim 11, it is characterised in that described scene information include following at least one: temporal information, location information or terminal applies classification information.