CN105654950B

CN105654950B - Adaptive voice feedback method and device

Info

Publication number: CN105654950B
Application number: CN201610060206.2A
Authority: CN
Inventors: 李丰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-01-28
Filing date: 2016-01-28
Publication date: 2019-07-16
Anticipated expiration: 2036-01-28
Also published as: CN105654950A

Abstract

This application discloses adaptive voice feedback method and devices.One specific embodiment of the method includes: to obtain input information；Identify the scene information of the input information；Parse the input information obtain user emotion information, exchange way information, in subject content information at least one of, wherein the exchange way information includes language category information；According in the user emotion information, the language category information, the subject content information at least one of and the scene information generate user property label；The user property label is matched with the applicable label of voice feedback pattern trained in advance, and obtains matching degree；Voice feedback is carried out using with the highest voice feedback pattern of the user property matching degree.The embodiment realizes adaptive feedback voice, improves the specific aim and validity of voice feedback.

Description

Adaptive voice feedback method and device

Technical field

This application involves field of computer technology, and in particular to Internet technical field more particularly to adaptive voice are anti- Present method and apparatus.

Background technique

With the development of computer technology especially Internet technology, the function of the client application of terminal device is increasingly Diversification.Voice assistant is a kind of can to realize or substitute partially our inquiries and operation on mobile phone by interactive voice Application, certainly, realize voice assistant function further include have voice assistant function application or have voice assistant function The website of energy.By such application or website, the convenience of operating terminal equipment can be greatly improved.However, existing above-mentioned Using or the interaction of website and people also only reside within how correctly to understand the voice input of people, and give answer as early as possible and ask Topic executes certain operations (such as inquire, show, application operating), and function is more single, and specific aim is not strong.

Summary of the invention

The purpose of the application is to propose a kind of improved adaptive voice feedback method and device, to solve background above The technical issues of technology segment is mentioned.

In a first aspect, this application provides a kind of adaptive voice feedback methods, which comprises obtain input letter Breath；Identify the scene information of the input information；Parse the input information obtain user emotion information, exchange way information, At least one of in subject content information, wherein the exchange way information includes language category information；According to user's feelings Thread information, the language category information, in the subject content information at least one of and the scene information generate user and belong to Property label；The user property label is matched with the applicable label of voice feedback pattern trained in advance, and is obtained With degree；Voice feedback is carried out using with the highest voice feedback pattern of the user property matching degree.

In some embodiments, the exchange way information further include: word speed information, group sentence mode information or close language classification Information；And it includes: basis that the use, which carries out voice feedback with the highest voice feedback pattern of the user property matching degree, The word speed information, described group of sentence mode information or the close language classification information adjust the voice feedback pattern；Use process The voice feedback pattern of adjustment is fed back.

In some embodiments, described to be believed according to the word speed information, described group of sentence mode information or the close language classification Breath includes: that the word speed of the voice feedback pattern is adjusted to corresponding with the word speed information adjusting the voice feedback pattern Word speed；The group sentence mode of the voice feedback pattern is adjusted to consistent with described group of sentence mode information；Acquisition pre-establishes With the close consistent close language set of language classification information, wherein the close language set include positive common-use words and with it is described normal The corresponding close language of term；Words and phrases in the voice feedback pattern are compared with the positive common-use words in the close language set； If the words and phrases are identical as the positive common-use words, the words and phrases are replaced with close language corresponding with the positive common-use words.

In some embodiments, the input information includes: voice messaging and/or video information；And parsing is described defeated Entering information to obtain user emotion information includes: that the parsing voice messaging obtains in word speed information, prosody information or spectrum information At least one of；The word speed information, prosody information are compared with word speed threshold value, intonation threshold value, obtain voice mood knot Fruit；It parses the video information and obtains video mood result；It is obtained based on the voice mood result and the video mood result User emotion information out.

In some embodiments, the method also includes: according to from the recommendation set preestablished choose and institute State scene information, the user emotion information, the associated recommendation of subject content information；Recommendation is generated to execute Request；The recommendation is sent to client and executes request, so that client chooses whether to permit the execution recommendation Content；If the client executes request to the recommendation and issues license, the recommendation is executed.

In some embodiments, the scene information includes at least one of the following: that temporal information, location information or terminal are answered Use classification information.

Second aspect, this application provides a kind of adaptive voice feedback device, described device includes: acquisition module, is matched It sets for obtaining input information；Identification module is configured to identify the scene information of the input information；Parsing module, configuration For parse the input information obtain user emotion information, exchange way information, in subject content information at least one of, In, the exchange way information includes language category information；Generation module is configured to according to the user emotion information, institute At least one in predicate speech classification information, the subject content information and scene information generation user property label；? With module, it is configured to match the user property label with the applicable label of voice feedback pattern trained in advance, And obtain matching degree；Feedback module is configured to carry out using with the highest voice feedback pattern of the user property matching degree Voice feedback.

In some embodiments, the exchange way information further include: word speed information, group sentence mode information or close language classification Information；And the feedback module includes adjusting submodule, the adjusting submodule is configured to: according to the word speed information, Described group of sentence mode information or the close language classification information adjust the voice feedback pattern；Use the voice feedback being adjusted Pattern is fed back.

In some embodiments, described device further includes recommending module, and the recommending module is configured to: according to from preparatory It is chosen in the recommendation set set up related to the scene information, the user emotion information, the subject content information The recommendation of connection；It generates recommendation and executes request；The recommendation is sent to client and executes request, for client It chooses whether to permit the execution recommendation；Permitted if the client executes request and issue to the recommendation Can, then execute the recommendation.

Adaptive voice feedback method and device provided by the present application input information by obtaining；Identification input information Scene information, parsing input information obtain user emotion information, exchange way information, in subject content information at least one of, Further according in user emotion information, the language category information, subject content information at least one of and scene information generate use Then family attribute tags match user property label with the applicable label of voice feedback pattern trained in advance, and To matching degree, finally uses and carry out voice feedback with the highest voice feedback pattern of the user property matching degree, realize certainly Feedback voice is adapted to, the specific aim and validity of voice feedback are improved.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is can be using the adaptive voice feedback method of the application or the embodiment of adaptive voice feedback device Exemplary system architecture；

Fig. 2 is the flow chart according to one embodiment of the adaptive voice feedback method of the application；

Fig. 3 is the flow chart according to another embodiment of the adaptive voice feedback method of the application；

Fig. 4 is the data flow schematic diagram of an application scenarios of embodiment illustrated in fig. 3；

Fig. 5 is the structural schematic diagram according to one embodiment of the adaptive voice feedback device of the application；

Fig. 6 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application Figure.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the implementation of the adaptive voice feedback method or adaptive voice feedback device of the application The exemplary system architecture 100 of example.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be fed back by network 104 and server 105, to receive or send out Send message (such as voice messaging) etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as The application of voice assistant class, the application of document management class, searching class application, mailbox client, social platform software etc..

Terminal device 101,102,103 can be various electronic equipments, including but not limited to smart phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image Expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic shadow As expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..

Server 105 can be to provide the server of various services, such as to the voice on terminal device 101,102,103 The backstage Speech processing services device that the offers such as assistant's class application are supported.Above-mentioned backstage Speech processing services device can be set to from terminal The standby voice received such as is stored, is analyzed at the processing, and processing result is fed back to above-mentioned terminal device and is executed corresponding Operation.

As shown in Figure 1, by installing corresponding voice assistant class application, Huo Zhetong on terminal device 101,102,103 Cross on terminal device 101,102,103 with voice assistant function communication class application, or by terminal device 101, 102, browsing has the communication class website of voice assistant function on 103, these terminal devices can with the form of speech message to Server 105 issues object dispatching request, can execute above-mentioned adaptive voice feedback method by server 105 later.Phase Ying Di, adaptive voice feedback device can be set in server 105.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the adaptive voice feedback method according to the application is shown 200.Above-mentioned adaptive voice feedback method, comprising the following steps:

Step 201, input information is obtained.

In the present embodiment, electronic equipment (such as the service shown in FIG. 1 of adaptive voice feedback method operation thereon Device) it can be from the input information for locally or remotely obtaining user.When in the memory that input information has been stored in electronic equipment When, above-mentioned electronic equipment directly can obtain above-mentioned input information from local memory.Alternatively, when above-mentioned electronic equipment is pair When the background server that the voice assistant class application on terminal device is supported, wired connection mode or nothing can be passed through Line connection type obtains input information from above-mentioned terminal device.Above-mentioned radio connection include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other now The radio connection known or developed in the future.

In the present embodiment, input information includes but is not limited to voice messaging, video information, image information or text letter Breath.

Step 202, the scene information of identification input information.

In the present embodiment, the scene information for inputting information refers to scene locating when user sends this input information, on Stating scene information can include but is not limited to temporal information, location information or terminal applies classification information.Herein, temporal information Refer to that user sends the time of above-mentioned input information.Location information refers to that user sends the place of above-mentioned input information, above-mentionedly Point can be specific address, such as the city x, the x province area the x street x；Address above mentioned is also possible to locating place, such as family, company, doctor Institute.Herein, terminal applies classification information can be user and indicate to operate the information of which kind of terminal applies, for example, if user is logical It crosses voice instruction and opens map class application, then terminal applies classification information is exactly this map class application；Optionally, terminal applies Classification information can also be used in above-mentioned input information that user enters and sends which kind of terminal applies information, such as User uses communication class application to send one of text information, video information, image information or a variety of, then terminal applies class Other information is exactly this communication class application.

In some optional implementations of the present embodiment, location-based service (LBS, Location Based can be passed through Services, also known as positioning service) identification location information.

In some optional implementations of the present embodiment, several scenes can be established according to the statistical data of history Information model, such as " evening-family-take-away ", " morning-company's meeting-degree is secret ", " Sunday-open air-map ", " default ".It will obtain The scene information matching degree of the scene information and scene information Model Matching of the input information taken, judgement and current input information is most High scene information model makees this scene information model if this scene information model meets predetermined matching degree threshold value For scene information.If the matching degree of scene information model and the scene information of identification cannot all reach scheduled matching degree threshold Value, then using the scene information of identification as the scene information of input information；Alternatively, being made with " default " model of scene information model For the scene information for inputting information.

Step 203, parsing input information obtain user emotion information, exchange way information, in subject content information extremely One item missing.

In the present embodiment, the emotional information of user refers to emotional state information when user inputs above-mentioned input information, Such as it is pleasant, sad, tire out, have much energy.

In some optional implementations of the present embodiment, parsing above-mentioned voice messaging and obtaining user emotion information can lead to It crosses following manner realization: parsing above-mentioned voice messaging and obtain at least one in pace value, intonation value or spectrum information；It will be above-mentioned Pace value, intonation value are compared with word speed threshold value, intonation threshold value, obtain voice mood result；Above-mentioned video information is parsed to obtain To video mood result；User emotion information is obtained based on above-mentioned voice mood result and above-mentioned video mood result.

In the present embodiment, above-mentioned exchange way information includes but is not limited to language category information, above-mentioned language category letter Breath can be languages classification information, such as Chinese, English, Japanese；It is different that above-mentioned language category information is also possible to language of the same race Dialect information, such as Guangdong language, Ningbo words, mandarin, then such as Americanese, British English.

In the present embodiment, above-mentioned subject content information refers to the semantic information that above-mentioned input information is transmitted, for example, defeated Entering information is " I will order take-away ", in some optional implementations, handles by word cutting and semantics recognition is handled, obtain master Topic content information can be " ordering " " take-away ".

In some optional implementations of the present embodiment, the main theme using speech recognition technology identification voice messaging Content information.Such as the voice messaging of user is identified as shopping, inquiry, the square XX, excellent clothing library, then by subject content information It is expressed as a series of subject key words { shopping, inquiry, the square XX, excellent clothing library }.

In some optional implementations of the present embodiment, corresponding operation is executed according to subject content information, for example, defeated Entering information is " I will order take-away ", has the operation for ordering the terminal applies for taking out function then executing and opening, right rear line pushes away Recommend take-away ordering information.

Step 204, according in user emotion information, language category information, subject content information at least one of and scene Information generates user property label.

In the present embodiment, user property label is for characterize and the mood of user, used language, being presently in field The situation of the correlation such as scape, for example, being taken out surely by dialect, mood is dejected, then using when at 9 points in a female user evening is in Family attribute tags can be { at night, family, female voice is dejected, Shanghai native language }.

Step 205, above-mentioned user property label is matched with the applicable label of voice feedback pattern trained in advance, And obtain matching degree.

In the present embodiment, several voice feedback patterns of voice are trained in advance, and above-mentioned voice feedback pattern includes feature Label and applicable label, wherein features described above label is used to characterize the feature of this voice feedback pattern, for example, voice feedback sample The feature tag of formula is { female voice, Lin Zhiling, mandarin are after 80s }；Above-mentioned applicable label is suitable for characterizing this voice feedback pattern Which share in situation, such as it is { scene 1, scene 2, mood 1, mood that feature tag, which is the applicable label of voice feedback pattern, 2, dialect 1, close language 1, close language 2, theme 1, theme 2 }.

Step 206, voice feedback is carried out using with the highest voice feedback pattern of above-mentioned user property matching degree.

In the present embodiment, above-mentioned steps 205 are based on, by above-mentioned user property label and above-mentioned applicable tag match, It obtains the highest voice feedback pattern of matching degree and is fed back, such as mandarin input of the user when driving in a good humor, be System will acquire { female voice, Lin Zhiling, mandarin are after 80s } this voice feedback pattern and carry out voice feedback.

The method provided by the above embodiment of the application is by obtaining input information；The scene information of identification input information, Parsing input information obtain user emotion information, exchange way information, in subject content information at least one of, further according to user Emotional information, the language category information, in subject content information at least one of and scene information generate user property label, Then user property label is matched with the applicable label of voice feedback pattern trained in advance, and obtains matching degree, most After use and carry out voice feedback with the highest voice feedback pattern of the user property matching degree, realize adaptive backchannel Sound improves the specific aim and validity of voice feedback.

With further reference to Fig. 3, it illustrates the processes 300 of another embodiment of adaptive voice feedback method.It should be certainly Adapt to the process 300 of voice feedback method, comprising the following steps:

Step 301, input information is obtained.

In the present embodiment, electronic equipment (such as the service shown in FIG. 1 of adaptive voice feedback method operation thereon Device) it can be from the input information for locally or remotely obtaining user.

In the present embodiment, input information can include but is not limited to voice messaging, video information, image information or text Information.

Step 302, the scene information of identification input information.

In the present embodiment, the scene information for inputting information refers to scene locating when user sends this input information, on Stating scene information can include but is not limited to temporal information, location information or terminal applies classification information.

Step 303, parsing input information obtains user emotion information, exchange way information, subject content information.

In the present embodiment, parsing above-mentioned voice messaging and obtaining user emotion information can be accomplished by the following way: solution It analyses above-mentioned voice messaging and obtains pace value, intonation value；Above-mentioned pace value, intonation value and word speed threshold value, intonation threshold value are compared Compared with obtaining voice mood result；It parses above-mentioned video information and obtains video mood result；Based on above-mentioned voice mood result and upper It states video mood result and obtains user emotion information.It is understood that above-mentioned word speed threshold value, intonation threshold value according to user not It is same and different.

In some optional implementations of the present embodiment, if above-mentioned pace value is lower than word speed threshold value, and intonation value is low In intonation threshold value, then determining that voice mood result is negative.If above-mentioned pace value is higher than word speed threshold value, and intonation value is higher than Intonation threshold value, then determining that voice mood result is front.If only one is higher than its threshold value for above-mentioned pace value, intonation value, So determine that voice mood result is neutrality.

In some optional implementations of the present embodiment, parsing above-mentioned video information and obtaining video mood result can lead to It crosses video Dynamic Recognition or sampled images knowledge is realized otherwise.

In some optional implementations of the present embodiment, when voice mood result and video mood result be all front or When negative, mood result information is just positive or negative；If when voice mood result and inconsistent video mood result, determined Mood result information is neutrality.

It can be above-mentioned voice mood result and above-mentioned video mood knot in some optional implementations of the present embodiment Fruit sets up different weights, such as when voice input quality is preferable and video input is second-rate, sets for voice mood result Vertical higher weights.It is 1, -1,0 by front, negative, the neutral numerical value respectively that is arranged, by voice mood result and video mood result Weight and front, negative, neutral corresponding numerical operation, obtain mood end value, judge which kind of feelings is mood end value be in The numerical intervals of thread obtain mood result information.

In some optional implementations of the present embodiment, above-mentioned exchange way information includes: language category information, word speed Value, group sentence mode information or close language classification information.

Above-mentioned subject content information refers to the semantic information that above-mentioned input information is transmitted, for example, input information is that " I wants Order take-away ", in some optional implementations, is handled by word cutting and semantics recognition is handled, show that subject content information can To be " ordering " " take-away ".

Step 304, according to above-mentioned user emotion information, above-mentioned language category information, above-mentioned subject content information and above-mentioned Scene information generates user property label.

In the present embodiment, user property label for characterize and the mood of user, used language, expressed by user The subject content situation related to scene etc. is presently in pass through dialect for example, 9 points when being at night of a female user Fixed to take out, mood is dejected, then user property label can be { at night, family, female voice is dejected, Shanghai native language, orders take-away }.

Step 305, above-mentioned user property label is matched with the applicable label of voice feedback pattern trained in advance, And obtain matching degree.

Step 306, above-mentioned voice feedback pattern is adjusted according to word speed information, group sentence mode information or close language classification information.

In some optional implementations of the present embodiment, above-mentioned voice feedback pattern is adjusted according to above-mentioned word speed information It can be accomplished by the following way: the word speed of above-mentioned voice feedback pattern is adjusted to word speed corresponding with above-mentioned word speed information, Herein, word speed information is the value range inputted near pace value numerical value in user.For example, user is releived using slower Voice, then the word speed of voice feedback pattern is also also adjusted to the slower voice releived, i.e., by voice feedback pattern Word speed is transferred in this value range.For another example the word speed of above-mentioned voice feedback pattern is adjusted to different from above-mentioned word speed information In range, such as user uses faster voice, and possible user is in an irritable state, then by voice feedback pattern Word speed is the slower voice releived.

In some optional implementations of the present embodiment, can by adjusting voice in voice feedback pattern word with The word speed of interval adjustment voice feedback pattern between word, between sentence and sentence.

In some optional implementations of the present embodiment, above-mentioned voice feedback is adjusted according to above-mentioned group of sentence mode information Pattern can be accomplished by the following way: the group sentence mode of above-mentioned voice feedback pattern is adjusted to and above-mentioned group of sentence mode information Unanimously；Above-mentioned group of sentence mode information is obtained by speech recognition technology, such as identifies that the group sentence mode for obtaining user is accustomed to as { meaning Language, subject, object }, then the group sentence mode of adjustment voice feedback pattern is { predicate, subject, object }.For another example if with The group sentence mode at family is accustomed to as { have a meal, first }, if Normal Feedback may be " getting ahead ", according to herein may according to be adjusted to " walk, First ".

In some optional implementations of the present embodiment, above-mentioned voice feedback pattern is adjusted according to close language classification information It can be accomplished by the following way: obtain the above-mentioned close consistent close language set of language classification information, wherein above-mentioned close language set includes Positive common-use words and close language corresponding with above-mentioned positive common-use words；It will be in the words and phrases and above-mentioned close language set in above-mentioned voice feedback pattern Positive common-use words be compared；If above-mentioned words and phrases are identical as above-mentioned positive common-use words, with corresponding close with above-mentioned positive common-use words Language replaces above-mentioned words and phrases.

Step 307, it is fed back using the voice feedback pattern being adjusted.

In the present embodiment, above-mentioned steps 306 are based on, carry out voice feedback using the voice feedback pattern being adjusted.

In some optional implementations of the present embodiment, the process 300 of above-mentioned adaptive voice feedback method may be used also To include step 308, after feeding back voice, recommendation is executed；Above-mentioned execution recommendation can be realized by following steps: According to from being chosen in the recommendation set preestablished with above-mentioned scene information, above-mentioned user emotion information, in above-mentioned theme Hold the associated recommendation of information；It generates recommendation and executes request；Above-mentioned recommendation, which is sent, to client executes request, So that client chooses whether to permit the above-mentioned recommendation of above-mentioned execution；If above-mentioned client asks the execution of above-mentioned recommendation It asks sending to permit, then executes above-mentioned recommendation.For example, recognition result is (scene information) happy (user outdoors in the morning Emotional information) exercise (subject content information), then can recommend play a small pure and fresh song, if user license, Execute the operation for playing and recommending song.For another example being taken out surely by dialect when at 9 points in a female user evening is in, passing through use Family carries out voice and video identification when voice input, judges that user is very tired very hungry very dejected, then takes out surely in rapid execution Meanwhile the same sex dialect sound releived can be used, it inquires whether the user needs to play the music liked, or dial the electricity of common boudoir honey Words, if permitted, play the music or the phone for dialing common boudoir honey that user likes.

In the present embodiment, step 301, step 302, step 303, step 304 and the step 305 in above-mentioned implementation process It is essentially identical with step 201, step 202, step 203, step 204 and the step 205 in previous embodiment respectively, herein no longer It repeats.

From figure 3, it can be seen that the main difference of embodiment corresponding with Fig. 2 is the adaptive language in the present embodiment The process 300 of sound feedback method, which has had more, adjusts above-mentioned voice according to word speed information, group sentence mode information or close language classification information It feeds back the step 306 of pattern and executes the step 308 of recommendation after feeding back voice.By the step 306 of increase and 308, The scheme of the present embodiment description significantly more efficient can feed back preparatory information, improve the specific aim of feedback.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides a kind of adaptive voices One embodiment of feedback device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, the above-mentioned adaptive voice feedback device 500 of the present embodiment includes: to obtain module 501, configuration is used Information is inputted in obtaining；Identification module 502 is configured to identify the scene information of above-mentioned input information；Parsing module 503, matches Set for parse above-mentioned input information obtain user emotion information, exchange way information, in subject content information at least one of, Wherein, above-mentioned exchange way information includes language category information；Generation module 504 is configured to be believed according to above-mentioned user emotion At least one in breath, above-mentioned language category information, above-mentioned subject content information and above-mentioned scene information generation user property mark Label；Matching module 505, be configured to by the applicable label of above-mentioned user property label and voice feedback pattern trained in advance into Row matching, and obtain matching degree；Feedback module 506 is configured to using anti-with the above-mentioned highest voice of user property matching degree It presents pattern and carries out voice feedback.

In an optional embodiment of the present embodiment, above-mentioned exchange way information further include: word speed information, group sentence side Formula information or close language classification information；And above-mentioned feedback module includes adjusting submodule, above-mentioned adjusting submodule is configured to: Above-mentioned voice feedback pattern is adjusted according to above-mentioned word speed information, above-mentioned group of sentence mode information or above-mentioned close language classification information；It uses The voice feedback pattern being adjusted is fed back.

It is above-mentioned according to above-mentioned word speed information, above-mentioned group of sentence mode information in an optional embodiment of the present embodiment Or above-mentioned close language classification information adjust above-mentioned voice feedback pattern include: by the word speed of above-mentioned voice feedback pattern be adjusted to The corresponding word speed of above-mentioned word speed information；The group sentence mode of above-mentioned voice feedback pattern is adjusted to and above-mentioned group of sentence mode information one It causes；It obtains pre-establishing with the above-mentioned close consistent close language set of language classification information, wherein above-mentioned close language set includes just common Language and close language corresponding with above-mentioned positive common-use words；By in above-mentioned voice feedback pattern words and phrases with it is normal in above-mentioned close language set Term is compared；If above-mentioned words and phrases are identical as above-mentioned positive common-use words, replaced with close language corresponding with above-mentioned positive common-use words Above-mentioned words and phrases.

In an optional embodiment of the present embodiment, above-mentioned input information includes: voice messaging and/or video letter Breath；And parsing above-mentioned input information to obtain user emotion information includes: that the above-mentioned voice messaging of parsing obtains word speed information, language At least one of in tune information or spectrum information；Above-mentioned word speed information, prosody information and word speed threshold value, intonation threshold value are compared Compared with obtaining voice mood result；It parses above-mentioned video information and obtains video mood result；Based on above-mentioned voice mood result and upper It states video mood result and obtains user emotion information.

In an optional embodiment of the present embodiment, above-mentioned apparatus 500 further includes recommending module 507, above-mentioned recommendation Module is configured to: being believed according to choosing from the recommendation set preestablished with above-mentioned scene information, above-mentioned user emotion Breath, the above-mentioned associated recommendation of subject content information；It generates recommendation and executes request；Above-mentioned recommendation is sent to client Content executes request, so that client chooses whether to permit the above-mentioned recommendation of above-mentioned execution；If above-mentioned client is to above-mentioned Recommendation executes request and issues license, then executes above-mentioned recommendation.

In an optional embodiment of the present embodiment, above-mentioned scene information include at least one of the following: temporal information, Location information or terminal applies classification information.

It will be understood by those skilled in the art that above-mentioned adaptive voice feedback device 500 further includes some other known knots Structure, such as processor, memory etc., in order to unnecessarily obscure embodiment of the disclosure, these well known structures in Fig. 5 not It shows.

Below with reference to Fig. 6, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present application The structural schematic diagram of machine system 600.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable Computer program on medium, above-mentioned computer program include the program code for method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communications portion 609, and/or from removable Medium 611 is unloaded to be mounted.When the computer program is executed by central processing unit (CPU) 601, execute in the present processes The above-mentioned function of limiting.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquisition module.Wherein, the title of these units does not constitute the restriction to the unit itself under certain conditions, for example, instead Compilation unit is also described as " for obtaining the unit of input information ".

As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating Machine storage medium can be nonvolatile computer storage media included in above-mentioned apparatus in above-described embodiment；It is also possible to Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited One or more program is contained, when said one or multiple programs are executed by an equipment, so that above equipment: will be upper The word speed for stating voice feedback pattern is adjusted to word speed corresponding with above-mentioned word speed information；By the group sentence side of above-mentioned voice feedback pattern Formula is adjusted to consistent with above-mentioned group of sentence mode information；It obtains pre-establishing with the above-mentioned close consistent close language collection of language classification information It closes, wherein above-mentioned close language set includes positive common-use words and close language corresponding with above-mentioned positive common-use words；By above-mentioned voice feedback pattern In words and phrases be compared with the positive common-use words in above-mentioned close language set；If above-mentioned words and phrases are identical as above-mentioned positive common-use words, Above-mentioned words and phrases are replaced with close language corresponding with above-mentioned positive common-use words.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of adaptive voice feedback method, which is characterized in that the described method includes:

Obtain input information；

Identify the scene information of the input information；

Parse the input information obtain user emotion information, exchange way information, in subject content information at least one of, In, the exchange way information includes language category information；

According in the user emotion information, the language category information, the subject content information at least one of and it is described Scene information generates user property label；

The user property label is matched with the applicable label of voice feedback pattern trained in advance, and is matched Degree；

Voice feedback is carried out using with the highest voice feedback pattern of the user property tag match degree.

2. the method according to claim 1, wherein the exchange way information further include: word speed information, group sentence Mode information or close language classification information；And

The use carries out voice feedback with the highest voice feedback pattern of the user property tag match degree

The voice feedback pattern is adjusted according to the word speed information, described group of sentence mode information or the close language classification information；

It is fed back using the voice feedback pattern being adjusted.

3. according to the method described in claim 2, it is characterized in that, described believe according to the word speed information, described group of sentence mode Breath or the close language classification information include: adjusting the voice feedback pattern

The word speed of the voice feedback pattern is adjusted to word speed corresponding with the word speed information；

The group sentence mode of the voice feedback pattern is adjusted to consistent with described group of sentence mode information；

It obtains pre-establishing with the close consistent close language set of language classification information, wherein the close language set includes normal Term and close language corresponding with the positive common-use words；By in the voice feedback pattern words and phrases and the close language set in just Common-use words are compared；If the words and phrases are identical as the positive common-use words, replaced with close language corresponding with the positive common-use words Change the words and phrases.

4. method according to claim 1-3, which is characterized in that the input information include: voice messaging and/ Or video information；And

It parses the input information and obtains user emotion information and include:

It parses the voice messaging and obtains at least one in word speed information, prosody information or spectrum information；

The word speed information, prosody information are compared with word speed threshold value, intonation threshold value, obtain voice mood result；

It parses the video information and obtains video mood result；

User emotion information is obtained based on the voice mood result and the video mood result.

5. according to the method described in claim 4, it is characterized in that, the method also includes:

It is chosen and the scene information, the user emotion information, the master according to from the recommendation set preestablished Inscribe the associated recommendation of content information；

It generates recommendation and executes request；

The recommendation is sent to client and executes request, so that client chooses whether to permit described execute in the recommendation Hold；

If the client executes request to the recommendation and issues license, the recommendation is executed.

6. according to the method described in claim 5, it is characterized in that, the scene information includes at least one of the following: that the time believes Breath, location information or terminal applies classification information.

7. a kind of adaptive voice feedback device, which is characterized in that described device includes:

Module is obtained, is configured to obtain input information；

Identification module is configured to identify the scene information of the input information；

Parsing module is configured to parse the input information and obtains user emotion information, exchange way information, subject content letter At least one of in breath, wherein the exchange way information includes language category information；

Generation module is configured to according in the user emotion information, the language category information, the subject content information At least one of and the scene information generate user property label；

Matching module is configured to carry out on the applicable label of the user property label and voice feedback pattern trained in advance Matching, and obtain matching degree；

Feedback module is configured to anti-using voice is carried out with the highest voice feedback pattern of the user property tag match degree Feedback.

8. device according to claim 7, which is characterized in that the exchange way information further include: word speed information, group sentence Mode information or close language classification information；And

The feedback module includes adjusting submodule, and the adjusting submodule is configured to:

It is fed back using the voice feedback pattern being adjusted.

9. device according to claim 8, which is characterized in that described to be believed according to the word speed information, described group of sentence mode Breath or the close language classification information include: adjusting the voice feedback pattern

10. according to the described in any item devices of claim 7-9, which is characterized in that the input information includes: voice messaging And/or video information；And

It parses the video information and obtains video mood result；

11. device according to claim 10, which is characterized in that described device further includes recommending module, the recommendation mould Block is configured to:

It generates recommendation and executes request；

12. device according to claim 11, which is characterized in that the scene information includes at least one of the following: the time Information, location information or terminal applies classification information.