CN109710799A

CN109710799A - Voice interactive method, medium, device and calculating equipment

Info

Publication number: CN109710799A
Application number: CN201910005993.4A
Authority: CN
Inventors: 肖军军; 张敏; 张汉雁; 魏永振
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2019-01-03
Filing date: 2019-01-03
Publication date: 2019-05-03
Anticipated expiration: 2039-01-03
Also published as: CN109710799B

Abstract

Embodiments of the present invention provide a kind of voice interactive method, comprising: the voice messaging is converted to statement text by the voice messaging for receiving user's input；The comment information to match with the statement text is obtained from preset music commentary library；And the output comment information is as the response for the voice messaging.The embodiment of the present disclosure makes full use of existing music commentary information as response, and a large amount of reductions write upper input manpower in response content, and can cause the emotional resonance of the user of current input voice information, meets user feeling demand.Embodiments of the present invention additionally provide a kind of voice interaction device, medium and calculate equipment.

Description

Voice interactive method, medium, device and calculating equipment

Technical field

Embodiments of the present invention are related to field of computer technology, more specifically, embodiments of the present invention are related to voice Exchange method, medium, device and calculating equipment.

Background technique

Background that this section is intended to provide an explanation of the embodiments of the present invention set forth in the claims or context.Herein Description recognizes it is the prior art not because not being included in this section.

The essence of interactive voice is human-computer interaction, refers to that user is interacted using voice as carrier with machine, linked up, information Exchange, generation is a series of to be output and input, and is finally completed corresponding task or is reached corresponding purpose.

Existing interactive voice scheme needs developer to write machine response content in advance, when user's input voice information When, voice messaging is converted into text, selects the response content to match with the text as output.On the one hand, response content Write and need to put into a large amount of manpower, inefficiency, on the other hand, the response content write in advance is mechanical stiff, Wu Faman Sufficient user feeling demand.

Summary of the invention

For the above reasons, existing interactive voice scheme needs to put into a large amount of manpowers and writes response content, and in response It is mechanical stiff to hold, and is unable to satisfy user feeling demand.

For this reason, it may be necessary to a kind of improved voice interactive method, to realize the human-computer interaction of more efficient more emotional resonance.

In the present context, embodiments of the present invention are intended to provide a kind of voice interactive method and device.

In the first aspect of embodiment of the present invention, a kind of voice interactive method is provided, comprising: receive user's input Voice messaging, the voice messaging is converted into statement text；It is obtained and the statement text from preset music commentary library The comment information to match；And the comment information is exported as the response for the voice messaging.

In one embodiment of the invention, in the above-mentioned output comment information as answering for the voice messaging After answering, the above method further include: play music corresponding with the comment information.

In another embodiment of the invention, it is obtained and the statement text phase from preset music commentary library above-mentioned Before the comment information matched, the above method further include: a plurality of comment information about music for meeting preset condition is obtained, by institute The a plurality of comment information obtained constructs preset music commentary library；Identify the focus information of each comment information in preset music commentary library And intent information.The above-mentioned comment information to match with the statement text that obtains from preset music commentary library includes: to be based on The focus information and intent information of each comment information in the preset music commentary library, what acquisition matched with the statement text Comment information.

In another embodiment of the present invention, above-mentioned acquisition meets a plurality of comment information packet about music of preset condition It includes: according to the history music interaction behavioral data of the user, obtaining the corresponding comment information of individualized music of the user, Wherein, the individualized music of the user includes following at least one: the sound that the music of user's collection, the user create The music that the music or the user that happy, the described user likes play；And/or it obtains and current promotes that music is corresponding to be commented By information；And/or obtain the comment information for thumbing up that number is more than first threshold.

In one more embodiment of the present invention, the above-mentioned focus letter based on each comment information in the preset music commentary library Breath and intent information, obtaining the comment information to match with the statement text includes: to identify that the focus of the statement text is believed Breath and intent information；The focus of each comment information in the focus information of the statement text and the preset music commentary library is believed Breath is matched, and the matched comment information of focus is filtered out；And by the intent information of the statement text and the focus The intent information for the comment information matched is matched, and is filtered out focus matching and is intended to matched comment information.

In one more embodiment of the present invention, in the above-mentioned preset music commentary library of identification the focus information of each comment information and Intent information includes: the label extracted from each comment information based on tag library for characterizing corresponding focus information, base The intention clause for characterizing corresponding intent information is extracted from each comment information in intent classifier library.Above-mentioned identification institute The focus information and intent information for stating statement text include: to be extracted from the statement text for characterizing based on the tag library The label of corresponding focus information, is extracted from the statement text for characterizing corresponding intent information based on intent classifier library Intention clause.The focus information of each comment information in the above-mentioned focus information by the statement text and music commentary library Carrying out matching includes: to match the label of the statement text with the label of each comment information, when matching degree is more than It is determined as focus matched comment information when second threshold.And the above-mentioned intent information by the statement text and the coke It includes: to match the intention clause of the statement text with the focus that the intent information of the matched comment information of point, which carries out matching, The intention clause of comment information matched, be determined as focus matching when matching degree is more than third threshold value and be intended to matched Comment information.

In one more embodiment of the present invention, above-mentioned obtain from music commentary library is commented with what the statement text matched By information further include: when filtering out a plurality of focus matching and being intended to matched comment information, it is corresponding to obtain each comment information The priority of music；Priority based on the music is ranked up the comment, chooses a comment based on ranking results Information.

In one more embodiment of the present invention, the priority of the corresponding music of above-mentioned each comment information of acquisition includes: basis The history music interaction behavioral data of the user, determines the comprehensive score of the corresponding music of each comment information, the use The history music interaction behavioral data at family includes following at least one: the user collects the behavioral data of music, the user Thumb up the behavioral data of music, the user plays the behavioral data of music, the behavioral data of the user comment music, described User shares the behavioral data of music or the behavioral data of user creation music.

In the second aspect of embodiment of the present invention, provide a kind of voice interaction device, comprising: receiving module, With module and output module.Receiving module is used to receive the voice messaging of user's input, and the voice messaging is converted to sentence Text.Matching module from preset music commentary library for obtaining the comment information to match with the statement text.Export mould Block is for exporting the comment information as the response for the voice messaging.

In one embodiment of the invention, above-mentioned apparatus further includes playing module, described in exporting in output module After comment information is as the response for the voice messaging, music corresponding with the comment information is played.

In another embodiment of the invention, above-mentioned apparatus further include: the first preprocessing module and the second preprocessing module. First preprocessing module is used to obtain the comment to match with the statement text from preset music commentary library in matching module Before information, a plurality of comment information about music for meeting preset condition is obtained, is constructed by acquired a plurality of comment information Preset music commentary library.Second preprocessing module for identification in preset music commentary library each comment information focus information and meaning Figure information.Then matching module is used for focus information and intent information based on each comment information in the preset music commentary library, The comment information to match with the statement text is obtained from preset music commentary library.

In another embodiment of the present invention, the first preprocessing module is specifically used for being handed over according to the history music of the user Mutual behavioral data obtains the corresponding comment information of individualized music of the user, wherein the individualized music packet of the user Include following at least one: the music or described that the music of user's collection, the music of user creation, the user like The music that user plays；And/or obtain the corresponding comment information of current popularization music；And/or acquisition thumbs up number More than the comment information of first threshold.

In one more embodiment of the present invention, matching module includes: identification submodule, the first matched sub-block and second Sub-module.Identify the focus information and intent information of the submodule statement text for identification.First matched sub-block is used In the focus information of the statement text is matched with the focus information of each comment information in the preset music commentary library, Filter out the matched comment information of focus.Second matched sub-block is used for the intent information of the statement text and the focus The intent information of matched comment information is matched, and is filtered out focus matching and is intended to matched comment information.

In one more embodiment of the present invention, the second preprocessing module is specifically used for believing based on tag library from each comment The label for characterizing corresponding focus information is extracted in breath, is extracted and is used for from each comment information based on intent classifier library Characterize the intention clause of corresponding intent information.Identify that submodule is specifically used for based on the tag library from the statement text The label for characterizing corresponding focus information is extracted, is extracted from the statement text for characterizing phase based on intent classifier library The intention clause for the intent information answered.First matched sub-block is specifically used for the label of the statement text and each comment The label of information is matched, and is determined as the matched comment information of focus when matching degree is more than second threshold.And second Sub-module be specifically used for by the intention clause of the intentions clause of the statement text and the matched comment information of the focus into Row matching is determined as focus matching when matching degree is more than third threshold value and is intended to matched comment information.

In one more embodiment of the present invention, matching module further includes acquisition submodule and sorting sub-module.Obtain submodule Block is used to obtain belonging to the corresponding music of each comment information when filtering out a plurality of focus matching and being intended to matched comment information The priority of classification.Sorting sub-module is ranked up the comment for the priority based on the music, based on sequence knot Fruit chooses a comment information.

In one more embodiment of the present invention, acquisition submodule is specifically used for the history music interaction row according to the user For data, the comprehensive score of the corresponding music of each comment information is determined, wherein the history music interaction behavior of the user Data include following at least one: the user collects the behavioral data of music, the user thumbs up the behavioral data of music, institute State user play the behavioral data of music, the behavioral data of the user comment music, the user share music behavior number According to or the user create music behavioral data.

In the third aspect of embodiment of the present invention, a kind of medium is provided, is stored with computer executable instructions, institute Instruction is stated when being executed by processor for realizing voice interactive method described in any one of above-described embodiment.

In the fourth aspect of embodiment of the present invention, provide a kind of calculating equipment, comprising: memory, processor and Store the executable instruction that can be run on a memory and on a processor, realization when processor executes instruction: above-described embodiment Any one of described in voice interactive method.

The voice interactive method and device of embodiment according to the present invention, from numerous existing comment informations about music The comment information that the voice messaging of middle selection and active user's input matches is write in advance without developer and is answered as response Content is answered, a large amount of reductions write upper input manpower in response content.And due to comment information by real user be based on to phase It answers expressed by the true emotional of music, using the comment information to match as the response of voice messaging, can cause currently to input The emotional resonance of the user of voice messaging meets user feeling demand.

Detailed description of the invention

The following detailed description is read with reference to the accompanying drawings, above-mentioned and other mesh of exemplary embodiment of the invention , feature and advantage will become prone to understand.In the accompanying drawings, if showing by way of example rather than limitation of the invention Dry embodiment, in which:

Fig. 1 schematically shows the application scenarios of the voice interactive method and its device of embodiment according to the present invention；

Fig. 2 schematically shows the flow charts of voice interactive method according to an embodiment of the invention；

Fig. 3 schematically shows the flow chart of voice interactive method in accordance with another embodiment of the present invention；

Fig. 4 A schematically shows the schematic diagram in preset music commentary library according to an embodiment of the invention；

Fig. 4 B schematically shows the schematic diagram of interactive voice process according to an embodiment of the invention；

Fig. 5 A schematically shows the block diagram of voice interaction device according to an embodiment of the invention；

Fig. 5 B schematically shows the block diagram of voice interaction device in accordance with another embodiment of the present invention；

Fig. 6 schematically shows the block diagram of matching module according to an embodiment of the invention；

Fig. 7 schematically shows the schematic diagram of the computer readable storage medium product of embodiment according to the present invention； And

Fig. 8 schematically shows the block diagram of the calculating equipment of embodiment according to the present invention.

In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.

Specific embodiment

The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any Mode limits the scope of the invention.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and energy It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.

One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method Or computer program product.Therefore, the present disclosure may be embodied in the following forms, it may be assumed that complete hardware, complete software The form that (including firmware, resident software, microcode etc.) or hardware and software combine.

Embodiment according to the present invention proposes a kind of voice interactive method, medium, device and calculates equipment.

Herein, it is to be understood that related term includes: voice messaging, statement text, preset music commentary Library, comment information etc..Wherein, voice messaging is the audio data based on SoundRec, is phase by the Content Transformation of voice messaging The text answered to get arrive statement text.Comment information refers to music commentary information, and any user can comment any music By obtaining the comment information accordingly about music.Preset music commentary library is constructed by multiple comment informations about music.This Outside, any number of elements in attached drawing be used to example rather than limit and it is any name be only used for distinguishing, without appoint What limitation.

Below with reference to several representative embodiments of the invention, the principle and spirit of the present invention are explained in detail.

Summary of the invention

During realizing disclosure design, inventor's discovery: existing interactive voice scheme needs developer pre- Machine response content is first write, when user's input voice information, voice messaging is converted into text, selection and text phase The response content matched is as output.There are the following problems for the program: on the one hand, writing for response content needs to put into a large amount of people Power, inefficiency, on the other hand, the response content write in advance is mechanical stiff, is unable to satisfy user feeling demand.

For this purpose, the embodiment of the invention provides a kind of voice interactive method and device, this method comprises: receiving user's input Voice messaging, the voice messaging is converted into statement text；It is obtained and the statement text from preset music commentary library The comment information to match；And the comment information is exported as the response for the voice messaging.The embodiment of the present disclosure The comment to match with the voice messaging of active user's input is selected from numerous existing comment informations about music to believe Breath is used as response, writes response content in advance without developer, and a large amount of reductions write upper input manpower in response content. And as comment information as real user based on expressed by the true emotional to corresponding music, using the comment information to match as The response of voice messaging can cause the emotional resonance of the user of current input voice information, meet user feeling demand.

After introduced the basic principles of the present invention, lower mask body introduces various non-limiting embodiment party of the invention Formula.

Application scenarios overview

The voice interactive method of the embodiment of the present invention and its application scenarios of device are elaborated referring initially to Fig. 1.

Fig. 1 schematically shows the application scenarios of the voice interactive method and its device of embodiment according to the present invention. As shown in Figure 1, including electronic equipment 110 and user 120 in the application scenarios, electronic equipment 110 has voice interactive function, with User 120 carries out interactive voice.Electronic equipment 110 is intelligent sound box, in other embodiments, electronic equipment in the present embodiment 110 can be the arbitrary equipment with voice interactive function, such as smart phone, computer, smartwatch, various intelligent appliances Deng herein with no restrictions.

Electronic equipment 110 acquires the voice messaging that user 120 inputs by microphone, and makes phase according to the voice messaging The response answered executes corresponding task.For example, 120 input voice information of user " today, how is weather ", electronic equipment 110 Inquiry weather simultaneously responds " -4 degrees Celsius of minimum temperature, 6 degrees Celsius of maximum temperature, clear to cloudy " according to query result, or Person, 120 input voice information of user " several points now ", electronic equipment 110 are inquired current time and are made according to query result Response " 9 points 05 minute ".In the two examples, the voice messaging that user 120 inputs all is to have clear answer, electronic equipment 110 Clear answer can directly be inquired in response, however in daily life, the voice that in most cases user 120 inputs Information is not clear answer, such as 120 input voice information of user " today is good sad ", and electronic equipment 110 is answered at this time When using the content that is best suitable for the current psychological needs of user 120 as response.

Illustrative methods

Below with reference to the application scenarios of Fig. 1, the language of illustrative embodiments according to the present invention is described with reference to Fig. 2~Fig. 4 B Sound exchange method.It should be noted which is shown only for the purpose of facilitating an understanding of the spirit and principles of the present invention for above-mentioned application scenarios, Embodiments of the present invention are not limited in this respect.On the contrary, embodiments of the present invention can be applied to applicable appoint What scene.

Fig. 2 schematically shows the flow charts of voice interactive method according to an embodiment of the invention.Such as Fig. 2 institute Show, this method includes following operation:

S201 is operated, the voice messaging of user's input is received, the voice messaging is converted into statement text.

S202 is operated, the comment information to match with the statement text is obtained from preset music commentary library.

S203 is operated, exports the comment information as the response for the voice messaging.

As it can be seen that the voice messaging that method shown in Fig. 2 inputs user, believes from numerous existing comments about music The comment information to match with the voice messaging of active user's input is selected in breath as response, is write in advance without developer Response content is write, a large amount of reductions write upper input manpower in response content.And since comment information is based on by real user Expressed by true emotional to corresponding music, using the comment information to match as the response of voice messaging, it can cause current The emotional resonance of the user of input voice information meets user feeling demand.

Fig. 3 schematically shows the flow chart of voice interactive method in accordance with another embodiment of the present invention.Such as Fig. 3 institute Show, this method includes operation S201~S204, wherein operation S201~S203 is identical as each operation correspondence shown in Fig. 2, herein It repeats no more.

S204 is operated, music corresponding with the comment information is played.

In specific example, it is corresponding can directly to play the comment information after output comment information is as response Music, can also be in output comment information as in the predetermined time after response, the broadcasting when meeting preset trigger condition The corresponding music of the comment information.It is each that the corresponding music of comment information can be song, absolute music, cross-talk, speech, broadcast etc. Kind audio file, herein with no restrictions.

In the embodiment of the present disclosure, for any music, user can evaluate the music, obtain comment information, Therefore any music corresponds to comment information of one or more users about the music, includes one in preset music commentary library Or the comment information of multiple music.After the voice messaging for receiving user's input, which is converted into statement text, The comment information to match with the statement text is obtained from preset music commentary library, the comment information got can be expressed The mood similar with the voice messaging of input exports the comment information as response, can cause currently to input voice letter naturally The emotional resonance of the user of breath.Further, the corresponding music of the comment information is also played after exporting the comment information, due to Mood expressed by the comment information is caused by the corresponding music of the comment information, illustrates that the music adapts to the feelings Thread, the user of Xiang Dangqian input voice information play the music, can build the atmosphere for adapting to the user emotion, so that voice Interactive process is more natural, rich in emotion, is no longer stiff mechanical human-computer interaction in the prior art.

In embodiment of the disclosure, it is obtained and the statement text phase from preset music commentary library in operation S202 Before the comment information matched, method shown in Fig. 2 or Fig. 3 can also include some preprocessing process: acquisition meets preset condition The a plurality of comment information about music constructs preset music commentary library by acquired a plurality of comment information；Identify preset music Comment on the focus information and intent information of each comment information in library.Above-mentioned preprocessing process constructs preset music commentary library, this is pre- Setting music commentary library includes a plurality of comment information about music for meeting preset condition, and it also requires to preset music commentary Comment information in library carries out identifying processing, identifies the focus information and intent information of each comment information.Wherein, comment information Focus information refers to most important information expressed by comment information, is that the promoter of comment information wishes the viewing of comment information The part that person pays attention to when seeing the comment information, each comment information may include one or more focus informations, the focus Information can be characterized by one or more labels.The intent information of comment information refers to the promoter of comment information by being somebody's turn to do The operation or purpose for the desired realization that comment information gives expression to.

On this basis, operation S202 obtains the comment to match with the statement text from preset music commentary library and believes Breath includes: focus information and intent information based on each comment information in the preset music commentary library, is obtained and the sentence The comment information that text matches.Scheme according to the present embodiment, be based on focus information and intent information, to comment information with it is defeated The statement text for entering voice messaging is matched, since focus information and intent information be able to reflect the mood of real user, thinks The subjective factors such as method, viewpoint can effectively get mood similar with input voice information expression based on these two types of information, think Method, the comment information of viewpoint, the user of current input voice information is farthest adapted to from psychological levels.

Specifically, as an optional embodiment, above-mentioned acquisition meets a plurality of comment about music of preset condition Information includes: the history music interaction behavioral data according to the user, and the individualized music for obtaining the user corresponding is commented By information, wherein the individualized music of the user includes following at least one: the music of user's collection, the user The music that the music or the user that the music of creation, the user like play.And/or obtain current popularization music Corresponding comment information.And/or obtain the comment information for thumbing up that number is more than first threshold.

Scheme according to the present embodiment, preset music commentary library may include the individual character of the current user for carrying out interactive voice Change the comment information of music, the individualized music of user reflects the music preferences of user, from the comment of the music of user preference The comment information to match with the voice messaging of user input is obtained in information as response, and further plays corresponding sound It is happy, it is easier to cause the sympathetic response of the user.Preset music commentary library also may include the corresponding comment information of current popularization music, The comment information to match with the voice messaging of user's input is obtained from the current comment information for promoting music as response, and Corresponding popularization music is further played, not only can satisfy the interactive voice demand of user, moreover it is possible to recommend corresponding promote to user Music.Preset music commentary library also may include thumbing up the comment information that number is more than first threshold, i.e., popular comment information, popular Comment information is comment information that is representative, can causing most people sympathetic response, from popular comment information obtain with The comment information that the voice messaging of user's input matches further plays corresponding music as response, it is easier to cause this The sympathetic response of user.

In one embodiment of the present disclosure, the above-mentioned focus letter based on each comment information in the preset music commentary library Breath and intent information, obtaining the comment information to match with the statement text includes: to identify that the focus of the statement text is believed Breath and intent information；By the focus information of each comment information in the focus information of the statement text and preset music commentary library into Row matching, filters out the matched comment information of focus；And it is the intent information of the statement text and the focus is matched The intent information of comment information is matched, and is filtered out focus matching and is intended to matched comment information.

Wherein, the focus information of statement text refers to most important information expressed by the statement text, is sentence text The promoter of this corresponding voice messaging wishes the part that recipient pays attention to, each statement text may include one or more burnt Point information, the focus information can be characterized by one or more labels.The intent information of statement text refers to sentence text The operation or purpose for the desired realization that the promoter of this corresponding voice messaging gives expression to.The above process first carries out focus matching, It filters out with the matched comment information of statement text focus, screens out a large amount of unrelated comment informations, then carry out intention matching, screen Matched comment information is matched and be intended to statement text focus out, improves matching efficiency.

Specifically, as an optional embodiment, the focus of each comment information in the above-mentioned preset music commentary library of identification Information and intent information include: the mark extracted from each comment information based on tag library for characterizing corresponding focus information Label, the intention clause for characterizing corresponding intent information is extracted based on intent classifier library from each comment information.It is above-mentioned It identifies the focus information of the statement text and intent information includes: to extract use from the statement text based on the tag library In the label for characterizing corresponding focus information, extracted from the statement text for characterizing corresponding meaning based on intent classifier library The intention clause of figure information.The coke of each comment information in the above-mentioned focus information by the statement text and music commentary library It includes: to match the label of the statement text with the label of each comment information that point information, which carries out matching, works as matching Degree is determined as the matched comment information of focus when being more than second threshold.And the above-mentioned intent information by the statement text with It includes: by the intention clause of the statement text and the coke that the intent information of the matched comment information of focus, which carries out matching, The intention clause of the matched comment information of point is matched, and is determined as focus matching when matching degree is more than third threshold value and is intended to Matched comment information.

Wherein it is possible to tag library and intent classifier library used in identification focus information and intent information are preset, and It constantly updates and expands in use.The process of the focus information of above-mentioned identification comment information and the focus of identification statement text The process of information uses identical tag library, so that the extraction standard of focus information is consistent, guarantees the matched standard of subsequent focus True property.And it is identical to identify that the process of the intent information of comment information and the process of the intent information of identification statement text use Intent classifier library guarantees the subsequent matched accuracy of intention so that the extraction standard of intent information is consistent.

In another embodiment of the disclosure, can in advance in the preset music commentary library of nonrecognition comment information focus Information and intent information operate S202 from preset music commentary after the voice messaging for inputting user is converted to statement text It may include: the focus information and meaning for identifying the statement text that the comment information to match with the statement text is obtained in library Figure information；The focus information of the statement text is matched with each comment information in preset music commentary library, screening is discharged of the coke The matched comment information of point；And by the intention of the intent information of the statement text and the matched comment information of the focus Information is matched, and is filtered out focus matching and is intended to matched comment information.

When filtering out a focus matching and being intended to matched comment information, directly exports comment information conduct and answer Answer, when filtering out the matching of a plurality of focus and being intended to matched comment information, as an optional embodiment, operate S202 from The comment information to match with the statement text is obtained in preset music commentary library further include: match when filtering out a plurality of focus And when being intended to matched comment information, the priority of the corresponding music of each comment information is obtained；Priority based on the music The comment is ranked up, chooses a comment information based on ranking results.

Wherein optionally, the priority for obtaining the corresponding music of each comment information includes: going through according to the user History music interaction behavioral data determines the comprehensive score of the corresponding music of each comment information, the history music of the user Interbehavior data include following at least one: the behavioral data of user's collection music, the user thumb up the row of music The behavioral data of music, the behavioral data of the user comment music, user sharing music are played for data, the user Behavioral data or the user create music behavioral data.

Below with reference to Fig. 4 A~Fig. 4 B, Fig. 2~method shown in Fig. 3 is illustrated in conjunction with specific embodiments:

In the present embodiment, user A and intelligent sound box carry out interactive voice, and preset sound is first constructed before interactive voice starts Music comment opinion library.

Fig. 4 A schematically shows the schematic diagram in preset music commentary library according to an embodiment of the invention.

It as shown in Figure 4 A, include: the corresponding comment information of individualized music of user A in preset music commentary library, currently The comment information of popularization music in predetermined time, and popular comment information.Wherein, the individualized music of user A refers to use Family A such as is collected, is created, being liked, being shared, being played at the music of positive music interaction behavior, the popularization in the current predetermined time Music includes that hot music in the current predetermined time, the music for needing to promote with partner agreement etc. are one or more, popular Comment information, which refers to, thumbs up the comment information that number is more than or equal to 500.

According to the history interactive voice content of user A, preliminary basic label is filtered out, such as basic label includes: " lonely Solely ", " insomnia ", " sad ", " memory ", " sorrow ", " anxiety " etc. construct basic label library by these basic labels.Based on this Basic label in basic label library forms clause, carries out semantic clause to the standby comment information in preset music commentary library and takes out It takes, so that each comment information is carried out basic label classification, and concludes into intent classifier clause, form intent classifier library, root According to the clause expandtabs library extracted.The extraction of sentence clause is carried out to each comment information again based on the tag library after expansion, it will Each comment information is updated labeling, concludes into intent classifier library, and again according to the clause expandtabs extracted Library.And so on, continuous cyclic extension is reached final tag library and intent classifier library, and is obtained based on final tag library For characterizing one or more labels of the focus information of each comment information, obtained based on final intent classifier library for characterizing The intention clause of the intent information of each comment information.

Fig. 4 B schematically shows the schematic diagram of interactive voice process according to an embodiment of the invention.

When intelligent sound box receives the voice messaging of user A input, pass through speech recognition technology (ASR, Automatic Speech Recognition) voice messaging is converted into statement text, in this example, the voice messaging of user A input is corresponding Statement text be " I can't fall asleep ".It is right based on natural language understanding (NLU, Natural Language Understanding) The statement text carries out semantic analysis, obtains the intention clause for characterizing the intent information of the statement text: { I, chats, comfort }, Obtain the multiple labels for characterizing the focus information of the statement text: { insomnia, lonely }.

Comment in preset music commentary library shown in focus information and Fig. 4 A based on preset algorithm computing statement text The similarity of information filters out the comment information that similarity is higher than second threshold.In this example, calculated using recommender system item_cf The similarity of the corresponding label of method computing statement text " I can't fall asleep " { insomnia, lonely } label corresponding with each comment information, Filter out 5 comment informations in preset music commentary library shown in Fig. 4 B: comment 1, comment 2, comment 3, comment 4, comment 5. Wherein, 1 corresponding intention clause of comment is { chat }, and 2 corresponding intention clause of comment are { chat, comfort }, and comment 3 is corresponding Being intended to clause is { treatment }, and 4 corresponding intention clause of comment are { video display }, and 5 corresponding intention clause of comment are { chat, peace Console }.

The intent information of statement text intention clause corresponding with the comment information filtered out is matched, is further sieved Select the comment information that matching degree is more than third threshold value.In this example, by the corresponding intention clause of statement text " I can't fall asleep " { I, chats, comfort } intention clause corresponding with 5 comment informations filtered out above is matched, and intention clause is filtered out Comment information including { chat, comfort }, that is, comment on 2 and comment 5.

According to the history music interaction behavioral data of user A, the comprehensive of the corresponding music of each comment information filtered out is determined Scoring is closed, the corresponding comment information of the highest music of comprehensive score is chosen.Further, it is also contemplated that each comment information selected Other speciality of corresponding music in addition to user preference, as liked according to popularization music > user in the current predetermined time Music > user plays more than each comment of the priority orders of music > user's collection music of the 4th threshold value to screening Information is ranked up, and chooses a comment of highest priority.In this example, to the comment 2 and comment 5 filtered out according to user A Music preferences sequence, 5 corresponding music of comment are the music liked of user A, have higher priority, finishing screen, which is selected, to be commented By 5, corresponding music are as follows: song " late into the night dining room ".

Intelligent sound box by speech synthesis technique (TTS, Text to speech) by comment 5 be converted to voice messaging " according to Say, at night restless people, be because waking up in others' dream " and export, " I can't fall asleep " as user A input is answered It answers.And it is possible to 5 corresponding music " late into the night dining room " of comment directly be played, alternatively, playing 5 pairs of comment after user A confirmation The music " late into the night dining room " answered.Since the music is the favorite song of user A, and comment on 5 be other real users be based on with What the similar mood of " I can't fall asleep " of family A write, when intelligent sound box output comment 5 is used as response and plays corresponding song, User A emotion and sympathetic response will be brought, is the human-computer interaction process for more having temperature.

The embodiment of the present disclosure can reduce human input cost needed for interactive voice, make full use of existing true use Write the comment information of generation in family；By choosing the comment information to match with the statement text of input as response, Neng Gouti The emotional interaction in interactive voice is risen, the question-response formula dialogue of machine is different from, the content not only answered meets user's heart Reason demand, the music for matching broadcasting also help to build atmosphere, and user is allowed to feel emotional resonance.

Exemplary means

After describing the method for exemplary embodiment of the invention, next, showing with reference to Fig. 5 A~Fig. 6 the present invention The voice interaction device of example property embodiment is described in detail.

Fig. 5 A schematically shows the block diagram of voice interaction device according to an embodiment of the invention.Such as Fig. 5 A institute Show, which includes:

Receiving module 510 is used to receive the voice messaging of user's input, and the voice messaging is converted to statement text.

Matching module 520 from preset music commentary library for obtaining the comment information to match with the statement text.

Output module 530 is for exporting the comment information as the response for the voice messaging.

As it can be seen that the voice messaging that device shown in Fig. 5 A inputs user, from numerous existing comments about music The comment information to match with the voice messaging of active user's input is selected in information as response, and it is preparatory to be not necessarily to developer Response content is write, a large amount of reductions write upper input manpower in response content.And since comment information is by real user base Expressed by the true emotional to corresponding music, using the comment information to match as the response of voice messaging, it can cause to work as The emotional resonance of the user of preceding input voice information meets user feeling demand.

Fig. 5 B schematically shows the block diagram of voice interaction device in accordance with another embodiment of the present invention.Such as Fig. 5 B institute Show, which includes: receiving module 510, matching module 520, output module 530, playing module 540, One preprocessing module 550 and the second preprocessing module 560.Wherein, receiving module 510, matching module 520 and output module 530 Hereinbefore it is stated that, duplicate part repeats no more.

Playing module 540 is used to export the comment information as answering for the voice messaging in the output module After answering, music corresponding with the comment information is played.

In one embodiment of the present disclosure, it is obtained from preset music commentary library in matching module 520 and the sentence Before the comment information that text matches, the first preprocessing module 550 meets a plurality of about music of preset condition for obtaining Comment information, preset music commentary library is constructed by acquired a plurality of comment information.Second preprocessing module 560 is for identification The focus information and intent information of each comment information in preset music commentary library.

On this basis, matching module 520 is obtained from preset music commentary library and is commented with what the statement text matched By information specifically: focus information and intent information based on each comment information in the preset music commentary library, from preset sound The comment information to match with the statement text is obtained in music comment opinion library.

Wherein specifically, as an optional embodiment, the first preprocessing module 550 is used for going through according to the user History music interaction behavioral data obtains the corresponding comment information of individualized music of the user, wherein the individual character of the user It includes following at least one for changing music: the sound that the music of user's collection, the music of user creation, the user like The music that the happy or described user plays；And/or obtain the corresponding comment information of current popularization music；And/or it obtains Take the comment information for thumbing up that number is more than first threshold.

Fig. 6 schematically shows the block diagram of matching module according to an embodiment of the invention.As shown in fig. 6, this It include: identification submodule 521, the first matched sub-block 522, the second matched sub-block 523, acquisition submodule 524 with module 520 With sorting sub-module 525.

In one embodiment of the present disclosure, identification submodule 521 for identification the focus information of the statement text and Intent information.First matched sub-block 522 in the focus information of the statement text and preset music commentary library for will respectively comment It is matched by the focus information of information, filters out the matched comment information of focus.And second matched sub-block 523 be used for The intent information of the statement text is matched with the intent information of the matched comment information of the focus, filters out focus It matches and is intended to matched comment information.

Wherein, as an optional embodiment, the second preprocessing module 560 is specifically used for based on tag library from described each The label for characterizing corresponding focus information is extracted in comment information, is mentioned from each comment information based on intent classifier library It takes in the intention clause for characterizing corresponding intent information.Identify that submodule 521 is specifically used for based on the tag library from described The label for characterizing corresponding focus information is extracted in statement text, is extracted from the statement text based on intent classifier library For characterizing the intention clause of corresponding intent information.First matched sub-block 522 is specifically used for the mark of the statement text Label are matched with the label of each comment information, are determined as the matched comment letter of focus when matching degree is more than second threshold Breath.And second matched sub-block 523 be specifically used for the intention clause of the statement text and the matched comment of the focus The intention clause of information is matched, and is determined as focus matching when matching degree is more than third threshold value and is intended to matched comment letter Breath.

In one embodiment of the present disclosure, acquisition submodule 524, which is used to work as, filters out a plurality of focus matching and intention When the comment information matched, the priority of the corresponding music generic of each comment information is obtained.Sorting sub-module 525 is for being based on The priority of the music is ranked up the comment, chooses a comment information based on ranking results.

Wherein optionally, acquisition submodule 524 is specifically used for the history music interaction behavioral data according to the user, really The comprehensive score of the corresponding music of fixed each comment information, wherein the history music interaction behavioral data of the user includes Following at least one: the behavioral data of user's collection music, the user thumbs up the behavioral data of music, the user broadcasts The behavioral data of the behavioral data, the user comment music put the music on, the user share the behavioral data or described of music The behavioral data of user's creation music.

It should be noted that in device section Example each module/unit/subelement etc. embodiment, the skill of solution Art problem, the function of realization and the technical effect reached respectively with the implementation of corresponding step each in method section Example Mode, the technical issues of solving, the function of realization and the technical effect that reaches are same or like, and details are not described herein.

Exemplary media

After describing the method and apparatus of exemplary embodiment of the invention, next, to the exemplary reality of the present invention The medium for applying the realization voice interactive method of mode is introduced.

The embodiment of the invention provides a kind of media, are stored with computer executable instructions, above-metioned instruction is by processor For realizing voice interactive method described in any one of above method embodiment when execution.

In some possible embodiments, various aspects of the invention are also implemented as a kind of shape of program product Formula comprising program code, when described program product is run on the computing device, said program code is for making the calculating Equipment executes described in above-mentioned " illustrative methods " part of this specification the language of various illustrative embodiments according to the present invention Step in sound exchange method can also execute for example, the calculating equipment can execute operating procedure as shown in Figure 2 Operating procedure as shown in Figure 3.

Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, red The system of outside line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

Fig. 7 schematically shows the schematic diagram of the computer readable storage medium product of embodiment according to the present invention, As shown in fig. 7, describing the program product 70 of the realization voice interactive method of embodiment according to the present invention, can use Portable compact disc read only memory (CD-ROM) and including program code, and can be on calculating equipment, such as PC Operation.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing, which can be, any to be included or deposits The tangible medium of program is stored up, which can be commanded execution system, device or device use or in connection.

Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying Readable program code.The data-signal of this propagation can take various forms, including --- but being not limited to --- electromagnetism letter Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Program in connection.

The program code for including on readable medium can transmit with any suitable medium, including --- but being not limited to --- Wirelessly, wired, optical cable, RF etc. or above-mentioned any appropriate combination.

The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language --- and such as Java, C++ etc. further include routine Procedural programming language --- such as " C ", language or similar programming language.Program code can fully exist It executes in user calculating equipment, partly execute on a user device, executing, as an independent software package partially in user Upper side point is calculated to execute or execute in remote computing device or server completely on a remote computing.It is relating to And in the situation of remote computing device, remote computing device can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN) one is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize internet Service provider is connected by internet).

Exemplary computer device

After method, medium and the device for describing exemplary embodiment of the invention, next, introducing according to this hair The calculating equipment of the realization voice interactive method of bright another exemplary embodiment.

The embodiment of the invention also provides a kind of calculating equipment, comprising: memory, processor and storage are on a memory simultaneously The executable instruction that can be run on a processor, the processor are realized any in above method embodiment when executing described instruction Voice interactive method described in.

Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".

In some possible embodiments, the calculating equipment according to the present invention for realizing voice interactive method can be at least Including at least one processing unit and at least one storage unit.Wherein, the storage unit is stored with program code, when When said program code is executed by the processing unit, so as to execute this specification above-mentioned " illustrative methods " for the processing unit Operating procedure in the data processing method of various illustrative embodiments according to the present invention described in part.For example, described Processing unit can execute operating procedure as shown in Figure 2, can also execute operating procedure as shown in Figure 3.

The calculating equipment of the realization voice interactive method of this embodiment according to the present invention is described referring to Fig. 8 80.Calculating equipment 80 as shown in Figure 8 is only an example, should not function to the embodiment of the present invention and use scope bring Any restrictions.

It is showed in the form of universal computing device as shown in figure 8, calculating equipment 80.Calculate equipment 80 component may include But it is not limited to: at least one above-mentioned processing unit 801, at least one above-mentioned storage unit 802, connection different system components (packet Include storage unit 802 and processing unit 801) bus 803.

Bus 803 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.

Storage unit 802 may include the readable medium of form of volatile memory, such as random access memory (RAM) 8021 and/or cache memory 8022, it can further include read-only memory (ROM) 8023.

Storage unit 802 can also include program/utility with one group of (at least one) program module 8024 8025, such program module 8024 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.

Calculating equipment 80 can also be with one or more external equipment 804 (such as keyboard, sensing equipment, bluetooth equipment etc.) Communication can also be enabled a user to communicate with the equipment that calculating equipment 80 interacts with one or more, and/or be set with to calculate The standby 80 any equipment (such as router, modem etc.) that can be communicated with one or more of the other calculating equipment are led to Letter.This communication can be carried out by input/output (I/0) interface 805.Also, calculating equipment 80 can also be suitable by network Orchestration 806 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as because of spy Net) communication.As shown, network adapter 806 is communicated by bus 803 with the other modules for calculating equipment 80.It should be understood that Although not shown in the drawings, other hardware and/or software module can be used in conjunction with equipment 80 is calculated, including but not limited to: micro- generation Code, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup are deposited Storage system etc..

It should be noted that although being referred to several units/modules or son list of data processing equipment in the above detailed description Member/module, but it is this division be only exemplary it is not enforceable.In fact, embodiment according to the present invention, on The feature and function of two or more units/modules of text description can embody in a units/modules.Conversely, above The feature and function of one units/modules of description can be to be embodied by multiple units/modules with further division.

In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.

Although detailed description of the preferred embodimentsthe spirit and principles of the present invention are described by reference to several, it should be appreciated that, this It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects Combination is benefited to carry out, this to divide the convenience merely to statement.The present invention is directed to cover appended claims spirit and Included various modifications and equivalent arrangements in range.

Claims

1. a kind of voice interactive method, comprising:

The voice messaging for receiving user's input, is converted to statement text for the voice messaging；

The comment information to match with the statement text is obtained from preset music commentary library；And

The comment information is exported as the response for the voice messaging.

2. according to the method described in claim 1, wherein, being used as in the output comment information for the voice messaging Response after, the method also includes: play corresponding with comment information music.

3. according to the method described in claim 1, wherein:

Described before the comment information that acquisition matches with the statement text in preset music commentary library, the method is also Include:

The a plurality of comment information about music for meeting preset condition is obtained, preset sound is constructed by acquired a plurality of comment information Music comment opinion library；And

Identify the focus information and intent information of each comment information in preset music commentary library；

Described obtain from preset music commentary library with the comment information that the statement text matches includes: based on described preset The focus information and intent information of each comment information in music commentary library obtain the comment to match with the statement text and believe Breath.

4. according to the method described in claim 3, wherein, a plurality of comment about music that the acquisition meets preset condition is believed Breath includes:

According to the history music interaction behavioral data of the user, the corresponding comment letter of individualized music of the user is obtained Breath, wherein the individualized music of the user includes following at least one: the music of user's collection, user creation The music that plays of music, the music liked of the user or the user；And/or

Obtain the corresponding comment information of current popularization music；And/or

Obtain the comment information for thumbing up that number is more than first threshold.

5. according to the method described in claim 3, wherein, the coke based on each comment information in the preset music commentary library Point information and intent information, obtaining the comment information to match with the statement text includes:

Identify the focus information and intent information of the statement text；

By the focus information progress of each comment information in the focus information of the statement text and the preset music commentary library Match, filters out the matched comment information of focus；And

The intent information of the statement text is matched with the intent information of the matched comment information of the focus, is filtered out Focus matches and is intended to matched comment information.

6. according to the method described in claim 5, wherein:

The focus information of each comment information and intent information include: based on tag library from institute in the preset music commentary library of the identification The label extracted in each comment information for characterizing corresponding focus information is stated, is based on intent classifier library from each comment information The middle intention clause extracted for characterizing corresponding intent information；

The focus information and intent information of the identification statement text include: based on the tag library from the statement text The middle label extracted for characterizing corresponding focus information is extracted from the statement text for characterizing based on intent classifier library The intention clause of corresponding intent information；

The focus information progress of each comment information in the focus information by the statement text and music commentary library With including: to match the label of the statement text with the label of each comment information, when matching degree is more than the second threshold It is determined as focus matched comment information when value；And

The intent information of the intent information by the statement text and the matched comment information of focus carries out matching packet It includes: the intention clause of the statement text being matched with the intention clause of the matched comment information of the focus, works as matching Degree is determined as focus matching and is intended to matched comment information when being more than third threshold value.

7. according to the method described in claim 5, wherein, described obtain from music commentary library matches with the statement text Comment information further include:

When filtering out a plurality of focus matching and being intended to matched comment information, the preferential of the corresponding music of each comment information is obtained Grade；And

Priority based on the music is ranked up the comment, chooses a comment information based on ranking results.

8. a kind of voice interaction device, comprising:

The voice messaging is converted to statement text for receiving the voice messaging of user's input by receiving module；

Matching module, for obtaining the comment information to match with the statement text from preset music commentary library；And

Output module, for exporting the comment information as the response for the voice messaging.

9. a kind of medium, be stored with computer executable instructions, described instruction when being executed by processor for realizing:

Voice interactive method as described in any one of claims 1 to 7.

10. a kind of calculating equipment, comprising: memory, processor and storage on a memory and can run on a processor can It executes instruction, the processor is realized when executing described instruction:

Voice interactive method as described in any one of claims 1 to 7.