CN108711423A

CN108711423A - Intelligent sound interacts implementation method, device, computer equipment and storage medium

Info

Publication number: CN108711423A
Application number: CN201810294041.4A
Authority: CN
Inventors: 杨鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2018-10-26

Abstract

The invention discloses intelligent sound interaction implementation method, device, computer equipment and storage medium, wherein methods to include：It is that user carries out the query inputted during interactive voice with intelligent sound equipment to obtain user query, the query from intelligent sound equipment；According to the conversational style of the user got, the corresponding response voices of the query are generated, and response voice is returned into intelligent sound equipment and is played out.The present invention program can generate response voice based on the conversational style of user, to realize the personalized response for being directed to different user so that interactive voice is more perceptual, anthropomorphic, intelligent, and the interactive experience etc. for being more in line with human conversation custom is brought for user.

Description

Intelligent sound interacts implementation method, device, computer equipment and storage medium

【Technical field】

The present invention relates to Computer Applied Technologies, more particularly to intelligent sound interaction implementation method, device, computer equipment And storage medium.

【Background technology】

Intelligent sound interaction is the interactive mode of new generation inputted based on voice, can be obtained by feedback knot by speaking Fruit.With the development of technology with it is perfect, intelligent sound equipment is more and more universal, is more and more widely used.

Although current interactive voice dialogic operation improves tone color etc., from surface by artificial editor's response format in advance It is upper to make dialogue closer to human conversation, have certain affine sense.The solid mechanical but dialogue remains unchanged, lacks human interest, lacks Few " intelligence ", can only answer user according to the preset fixed policy in high in the clouds, and apparent with human conversation custom gap, user does not have Substitute into sense, can only meet it is simple you ask me answers, cannot be satisfied more advanced human-machine intelligence's voice dialogue requirement.

【Invention content】

In view of this, the present invention provides intelligent sound interaction implementation method, device, computer equipment and storage mediums.

Specific technical solution is as follows：

A kind of intelligent sound interaction implementation method, including：

It is that user carries out with the intelligent sound equipment to obtain user query, the query from intelligent sound equipment The query inputted during interactive voice；

According to the conversational style of the user got, the corresponding response voices of the query are generated, and answered by described in It answers voice and returns to the intelligent sound equipment and play out.

According to one preferred embodiment of the present invention, the conversational style includes following one or whole：Locution, emotion wind Lattice.

According to one preferred embodiment of the present invention, the locution for obtaining the user includes：

According to the history intersection record of user and the intelligent sound equipment, the locution of the user is determined.

According to one preferred embodiment of the present invention, the locution includes following one or arbitrary combination：

Accent, pet phrase, rhythm of speaking, use popular vocabulary at format of speaking custom.

According to one preferred embodiment of the present invention, the affective style for obtaining the user includes：

According to one of following information or all：History intersection record, the real-time, interactive of user and the intelligent sound equipment Content determines the affective style of the user.

According to one preferred embodiment of the present invention, described to determine that the affective style of the user includes：

Pass through one of following sentiment analysis mode or arbitrary combination：Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm Sentiment analysis determines the affective style of the user.

According to one preferred embodiment of the present invention, the conversational style for the user that the basis is got, described in generation The corresponding response voices of query include：

Obtain the corresponding response contents of the query；

Conversational style in conjunction with the user and the response content generate the corresponding response voices of the query.

A kind of intelligent sound interaction implementation method, including：

Query input by user during acquisition interactive voice, is sent to cloud server, with toilet by the query Conversational style of the cloud server according to the user got is stated, the corresponding response voices of the query are generated；

The response voice from the cloud server is obtained, and is played out.

A kind of intelligent sound interaction realization device, including：First processing units and second processing unit；

The first processing units are user for obtaining user query, the query from intelligent sound equipment The query inputted during interactive voice is carried out with the intelligent sound equipment；

The second processing unit generates the query and corresponds to for the conversational style according to the user got Response voice, and the response voice is returned into the intelligent sound equipment and is played out.

According to one preferred embodiment of the present invention, the second processing unit is further used for, according to the user that gets with The history intersection record of the intelligent sound equipment, determines the locution of the user.

According to one preferred embodiment of the present invention, the second processing unit is further used for, according to the following letter got One of breath is whole：The history intersection record of user and the intelligent sound equipment, real-time, interactive content, determine the user Affective style.

According to one preferred embodiment of the present invention, the second processing unit passes through one of following sentiment analysis mode or arbitrary Combination：Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm sentiment analysis, determine the affective style of the user.

According to one preferred embodiment of the present invention, the second processing unit obtains the corresponding response contents of the query, knot The conversational style of the user and the response content are closed, the corresponding response voices of the query are generated.

A kind of intelligent sound interaction realization device, including：Third processing unit and fourth processing unit；

The third processing unit sends out the query for obtaining query input by user during interactive voice Cloud server is given, so that the cloud server is according to the conversational style of the user got, generates the query Corresponding response voice；

The fourth processing unit for obtaining the response voice from the cloud server, and plays out.

A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.

A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.

It can be seen that using scheme of the present invention based on above-mentioned introduction, getting the use from intelligent sound equipment After the query of family, the corresponding response voices of query can be generated according to the conversational style of the user got, and then by response voice Intelligent sound equipment is returned to play out, it compared with the prior art, can be based on the dialogue wind of user in scheme of the present invention Glucine is at response voice, to realize the personalized response for being directed to different user so that interactive voice is more perceptual, quasi- People, intelligence bring the interactive experience etc. for being more in line with human conversation custom for user.

【Description of the drawings】

Fig. 1 is the flow chart that intelligent sound of the present invention interacts implementation method first embodiment.

Fig. 2 is the flow chart that intelligent sound of the present invention interacts implementation method second embodiment.

Interactive mode schematic diagrames of the Fig. 3 between user of the present invention, intelligent sound equipment and cloud server.

Fig. 4 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device first embodiment.

Fig. 5 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device second embodiment.

Fig. 6 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention.

【Specific implementation mode】

User by voice come with intelligent sound equipment carry out communication exchange, and user be perception have emotion, user Can wish equipment can " intelligence ", can have the Language Style of people, understand the emotion etc. of people, can it is perceptual, intelligence rationally The exchange that engages in the dialogue.

For this purpose, proposing a kind of intelligent sound interaction realization method in the present invention, by the dialogue with user, intelligence learning is used The conversational style at family, and being applied, to keep interactive voice more personalized, sensibility etc..

In order to keep technical scheme of the present invention clearer, clear, develop simultaneously embodiment referring to the drawings, to institute of the present invention The scheme of stating is further described.

Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, all other embodiment that those skilled in the art are obtained without creative efforts, all Belong to the scope of protection of the invention.

Fig. 1 is the flow chart that intelligent sound of the present invention interacts implementation method first embodiment.As shown in Figure 1, including Realization method in detail below.

In 101, it is user and intelligent sound equipment to obtain user query, the query from intelligent sound equipment Carry out the query inputted during interactive voice.

In 102, according to the conversational style of the user got, the corresponding response voices of query are generated, and by response language Sound returns to intelligent sound equipment and plays out.

In practical applications, the executive agent of flow shown in Fig. 1 can be cloud server.

Intelligent sound equipment is got during interactive voice after query input by user, can send it to high in the clouds Server, cloud server can generate the corresponding response languages of the query got according to the conversational style of the user got Sound, and then response voice is returned into intelligent sound equipment, by intelligent sound equipment by response speech play to user.

The conversational style of user may include locution, can also include affective style, can also include speaking simultaneously Style and affective style etc., are described below respectively.

One) locution

Preferably, the locution of user can be determined according to the history intersection record of user and intelligent sound equipment.

History intersection record with user can be sent to cloud server by intelligent sound equipment, and cloud server is according to obtaining The history intersection record got learns the locution of user.

The history intersection record can refer to that intelligent sound equipment starts all intersection records with user after enabling, Can refer to the intersection record in such as nearest a period of time, usually, the history interaction note that intelligent sound equipment is provided The content of record is more, and the locution for the user that cloud server learns is more accurate, more comprehensive.

When intelligent sound equipment sends history intersection record to cloud server is not restricted, for example, every N days, N was Positive integer, intelligent sound equipment then can be sent to high in the clouds clothes by what is got by current time with the history intersection record of user Business device, so that cloud server learns according to the history intersection record got or update the locution of user.

Cloud server study can be applied it to after the locution of user in the dialogue with user, to allow use Family is felt to have an intimate sense with interest etc..

The locution may include following one or arbitrary combination：Accent, pet phrase, format of speaking custom, section of speaking It plays, using popular vocabulary etc..

1) accent

When many users use intelligent sound equipment, can band have an accent, such as one band of southeastern coast user, can say " wide It is general ", " Fujian is general " etc. or even mandarin and dialect high frequency vocabulary be mingled with use.It can be learnt by locution, analog subscriber Accent engages in the dialogue with user.

For example, certain user is Hok-lo, when being chatted with intelligent sound equipment, it can say that " you can say that is sent out (words) ", then the response voice for playing to user can be " I can say common hair, English ".

That is, analyzed by accent, can the accent of analog subscriber engage in the dialogue with user, to better blending into and The dialogue of user, it is warmer, intelligent.

2) pet phrase

Many users have the custom using pet phrase, for example, some users often can habitually add mouth when speaking Head buddhist " in fact ", " you say very to ", " pretty good " etc., then user is with intelligent sound equipment when carrying out interactive voice, It also can be habitually with pet phrase.The frequency of occurrences etc. that can be based on the vocabulary in history intersection record, analysis are determined as use The vocabulary of the pet phrase at family, and then analog subscriber uses pet phrase, engages in the dialogue with user.

For example, the pet phrase of user is " so ", it can not be felt in dialogue and use this pet phrase, as user is defeated The query entered is that " so, today, Beijing weather was suitble to outing", then the response voice for playing to user can be " to be So, Beijing weather today is pretty good, and air is suitble to away stroll outing ".

I.e. by analysis, the frequency that judgement " so " this vocabulary occurs is very high, all there is this in talking with many times A vocabulary then then can determine that one of the pet phrase for user, and then can say pet phrase at random in the dialogue with user, from And the locution of intelligent sound equipment personification is assigned, enhancing entertaining, cordial feeling.

3) format of speaking is accustomed to

Such as some user's idiom gas words " oh, eh, breathe out, uh, baa " etc., some users like brief knot By the clause of style, some users like pleading for mercy for the complete clause of Ganfeng richness, then, equally it is intended to whether expression can rain today The meaning, for the user for the clause for liking brief conclusion style, the response voice of broadcasting can be " today will not rain ", And the user of the complete clause for Ganfeng richness of liking pleading for mercy for, the response voice of broadcasting can for " today shows according to weather forecast, Weather is fine, and maximum probability will not rain ".

4) it speaks rhythm (speed)

The speech rate of different user is different, the fast user of speech rate, more the dialogue of custom rapid rate, and speech rate Slow user is more accustomed to the dialogue of slow rate.It can be according to the speed that user speaks in history intersection record, to be adapted to adjustment intelligence The speed of the dialogue of speech ciphering equipment.

5) using popular vocabulary

Some users like using popular vocabulary when speaking, and some users are then relatively conservative sedate, do not like with popular Vocabulary.Whether user, which likes using stream, can be judged for the frequency of use of popular vocabulary according to user in history intersection record Row vocabulary, and then it is medium appropriate can be added to dialogue.

It in practical applications, can be first after cloud server gets the user query that intelligent sound equipment is sent Speech recognition is carried out to it, it, later can be according to voice recognition result, according to existing to obtain the voice recognition result of textual form There is mode to determine response content, according to existing processing mode, can be given birth to later according to response content by speech synthesis technique etc. At response voice, and then returns to intelligent sound equipment and play out, and in the present embodiment, after determining response content, In combination with the locution and response content of the user learnt, response voice is generated.

For example, can playing response content according to the accent of the user learnt according to corresponding accent, being arrived according to study User pet phrase, in response content be added user pet phrase, can according to user learn speak format be accustomed to, Response content is simplified and (such as removes qualifier), response voice can be adjusted according to the rhythm of speaking of the user learnt Broadcasting speed, can be according to the custom of the user learnt liked using popular vocabulary, by some vocabulary in response content Corresponding popular vocabulary etc. is replaced with, is replaced with " blue thin mushroom " as that " will feel bad and want to cry ".

Two) affective style

Preferably, can be according to one of following information or whole：The history intersection record of user and intelligent sound equipment, in real time Interaction content determines the affective style of user.

Based on history intersection record, it can learn the passing affective style typically exhibited of user, if user is a product The often very unhappy people of extremely happy people or a passiveness gloomily, can be according to the passing emotion typically exhibited of user Style predicts the current affective style of user.Alternatively, real-time, interactive content can be based on, the current emotion wind of user is determined Lattice.Alternatively, also the two can be used in combination.

Real-time, interactive content can refer to the newest query got, alternatively, got during this interactive voice Query etc..

One of following sentiment analysis mode or arbitrary combination can be passed through：Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound section Sentiment analysis is played, determines the affective style of user.

1) vocabulary sentiment analysis

The vocabulary that can be directed in query carries out sentiment analysis.The emotion of Chinese character have commendatory term, derogatory term, actively with passiveness Modal particle, word of swearing at people etc., different vocabulary have the representative of its emotion.For example, the query of expression user to family has：Flat mercilessness Sense --- " I returns "；Happy positive emotion --- " I has returned "；Very happy emotion --- " aha, I am back ".

2) sentence meaning sentiment analysis

The vocabulary and complete sentence meaning that can be directed in query, pass through natural language processing (NLP, Natural Language Processing it) analyzes, to carry out sentiment analysis.Sentence meaning sentiment analysis is mainly based upon the progress of vocabulary sentiment analysis.

3) sound rhythm sentiment analysis

Query sound can be analyzed, be compared with history intersection record and standard voice emotion rhythm library etc. Judge sound rhythm, predicts emotion.

In combination with the affective style and response content of user, the corresponding response voices of query input by user are generated.

For example, query input by user is " I has returned ", happy positive affective style is obtained according to the prior art The response content got is " welcome is returned, and may I ask needs, what is helped ", then according to processing mode described in the present embodiment, it can Response content is adjusted to " welcome is returned, and has what what wanted help, I can be very serious ".

Furthermore it is also possible to which in conjunction with the locution of user, affective style and response content generate input by user The corresponding response voices of query.

It is very unhappy for example, user encounters trouble, then then can not in response content using popular vocabulary, no Using the pet phrase of user, without using the accent etc. of user, only carry out response speech play etc. according to the rhythm of speaking of user.Instead It can be in response content using popular vocabulary, the pet phrase etc. of user, so that response language if user is very happy Sound is more interesting.

In addition, for different users, different users can be distinguished by Application on Voiceprint Recognition, and then realize and be directed to different user Personalization show.

For example, intelligent sound equipment is intelligent sound box, family shares three mouthfuls of people, everyone can use the equipment, then Different users can be distinguished by Application on Voiceprint Recognition, the history intersection record for being directed to different user respectively carries out the dialogue of different user Style study etc..

Fig. 2 is the flow chart that intelligent sound of the present invention interacts implementation method second embodiment.As shown in Fig. 2, including Realization method in detail below.

In 201, query input by user during interactive voice is obtained, query is sent to cloud server, with Just cloud server generates the corresponding response voices of query according to the conversational style of the user got.

In 202, the response voice from cloud server is obtained, and is played out.

The conversational style may include following one or whole：Locution, affective style etc..Locution again can be into one Step includes following one or arbitrary combination：Accent, pet phrase, rhythm of speaking, use popular vocabulary etc. at format of speaking custom.

Based on above-mentioned introduction, interactions of the Fig. 3 between user of the present invention, intelligent sound equipment and cloud server Schematic diagram.As shown in figure 3, when user carries out interactive voice with intelligent sound equipment, it will usually first pass through and wake up word wake-up Intelligent sound equipment, later, user then can carry out normal interactive voice with intelligent sound equipment, be inputted to intelligent sound equipment Query, and obtain the response voice of intelligent sound device plays.Intelligent sound equipment can send the query got every time To cloud server, cloud server can be interacted according to the user for being obtained from intelligent sound equipment with the history of intelligent sound equipment Record etc., learns the conversational style for user, such as locution and affective style, and is applied, i.e., according to the dialogue of user Style generates the corresponding response voices of query got every time, and then response voice is returned to intelligent sound equipment and is carried out Play etc..

It should be noted that for each method embodiment above-mentioned, for simple description, all it is all expressed as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the described action sequence because According to the present invention, certain steps may be used other sequences or be carried out at the same time.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

In short, using scheme described in above-mentioned each method embodiment, response voice can be generated based on the conversational style of user, To realize be directed to different user personalized response so that interactive voice is more perceptual, anthropomorphic, intelligent, be user with To be more in line with the interactive experience etc. of human conversation custom.

It is the introduction about embodiment of the method above, below by way of device embodiment, to scheme of the present invention into traveling One step explanation.

Fig. 4 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device first embodiment.Such as Fig. 4 institutes Show, including：First processing units 401 and second processing unit 402.

First processing units 401, for obtaining the user query from intelligent sound equipment, the query be user and Intelligent sound equipment carries out the query inputted during interactive voice.

Second processing unit 402 generates the corresponding response languages of query for the conversational style according to the user got Sound, and response voice is returned into intelligent sound equipment and is played out.

The conversational style of user may include locution, can also include affective style, can also include speaking simultaneously Style and affective style etc..

Preferably, second processing unit 402 can according to the history intersection record of the user and intelligent sound equipment that get, Determine the locution of user.

That is, second processing unit 402 can be by obtained from intelligent sound equipment, user and intelligent sound The history intersection record of equipment is learnt, the locution of study to user.

Second processing unit 402 after getting the user query that intelligent sound equipment is sent, can first to its into Row speech recognition later can be according to voice recognition result, according to existing way to obtain the voice recognition result of textual form It determines response content, according to existing processing mode, response can be generated according to response content by speech synthesis technique etc. later Voice, and then return to intelligent sound equipment and play out, and in the present embodiment, it is combinable after determining response content The locution and response content of the user learnt generates response voice.

For example, can playing response content according to the accent of the user learnt according to corresponding accent, being arrived according to study User pet phrase, in response content be added user pet phrase etc..

Second processing unit 402 can also be according to one of following information got or whole：User and intelligent sound equipment History intersection record, real-time, interactive content, determine the affective style of user.

Specifically, second processing unit 402 can pass through one of following sentiment analysis mode or arbitrary combination：Vocabulary emotion point Analysis, sentence meaning sentiment analysis, sound rhythm sentiment analysis, determine the affective style of user.

Second processing unit 402 generates query pairs input by user in combination with the affective style and response content of user The response voice answered.

In addition, second processing unit 402 can be combined with the locution of user, affective style and response content, it is raw At the corresponding response voices of query input by user.

Fig. 5 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device second embodiment.Such as Fig. 5 It is shown, including：Third processing unit 501 and fourth processing unit 502.

Query is sent to cloud by third processing unit 501 for obtaining query input by user during interactive voice Server is held, so that cloud server is according to the conversational style of the user got, generates the corresponding response voices of query.

Fourth processing unit 502 for obtaining the response voice from cloud server, and plays out.

The specific workflow of Fig. 4 and Fig. 5 shown device embodiments please refers to the respective description in preceding method embodiment, It repeats no more.

Fig. 6 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention. The computer system/server 12 that Fig. 6 is shown is only an example, should not be to the function and use scope of the embodiment of the present invention Bring any restrictions.

As shown in fig. 6, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to：One or more processor (processing unit) 16, memory 28 connect not homology The bus 18 of system component (including memory 28 and processor 16).

Bus 18 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 12 typically comprises a variety of computer system readable media.These media can be appointed What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.

Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 6 To provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-volatile Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each to drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program There is one group of (for example, at least one) program module, these program modules to be configured to perform the present invention for product, the program product The function of each embodiment.

Program/utility 40 with one group of (at least one) program module 42 can be stored in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs Module and program data may include the realization of network environment in each or certain combination in these examples.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.

Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or any set with so that the computer system/server 12 communicated with one or more of the other computing device Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in fig. 6, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to：Microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processor 16 is stored in the program in memory 28 by operation, to perform various functions at application and data Reason, such as realize the method in Fig. 1 or 2 illustrated embodiments.

The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt The method in embodiment as shown in the figures 1 and 2 will be realized when processor executes.

The arbitrary combination of one or more computer-readable media may be used.Computer-readable medium can be calculated Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes：Electrical connection with one or more conducting wires, just It takes formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, can be any include computer readable storage medium or storage journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.

Computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission for by instruction execution system, device either device use or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partly executes or executed on a remote computer or server completely on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service It is connected by internet for quotient).

In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of division of logic function, formula that in actual implementation, there may be another division manner.

The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. it is various The medium of program code can be stored.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention With within principle, any modification, equivalent substitution, improvement and etc. done should be included within the scope of protection of the invention god.

Claims

1. a kind of intelligent sound interacts implementation method, which is characterized in that including：

It is that user carries out voice with the intelligent sound equipment to obtain user query, the query from intelligent sound equipment The query inputted in interactive process；

According to the conversational style of the user got, the corresponding response voices of the query are generated, and by the response language Sound returns to the intelligent sound equipment and plays out.

2. according to the method described in claim 1, it is characterized in that,

The conversational style includes following one or whole：Locution, affective style.

3. according to the method described in claim 2, it is characterized in that,

The locution for obtaining the user includes：

4. according to the method described in claim 3, it is characterized in that,

The locution includes following one or arbitrary combination：

5. according to the method described in claim 2, it is characterized in that,

The affective style for obtaining the user includes：

According to one of following information or all：The history intersection record of user and the intelligent sound equipment, real-time, interactive content, Determine the affective style of the user.

6. according to the method described in claim 5, it is characterized in that,

It is described to determine that the affective style of the user includes：

Pass through one of following sentiment analysis mode or arbitrary combination：Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm emotion Analysis, determines the affective style of the user.

7. according to the method described in claim 1, it is characterized in that,

The conversational style for the user that the basis is got, generating the corresponding response voices of the query includes：

Obtain the corresponding response contents of the query；

8. a kind of intelligent sound interacts implementation method, which is characterized in that including：

Query input by user during acquisition interactive voice, is sent to cloud server, so as to the cloud by the query It holds server according to the conversational style of the user got, generates the corresponding response voices of the query；

The response voice from the cloud server is obtained, and is played out.

9. according to the method described in claim 8, it is characterized in that,

10. a kind of intelligent sound interacts realization device, which is characterized in that including：First processing units and second processing unit；

The first processing units are user and institute for obtaining user query, the query from intelligent sound equipment It states intelligent sound equipment and carries out the query inputted during interactive voice；

The second processing unit, for according to the conversational style of the user got, generating, the query is corresponding to be answered Voice is answered, and the response voice is returned into the intelligent sound equipment and is played out.

11. device according to claim 10, which is characterized in that

12. according to the devices described in claim 11, which is characterized in that

The second processing unit is further used for, and note is interacted with the history of the intelligent sound equipment according to the user got Record, determines the locution of the user.

13. device according to claim 12, which is characterized in that

The locution includes following one or arbitrary combination：

14. according to the devices described in claim 11, which is characterized in that

The second processing unit is further used for, according to one of following information got or all：User and the intelligence The history intersection record of speech ciphering equipment, real-time, interactive content, determine the affective style of the user.

15. device according to claim 14, which is characterized in that

The second processing unit passes through one of following sentiment analysis mode or arbitrary combination：Vocabulary sentiment analysis, sentence meaning emotion Analysis, sound rhythm sentiment analysis, determine the affective style of the user.

16. device according to claim 10, which is characterized in that

The second processing unit obtains the corresponding response contents of the query, the conversational style in conjunction with the user and institute Response content is stated, the corresponding response voices of the query are generated.

17. a kind of intelligent sound interacts realization device, which is characterized in that including：Third processing unit and fourth processing unit；

The query is sent to by the third processing unit for obtaining query input by user during interactive voice Cloud server generates the query and corresponds to so that the cloud server is according to the conversational style of the user got Response voice；

18. device according to claim 17, which is characterized in that

19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9 Method described in.

20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is handled Such as method according to any one of claims 1 to 9 is realized when device executes.