CN108711423A - Intelligent sound interacts implementation method, device, computer equipment and storage medium - Google Patents
Intelligent sound interacts implementation method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108711423A CN108711423A CN201810294041.4A CN201810294041A CN108711423A CN 108711423 A CN108711423 A CN 108711423A CN 201810294041 A CN201810294041 A CN 201810294041A CN 108711423 A CN108711423 A CN 108711423A
- Authority
- CN
- China
- Prior art keywords
- user
- query
- intelligent sound
- style
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000004044 response Effects 0.000 claims abstract description 96
- 230000002452 interceptive effect Effects 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims description 45
- 230000008451 emotion Effects 0.000 claims description 19
- 230000033764 rhythmic process Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 1
- 241001672694 Citrus reticulata Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 235000020094 liqueur Nutrition 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000020354 squash Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses intelligent sound interaction implementation method, device, computer equipment and storage medium, wherein methods to include:It is that user carries out the query inputted during interactive voice with intelligent sound equipment to obtain user query, the query from intelligent sound equipment;According to the conversational style of the user got, the corresponding response voices of the query are generated, and response voice is returned into intelligent sound equipment and is played out.The present invention program can generate response voice based on the conversational style of user, to realize the personalized response for being directed to different user so that interactive voice is more perceptual, anthropomorphic, intelligent, and the interactive experience etc. for being more in line with human conversation custom is brought for user.
Description
【Technical field】
The present invention relates to Computer Applied Technologies, more particularly to intelligent sound interaction implementation method, device, computer equipment
And storage medium.
【Background technology】
Intelligent sound interaction is the interactive mode of new generation inputted based on voice, can be obtained by feedback knot by speaking
Fruit.With the development of technology with it is perfect, intelligent sound equipment is more and more universal, is more and more widely used.
Although current interactive voice dialogic operation improves tone color etc., from surface by artificial editor's response format in advance
It is upper to make dialogue closer to human conversation, have certain affine sense.The solid mechanical but dialogue remains unchanged, lacks human interest, lacks
Few " intelligence ", can only answer user according to the preset fixed policy in high in the clouds, and apparent with human conversation custom gap, user does not have
Substitute into sense, can only meet it is simple you ask me answers, cannot be satisfied more advanced human-machine intelligence's voice dialogue requirement.
【Invention content】
In view of this, the present invention provides intelligent sound interaction implementation method, device, computer equipment and storage mediums.
Specific technical solution is as follows:
A kind of intelligent sound interaction implementation method, including:
It is that user carries out with the intelligent sound equipment to obtain user query, the query from intelligent sound equipment
The query inputted during interactive voice;
According to the conversational style of the user got, the corresponding response voices of the query are generated, and answered by described in
It answers voice and returns to the intelligent sound equipment and play out.
According to one preferred embodiment of the present invention, the conversational style includes following one or whole:Locution, emotion wind
Lattice.
According to one preferred embodiment of the present invention, the locution for obtaining the user includes:
According to the history intersection record of user and the intelligent sound equipment, the locution of the user is determined.
According to one preferred embodiment of the present invention, the locution includes following one or arbitrary combination:
Accent, pet phrase, rhythm of speaking, use popular vocabulary at format of speaking custom.
According to one preferred embodiment of the present invention, the affective style for obtaining the user includes:
According to one of following information or all:History intersection record, the real-time, interactive of user and the intelligent sound equipment
Content determines the affective style of the user.
According to one preferred embodiment of the present invention, described to determine that the affective style of the user includes:
Pass through one of following sentiment analysis mode or arbitrary combination:Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm
Sentiment analysis determines the affective style of the user.
According to one preferred embodiment of the present invention, the conversational style for the user that the basis is got, described in generation
The corresponding response voices of query include:
Obtain the corresponding response contents of the query;
Conversational style in conjunction with the user and the response content generate the corresponding response voices of the query.
A kind of intelligent sound interaction implementation method, including:
Query input by user during acquisition interactive voice, is sent to cloud server, with toilet by the query
Conversational style of the cloud server according to the user got is stated, the corresponding response voices of the query are generated;
The response voice from the cloud server is obtained, and is played out.
According to one preferred embodiment of the present invention, the conversational style includes following one or whole:Locution, emotion wind
Lattice.
A kind of intelligent sound interaction realization device, including:First processing units and second processing unit;
The first processing units are user for obtaining user query, the query from intelligent sound equipment
The query inputted during interactive voice is carried out with the intelligent sound equipment;
The second processing unit generates the query and corresponds to for the conversational style according to the user got
Response voice, and the response voice is returned into the intelligent sound equipment and is played out.
According to one preferred embodiment of the present invention, the conversational style includes following one or whole:Locution, emotion wind
Lattice.
According to one preferred embodiment of the present invention, the second processing unit is further used for, according to the user that gets with
The history intersection record of the intelligent sound equipment, determines the locution of the user.
According to one preferred embodiment of the present invention, the locution includes following one or arbitrary combination:
Accent, pet phrase, rhythm of speaking, use popular vocabulary at format of speaking custom.
According to one preferred embodiment of the present invention, the second processing unit is further used for, according to the following letter got
One of breath is whole:The history intersection record of user and the intelligent sound equipment, real-time, interactive content, determine the user
Affective style.
According to one preferred embodiment of the present invention, the second processing unit passes through one of following sentiment analysis mode or arbitrary
Combination:Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm sentiment analysis, determine the affective style of the user.
According to one preferred embodiment of the present invention, the second processing unit obtains the corresponding response contents of the query, knot
The conversational style of the user and the response content are closed, the corresponding response voices of the query are generated.
A kind of intelligent sound interaction realization device, including:Third processing unit and fourth processing unit;
The third processing unit sends out the query for obtaining query input by user during interactive voice
Cloud server is given, so that the cloud server is according to the conversational style of the user got, generates the query
Corresponding response voice;
The fourth processing unit for obtaining the response voice from the cloud server, and plays out.
According to one preferred embodiment of the present invention, the conversational style includes following one or whole:Locution, emotion wind
Lattice.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor
The computer program of upper operation, the processor realize method as described above when executing described program.
A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor
Now method as described above.
It can be seen that using scheme of the present invention based on above-mentioned introduction, getting the use from intelligent sound equipment
After the query of family, the corresponding response voices of query can be generated according to the conversational style of the user got, and then by response voice
Intelligent sound equipment is returned to play out, it compared with the prior art, can be based on the dialogue wind of user in scheme of the present invention
Glucine is at response voice, to realize the personalized response for being directed to different user so that interactive voice is more perceptual, quasi-
People, intelligence bring the interactive experience etc. for being more in line with human conversation custom for user.
【Description of the drawings】
Fig. 1 is the flow chart that intelligent sound of the present invention interacts implementation method first embodiment.
Fig. 2 is the flow chart that intelligent sound of the present invention interacts implementation method second embodiment.
Interactive mode schematic diagrames of the Fig. 3 between user of the present invention, intelligent sound equipment and cloud server.
Fig. 4 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device first embodiment.
Fig. 5 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device second embodiment.
Fig. 6 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention.
【Specific implementation mode】
User by voice come with intelligent sound equipment carry out communication exchange, and user be perception have emotion, user
Can wish equipment can " intelligence ", can have the Language Style of people, understand the emotion etc. of people, can it is perceptual, intelligence rationally
The exchange that engages in the dialogue.
For this purpose, proposing a kind of intelligent sound interaction realization method in the present invention, by the dialogue with user, intelligence learning is used
The conversational style at family, and being applied, to keep interactive voice more personalized, sensibility etc..
In order to keep technical scheme of the present invention clearer, clear, develop simultaneously embodiment referring to the drawings, to institute of the present invention
The scheme of stating is further described.
Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention
In embodiment, all other embodiment that those skilled in the art are obtained without creative efforts, all
Belong to the scope of protection of the invention.
Fig. 1 is the flow chart that intelligent sound of the present invention interacts implementation method first embodiment.As shown in Figure 1, including
Realization method in detail below.
In 101, it is user and intelligent sound equipment to obtain user query, the query from intelligent sound equipment
Carry out the query inputted during interactive voice.
In 102, according to the conversational style of the user got, the corresponding response voices of query are generated, and by response language
Sound returns to intelligent sound equipment and plays out.
In practical applications, the executive agent of flow shown in Fig. 1 can be cloud server.
Intelligent sound equipment is got during interactive voice after query input by user, can send it to high in the clouds
Server, cloud server can generate the corresponding response languages of the query got according to the conversational style of the user got
Sound, and then response voice is returned into intelligent sound equipment, by intelligent sound equipment by response speech play to user.
The conversational style of user may include locution, can also include affective style, can also include speaking simultaneously
Style and affective style etc., are described below respectively.
One) locution
Preferably, the locution of user can be determined according to the history intersection record of user and intelligent sound equipment.
History intersection record with user can be sent to cloud server by intelligent sound equipment, and cloud server is according to obtaining
The history intersection record got learns the locution of user.
The history intersection record can refer to that intelligent sound equipment starts all intersection records with user after enabling,
Can refer to the intersection record in such as nearest a period of time, usually, the history interaction note that intelligent sound equipment is provided
The content of record is more, and the locution for the user that cloud server learns is more accurate, more comprehensive.
When intelligent sound equipment sends history intersection record to cloud server is not restricted, for example, every N days, N was
Positive integer, intelligent sound equipment then can be sent to high in the clouds clothes by what is got by current time with the history intersection record of user
Business device, so that cloud server learns according to the history intersection record got or update the locution of user.
Cloud server study can be applied it to after the locution of user in the dialogue with user, to allow use
Family is felt to have an intimate sense with interest etc..
The locution may include following one or arbitrary combination:Accent, pet phrase, format of speaking custom, section of speaking
It plays, using popular vocabulary etc..
1) accent
When many users use intelligent sound equipment, can band have an accent, such as one band of southeastern coast user, can say " wide
It is general ", " Fujian is general " etc. or even mandarin and dialect high frequency vocabulary be mingled with use.It can be learnt by locution, analog subscriber
Accent engages in the dialogue with user.
For example, certain user is Hok-lo, when being chatted with intelligent sound equipment, it can say that " you can say that is sent out
(words) ", then the response voice for playing to user can be " I can say common hair, English ".
That is, analyzed by accent, can the accent of analog subscriber engage in the dialogue with user, to better blending into and
The dialogue of user, it is warmer, intelligent.
2) pet phrase
Many users have the custom using pet phrase, for example, some users often can habitually add mouth when speaking
Head buddhist " in fact ", " you say very to ", " pretty good " etc., then user is with intelligent sound equipment when carrying out interactive voice,
It also can be habitually with pet phrase.The frequency of occurrences etc. that can be based on the vocabulary in history intersection record, analysis are determined as use
The vocabulary of the pet phrase at family, and then analog subscriber uses pet phrase, engages in the dialogue with user.
For example, the pet phrase of user is " so ", it can not be felt in dialogue and use this pet phrase, as user is defeated
The query entered is that " so, today, Beijing weather was suitble to outing", then the response voice for playing to user can be " to be
So, Beijing weather today is pretty good, and air is suitble to away stroll outing ".
I.e. by analysis, the frequency that judgement " so " this vocabulary occurs is very high, all there is this in talking with many times
A vocabulary then then can determine that one of the pet phrase for user, and then can say pet phrase at random in the dialogue with user, from
And the locution of intelligent sound equipment personification is assigned, enhancing entertaining, cordial feeling.
3) format of speaking is accustomed to
Such as some user's idiom gas words " oh, eh, breathe out, uh, baa " etc., some users like brief knot
By the clause of style, some users like pleading for mercy for the complete clause of Ganfeng richness, then, equally it is intended to whether expression can rain today
The meaning, for the user for the clause for liking brief conclusion style, the response voice of broadcasting can be " today will not rain ",
And the user of the complete clause for Ganfeng richness of liking pleading for mercy for, the response voice of broadcasting can for " today shows according to weather forecast,
Weather is fine, and maximum probability will not rain ".
4) it speaks rhythm (speed)
The speech rate of different user is different, the fast user of speech rate, more the dialogue of custom rapid rate, and speech rate
Slow user is more accustomed to the dialogue of slow rate.It can be according to the speed that user speaks in history intersection record, to be adapted to adjustment intelligence
The speed of the dialogue of speech ciphering equipment.
5) using popular vocabulary
Some users like using popular vocabulary when speaking, and some users are then relatively conservative sedate, do not like with popular
Vocabulary.Whether user, which likes using stream, can be judged for the frequency of use of popular vocabulary according to user in history intersection record
Row vocabulary, and then it is medium appropriate can be added to dialogue.
It in practical applications, can be first after cloud server gets the user query that intelligent sound equipment is sent
Speech recognition is carried out to it, it, later can be according to voice recognition result, according to existing to obtain the voice recognition result of textual form
There is mode to determine response content, according to existing processing mode, can be given birth to later according to response content by speech synthesis technique etc.
At response voice, and then returns to intelligent sound equipment and play out, and in the present embodiment, after determining response content,
In combination with the locution and response content of the user learnt, response voice is generated.
For example, can playing response content according to the accent of the user learnt according to corresponding accent, being arrived according to study
User pet phrase, in response content be added user pet phrase, can according to user learn speak format be accustomed to,
Response content is simplified and (such as removes qualifier), response voice can be adjusted according to the rhythm of speaking of the user learnt
Broadcasting speed, can be according to the custom of the user learnt liked using popular vocabulary, by some vocabulary in response content
Corresponding popular vocabulary etc. is replaced with, is replaced with " blue thin mushroom " as that " will feel bad and want to cry ".
Two) affective style
Preferably, can be according to one of following information or whole:The history intersection record of user and intelligent sound equipment, in real time
Interaction content determines the affective style of user.
Based on history intersection record, it can learn the passing affective style typically exhibited of user, if user is a product
The often very unhappy people of extremely happy people or a passiveness gloomily, can be according to the passing emotion typically exhibited of user
Style predicts the current affective style of user.Alternatively, real-time, interactive content can be based on, the current emotion wind of user is determined
Lattice.Alternatively, also the two can be used in combination.
Real-time, interactive content can refer to the newest query got, alternatively, got during this interactive voice
Query etc..
One of following sentiment analysis mode or arbitrary combination can be passed through:Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound section
Sentiment analysis is played, determines the affective style of user.
1) vocabulary sentiment analysis
The vocabulary that can be directed in query carries out sentiment analysis.The emotion of Chinese character have commendatory term, derogatory term, actively with passiveness
Modal particle, word of swearing at people etc., different vocabulary have the representative of its emotion.For example, the query of expression user to family has:Flat mercilessness
Sense --- " I returns ";Happy positive emotion --- " I has returned ";Very happy emotion --- " aha, I am back
".
2) sentence meaning sentiment analysis
The vocabulary and complete sentence meaning that can be directed in query, pass through natural language processing (NLP, Natural Language
Processing it) analyzes, to carry out sentiment analysis.Sentence meaning sentiment analysis is mainly based upon the progress of vocabulary sentiment analysis.
3) sound rhythm sentiment analysis
Query sound can be analyzed, be compared with history intersection record and standard voice emotion rhythm library etc.
Judge sound rhythm, predicts emotion.
In combination with the affective style and response content of user, the corresponding response voices of query input by user are generated.
For example, query input by user is " I has returned ", happy positive affective style is obtained according to the prior art
The response content got is " welcome is returned, and may I ask needs, what is helped ", then according to processing mode described in the present embodiment, it can
Response content is adjusted to " welcome is returned, and has what what wanted help, I can be very serious ".
Furthermore it is also possible to which in conjunction with the locution of user, affective style and response content generate input by user
The corresponding response voices of query.
It is very unhappy for example, user encounters trouble, then then can not in response content using popular vocabulary, no
Using the pet phrase of user, without using the accent etc. of user, only carry out response speech play etc. according to the rhythm of speaking of user.Instead
It can be in response content using popular vocabulary, the pet phrase etc. of user, so that response language if user is very happy
Sound is more interesting.
In addition, for different users, different users can be distinguished by Application on Voiceprint Recognition, and then realize and be directed to different user
Personalization show.
For example, intelligent sound equipment is intelligent sound box, family shares three mouthfuls of people, everyone can use the equipment, then
Different users can be distinguished by Application on Voiceprint Recognition, the history intersection record for being directed to different user respectively carries out the dialogue of different user
Style study etc..
Fig. 2 is the flow chart that intelligent sound of the present invention interacts implementation method second embodiment.As shown in Fig. 2, including
Realization method in detail below.
In 201, query input by user during interactive voice is obtained, query is sent to cloud server, with
Just cloud server generates the corresponding response voices of query according to the conversational style of the user got.
In 202, the response voice from cloud server is obtained, and is played out.
The conversational style may include following one or whole:Locution, affective style etc..Locution again can be into one
Step includes following one or arbitrary combination:Accent, pet phrase, rhythm of speaking, use popular vocabulary etc. at format of speaking custom.
Based on above-mentioned introduction, interactions of the Fig. 3 between user of the present invention, intelligent sound equipment and cloud server
Schematic diagram.As shown in figure 3, when user carries out interactive voice with intelligent sound equipment, it will usually first pass through and wake up word wake-up
Intelligent sound equipment, later, user then can carry out normal interactive voice with intelligent sound equipment, be inputted to intelligent sound equipment
Query, and obtain the response voice of intelligent sound device plays.Intelligent sound equipment can send the query got every time
To cloud server, cloud server can be interacted according to the user for being obtained from intelligent sound equipment with the history of intelligent sound equipment
Record etc., learns the conversational style for user, such as locution and affective style, and is applied, i.e., according to the dialogue of user
Style generates the corresponding response voices of query got every time, and then response voice is returned to intelligent sound equipment and is carried out
Play etc..
It should be noted that for each method embodiment above-mentioned, for simple description, all it is all expressed as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the described action sequence because
According to the present invention, certain steps may be used other sequences or be carried out at the same time.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
In short, using scheme described in above-mentioned each method embodiment, response voice can be generated based on the conversational style of user,
To realize be directed to different user personalized response so that interactive voice is more perceptual, anthropomorphic, intelligent, be user with
To be more in line with the interactive experience etc. of human conversation custom.
It is the introduction about embodiment of the method above, below by way of device embodiment, to scheme of the present invention into traveling
One step explanation.
Fig. 4 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device first embodiment.Such as Fig. 4 institutes
Show, including:First processing units 401 and second processing unit 402.
First processing units 401, for obtaining the user query from intelligent sound equipment, the query be user and
Intelligent sound equipment carries out the query inputted during interactive voice.
Second processing unit 402 generates the corresponding response languages of query for the conversational style according to the user got
Sound, and response voice is returned into intelligent sound equipment and is played out.
The conversational style of user may include locution, can also include affective style, can also include speaking simultaneously
Style and affective style etc..
Preferably, second processing unit 402 can according to the history intersection record of the user and intelligent sound equipment that get,
Determine the locution of user.
That is, second processing unit 402 can be by obtained from intelligent sound equipment, user and intelligent sound
The history intersection record of equipment is learnt, the locution of study to user.
The locution may include following one or arbitrary combination:Accent, pet phrase, format of speaking custom, section of speaking
It plays, using popular vocabulary etc..
Second processing unit 402 after getting the user query that intelligent sound equipment is sent, can first to its into
Row speech recognition later can be according to voice recognition result, according to existing way to obtain the voice recognition result of textual form
It determines response content, according to existing processing mode, response can be generated according to response content by speech synthesis technique etc. later
Voice, and then return to intelligent sound equipment and play out, and in the present embodiment, it is combinable after determining response content
The locution and response content of the user learnt generates response voice.
For example, can playing response content according to the accent of the user learnt according to corresponding accent, being arrived according to study
User pet phrase, in response content be added user pet phrase etc..
Second processing unit 402 can also be according to one of following information got or whole:User and intelligent sound equipment
History intersection record, real-time, interactive content, determine the affective style of user.
Based on history intersection record, it can learn the passing affective style typically exhibited of user, if user is a product
The often very unhappy people of extremely happy people or a passiveness gloomily, can be according to the passing emotion typically exhibited of user
Style predicts the current affective style of user.Alternatively, real-time, interactive content can be based on, the current emotion wind of user is determined
Lattice.Alternatively, also the two can be used in combination.
Specifically, second processing unit 402 can pass through one of following sentiment analysis mode or arbitrary combination:Vocabulary emotion point
Analysis, sentence meaning sentiment analysis, sound rhythm sentiment analysis, determine the affective style of user.
Second processing unit 402 generates query pairs input by user in combination with the affective style and response content of user
The response voice answered.
For example, query input by user is " I has returned ", happy positive affective style is obtained according to the prior art
The response content got is " welcome is returned, and may I ask needs, what is helped ", then according to processing mode described in the present embodiment, it can
Response content is adjusted to " welcome is returned, and has what what wanted help, I can be very serious ".
In addition, second processing unit 402 can be combined with the locution of user, affective style and response content, it is raw
At the corresponding response voices of query input by user.
It is very unhappy for example, user encounters trouble, then then can not in response content using popular vocabulary, no
Using the pet phrase of user, without using the accent etc. of user, only carry out response speech play etc. according to the rhythm of speaking of user.Instead
It can be in response content using popular vocabulary, the pet phrase etc. of user, so that response language if user is very happy
Sound is more interesting.
Fig. 5 is the composed structure schematic diagram that intelligent sound of the present invention interacts realization device second embodiment.Such as Fig. 5
It is shown, including:Third processing unit 501 and fourth processing unit 502.
Query is sent to cloud by third processing unit 501 for obtaining query input by user during interactive voice
Server is held, so that cloud server is according to the conversational style of the user got, generates the corresponding response voices of query.
Fourth processing unit 502 for obtaining the response voice from cloud server, and plays out.
The conversational style may include following one or whole:Locution, affective style etc..Locution again can be into one
Step includes following one or arbitrary combination:Accent, pet phrase, rhythm of speaking, use popular vocabulary etc. at format of speaking custom.
The specific workflow of Fig. 4 and Fig. 5 shown device embodiments please refers to the respective description in preceding method embodiment,
It repeats no more.
Fig. 6 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention.
The computer system/server 12 that Fig. 6 is shown is only an example, should not be to the function and use scope of the embodiment of the present invention
Bring any restrictions.
As shown in fig. 6, computer system/server 12 is showed in the form of universal computing device.Computer system/service
The component of device 12 can include but is not limited to:One or more processor (processing unit) 16, memory 28 connect not homology
The bus 18 of system component (including memory 28 and processor 16).
Bus 18 indicates one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts
For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 12 typically comprises a variety of computer system readable media.These media can be appointed
What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and
Immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory
Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no
Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing
Immovable, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 6
To provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-volatile
Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each to drive
Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program
There is one group of (for example, at least one) program module, these program modules to be configured to perform the present invention for product, the program product
The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can be stored in such as memory 28
In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs
Module and program data may include the realization of network environment in each or certain combination in these examples.Program mould
Block 42 usually executes function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14
Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more
Letter, and/or any set with so that the computer system/server 12 communicated with one or more of the other computing device
Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And
And computer system/server 12 can also pass through network adapter 20 and one or more network (such as LAN
(LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in fig. 6, network adapter 20 passes through bus
18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined
Systems/servers 12 use other hardware and/or software module, including but not limited to:Microcode, device driver, at redundancy
Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processor 16 is stored in the program in memory 28 by operation, to perform various functions at application and data
Reason, such as realize the method in Fig. 1 or 2 illustrated embodiments.
The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt
The method in embodiment as shown in the figures 1 and 2 will be realized when processor executes.
The arbitrary combination of one or more computer-readable media may be used.Computer-readable medium can be calculated
Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited
In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates
The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes:Electrical connection with one or more conducting wires, just
It takes formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this document, can be any include computer readable storage medium or storage journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.
Computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated,
Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium other than computer readable storage medium, which can send, propagate or
Transmission for by instruction execution system, device either device use or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partly executes or executed on a remote computer or server completely on the remote computer on the user computer.
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service
It is connected by internet for quotient).
In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through
Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of division of logic function, formula that in actual implementation, there may be another division manner.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. it is various
The medium of program code can be stored.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
With within principle, any modification, equivalent substitution, improvement and etc. done should be included within the scope of protection of the invention god.
Claims (20)
1. a kind of intelligent sound interacts implementation method, which is characterized in that including:
It is that user carries out voice with the intelligent sound equipment to obtain user query, the query from intelligent sound equipment
The query inputted in interactive process;
According to the conversational style of the user got, the corresponding response voices of the query are generated, and by the response language
Sound returns to the intelligent sound equipment and plays out.
2. according to the method described in claim 1, it is characterized in that,
The conversational style includes following one or whole:Locution, affective style.
3. according to the method described in claim 2, it is characterized in that,
The locution for obtaining the user includes:
According to the history intersection record of user and the intelligent sound equipment, the locution of the user is determined.
4. according to the method described in claim 3, it is characterized in that,
The locution includes following one or arbitrary combination:
Accent, pet phrase, rhythm of speaking, use popular vocabulary at format of speaking custom.
5. according to the method described in claim 2, it is characterized in that,
The affective style for obtaining the user includes:
According to one of following information or all:The history intersection record of user and the intelligent sound equipment, real-time, interactive content,
Determine the affective style of the user.
6. according to the method described in claim 5, it is characterized in that,
It is described to determine that the affective style of the user includes:
Pass through one of following sentiment analysis mode or arbitrary combination:Vocabulary sentiment analysis, sentence meaning sentiment analysis, sound rhythm emotion
Analysis, determines the affective style of the user.
7. according to the method described in claim 1, it is characterized in that,
The conversational style for the user that the basis is got, generating the corresponding response voices of the query includes:
Obtain the corresponding response contents of the query;
Conversational style in conjunction with the user and the response content generate the corresponding response voices of the query.
8. a kind of intelligent sound interacts implementation method, which is characterized in that including:
Query input by user during acquisition interactive voice, is sent to cloud server, so as to the cloud by the query
It holds server according to the conversational style of the user got, generates the corresponding response voices of the query;
The response voice from the cloud server is obtained, and is played out.
9. according to the method described in claim 8, it is characterized in that,
The conversational style includes following one or whole:Locution, affective style.
10. a kind of intelligent sound interacts realization device, which is characterized in that including:First processing units and second processing unit;
The first processing units are user and institute for obtaining user query, the query from intelligent sound equipment
It states intelligent sound equipment and carries out the query inputted during interactive voice;
The second processing unit, for according to the conversational style of the user got, generating, the query is corresponding to be answered
Voice is answered, and the response voice is returned into the intelligent sound equipment and is played out.
11. device according to claim 10, which is characterized in that
The conversational style includes following one or whole:Locution, affective style.
12. according to the devices described in claim 11, which is characterized in that
The second processing unit is further used for, and note is interacted with the history of the intelligent sound equipment according to the user got
Record, determines the locution of the user.
13. device according to claim 12, which is characterized in that
The locution includes following one or arbitrary combination:
Accent, pet phrase, rhythm of speaking, use popular vocabulary at format of speaking custom.
14. according to the devices described in claim 11, which is characterized in that
The second processing unit is further used for, according to one of following information got or all:User and the intelligence
The history intersection record of speech ciphering equipment, real-time, interactive content, determine the affective style of the user.
15. device according to claim 14, which is characterized in that
The second processing unit passes through one of following sentiment analysis mode or arbitrary combination:Vocabulary sentiment analysis, sentence meaning emotion
Analysis, sound rhythm sentiment analysis, determine the affective style of the user.
16. device according to claim 10, which is characterized in that
The second processing unit obtains the corresponding response contents of the query, the conversational style in conjunction with the user and institute
Response content is stated, the corresponding response voices of the query are generated.
17. a kind of intelligent sound interacts realization device, which is characterized in that including:Third processing unit and fourth processing unit;
The query is sent to by the third processing unit for obtaining query input by user during interactive voice
Cloud server generates the query and corresponds to so that the cloud server is according to the conversational style of the user got
Response voice;
The fourth processing unit for obtaining the response voice from the cloud server, and plays out.
18. device according to claim 17, which is characterized in that
The conversational style includes following one or whole:Locution, affective style.
19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9
Method described in.
20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is handled
Such as method according to any one of claims 1 to 9 is realized when device executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810294041.4A CN108711423A (en) | 2018-03-30 | 2018-03-30 | Intelligent sound interacts implementation method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810294041.4A CN108711423A (en) | 2018-03-30 | 2018-03-30 | Intelligent sound interacts implementation method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108711423A true CN108711423A (en) | 2018-10-26 |
Family
ID=63866477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810294041.4A Pending CN108711423A (en) | 2018-03-30 | 2018-03-30 | Intelligent sound interacts implementation method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108711423A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146610A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of determination method and device of user view |
CN109413277A (en) * | 2018-11-20 | 2019-03-01 | 维沃移动通信有限公司 | A kind of speech output method and terminal device |
CN110265021A (en) * | 2019-07-22 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Personalized speech exchange method, robot terminal, device and readable storage medium storing program for executing |
CN111199732A (en) * | 2018-11-16 | 2020-05-26 | 深圳Tcl新技术有限公司 | Emotion-based voice interaction method, storage medium and terminal equipment |
CN111292737A (en) * | 2018-12-07 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Voice interaction and voice awakening detection method, device, equipment and storage medium |
CN111475020A (en) * | 2020-04-02 | 2020-07-31 | 深圳创维-Rgb电子有限公司 | Information interaction method, interaction device, electronic equipment and storage medium |
CN111724789A (en) * | 2019-03-19 | 2020-09-29 | 华为终端有限公司 | Voice interaction method and terminal equipment |
CN111833854A (en) * | 2020-01-08 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | Man-machine interaction method, terminal and computer readable storage medium |
CN111862938A (en) * | 2020-05-07 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Intelligent response method, terminal and computer readable storage medium |
CN112181348A (en) * | 2020-08-28 | 2021-01-05 | 星络智能科技有限公司 | Sound style switching method, system, computer equipment and readable storage medium |
CN112445901A (en) * | 2019-09-03 | 2021-03-05 | 上海智臻智能网络科技股份有限公司 | Method and device for setting language of intelligent equipment |
CN112634886A (en) * | 2020-12-02 | 2021-04-09 | 海信电子科技(武汉)有限公司 | Interaction method of intelligent equipment, server, computing equipment and storage medium |
WO2021068467A1 (en) * | 2019-10-12 | 2021-04-15 | 百度在线网络技术(北京)有限公司 | Method and apparatus for recommending voice packet, electronic device, and storage medium |
CN112667796A (en) * | 2021-01-05 | 2021-04-16 | 网易(杭州)网络有限公司 | Dialog reply method and device, electronic equipment and readable storage medium |
CN113053373A (en) * | 2021-02-26 | 2021-06-29 | 上海声通信息科技股份有限公司 | Intelligent vehicle-mounted voice interaction system supporting voice cloning |
CN113689881A (en) * | 2020-05-18 | 2021-11-23 | 北京中关村科金技术有限公司 | Method, device and storage medium for audio interaction aiming at voice image |
CN115101048A (en) * | 2022-08-24 | 2022-09-23 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102413418A (en) * | 2011-10-13 | 2012-04-11 | 任峰 | Interpreter capable of realizing IVR process through intelligent mobile phone interface |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
CN106328139A (en) * | 2016-09-14 | 2017-01-11 | 努比亚技术有限公司 | Voice interaction method and voice interaction system |
CN106469212A (en) * | 2016-09-05 | 2017-03-01 | 北京百度网讯科技有限公司 | Man-machine interaction method based on artificial intelligence and device |
CN106504743A (en) * | 2016-11-14 | 2017-03-15 | 北京光年无限科技有限公司 | A kind of interactive voice output intent and robot for intelligent robot |
CN106934452A (en) * | 2017-01-19 | 2017-07-07 | 深圳前海勇艺达机器人有限公司 | Robot dialogue method and system |
CN107146610A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of determination method and device of user view |
-
2018
- 2018-03-30 CN CN201810294041.4A patent/CN108711423A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102413418A (en) * | 2011-10-13 | 2012-04-11 | 任峰 | Interpreter capable of realizing IVR process through intelligent mobile phone interface |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
CN106469212A (en) * | 2016-09-05 | 2017-03-01 | 北京百度网讯科技有限公司 | Man-machine interaction method based on artificial intelligence and device |
CN106328139A (en) * | 2016-09-14 | 2017-01-11 | 努比亚技术有限公司 | Voice interaction method and voice interaction system |
CN106504743A (en) * | 2016-11-14 | 2017-03-15 | 北京光年无限科技有限公司 | A kind of interactive voice output intent and robot for intelligent robot |
CN106934452A (en) * | 2017-01-19 | 2017-07-07 | 深圳前海勇艺达机器人有限公司 | Robot dialogue method and system |
CN107146610A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of determination method and device of user view |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146610A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of determination method and device of user view |
CN111199732A (en) * | 2018-11-16 | 2020-05-26 | 深圳Tcl新技术有限公司 | Emotion-based voice interaction method, storage medium and terminal equipment |
CN109413277A (en) * | 2018-11-20 | 2019-03-01 | 维沃移动通信有限公司 | A kind of speech output method and terminal device |
CN111292737A (en) * | 2018-12-07 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Voice interaction and voice awakening detection method, device, equipment and storage medium |
CN111724789A (en) * | 2019-03-19 | 2020-09-29 | 华为终端有限公司 | Voice interaction method and terminal equipment |
CN110265021A (en) * | 2019-07-22 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Personalized speech exchange method, robot terminal, device and readable storage medium storing program for executing |
CN112445901A (en) * | 2019-09-03 | 2021-03-05 | 上海智臻智能网络科技股份有限公司 | Method and device for setting language of intelligent equipment |
WO2021068467A1 (en) * | 2019-10-12 | 2021-04-15 | 百度在线网络技术(北京)有限公司 | Method and apparatus for recommending voice packet, electronic device, and storage medium |
CN111833854A (en) * | 2020-01-08 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | Man-machine interaction method, terminal and computer readable storage medium |
CN111475020A (en) * | 2020-04-02 | 2020-07-31 | 深圳创维-Rgb电子有限公司 | Information interaction method, interaction device, electronic equipment and storage medium |
CN111862938A (en) * | 2020-05-07 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Intelligent response method, terminal and computer readable storage medium |
CN113689881A (en) * | 2020-05-18 | 2021-11-23 | 北京中关村科金技术有限公司 | Method, device and storage medium for audio interaction aiming at voice image |
CN112181348A (en) * | 2020-08-28 | 2021-01-05 | 星络智能科技有限公司 | Sound style switching method, system, computer equipment and readable storage medium |
CN112634886A (en) * | 2020-12-02 | 2021-04-09 | 海信电子科技(武汉)有限公司 | Interaction method of intelligent equipment, server, computing equipment and storage medium |
CN112634886B (en) * | 2020-12-02 | 2024-03-01 | 海信电子科技(武汉)有限公司 | Interaction method of intelligent equipment, server, computing equipment and storage medium |
CN112667796A (en) * | 2021-01-05 | 2021-04-16 | 网易(杭州)网络有限公司 | Dialog reply method and device, electronic equipment and readable storage medium |
CN112667796B (en) * | 2021-01-05 | 2023-08-11 | 网易(杭州)网络有限公司 | Dialogue reply method and device, electronic equipment and readable storage medium |
CN113053373A (en) * | 2021-02-26 | 2021-06-29 | 上海声通信息科技股份有限公司 | Intelligent vehicle-mounted voice interaction system supporting voice cloning |
CN115101048A (en) * | 2022-08-24 | 2022-09-23 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
CN115101048B (en) * | 2022-08-24 | 2022-11-11 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108711423A (en) | Intelligent sound interacts implementation method, device, computer equipment and storage medium | |
CN108962217B (en) | Speech synthesis method and related equipment | |
EP3438972B1 (en) | Information processing system and method for generating speech | |
US20200279553A1 (en) | Linguistic style matching agent | |
US20180203946A1 (en) | Computer generated emulation of a subject | |
CN110491382A (en) | Audio recognition method, device and interactive voice equipment based on artificial intelligence | |
CN112349273B (en) | Speech synthesis method based on speaker, model training method and related equipment | |
CN109189980A (en) | The method and electronic equipment of interactive voice are carried out with user | |
CN108597509A (en) | Intelligent sound interacts implementation method, device, computer equipment and storage medium | |
CN107516511A (en) | The Text To Speech learning system of intention assessment and mood | |
KR20170026593A (en) | Generating computer responses to social conversational inputs | |
CN110347792A (en) | Talk with generation method and device, storage medium, electronic equipment | |
WO2000038808A1 (en) | Information processor, portable device, electronic pet device, recorded medium on which information processing procedure is recorded, and information processing method | |
Latif et al. | Self supervised adversarial domain adaptation for cross-corpus and cross-language speech emotion recognition | |
JP2003521750A (en) | Speech system | |
JPWO2017200074A1 (en) | Dialogue method, dialogue system, dialogue apparatus, and program | |
Zhou et al. | Speech synthesis with mixed emotions | |
Singh | The role of speech technology in biometrics, forensics and man-machine interface. | |
CN113838448B (en) | Speech synthesis method, device, equipment and computer readable storage medium | |
Wang et al. | Comic-guided speech synthesis | |
CN113761268A (en) | Playing control method, device, equipment and storage medium of audio program content | |
KR100917552B1 (en) | Method and system for improving the fidelity of a dialog system | |
JP2009151314A (en) | Information processing device and information processing method | |
WO2017200077A1 (en) | Dialog method, dialog system, dialog device, and program | |
Verma et al. | Animating expressive faces across languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210508 Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Applicant after: Shanghai Xiaodu Technology Co.,Ltd. Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. |