CN110136719A - A kind of method, apparatus and system for realizing Intelligent voice dialog - Google Patents
A kind of method, apparatus and system for realizing Intelligent voice dialog Download PDFInfo
- Publication number
- CN110136719A CN110136719A CN201810105481.0A CN201810105481A CN110136719A CN 110136719 A CN110136719 A CN 110136719A CN 201810105481 A CN201810105481 A CN 201810105481A CN 110136719 A CN110136719 A CN 110136719A
- Authority
- CN
- China
- Prior art keywords
- text
- logic
- dialog
- language
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The present invention provides a kind of method, apparatus and system for realizing Intelligent voice dialog, which comprises receives the voice signal of client recording;Speech text is converted by the voice signal;Determine the corresponding semanteme of the speech text;Determine the corresponding logic of language of the semanteme;Determine the corresponding dialog text of the logic of language;Synthesize the corresponding audio file of the dialog text;The audio file is sent to client.Using the embodiment of the present invention, the English study time is flexible, and cost is low, and the limitation answered user is small, and the learning experience of intelligent man-machine interactive is provided for user.
Description
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of method, apparatus for realizing Intelligent voice dialog and
System.
Background technique
Raising with people to the attention degree of English study, more and more English study mechanisms and English study are soft
Part comes into being.
In general, people select foreign teacher's course under the line of payment, foreign teacher's course time is solid under line to preferably practice spoken language
Fixed, learning time is not flexible, spends high;And the simulation dialogue that Online Learning software provides must be promoted according to set process,
Directly providing option allows user to answer, and the limitation answered user is very big, and the intelligence that man-machine interactive can not be provided for user is learned
Practise experience.
Summary of the invention
In view of this, the present invention provides a kind of method, apparatus and system for realizing Intelligent voice dialog, to solve Anglistics
The habit time is not flexible, spends high, big to the limitation of user's answer problem.
To achieve the above object, it is as follows to provide technical solution by the present invention:
According to the first aspect of the invention, a kind of method for realizing Intelligent voice dialog is proposed, which comprises
Receive the voice signal of client recording;
Speech text is converted by the voice signal;
Determine the corresponding semanteme of the speech text;
Determine the corresponding logic of language of the semanteme;
Determine the corresponding dialog text of the logic of language;
Synthesize the corresponding audio file of the dialog text;
The audio file is sent to client.
According to the second aspect of the invention, a kind of device for realizing Intelligent voice dialog is proposed, comprising:
Speech reception module, for receiving the voice signal of client recording;
Text conversion module, for converting speech text for the voice signal;
Semantic determining module, for determining the corresponding semanteme of the speech text;
Logic determining module, for determining the corresponding logic of language of the semanteme;
Text determining module, for determining the corresponding dialog text of the logic of language;
Audio synthesis module, for synthesizing the corresponding audio file of the dialog text;
Audio sending module, for sending the audio file to client.
According to the third aspect of the invention we, a kind of system for realizing Intelligent voice dialog is proposed, the system comprises: visitor
Family end, server;Wherein,
The client sends scene instruction to server for receiving scene instruction;
The server is referred to for being instructed the function of opening Intelligent voice dialog based on the scene, and based on the scene
It enables corresponding scene initiate first run dialogue to client, when receiving the voice signal of client recording, the voice is believed
Number it is converted into speech text, determines the corresponding semanteme of the speech text, determine the corresponding logic of language of the semanteme, determine institute
The corresponding dialog text of logic of language is stated, the corresponding audio file of the dialog text is synthesized, sends the audio to client
File;
The client is also used to receive the audio file, plays the audio file.
By above technical scheme as it can be seen that server receives the voice signal of client recording, server turns voice signal
Speech text is turned to, and determines the corresponding semanteme of speech text, server determines logic of language according to semanteme, passes through logic of language
Determine corresponding dialog text, it is final to synthesize the corresponding audio file of dialog text, audio file is sent to client, so that objective
Family end plays the dialogue of initiation next round after the audio file, and the method learning time of the realization Intelligent voice dialog is flexible, flower
Take it is low, to user answer limitation it is small, the learning experience of intelligent man-machine interactive is provided for user.
Detailed description of the invention
Figure 1A is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention;
Figure 1B is the schematic diagram of internal structure for the server that Figure 1A method is applicable in;
Fig. 2 is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention;
Fig. 3 is that provided by the invention another realizes the embodiment flow chart of the method for Intelligent voice dialog;
Fig. 4 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog;
Fig. 5 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog;
Fig. 6 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog;
Fig. 7 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog;
Fig. 8 is a kind of hardware structure diagram of server provided by the invention;
Fig. 9 is the embodiment block diagram of the device of realization Intelligent voice dialog provided by the invention;
Figure 10 is that provided by the invention another realizes the embodiment block diagram of the device of Intelligent voice dialog.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended
The example of device and method being described in detail in claims, some aspects of the invention are consistent.
It is only to be not intended to limit the invention merely for for the purpose of describing particular embodiments in terminology used in the present invention.
It is also intended in the present invention and the "an" of singular used in the attached claims, " described " and "the" including majority
Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps
It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the present invention
A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from
In the case where the scope of the invention, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as
One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determination ".
Figure 1A is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention.Realization intelligence
The method of voice dialogue as shown in Figure 1A, can include the following steps: using in the server
Step 101: receiving the voice signal of client recording.
Step 102: converting speech text for voice signal.
Step 103: determining the corresponding semanteme of speech text.
Step 104: determining semantic corresponding logic of language.
Step 105: determining the corresponding dialog text of logic of language.
Step 106: the corresponding audio file of synthesis dialog text.
Step 107: sending audio file to client.
In a step 101, in one embodiment, it will be appreciated by persons skilled in the art that client passes through screen exhibition
Show that at least one scene task, scene task are finally to realize the living scene of a certain purpose, scene task is for example are as follows: in dining room
A beefsteak is selected, boarding card is obtained on airport, does shopping in duty-free shop, handle move in Deng living scenes in hotel.User passes through a little
Screen selection scene task is hit, client receives user and clicks the scene instruction generated when screen, and user end to server is sent
Scene instruction, server instructs the function of opening Intelligent voice dialog based on the scene, and server is instructed based on the scene
Corresponding scene initiates first run dialogue, and by taking scene task is " selecting portion beefsteak in dining room " as an example, server is based on " to eat
Select portion beefsteak in the Room " the corresponding scene of scene instruction, first run dialogue is initiated, client terminal playing content is " What steak do
You want? " audio file, specifically the appearance form on client screen can be with are as follows: text prompt, picture, cardon, small
The combination of video etc. and audio file.It, can be with it will be appreciated by persons skilled in the art that by the way that different combination is arranged
Adjust the complexity of Intelligent dialogue.For example, more testing the hearing energy of user when audio file and picture combination are presented
Power, dialogue difficulty are higher;When audio file combines presentation with text prompt, user can more be held by reading text prompt
Readily understood voice content, dialogue difficulty are relatively simple.In every wheel dialogue, client passes through the record command unlatching pair for receiving user
The recording of voice signal, when client receives the instruction recorded and completed, client is sent obtained voice signal is recorded
To server.The voice signal of server reception client recording.For above-mentioned enquirement " What steak do you
Want? ", such as user is the voice signal of " I want Sirloin please " by client recording content.
In a step 102, in one embodiment, voice signal is converted speech text by server.In conjunction with step 101,
Server converts the voice signal of " I want Sirloin please " to the voice of " I want Sirloin please "
Text.Specifically, how server, which converts speech text for voice signal, can refer to description of related art, do not go to live in the household of one's in-laws on getting married herein
It states.
In step 103, in one embodiment, server determines the corresponding semanteme of speech text.Those skilled in the art
It is understood that the English level as user is not good enough, and the audio frequency effect recorded in addition is influenced by factors such as environmental disturbances,
In the speech text that server is converted based on voice signal it is possible that word missing, syntax error, punctuate the problems such as, therefore
Server needs to parse the core content that effectively can reflect dialogue original idea from speech text.In conjunction with step 102, service
Device determines that the verb " want " in " I want Sirloin please " speech text is the semanteme for indicating affirmative, in conjunction with " I
Noun " Sirloin " in want Sirloin please " speech text indicates that user wants sirloin steak, therefore server
It can determine that semantic is " wanting to eat sirloin steak ".Specifically, how server determines that the step of the corresponding semanteme of speech text can
With reference to the associated description of following step 201- steps 202 shown in Fig. 2, first it is not described herein herein.
At step 104, in one embodiment, server determines semantic corresponding logic of language.Wherein, logic of language is
The content of context of dialogue linking will meet thinking logic, for example, when upper one in dialogue is " I want Sirloin
Please ", then the associated next sentence pair words for meeting semantic logic can for " illustrating whether sirloin steak available in stock to deposit ",
" how would you like your steak done for inquiry " or " asking whether to need to add other garnishes and drinks ".In conjunction with step 103, such as server
It determines that semantic is " wanting to eat sirloin steak ", then can determine that semantic logic corresponding with " wanting to eat sirloin steak " can be " inquiry
How would you like your steak done ".Specifically, how server determines that the step of semantic corresponding logic of language can refer to shown in following Fig. 3
Step 301- step 302 associated description, be first not described herein herein.
In step 105, in one embodiment, server determines the corresponding dialog text of logic of language.In conjunction with step
104, for example, determining that semantic logic corresponding with " wanting to eat sirloin steak " is " how would you like your steak done for inquiry ", then server determines
" how would you like your steak done for inquiry " corresponding dialog text is " How should we prepare your steak, medium
Well, medium rare or well done? ".Specifically, how server determines the step of semantic corresponding logic of language
The associated description that can refer to following step 401- steps 402 shown in Fig. 4, is first not described herein herein.
In step 106, in one embodiment, the corresponding audio file of server synthesis dialog text.In conjunction with step
105, server is by dialog text " How should we prepare your steak, medium well, medium rare
Or well done? " synthesize corresponding audio file.Specifically, how server synthesizes the corresponding audio file of dialog text
Can refer to description of related art, therefore not to repeat here.
In the embodiment of the present invention, server receives the voice signal of client recording, and server converts voice signal to
Speech text, and determine the corresponding semanteme of speech text, server determines logic of language according to semanteme, is determined by logic of language
Corresponding dialog text, it is final to synthesize the corresponding audio file of dialog text, audio file is sent to client, so that client
Play the dialogue that next round is initiated after the audio file, the method learning time of the realization Intelligent voice dialog is flexible, spend it is low,
The limitation answered user is small, and the learning experience of intelligent man-machine interactive is provided for user.
Figure 1B is the schematic diagram of internal structure for the server that Figure 1A method is applicable in, and the server 11 in Figure 1B includes voice
Module 111, Understanding Module 112, logic module 113, text module 114, content module 115 and audio-frequency module 116.Wherein, language
Sound module 111 is used to receive the voice signal of the client recording of client transmission, and converts speech text for voice signal;
Understanding Module 112 is used to determine the semanteme of speech text;Logic module 113 is for determining semantic corresponding logic of language;Text
Module 114 is for determining the corresponding dialog text of logic of language;Content module 115 is used to be Understanding Module 112 and text module
114 provide corresponding word, phrase and sentence, provide preset logic configuration for logic module 113;Audio-frequency module 116 is used for
The dialog text Composite tone file that will be determined in text module 114.Specifically, in conjunction with the step 101- step of above-mentioned Figure 1A
107, voice module 111 receives the voice signal of client recording.The corresponding content of voice signal is, for example, " I want
The voice signal is converted speech text " I want Sirloin please " by Sirloin please ", voice module 111.
Understanding Module 112 determines that semantic is " to think by the verb " want " and noun " Sirloin " that provide in combined content module 115
Eat sirloin steak ".Logic module 113 determines " wanting to eat sirloin steak " semantic corresponding logic of language.For example, logic module 113
The three logic of propositions configuration provided from content module 115: " illustrating whether sirloin steak is available in stock to deposit ", " inquiry beefsteak wants several
It is point ripe ", in " asking whether to need to add other garnishes and drinks ", determine that semantic logic corresponding with " wanting to eat sirloin steak " is
" how would you like your steak done for inquiry ", then text module 114 determines that " how would you like your steak done for inquiry " corresponding dialog text is " How
Should we prepare your steak, medium well, medium rare or well done? ".Audio-frequency module
116 synthesis dialog text " How should we prepare your steak, medium well, medium rare or
Well done? " corresponding audio file.It will be appreciated by persons skilled in the art that the voice module in above-mentioned server
111, Understanding Module 112, logic module 113, text module 114, content module 115 and audio-frequency module 116 are merely illustrative
Bright, server can also include judgment module, and the modules such as scoring modules (being not shown in Figure 1B), judgment module can be used for judging field
Whether scape task is completed, for example, by taking scene task is " selecting portion beefsteak in dining room " as an example, when server judges client recording
Voice signal be " having selected a beefsteak ", then scene task of " selecting portion beefsteak in dining room " is to complete;Scoring modules are used for
It gives a mark to the voice signal of recording, specifically, how how server gives a mark to speech text, can refer to following
The associated description of step 608 shown in fig. 6, is first not described herein herein.
Fig. 2 is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention, in conjunction with Figure 1A,
How the corresponding semanteme of speech text is determined if illustrating to server on the basis of step 101- step 107, such as Fig. 2
It is shown, include the following steps:
Step 201: being based on the first default selection rule, choose at least one keyword in speech text.
Step 202: being determined based at least one keyword semantic.
In step 201, the first default selection rule can for choose speech text in verb, noun, personal pronoun,
Adverbial word etc. is used as keyword, specifically different selection rules can be arranged for different speech texts.When put question to " what,
When where " starts, then " noun " preferentially chosen in speech text is used as keyword;When with " who " beginning, then preferential choosing
Take " personal pronoun, noun " in speech text as keyword;When puing question to " how, do " beginning, then voice is preferentially chosen
" adverbial word " in text etc. is used as keyword.For example, when put up a question for " Do you want to eat Sirloin? " when, then may be used
Choose " adverbial word " that " Yes " or " No " etc. in speech text can define one's attitude, if speech text be " Yes, sure. ",
Then keyword is " Yes ";When rhetoric question is " Who is your best friends? " if speech text is " Lily is my
" Lily " that best friends. " can then choose in speech text can indicate " noun " of specific personage.
In step 202, server is determined semantic based at least one keyword, is " Do when putting up a question in conjunction with step 201
You want to eat Sirloin? " when, the keyword that server determines is " Yes ", then server can determine that semanteme is
" wanting to eat sirloin steak ".
In the embodiment of the present invention, server is based on the first default selection rule, chooses at least one of speech text and closes
Keyword, and semanteme is determined based at least one keyword, by the way that the first different default selection rules is arranged, can to service
Device is more intelligent in terms of semantic understanding, and has more high fault tolerance.
Fig. 3 is that provided by the invention another realizes that the embodiment flow chart of the method for Intelligent voice dialog, the present invention are real
Example combination Figure 1B is applied, how semantic corresponding logic of language is determined if illustrating to server, as shown in figure 3, including
Following steps:
Step 301: determining semantic corresponding at least one logic of propositions configuration.
Step 302: being based on the second default selection rule, determine logic of language from the configuration of at least one logic of propositions.
In step 301, in conjunction with Figure 1B, content module 115 is logic module 113 for storing logic of propositions configuration
Logic of propositions configuration is provided.By taking logic module 113 determines the semanteme of " wanting to eat sirloin steak " as an example, logic module 113 is from content
It determines that three logic of propositions corresponding with the semanteme of " wanting to eat sirloin steak " configure in module 115: " whether illustrating sirloin steak
It is available in stock to deposit ", " inquiry how would you like your steak done ", " asking whether to need to add other garnishes and drinks ".
In step 302, the second default selection rule is for example are as follows: is chosen at do not occurred before this wheel dialogue default and patrols
Collect configuration;Poll chooses logic of propositions configuration;It chooses by least logic of propositions configuration of access times etc..Such as logic module
The 113 three logic of propositions configurations recorded from content module 115: " illustrating whether sirloin steak is available in stock to deposit ", " inquiry beefsteak is wanted
" how would you like your steak done for inquiry " is chosen by polling mode in how would you like it ", " asking whether to need to add other garnishes and drinks "
As logic of language corresponding with the semanteme of " wanting to eat sirloin steak ".
In the embodiment of the present invention, server determines semantic corresponding at least one logic of propositions configuration, and server is based on the
Two default selection rules determine logic of language from the configuration of at least one logic of propositions, by the way that the reasonable second default choosing is arranged
Rule, and the greater number of logic of propositions of setting is taken to configure, the logic of language that server can be made finally to determine is more more
Sample.
Fig. 4 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real
Example combination Figure 1A is applied, on the basis of step 101- step 107, how the corresponding dialog text of logic of language is determined to server
It illustrates, as shown in figure 4, including the following steps:
Step 401: default rule of answering is determined based on speech signal.
Step 402: based on default rule of answering, determining the corresponding dialog text of logic of language.
In step 401, presetting rule of answering is determination principle when server determines dialog text based on logic of language.
Server based on speech signal determine the default method for answering rule may include: based on speech signal language text determine it is pre-
It is provided as answering rule;Default rule of answering is determined based on the score of speech signal.Wherein, it is determined based on the language text of speech signal
Default rule of answering is that server combination context provides suitable dialog text under concrete scene to logic of language;Based on language
The score of signal determines that default rule of answering for the speech signal different for score height, provides the different dialogue of complexity
Text.Specifically, server determines that the score value of user language ability, different score values correspond to different preset based on speech signal
It answers rule, such as: 0-30 points corresponding to be easier to the default of degree and answers regular (providing suggestive word) more;30-60 points pairs
Answer moderate default rule (normally answering) of answering;The default rule of answering of 60-100 points of corresponding more difficult degree (provides less
Indicative word).
In step 402, in step 105, server determines that logic of language is " how would you like your steak done for inquiry ", in conjunction with step
Rapid 401, if the score value of user language ability is 25 points, the default dialog text for answering the corresponding logic of language of rule is
" How should we prepare your steak, medium well, medium rare or well done? ",
In, " medium well, medium rare or well done " are the indicative word provided;If user language ability
Score value is 85 points, then the dialog text for presetting the corresponding logic of language of rule of answering is " How should we prepare
Your steak? ", indicative word is not provided.
In the embodiment of the present invention, server determines default rule of answering based on speech signal, and is based on default rule of answering,
Determine the corresponding dialog text of logic of language, it, can be with flexible transformation dialog text by the way that reasonable default rule of answering is arranged
Difficulty or ease.
Fig. 5 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real
Example combination Figure 1A is applied, on the basis of step 101- step 107, is illustrated to how server terminates dialogue, such as
Shown in Fig. 5, include the following steps:
Step 501: judging whether dialog text is consistent with goal-selling text.
Step 502: if dialog text is consistent with goal-selling text, terminating to talk with.
In step 501- step 502, goal-selling text is the preset dialogue for indicating scene task and completing of server
Text, in conjunction with the scene task in Figure 1A be " selecting portion beefsteak in dining room " for, if dialog text be " Enjoy your
Unanimously, then the function of server closing Intelligent voice dialog, is tied by meal " and preset target text " Enjoy your meal "
Beam dialogue.
In the embodiment of the present invention, server judges whether dialog text consistent with goal-selling text, if dialog text with
When goal-selling text is consistent, then server terminates to talk with, and achievees the purpose that complete scene task.
Fig. 6 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real
Example combination Figure 1A is applied, on the basis of step 101- step 107, at least one dimension of voice signal how is determined to server
Score value.It illustrates, as shown in fig. 6, including the following steps:
Step 601: receiving the voice signal of client recording.
Step 602: converting speech text for voice signal.
Step 603: determining the corresponding semanteme of speech text.
Step 604: determining semantic corresponding logic of language.
Step 605: determining the corresponding dialog text of logic of language.
Step 606: the corresponding audio file of synthesis dialog text.
Step 607: sending audio file to client.
Step 608: based on default standards of grading, determining the score value of at least one dimension of voice signal.
In step 601- step 607, associated description can refer to the associated description of the step 101- step 107 in Figure 1A,
This will not be repeated here, it should be noted that step 608 can execute before or after executing the arbitrary steps after step 601, herein
The timing of step 608 is not defined.
In step 608, default standards of grading are preset, and default standards of grading can be from multiple dimensions to language
Signal is given a mark, and multiple dimensions include: pronunciation, fluency, expression, complete independently etc..Specifically, by taking fluency as an example, in advance
If the time span for the speech signal that standards of grading can record user is judged;By taking pronunciation as an example, standards of grading are preset
The quantity of the effective word or phrase that can be converted speech signal in speech text with server is judged.Service
Device is given a mark to each dimension of voice signal, obtains the score value of each dimension.Server is by being arranged each dimension
Different weights are also based on user session overall performance and give a mark, and server can also generate hearing user, pronunciation, stream
Each dimension ability distribution map of situations such as sharp degree, expression, complete independently.The analysis being distributed based on ability is provided simultaneously and improvement is built
View can also select the performance lower dimension part of score value in the dimensions such as pronunciation, expression and be commented on.
In the embodiment of the present invention, server determines at least one dimension of voice signal to based on default standards of grading
Score value intuitively the ability to user session just can carry out brightening displaying by score value, while server is lower according to score value
Dimension part carry out analysis comment, facilitate user and targetedly learn in short slab dimension.
Fig. 7 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real
Example combination Figure 1A is applied, on the basis of step 101- step 107, receives how to handle finger of seeking help after seeking help instruction to server
Order illustrates, as shown in fig. 7, comprises following steps:
Step 701: when receive seek help instruct when, determine at least one with reference to dialog text based on current session text.
Step 702: sending at least one with reference to dialog text to client.
In step 701- step 702, when server receive client transmission seek help instruct when, server determination works as
Preceding dialog text, current session text are the problem of current waiting user that server is sent to client answers.Server base
In current session text determine it is corresponding with the current dialog text at least one refer to dialog text, for example, client reception
The instruction of seeking help is sent to server by the instruction of seeking help that " requesting help " control for clicking screen to user generates, client,
Server determine current session text be " What steak do you want? We have Rib Eye, Sirloin and
T-Bone ", then server determines that at least one is preset with reference to dialog text " I will have the Rib Eye
please","I'd like to try the Sirloin","I am ordering the T-Bone".Server is to client
End, which is sent, refers to dialog text " I will have the Rib Eye please ", " I ' d like to try the
Sirloin ", " I am ordering the T-Bone ", client are shown this three kinds with reference to dialog text in screen,
With for reference.
In the embodiment of the present invention, when server receive seek help instruct when, server based on current session text determine extremely
Few one refers to dialog text, and sends at least one with reference to dialog text to client, provides with reference to example, plays prompt
Effect facilitates user's memory and learning by imitation.
Corresponding to the method for above-mentioned realization Intelligent voice dialog, the invention also provides the hardware of server shown in Fig. 8
Structure chart.Referring to FIG. 8, in hardware view, the server include processor, internal bus, network interface, memory and it is non-easily
The property lost memory, is also possible that hardware required for other business certainly.Processor is read pair from nonvolatile memory
Then the computer program answered is run into memory, the device for realizing Intelligent voice dialog is formed on logic level.Certainly, it removes
Except software realization mode, other implementations, such as the side of logical device or software and hardware combining is not precluded in the present invention
Formula etc., that is to say, that the executing subject of following process flow is not limited to each logic unit, is also possible to hardware or patrols
Collect device.
Fig. 9 is the embodiment block diagram of the device of realization Intelligent voice dialog provided by the invention, as shown in figure 9, should
Realize Intelligent voice dialog device may include: speech reception module 91, text conversion module 92, semantic determining module 93,
Logic determining module 94, text determining module 95, audio synthesis module 96, audio sending module 97, in which:
Speech reception module 91, for receiving the voice signal of client recording;
Text conversion module 92, for converting speech text for voice signal;
Semantic determining module 93, for determining the corresponding semanteme of speech text;
Logic determining module 94, for determining semantic corresponding logic of language;
Text determining module 95, for determining the corresponding dialog text of logic of language;
Audio synthesis module 96, for synthesizing the corresponding audio file of dialog text;
Audio sending module 97, for sending audio file to client.
Figure 10 is that provided by the invention another realizes the embodiment block diagram of the device of Intelligent voice dialog, such as Figure 10 institute
Show, on the basis of above-mentioned embodiment illustrated in fig. 9, semantic determining module 93 includes:
Keyword chooses submodule 931, for being based on the first default selection rule, chooses at least one of speech text
Keyword;
First determines submodule 932, semantic for being determined based at least one keyword.
In one embodiment, logic determining module 94 includes:
Second determines submodule 941, for determining semantic corresponding at least one logic of propositions configuration;
Third determines submodule 942, for being based on the second default selection rule, from the configuration of at least one logic of propositions really
Determine logic of language.
In one embodiment, text determining module 95 includes:
4th determines submodule 951, for determining default rule of answering based on speech signal;
5th determines submodule 952, for determining the corresponding dialog text of logic of language based on default rule of answering.
In one embodiment, the device of Intelligent voice dialog is realized further include:
Text judgment module 98, for judging whether dialog text is consistent with goal-selling text;
End-of-dialogue module 99, if terminating to talk with when consistent with goal-selling text for dialog text.
In one embodiment, the device of Intelligent voice dialog is realized further include:
Grading module 100, for determining the score value of at least one dimension of voice signal based on default standards of grading.
In one embodiment, the device of Intelligent voice dialog is realized further include:
Referenced text determining module 101, for when receive seek help instruct when, based on current session text determine at least one
It is a to refer to dialog text;
Text sending module 102, for sending at least one with reference to dialog text to client.
The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus
Realization process, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein being used as separate part description
Unit may or may not be physically separated, component shown as a unit may or may not be
Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs
Some or all of the modules therein is selected to realize the purpose of the present invention program.Those of ordinary skill in the art are not paying wound
In the case that the property made is worked, it can understand and implement.
As seen from the above-described embodiment, in the embodiment of the present invention, server receives the voice signal of client recording, server
Speech text is converted by the voice signal, and determines the corresponding semanteme of the speech text, server is determined according to semanteme
Logic of language determines corresponding dialog text by logic of language, finally synthesizes the corresponding audio file of the dialog text, to
Client sends the audio file, so as to the dialogue of next round is initiated after the client terminal playing audio file, realization intelligence
The method learning time of voice dialogue is flexible, and cost is low, and the limitation answered user is small, and intelligent man-machine friendship is provided for user
The learning experience of mutual formula.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (10)
1. a kind of method for realizing Intelligent voice dialog, which is characterized in that the described method includes:
Receive the voice signal of client recording;
Speech text is converted by the voice signal;
Determine the corresponding semanteme of the speech text;
Determine the corresponding logic of language of the semanteme;
Determine the corresponding dialog text of the logic of language;
Synthesize the corresponding audio file of the dialog text;
The audio file is sent to client.
2. the method according to claim 1, wherein the corresponding semanteme of the determination speech text includes:
Based on the first default selection rule, at least one keyword in the speech text is chosen;
It is determined based at least one described keyword semantic.
3. the method according to claim 1, wherein the corresponding logic of language of the determination semanteme includes:
Determine corresponding at least one logic of propositions configuration of the semanteme;
Based on the second default selection rule, logic of language is determined from the configuration of at least one described logic of propositions.
4. the method according to claim 1, wherein the corresponding dialog text packet of the determination logic of language
It includes:
Default rule of answering is determined based on speech signal;
Based on the default rule of answering, the corresponding dialog text of the logic of language is determined.
5. the method according to claim 1, wherein the method also includes:
Judge whether the dialog text is consistent with goal-selling text;
If the dialog text is consistent with the goal-selling text, terminate to talk with.
6. the method according to claim 1, wherein the method also includes:
Based on default standards of grading, the score value of at least one dimension of the voice signal is determined.
7. -6 any method according to claim 1, which is characterized in that the method also includes:
When receive seek help instruct when, determine at least one with reference to dialog text based on current session text;
To client send it is described at least one refer to dialog text.
8. a kind of device for realizing Intelligent voice dialog, which is characterized in that described device includes:
Speech reception module, for receiving the voice signal of client recording;
Text conversion module, for converting speech text for the voice signal;
Semantic determining module, for determining the corresponding semanteme of the speech text;
Logic determining module, for determining the corresponding logic of language of the semanteme;
Text determining module, for determining the corresponding dialog text of the logic of language;
Audio synthesis module, for synthesizing the corresponding audio file of the dialog text;
Audio sending module, for sending the audio file to client.
9. device according to claim 8, which is characterized in that it is described semanteme determining module include:
Keyword chooses submodule, for being based on the first default selection rule, chooses at least one of described speech text pass
Keyword;
First determines submodule, semantic for being determined based at least one described keyword.
10. a kind of system for realizing Intelligent voice dialog, which is characterized in that the system comprises: client, server;Wherein,
The client sends scene instruction to server for receiving scene instruction;
The server, for instructing the function of opening Intelligent voice dialog based on the scene, and based on scene instruction pair
The scene answered initiates first run dialogue to client, and when receiving the voice signal of client recording, the voice signal is turned
Speech text is turned to, determines the corresponding semanteme of the speech text, the corresponding logic of language of the semanteme is determined, determines institute's predicate
It says the corresponding dialog text of logic, synthesizes the corresponding audio file of the dialog text, send the audio file to client;
The client is also used to receive the audio file, plays the audio file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810105481.0A CN110136719B (en) | 2018-02-02 | 2018-02-02 | Method, device and system for realizing intelligent voice conversation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810105481.0A CN110136719B (en) | 2018-02-02 | 2018-02-02 | Method, device and system for realizing intelligent voice conversation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110136719A true CN110136719A (en) | 2019-08-16 |
CN110136719B CN110136719B (en) | 2022-01-28 |
Family
ID=67567135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810105481.0A Active CN110136719B (en) | 2018-02-02 | 2018-02-02 | Method, device and system for realizing intelligent voice conversation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110136719B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021051404A1 (en) * | 2019-09-20 | 2021-03-25 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for auxiliary reply |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030236667A1 (en) * | 2002-03-12 | 2003-12-25 | Wen Say Ling | Computer-assisted language listening and speaking teaching system and method with circumstantial shadow and assessment functions |
CN1637740A (en) * | 2003-11-20 | 2005-07-13 | 阿鲁策株式会社 | Conversation control apparatus, and conversation control method |
US20070011005A1 (en) * | 2005-05-09 | 2007-01-11 | Altis Avante | Comprehension instruction system and method |
CN101098366A (en) * | 2006-06-30 | 2008-01-02 | 英华达(南京)科技有限公司 | System and method for on-line interactive learning through network telephone |
CN101496077A (en) * | 2005-05-09 | 2009-07-29 | 阿尔蒂斯阿万特公司 | Comprephension instruction system and method |
US20100223060A1 (en) * | 2009-02-27 | 2010-09-02 | Yao-Yuan Chang | Speech Interactive System And Method |
CN102667889A (en) * | 2009-12-16 | 2012-09-12 | 浦项工科大学校产学协力团 | Apparatus and method for foreign language study |
CN103198831A (en) * | 2013-04-10 | 2013-07-10 | 威盛电子股份有限公司 | Voice control method and mobile terminal device |
CN105575384A (en) * | 2016-01-13 | 2016-05-11 | 广东小天才科技有限公司 | Method, apparatus and equipment for automatically adjusting play resource according to the level of user |
CN105975511A (en) * | 2016-04-27 | 2016-09-28 | 乐视控股(北京)有限公司 | Intelligent dialogue method and apparatus |
CN106558252A (en) * | 2015-09-28 | 2017-04-05 | 百度在线网络技术(北京)有限公司 | By computer implemented spoken language exercise method and device |
US9798799B2 (en) * | 2012-11-15 | 2017-10-24 | Sri International | Vehicle personal assistant that interprets spoken natural language input based upon vehicle context |
-
2018
- 2018-02-02 CN CN201810105481.0A patent/CN110136719B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030236667A1 (en) * | 2002-03-12 | 2003-12-25 | Wen Say Ling | Computer-assisted language listening and speaking teaching system and method with circumstantial shadow and assessment functions |
CN1637740A (en) * | 2003-11-20 | 2005-07-13 | 阿鲁策株式会社 | Conversation control apparatus, and conversation control method |
US20070011005A1 (en) * | 2005-05-09 | 2007-01-11 | Altis Avante | Comprehension instruction system and method |
CN101496077A (en) * | 2005-05-09 | 2009-07-29 | 阿尔蒂斯阿万特公司 | Comprephension instruction system and method |
CN101098366A (en) * | 2006-06-30 | 2008-01-02 | 英华达(南京)科技有限公司 | System and method for on-line interactive learning through network telephone |
US20100223060A1 (en) * | 2009-02-27 | 2010-09-02 | Yao-Yuan Chang | Speech Interactive System And Method |
CN102667889A (en) * | 2009-12-16 | 2012-09-12 | 浦项工科大学校产学协力团 | Apparatus and method for foreign language study |
US9798799B2 (en) * | 2012-11-15 | 2017-10-24 | Sri International | Vehicle personal assistant that interprets spoken natural language input based upon vehicle context |
CN103198831A (en) * | 2013-04-10 | 2013-07-10 | 威盛电子股份有限公司 | Voice control method and mobile terminal device |
CN106558252A (en) * | 2015-09-28 | 2017-04-05 | 百度在线网络技术(北京)有限公司 | By computer implemented spoken language exercise method and device |
CN105575384A (en) * | 2016-01-13 | 2016-05-11 | 广东小天才科技有限公司 | Method, apparatus and equipment for automatically adjusting play resource according to the level of user |
CN105975511A (en) * | 2016-04-27 | 2016-09-28 | 乐视控股(北京)有限公司 | Intelligent dialogue method and apparatus |
Non-Patent Citations (2)
Title |
---|
YASUO NAKATANI: "Identifying Strategies That Facilitate EFL Learners’ Oral Communication:A Classroom Study Using Multiple Data Collection Procedures", 《THE MODERN LANGUAGE JPORNAL》 * |
谈利佳: "浅谈"人机对话"背景下初中英语口语教学策略", 《新校园(中旬刊)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021051404A1 (en) * | 2019-09-20 | 2021-03-25 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for auxiliary reply |
Also Published As
Publication number | Publication date |
---|---|
CN110136719B (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9318113B2 (en) | Method and apparatus for conducting synthesized, semi-scripted, improvisational conversations | |
WO2019174072A1 (en) | Intelligent robot based training method and apparatus, computer device and storage medium | |
JP6719747B2 (en) | Interactive method, interactive system, interactive device, and program | |
CN112074899A (en) | System and method for intelligent initiation of human-computer dialog based on multimodal sensory input | |
US20140278403A1 (en) | Systems and methods for interactive synthetic character dialogue | |
CN105975511A (en) | Intelligent dialogue method and apparatus | |
CN111290568A (en) | Interaction method and device and computer equipment | |
JP6699010B2 (en) | Dialogue method, dialogue system, dialogue device, and program | |
CN116009748B (en) | Picture information interaction method and device in children interaction story | |
CN109817244A (en) | Oral evaluation method, apparatus, equipment and storage medium | |
KR20070006742A (en) | Language teaching method | |
WO2020070923A1 (en) | Dialogue device, method therefor, and program | |
CN114048299A (en) | Dialogue method, apparatus, device, computer-readable storage medium, and program product | |
CN109903618A (en) | Listening Training method, apparatus, equipment and storage medium | |
CN110136719A (en) | A kind of method, apparatus and system for realizing Intelligent voice dialog | |
JP4085015B2 (en) | STREAM DATA GENERATION DEVICE, STREAM DATA GENERATION SYSTEM, STREAM DATA GENERATION METHOD, AND PROGRAM | |
Darves et al. | Talking to digital fish: Designing effective conversational interfaces for educational software | |
KR20190070683A (en) | Apparatus and method for constructing and providing lecture contents | |
JP6656529B2 (en) | Foreign language conversation training system | |
WO2017200075A1 (en) | Dialog method, dialog system, dialog scenario generation method, dialog scenario generation device, and program | |
Garde | Spotlight on the Audience: Collective Creativity in Recent Documentary and Reality Theatre from Australia and Germany | |
Jahn | Foundational issues in teaching cognitive narratology | |
CN116843805B (en) | Method, device, equipment and medium for generating virtual image containing behaviors | |
JP2001195419A (en) | Information-providing system | |
Shvetcov et al. | Algorithms of Natural Language Dialogue with Intelligent Robot NAO Evolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |