CN110136719A

CN110136719A - A kind of method, apparatus and system for realizing Intelligent voice dialog

Info

Publication number: CN110136719A
Application number: CN201810105481.0A
Authority: CN
Inventors: 翁翔坚; 林晖; 刘翔; 韩旭
Original assignee: SHANGHAI LIULISHUO INFORMATION TECHNOLOGY Co Ltd
Current assignee: SHANGHAI LIULISHUO INFORMATION TECHNOLOGY Co Ltd
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2019-08-16
Anticipated expiration: 2038-02-02
Also published as: CN110136719B

Abstract

The present invention provides a kind of method, apparatus and system for realizing Intelligent voice dialog, which comprises receives the voice signal of client recording；Speech text is converted by the voice signal；Determine the corresponding semanteme of the speech text；Determine the corresponding logic of language of the semanteme；Determine the corresponding dialog text of the logic of language；Synthesize the corresponding audio file of the dialog text；The audio file is sent to client.Using the embodiment of the present invention, the English study time is flexible, and cost is low, and the limitation answered user is small, and the learning experience of intelligent man-machine interactive is provided for user.

Description

A kind of method, apparatus and system for realizing Intelligent voice dialog

Technical field

The present invention relates to field of artificial intelligence more particularly to a kind of method, apparatus for realizing Intelligent voice dialog and System.

Background technique

Raising with people to the attention degree of English study, more and more English study mechanisms and English study are soft Part comes into being.

In general, people select foreign teacher's course under the line of payment, foreign teacher's course time is solid under line to preferably practice spoken language Fixed, learning time is not flexible, spends high；And the simulation dialogue that Online Learning software provides must be promoted according to set process, Directly providing option allows user to answer, and the limitation answered user is very big, and the intelligence that man-machine interactive can not be provided for user is learned Practise experience.

Summary of the invention

In view of this, the present invention provides a kind of method, apparatus and system for realizing Intelligent voice dialog, to solve Anglistics The habit time is not flexible, spends high, big to the limitation of user's answer problem.

To achieve the above object, it is as follows to provide technical solution by the present invention:

According to the first aspect of the invention, a kind of method for realizing Intelligent voice dialog is proposed, which comprises

Receive the voice signal of client recording；

Speech text is converted by the voice signal；

Determine the corresponding semanteme of the speech text；

Determine the corresponding logic of language of the semanteme；

Determine the corresponding dialog text of the logic of language；

Synthesize the corresponding audio file of the dialog text；

The audio file is sent to client.

According to the second aspect of the invention, a kind of device for realizing Intelligent voice dialog is proposed, comprising:

Speech reception module, for receiving the voice signal of client recording；

Text conversion module, for converting speech text for the voice signal；

Semantic determining module, for determining the corresponding semanteme of the speech text；

Logic determining module, for determining the corresponding logic of language of the semanteme；

Text determining module, for determining the corresponding dialog text of the logic of language；

Audio synthesis module, for synthesizing the corresponding audio file of the dialog text；

Audio sending module, for sending the audio file to client.

According to the third aspect of the invention we, a kind of system for realizing Intelligent voice dialog is proposed, the system comprises: visitor Family end, server；Wherein,

The client sends scene instruction to server for receiving scene instruction；

The server is referred to for being instructed the function of opening Intelligent voice dialog based on the scene, and based on the scene It enables corresponding scene initiate first run dialogue to client, when receiving the voice signal of client recording, the voice is believed Number it is converted into speech text, determines the corresponding semanteme of the speech text, determine the corresponding logic of language of the semanteme, determine institute The corresponding dialog text of logic of language is stated, the corresponding audio file of the dialog text is synthesized, sends the audio to client File；

The client is also used to receive the audio file, plays the audio file.

By above technical scheme as it can be seen that server receives the voice signal of client recording, server turns voice signal Speech text is turned to, and determines the corresponding semanteme of speech text, server determines logic of language according to semanteme, passes through logic of language Determine corresponding dialog text, it is final to synthesize the corresponding audio file of dialog text, audio file is sent to client, so that objective Family end plays the dialogue of initiation next round after the audio file, and the method learning time of the realization Intelligent voice dialog is flexible, flower Take it is low, to user answer limitation it is small, the learning experience of intelligent man-machine interactive is provided for user.

Detailed description of the invention

Figure 1A is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention；

Figure 1B is the schematic diagram of internal structure for the server that Figure 1A method is applicable in；

Fig. 2 is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention；

Fig. 3 is that provided by the invention another realizes the embodiment flow chart of the method for Intelligent voice dialog；

Fig. 4 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog；

Fig. 5 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog；

Fig. 6 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog；

Fig. 7 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog；

Fig. 8 is a kind of hardware structure diagram of server provided by the invention；

Fig. 9 is the embodiment block diagram of the device of realization Intelligent voice dialog provided by the invention；

Figure 10 is that provided by the invention another realizes the embodiment block diagram of the device of Intelligent voice dialog.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

It is only to be not intended to limit the invention merely for for the purpose of describing particular embodiments in terminology used in the present invention. It is also intended in the present invention and the "an" of singular used in the attached claims, " described " and "the" including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the present invention A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where the scope of the invention, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".

Figure 1A is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention.Realization intelligence The method of voice dialogue as shown in Figure 1A, can include the following steps: using in the server

Step 101: receiving the voice signal of client recording.

Step 102: converting speech text for voice signal.

Step 103: determining the corresponding semanteme of speech text.

Step 104: determining semantic corresponding logic of language.

Step 105: determining the corresponding dialog text of logic of language.

Step 106: the corresponding audio file of synthesis dialog text.

Step 107: sending audio file to client.

In a step 101, in one embodiment, it will be appreciated by persons skilled in the art that client passes through screen exhibition Show that at least one scene task, scene task are finally to realize the living scene of a certain purpose, scene task is for example are as follows: in dining room A beefsteak is selected, boarding card is obtained on airport, does shopping in duty-free shop, handle move in Deng living scenes in hotel.User passes through a little Screen selection scene task is hit, client receives user and clicks the scene instruction generated when screen, and user end to server is sent Scene instruction, server instructs the function of opening Intelligent voice dialog based on the scene, and server is instructed based on the scene Corresponding scene initiates first run dialogue, and by taking scene task is " selecting portion beefsteak in dining room " as an example, server is based on " to eat Select portion beefsteak in the Room " the corresponding scene of scene instruction, first run dialogue is initiated, client terminal playing content is " What steak do You want? " audio file, specifically the appearance form on client screen can be with are as follows: text prompt, picture, cardon, small The combination of video etc. and audio file.It, can be with it will be appreciated by persons skilled in the art that by the way that different combination is arranged Adjust the complexity of Intelligent dialogue.For example, more testing the hearing energy of user when audio file and picture combination are presented Power, dialogue difficulty are higher；When audio file combines presentation with text prompt, user can more be held by reading text prompt Readily understood voice content, dialogue difficulty are relatively simple.In every wheel dialogue, client passes through the record command unlatching pair for receiving user The recording of voice signal, when client receives the instruction recorded and completed, client is sent obtained voice signal is recorded To server.The voice signal of server reception client recording.For above-mentioned enquirement " What steak do you Want? ", such as user is the voice signal of " I want Sirloin please " by client recording content.

In a step 102, in one embodiment, voice signal is converted speech text by server.In conjunction with step 101, Server converts the voice signal of " I want Sirloin please " to the voice of " I want Sirloin please " Text.Specifically, how server, which converts speech text for voice signal, can refer to description of related art, do not go to live in the household of one's in-laws on getting married herein It states.

In step 103, in one embodiment, server determines the corresponding semanteme of speech text.Those skilled in the art It is understood that the English level as user is not good enough, and the audio frequency effect recorded in addition is influenced by factors such as environmental disturbances, In the speech text that server is converted based on voice signal it is possible that word missing, syntax error, punctuate the problems such as, therefore Server needs to parse the core content that effectively can reflect dialogue original idea from speech text.In conjunction with step 102, service Device determines that the verb " want " in " I want Sirloin please " speech text is the semanteme for indicating affirmative, in conjunction with " I Noun " Sirloin " in want Sirloin please " speech text indicates that user wants sirloin steak, therefore server It can determine that semantic is " wanting to eat sirloin steak ".Specifically, how server determines that the step of the corresponding semanteme of speech text can With reference to the associated description of following step 201- steps 202 shown in Fig. 2, first it is not described herein herein.

At step 104, in one embodiment, server determines semantic corresponding logic of language.Wherein, logic of language is The content of context of dialogue linking will meet thinking logic, for example, when upper one in dialogue is " I want Sirloin Please ", then the associated next sentence pair words for meeting semantic logic can for " illustrating whether sirloin steak available in stock to deposit ", " how would you like your steak done for inquiry " or " asking whether to need to add other garnishes and drinks ".In conjunction with step 103, such as server It determines that semantic is " wanting to eat sirloin steak ", then can determine that semantic logic corresponding with " wanting to eat sirloin steak " can be " inquiry How would you like your steak done ".Specifically, how server determines that the step of semantic corresponding logic of language can refer to shown in following Fig. 3 Step 301- step 302 associated description, be first not described herein herein.

In step 105, in one embodiment, server determines the corresponding dialog text of logic of language.In conjunction with step 104, for example, determining that semantic logic corresponding with " wanting to eat sirloin steak " is " how would you like your steak done for inquiry ", then server determines " how would you like your steak done for inquiry " corresponding dialog text is " How should we prepare your steak, medium Well, medium rare or well done? ".Specifically, how server determines the step of semantic corresponding logic of language The associated description that can refer to following step 401- steps 402 shown in Fig. 4, is first not described herein herein.

In step 106, in one embodiment, the corresponding audio file of server synthesis dialog text.In conjunction with step 105, server is by dialog text " How should we prepare your steak, medium well, medium rare Or well done? " synthesize corresponding audio file.Specifically, how server synthesizes the corresponding audio file of dialog text Can refer to description of related art, therefore not to repeat here.

In the embodiment of the present invention, server receives the voice signal of client recording, and server converts voice signal to Speech text, and determine the corresponding semanteme of speech text, server determines logic of language according to semanteme, is determined by logic of language Corresponding dialog text, it is final to synthesize the corresponding audio file of dialog text, audio file is sent to client, so that client Play the dialogue that next round is initiated after the audio file, the method learning time of the realization Intelligent voice dialog is flexible, spend it is low, The limitation answered user is small, and the learning experience of intelligent man-machine interactive is provided for user.

Figure 1B is the schematic diagram of internal structure for the server that Figure 1A method is applicable in, and the server 11 in Figure 1B includes voice Module 111, Understanding Module 112, logic module 113, text module 114, content module 115 and audio-frequency module 116.Wherein, language Sound module 111 is used to receive the voice signal of the client recording of client transmission, and converts speech text for voice signal； Understanding Module 112 is used to determine the semanteme of speech text；Logic module 113 is for determining semantic corresponding logic of language；Text Module 114 is for determining the corresponding dialog text of logic of language；Content module 115 is used to be Understanding Module 112 and text module 114 provide corresponding word, phrase and sentence, provide preset logic configuration for logic module 113；Audio-frequency module 116 is used for The dialog text Composite tone file that will be determined in text module 114.Specifically, in conjunction with the step 101- step of above-mentioned Figure 1A 107, voice module 111 receives the voice signal of client recording.The corresponding content of voice signal is, for example, " I want The voice signal is converted speech text " I want Sirloin please " by Sirloin please ", voice module 111. Understanding Module 112 determines that semantic is " to think by the verb " want " and noun " Sirloin " that provide in combined content module 115 Eat sirloin steak ".Logic module 113 determines " wanting to eat sirloin steak " semantic corresponding logic of language.For example, logic module 113 The three logic of propositions configuration provided from content module 115: " illustrating whether sirloin steak is available in stock to deposit ", " inquiry beefsteak wants several It is point ripe ", in " asking whether to need to add other garnishes and drinks ", determine that semantic logic corresponding with " wanting to eat sirloin steak " is " how would you like your steak done for inquiry ", then text module 114 determines that " how would you like your steak done for inquiry " corresponding dialog text is " How Should we prepare your steak, medium well, medium rare or well done? ".Audio-frequency module 116 synthesis dialog text " How should we prepare your steak, medium well, medium rare or Well done? " corresponding audio file.It will be appreciated by persons skilled in the art that the voice module in above-mentioned server 111, Understanding Module 112, logic module 113, text module 114, content module 115 and audio-frequency module 116 are merely illustrative Bright, server can also include judgment module, and the modules such as scoring modules (being not shown in Figure 1B), judgment module can be used for judging field Whether scape task is completed, for example, by taking scene task is " selecting portion beefsteak in dining room " as an example, when server judges client recording Voice signal be " having selected a beefsteak ", then scene task of " selecting portion beefsteak in dining room " is to complete；Scoring modules are used for It gives a mark to the voice signal of recording, specifically, how how server gives a mark to speech text, can refer to following The associated description of step 608 shown in fig. 6, is first not described herein herein.

Fig. 2 is the embodiment flow chart of the method for realization Intelligent voice dialog provided by the invention, in conjunction with Figure 1A, How the corresponding semanteme of speech text is determined if illustrating to server on the basis of step 101- step 107, such as Fig. 2 It is shown, include the following steps:

Step 201: being based on the first default selection rule, choose at least one keyword in speech text.

Step 202: being determined based at least one keyword semantic.

In step 201, the first default selection rule can for choose speech text in verb, noun, personal pronoun, Adverbial word etc. is used as keyword, specifically different selection rules can be arranged for different speech texts.When put question to " what, When where " starts, then " noun " preferentially chosen in speech text is used as keyword；When with " who " beginning, then preferential choosing Take " personal pronoun, noun " in speech text as keyword；When puing question to " how, do " beginning, then voice is preferentially chosen " adverbial word " in text etc. is used as keyword.For example, when put up a question for " Do you want to eat Sirloin? " when, then may be used Choose " adverbial word " that " Yes " or " No " etc. in speech text can define one's attitude, if speech text be " Yes, sure. ", Then keyword is " Yes "；When rhetoric question is " Who is your best friends? " if speech text is " Lily is my " Lily " that best friends. " can then choose in speech text can indicate " noun " of specific personage.

In step 202, server is determined semantic based at least one keyword, is " Do when putting up a question in conjunction with step 201 You want to eat Sirloin? " when, the keyword that server determines is " Yes ", then server can determine that semanteme is " wanting to eat sirloin steak ".

In the embodiment of the present invention, server is based on the first default selection rule, chooses at least one of speech text and closes Keyword, and semanteme is determined based at least one keyword, by the way that the first different default selection rules is arranged, can to service Device is more intelligent in terms of semantic understanding, and has more high fault tolerance.

Fig. 3 is that provided by the invention another realizes that the embodiment flow chart of the method for Intelligent voice dialog, the present invention are real Example combination Figure 1B is applied, how semantic corresponding logic of language is determined if illustrating to server, as shown in figure 3, including Following steps:

Step 301: determining semantic corresponding at least one logic of propositions configuration.

Step 302: being based on the second default selection rule, determine logic of language from the configuration of at least one logic of propositions.

In step 301, in conjunction with Figure 1B, content module 115 is logic module 113 for storing logic of propositions configuration Logic of propositions configuration is provided.By taking logic module 113 determines the semanteme of " wanting to eat sirloin steak " as an example, logic module 113 is from content It determines that three logic of propositions corresponding with the semanteme of " wanting to eat sirloin steak " configure in module 115: " whether illustrating sirloin steak It is available in stock to deposit ", " inquiry how would you like your steak done ", " asking whether to need to add other garnishes and drinks ".

In step 302, the second default selection rule is for example are as follows: is chosen at do not occurred before this wheel dialogue default and patrols Collect configuration；Poll chooses logic of propositions configuration；It chooses by least logic of propositions configuration of access times etc..Such as logic module The 113 three logic of propositions configurations recorded from content module 115: " illustrating whether sirloin steak is available in stock to deposit ", " inquiry beefsteak is wanted " how would you like your steak done for inquiry " is chosen by polling mode in how would you like it ", " asking whether to need to add other garnishes and drinks " As logic of language corresponding with the semanteme of " wanting to eat sirloin steak ".

In the embodiment of the present invention, server determines semantic corresponding at least one logic of propositions configuration, and server is based on the Two default selection rules determine logic of language from the configuration of at least one logic of propositions, by the way that the reasonable second default choosing is arranged Rule, and the greater number of logic of propositions of setting is taken to configure, the logic of language that server can be made finally to determine is more more Sample.

Fig. 4 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real Example combination Figure 1A is applied, on the basis of step 101- step 107, how the corresponding dialog text of logic of language is determined to server It illustrates, as shown in figure 4, including the following steps:

Step 401: default rule of answering is determined based on speech signal.

Step 402: based on default rule of answering, determining the corresponding dialog text of logic of language.

In step 401, presetting rule of answering is determination principle when server determines dialog text based on logic of language. Server based on speech signal determine the default method for answering rule may include: based on speech signal language text determine it is pre- It is provided as answering rule；Default rule of answering is determined based on the score of speech signal.Wherein, it is determined based on the language text of speech signal Default rule of answering is that server combination context provides suitable dialog text under concrete scene to logic of language；Based on language The score of signal determines that default rule of answering for the speech signal different for score height, provides the different dialogue of complexity Text.Specifically, server determines that the score value of user language ability, different score values correspond to different preset based on speech signal It answers rule, such as: 0-30 points corresponding to be easier to the default of degree and answers regular (providing suggestive word) more；30-60 points pairs Answer moderate default rule (normally answering) of answering；The default rule of answering of 60-100 points of corresponding more difficult degree (provides less Indicative word).

In step 402, in step 105, server determines that logic of language is " how would you like your steak done for inquiry ", in conjunction with step Rapid 401, if the score value of user language ability is 25 points, the default dialog text for answering the corresponding logic of language of rule is " How should we prepare your steak, medium well, medium rare or well done? ", In, " medium well, medium rare or well done " are the indicative word provided；If user language ability Score value is 85 points, then the dialog text for presetting the corresponding logic of language of rule of answering is " How should we prepare Your steak? ", indicative word is not provided.

In the embodiment of the present invention, server determines default rule of answering based on speech signal, and is based on default rule of answering, Determine the corresponding dialog text of logic of language, it, can be with flexible transformation dialog text by the way that reasonable default rule of answering is arranged Difficulty or ease.

Fig. 5 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real Example combination Figure 1A is applied, on the basis of step 101- step 107, is illustrated to how server terminates dialogue, such as Shown in Fig. 5, include the following steps:

Step 501: judging whether dialog text is consistent with goal-selling text.

Step 502: if dialog text is consistent with goal-selling text, terminating to talk with.

In step 501- step 502, goal-selling text is the preset dialogue for indicating scene task and completing of server Text, in conjunction with the scene task in Figure 1A be " selecting portion beefsteak in dining room " for, if dialog text be " Enjoy your Unanimously, then the function of server closing Intelligent voice dialog, is tied by meal " and preset target text " Enjoy your meal " Beam dialogue.

In the embodiment of the present invention, server judges whether dialog text consistent with goal-selling text, if dialog text with When goal-selling text is consistent, then server terminates to talk with, and achievees the purpose that complete scene task.

Fig. 6 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real Example combination Figure 1A is applied, on the basis of step 101- step 107, at least one dimension of voice signal how is determined to server Score value.It illustrates, as shown in fig. 6, including the following steps:

Step 601: receiving the voice signal of client recording.

Step 602: converting speech text for voice signal.

Step 603: determining the corresponding semanteme of speech text.

Step 604: determining semantic corresponding logic of language.

Step 605: determining the corresponding dialog text of logic of language.

Step 606: the corresponding audio file of synthesis dialog text.

Step 607: sending audio file to client.

Step 608: based on default standards of grading, determining the score value of at least one dimension of voice signal.

In step 601- step 607, associated description can refer to the associated description of the step 101- step 107 in Figure 1A, This will not be repeated here, it should be noted that step 608 can execute before or after executing the arbitrary steps after step 601, herein The timing of step 608 is not defined.

In step 608, default standards of grading are preset, and default standards of grading can be from multiple dimensions to language Signal is given a mark, and multiple dimensions include: pronunciation, fluency, expression, complete independently etc..Specifically, by taking fluency as an example, in advance If the time span for the speech signal that standards of grading can record user is judged；By taking pronunciation as an example, standards of grading are preset The quantity of the effective word or phrase that can be converted speech signal in speech text with server is judged.Service Device is given a mark to each dimension of voice signal, obtains the score value of each dimension.Server is by being arranged each dimension Different weights are also based on user session overall performance and give a mark, and server can also generate hearing user, pronunciation, stream Each dimension ability distribution map of situations such as sharp degree, expression, complete independently.The analysis being distributed based on ability is provided simultaneously and improvement is built View can also select the performance lower dimension part of score value in the dimensions such as pronunciation, expression and be commented on.

In the embodiment of the present invention, server determines at least one dimension of voice signal to based on default standards of grading Score value intuitively the ability to user session just can carry out brightening displaying by score value, while server is lower according to score value Dimension part carry out analysis comment, facilitate user and targetedly learn in short slab dimension.

Fig. 7 be it is provided by the invention another realize the embodiment flow chart of the method for Intelligent voice dialog, the present invention is real Example combination Figure 1A is applied, on the basis of step 101- step 107, receives how to handle finger of seeking help after seeking help instruction to server Order illustrates, as shown in fig. 7, comprises following steps:

Step 701: when receive seek help instruct when, determine at least one with reference to dialog text based on current session text.

Step 702: sending at least one with reference to dialog text to client.

In step 701- step 702, when server receive client transmission seek help instruct when, server determination works as Preceding dialog text, current session text are the problem of current waiting user that server is sent to client answers.Server base In current session text determine it is corresponding with the current dialog text at least one refer to dialog text, for example, client reception The instruction of seeking help is sent to server by the instruction of seeking help that " requesting help " control for clicking screen to user generates, client, Server determine current session text be " What steak do you want? We have Rib Eye, Sirloin and T-Bone ", then server determines that at least one is preset with reference to dialog text " I will have the Rib Eye please","I'd like to try the Sirloin","I am ordering the T-Bone".Server is to client End, which is sent, refers to dialog text " I will have the Rib Eye please ", " I ' d like to try the Sirloin ", " I am ordering the T-Bone ", client are shown this three kinds with reference to dialog text in screen, With for reference.

In the embodiment of the present invention, when server receive seek help instruct when, server based on current session text determine extremely Few one refers to dialog text, and sends at least one with reference to dialog text to client, provides with reference to example, plays prompt Effect facilitates user's memory and learning by imitation.

Corresponding to the method for above-mentioned realization Intelligent voice dialog, the invention also provides the hardware of server shown in Fig. 8 Structure chart.Referring to FIG. 8, in hardware view, the server include processor, internal bus, network interface, memory and it is non-easily The property lost memory, is also possible that hardware required for other business certainly.Processor is read pair from nonvolatile memory Then the computer program answered is run into memory, the device for realizing Intelligent voice dialog is formed on logic level.Certainly, it removes Except software realization mode, other implementations, such as the side of logical device or software and hardware combining is not precluded in the present invention Formula etc., that is to say, that the executing subject of following process flow is not limited to each logic unit, is also possible to hardware or patrols Collect device.

Fig. 9 is the embodiment block diagram of the device of realization Intelligent voice dialog provided by the invention, as shown in figure 9, should Realize Intelligent voice dialog device may include: speech reception module 91, text conversion module 92, semantic determining module 93, Logic determining module 94, text determining module 95, audio synthesis module 96, audio sending module 97, in which:

Speech reception module 91, for receiving the voice signal of client recording；

Text conversion module 92, for converting speech text for voice signal；

Semantic determining module 93, for determining the corresponding semanteme of speech text；

Logic determining module 94, for determining semantic corresponding logic of language；

Text determining module 95, for determining the corresponding dialog text of logic of language；

Audio synthesis module 96, for synthesizing the corresponding audio file of dialog text；

Audio sending module 97, for sending audio file to client.

Figure 10 is that provided by the invention another realizes the embodiment block diagram of the device of Intelligent voice dialog, such as Figure 10 institute Show, on the basis of above-mentioned embodiment illustrated in fig. 9, semantic determining module 93 includes:

Keyword chooses submodule 931, for being based on the first default selection rule, chooses at least one of speech text Keyword；

First determines submodule 932, semantic for being determined based at least one keyword.

In one embodiment, logic determining module 94 includes:

Second determines submodule 941, for determining semantic corresponding at least one logic of propositions configuration；

Third determines submodule 942, for being based on the second default selection rule, from the configuration of at least one logic of propositions really Determine logic of language.

In one embodiment, text determining module 95 includes:

4th determines submodule 951, for determining default rule of answering based on speech signal；

5th determines submodule 952, for determining the corresponding dialog text of logic of language based on default rule of answering.

In one embodiment, the device of Intelligent voice dialog is realized further include:

Text judgment module 98, for judging whether dialog text is consistent with goal-selling text；

End-of-dialogue module 99, if terminating to talk with when consistent with goal-selling text for dialog text.

Grading module 100, for determining the score value of at least one dimension of voice signal based on default standards of grading.

Referenced text determining module 101, for when receive seek help instruct when, based on current session text determine at least one It is a to refer to dialog text；

Text sending module 102, for sending at least one with reference to dialog text to client.

The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.

For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein being used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to realize the purpose of the present invention program.Those of ordinary skill in the art are not paying wound In the case that the property made is worked, it can understand and implement.

As seen from the above-described embodiment, in the embodiment of the present invention, server receives the voice signal of client recording, server Speech text is converted by the voice signal, and determines the corresponding semanteme of the speech text, server is determined according to semanteme Logic of language determines corresponding dialog text by logic of language, finally synthesizes the corresponding audio file of the dialog text, to Client sends the audio file, so as to the dialogue of next round is initiated after the client terminal playing audio file, realization intelligence The method learning time of voice dialogue is flexible, and cost is low, and the limitation answered user is small, and intelligent man-machine friendship is provided for user The learning experience of mutual formula.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of method for realizing Intelligent voice dialog, which is characterized in that the described method includes:

Receive the voice signal of client recording；

Speech text is converted by the voice signal；

Determine the corresponding semanteme of the speech text；

Determine the corresponding logic of language of the semanteme；

Determine the corresponding dialog text of the logic of language；

Synthesize the corresponding audio file of the dialog text；

The audio file is sent to client.

2. the method according to claim 1, wherein the corresponding semanteme of the determination speech text includes:

Based on the first default selection rule, at least one keyword in the speech text is chosen；

It is determined based at least one described keyword semantic.

3. the method according to claim 1, wherein the corresponding logic of language of the determination semanteme includes:

Determine corresponding at least one logic of propositions configuration of the semanteme；

Based on the second default selection rule, logic of language is determined from the configuration of at least one described logic of propositions.

4. the method according to claim 1, wherein the corresponding dialog text packet of the determination logic of language It includes:

Default rule of answering is determined based on speech signal；

Based on the default rule of answering, the corresponding dialog text of the logic of language is determined.

5. the method according to claim 1, wherein the method also includes:

Judge whether the dialog text is consistent with goal-selling text；

If the dialog text is consistent with the goal-selling text, terminate to talk with.

6. the method according to claim 1, wherein the method also includes:

Based on default standards of grading, the score value of at least one dimension of the voice signal is determined.

7. -6 any method according to claim 1, which is characterized in that the method also includes:

When receive seek help instruct when, determine at least one with reference to dialog text based on current session text；

To client send it is described at least one refer to dialog text.

8. a kind of device for realizing Intelligent voice dialog, which is characterized in that described device includes:

Speech reception module, for receiving the voice signal of client recording；

Text conversion module, for converting speech text for the voice signal；

Audio sending module, for sending the audio file to client.

9. device according to claim 8, which is characterized in that it is described semanteme determining module include:

Keyword chooses submodule, for being based on the first default selection rule, chooses at least one of described speech text pass Keyword；

First determines submodule, semantic for being determined based at least one described keyword.

10. a kind of system for realizing Intelligent voice dialog, which is characterized in that the system comprises: client, server；Wherein,

The client sends scene instruction to server for receiving scene instruction；

The server, for instructing the function of opening Intelligent voice dialog based on the scene, and based on scene instruction pair The scene answered initiates first run dialogue to client, and when receiving the voice signal of client recording, the voice signal is turned Speech text is turned to, determines the corresponding semanteme of the speech text, the corresponding logic of language of the semanteme is determined, determines institute's predicate It says the corresponding dialog text of logic, synthesizes the corresponding audio file of the dialog text, send the audio file to client；

The client is also used to receive the audio file, plays the audio file.