CN111881254A

CN111881254A - Method and device for generating dialogs, electronic equipment and storage medium

Info

Publication number: CN111881254A
Application number: CN202010525697.XA
Authority: CN
Inventors: 陈宪涛; 徐濛
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-11-03
Anticipated expiration: 2040-06-10
Also published as: CN111881254B

Abstract

The application discloses a dialect generation method, a device, electronic equipment and a storage medium, relates to the field of artificial intelligence, and can use a deep learning technology. The specific implementation scheme is as follows: mining a target scene of the current conversation based on the conversation text information of the user; mining a target strategy adopted by the current conversation based on the target scene; and generating the target dialogs according to the information on the dialogs, the target strategies and the pre-established dialog sets. According to the method and the device, manual writing of the dialect is not needed, the target dialect can be automatically generated based on the dialog text information of the user, the target strategy of the current dialog and the dialog set established in advance, the labor cost can be effectively saved, the generation time of the dialect can be shortened, and the generation efficiency of the dialect is improved. Moreover, the existing scheme is only suitable for scenes written with dialogs and only covers part of the scenes of chat conversations, so that the application range is narrow.

Description

Method and device for generating dialogs, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of artificial intelligence, which can use a deep learning technique, and in particular, to a dialect generation method, device, electronic device, and storage medium.

Background

The chat conversation between people and intelligent equipment is an important interaction mode in the intelligent era and is also a future trend. For intelligent devices, how to respond and reply to people's conversations is crucial to the overall chat experience.

At present, people and intelligent equipment carry out chat conversation, and a common conversation generation method of the intelligent equipment comprises manual writing depending on manpower; namely, firstly, the dialogue of the chatting people is sorted and counted, and then, a special person is arranged to write the corresponding reply language.

However, the existing manual-dependent writing of the dialogs causes time-consuming, labor-consuming and inefficient generation of the dialogs.

Disclosure of Invention

In order to solve the technical problem, the present application provides a conversation generating method, device, electronic device and storage medium.

According to an aspect of the present application, there is provided a dialog generation method, wherein the method comprises:

mining a target scene of the current conversation based on the conversation text information of the user;

mining a target strategy adopted by the current conversation based on the target scene;

and generating the target dialect according to the information of the dialog, the target strategy and a pre-established dialog set.

According to another aspect of the application, there is provided a dialog generation apparatus, wherein the apparatus comprises:

the scene mining module is used for mining a target scene of the current conversation based on the conversation text information of the user;

the strategy mining module is used for mining a target strategy adopted by the current conversation based on the target scene;

and the generating module is used for generating the target dialect according to the information of the dialog, the target strategy and a pre-established dialog set.

According to still another aspect of the present application, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to the technology of the application, the defects of the prior art are overcome, manual writing of the dialect is not needed, the target dialect can be automatically generated based on the information of the dialog text of the user, the target strategy of the current dialog and the dialog set established in advance, the labor cost can be effectively saved, the generation time of the dialect can be shortened, and the generation efficiency of the dialect is improved. Moreover, the existing manual speech writing method can only be applied to scenes in which speech is written, only part of the scenes in which chat conversations are covered, and the application range is narrow.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing a speech generation method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 is a schematic diagram according to a first embodiment of the present application; as shown in fig. 1, the present embodiment provides a method for generating a dialog, which may specifically include the following steps:

s101, mining a target scene of a current conversation based on the conversation text information of a user;

s102, mining a target strategy adopted by the current conversation based on a target scene;

s103, generating a target conversation according to the conversation text information, the target strategy and the pre-established conversation set.

The main implementation body of the speech generation method of this embodiment is a speech generation device, which may be a stand-alone electronic entity, such as a man-machine intelligent dialog system, or may be an application integrated by software and run on a hardware device similar to a computer device when in use. The speech generation device of the present embodiment may be applied to a dialog with a user, and generate a target speech based on the information on the dialog context of the user, that is, the target speech is the information on the dialog context generated by using the speech generation method of the present embodiment.

Specifically, the dialog generating device of the present embodiment may mine a target scene of a current dialog based on the dialog context information of the user. It should be noted that, in this embodiment, a lot of historical dialog information may be collected in advance, and all historical dialog information may be analyzed to mine all possible scenarios in the dialog. Specifically, the number of scenes of the dialog mined in advance in the present embodiment may not be limited. For example, the dialog scene that can be mined can have a beginning and an end according to the position of the dialog in the whole chat. And the scenes of the conversation can be mined according to the emotion of the user in the conversation, wherein the scenes of the conversation have positive emotion of the user and negative emotion of the user. Correspondingly, other dialog scenarios may also be mined, which is not described herein in detail by way of example. Based on the above, the target scene of the current conversation can be mined based on the scenes in all possible conversations mined in advance and combined with the information of the conversation text of the user.

In practical applications, there may be one, two or more strategies employed by the dialog in each scene of the dialog. For example, in a scenario of a negative emotion of the user, the conversation strategy that may be employed may be a consortium, encouragement, complaint, or the like. In the scene of starting the conversation, the conversation strategies which can be adopted also can be welcome, suggested, inquired and the like. In this embodiment, after the target scene of the current conversation is acquired, a target strategy that should be adopted by the current conversation may be mined based on the target scene. Finally, a target dialogue is generated according to the dialogue upper information, the target strategy and the pre-established dialogue set.

The dialogue set of the present embodiment may be formed by numerous pairs of dialogue corpus. Each dialog corpus pair includes dialog context information and dialog context information. For example, the dialog set of this embodiment may implement a large amount of collected dialog corpora by collecting and organizing the dialogs of a plurality of products and collecting and organizing the dialogs written by experts; then, the user can be invited to screen and evaluate the preference degree of the collected dialogue linguistic data, the online effect verification is carried out at the same time, and finally, a certain amount of high-quality dialogue linguistic data are screened and collected in the dialogue set. Then, inviting the user and the expert to label the dialogue strategy of the high-quality dialogue corpus; summarizing and generalizing the strategy of the dialogies based on the marked strategy of the dialogies; the resulting dialog set may include policy information for each pair of dialog corpuses.

In the dialoging generation method of the embodiment, a target scene of a current dialog is mined through dialog text information based on a user; mining a target strategy adopted by the current conversation based on the target scene; compared with the prior art, the target dialect is generated according to the information of the dialog text, the target strategy and the pre-established dialog set, the target dialect is automatically generated based on the information of the dialog text of the user, the target strategy of the current dialog and the pre-established dialog set without manual writing, labor cost can be effectively saved, generation time of the dialect can be shortened, and generation efficiency of the dialect is improved. Moreover, the existing manual speech writing method can only be applied to scenes in which speech is written, only part of the scenes in which chat conversations are covered, and the application range is narrow.

FIG. 2 is a schematic diagram according to a second embodiment of the present application; as shown in fig. 2, the speech generating method of the present embodiment is further described in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the method for generating a dialog according to this embodiment may specifically include the following steps:

s201, receiving conversation text information of a user;

the speech generating device of the embodiment can be applied to a man-machine intelligent conversation system, and can receive conversation text information of a user, namely Query sent by the user. In this embodiment, the received information on the dialog of the user may be in a form of text or voice. In this embodiment, a voice recognition technology may be adopted to recognize the information on the user's dialog as characters for subsequent dialog generation.

S202, mining a target scene of the current conversation based on the conversation text information of the user;

for example, in this embodiment, the step S202 may be specifically implemented by any one of the following manners:

(1) semantic analysis is carried out on the information of the user on the conversation to obtain a target scene of the current conversation; or

(2) And mining a target scene of the current conversation according to the information of the conversation of the user and a scene mining model trained in advance based on a deep learning technology.

In the above mode (1), semantic analysis may be performed on the information on the dialog of the user, so as to identify the target scene of the current dialog. For example, the user's above information may be "i am back", at which time the target scene of the current conversation may be identified as the start scene. If the above information of the user is "good and troublesome", the target scene of the current conversation can be identified as the negative emotional scene of the user. If the above information of the user is "i want to have a rest", the target scene of the current conversation can be identified as the leaving scene at this time. If the above information of the user is "happy today", the target scene of the current conversation of the user can be identified as the positive emotional scene of the user, and the like.

In the above-described aspect (2), the dialog context information of the user is directly input to the scene mining model trained in advance based on the deep learning technique without performing semantic analysis on the dialog context information of the user, and the scene mining model can output the target scene corresponding to the dialog context information of the user.

When the scene mining model is trained based on the deep learning technology, the dialogue context information corresponding to various scenes can be mined in advance to serve as training data, namely each piece of training data comprises a piece of dialogue context information and an artificial labeling scene corresponding to the dialogue context information, and the artificial labeling scene can be a scene for analyzing and labeling the dialogue context information in an artificial mode. During training, the dialogue context information in each piece of training data is input into the scene mining model, the scene mining model can predict a prediction scene corresponding to the dialogue context information, then whether the artificial labeling scene is consistent with the prediction scene or not is compared, and if not, the parameters of the scene mining model are adjusted, so that the prediction scene is consistent with the artificial labeling scene. And continuously training the scene mining model by adopting countless training data according to the mode, so that the predicted scene is always consistent with the manually marked scene in the continuous training of the preset number of rounds, and at the moment, the parameters of the scene mining model can be determined so as to further determine the scene mining model. The number of consecutive preset rounds may be 100, 200 or other numbers, which are not described herein again.

When the target scene of the current conversation is mined, the information of the conversation text of the user is input into the trained scene mining model, and the scene mining model can predict and output the mined target scene of the current conversation.

In this embodiment, by adopting any one of the two manners, the target scene of the current conversation can be accurately mined, so as to improve the accuracy of the subsequently generated target conversation.

S203, acquiring at least one conversation strategy corresponding to a target scene according to a corresponding relation between a conversation scene and a conversation strategy mined in advance;

in this embodiment, for each dialog scenario, a correspondence between the dialog scenario and the dialog policy is pre-established. In this embodiment, the correspondence between the dialog scenario and the dialog policy may be one-to-one, one-to-two, or one-to-many. In this way, at least one conversation strategy corresponding to the target scene can be acquired according to the correspondence between the conversation scene and the conversation strategy mined in advance.

S204, selecting a target strategy adopted by the current conversation from at least one conversation strategy;

it should be noted that at least one of the dialog policies is all the dialog policies that can be selected in the current target scenario. In practical applications, if a dialog in the target scene is performed for the first time, which dialog strategy the user prefers is not determined, and at this time, one of the at least one dialog strategy may be randomly selected as the target strategy adopted by the current dialog.

Or optionally, in this embodiment, the target policy adopted by the current dialog may also be mined from at least one dialog policy according to the attribute information of the user, so as to select the target policy that may be of most interest to the user. For example, if the user uses the intelligent interactive system in which the speech generating apparatus of the present embodiment is located, the attribute information of the user is stored in the intelligent interactive system. For example, the historical conversation of the user, the conversation strategy adopted in the same target scene, and the response emotion of the user under the conversation strategy can be recorded. If the response emotion of the user is satisfied, the satisfied conversation strategy of the user can be adopted as the target strategy of the current conversation corresponding to the next target scene. For example, in a negative emotional scene of a certain user, in the historical conversation of the user, an encouraging strategy is adopted to carry out the conversation with the user, and the user is satisfied. In the target scene, encouragement can be adopted as a target strategy of the current conversation so as to carry out the conversation with the user.

Optionally, the manner of selecting the target policy adopted by the current dialog from the at least one dialog policy may also include a great variety. For example, it may also be detected whether each dialog policy corresponding to the current scenario is included in the user's historical dialog. If not, the conversation strategy not included in the historical conversation can be selected as the target strategy so as to detect the response emotion of the user to each strategy. Or if the historical dialog of the user includes each dialog strategy corresponding to the current scene, but in order to prevent the aesthetic fatigue of the user, it may be detected whether the target strategies corresponding to the current target scene are the same or not in the historical dialog, and if so, it is detected whether the number of dialog times adopting the target strategies reaches a preset number threshold, and if so, another strategy corresponding to the current target scene is changed as the target strategy, so as to avoid the user from always hearing the same response to generate boredom. Or alternatively, one policy may be selected as the target policy based on the probability of each policy being selected in the historical dialog. In short, the selection process of the target policy may refer to a historical dialog, or may select from at least one dialog policy in combination with the probability that each policy should be selected, which is not limited herein.

By adopting the method of the embodiment, the target strategy adopted by the current conversation can be accurately and reasonably selected from at least one conversation strategy, so that the accuracy of the generation of the subsequent target conversation is improved. Moreover, the target strategy can be mined further based on the attribute information of the user, and the reasonability and the accuracy of the target strategy are further ensured.

This step S203 and step S204 of this embodiment are an implementation manner of step S102 of the above embodiment shown in fig. 1.

S205, according to the conversation text information and the target strategy, obtaining a plurality of conversation text information which is corresponding to the conversation text information and is replied by adopting the target strategy from the conversation set;

the collection process of the dialog set in this embodiment is the same as the collection process of the embodiment shown in fig. 1, and reference may be made to the description of the embodiment shown in fig. 1 for details, which are not described herein again.

Because the conversation set is marked with the corresponding strategy in each conversation corpus, all the conversation context information corresponding to the conversation context information and the strategy corresponding to each conversation context information can be retrieved when the conversation set is retrieved according to the conversation context information. And then all the session context information with the session strategy as the target strategy can be screened from the retrieval result, so that various session context information can be obtained.

S206, generating a target dialect according to the information of the plurality of dialog contexts;

for example, the step S206 may include the following steps:

(a) acquiring any two kinds of session context information from the multiple kinds of session context information;

in this embodiment, an example of generating a target dialog by acquiring any two types of dialog context information from a plurality of types of dialog context information is given. In this step, any two kinds of session context information can be randomly acquired from a plurality of kinds of session context information.

(b) Judging whether semantic conflict exists between the obtained two pieces of session context information; if not, executing step (c); otherwise, if the two types of session context information exist, returning to the step (a) to obtain the two types of session context information again;

specifically, whether semantic conflict exists between the two types of session context information can be identified through semantics. For situations where there is a semantic conflict, which is inconvenient to combine to form the target utterance, the process may return to step (a) to retrieve the two types of context information.

(c) And generating the target dialect according to the acquired two pieces of dialog context information.

For example, when the step (c) is implemented, the two pieces of session context information may be merged and redundancy is removed to obtain the target session; or generating the target dialect by utilizing a dialect generation model and two pieces of dialog context information which are trained in advance based on a deep learning technology. When the method is used, the obtained two kinds of dialogue context information can be directly input into a dialogue generating model, and the dialogue generating model can generate and output a target dialogue based on the two kinds of dialogue context information.

Similarly, when training the grammar generation model based on the deep learning technology, a plurality of pieces of training data can be mined in advance, wherein each piece of training data comprises two pieces of dialogue context information and a target grammar generated manually based on the two pieces of dialogue context information. During training, two kinds of dialogue context information in each piece of training data are input into the dialogue generating model, the dialogue generating model can predict the generated target dialogue, and then whether the artificially generated target dialogue is consistent with the predicted target dialogue is compared, if not, the parameters of the dialogue generating model are adjusted, so that the two methods tend to be consistent. And continuously training the dialogue generating model by adopting countless training data according to the mode, so that the predicted target dialogue is always consistent with the artificially marked target dialogue in the continuous training of the preset number of rounds, and parameters of the dialogue generating model can be determined at the moment so as to determine the dialogue generating model. The number of consecutive preset rounds may be 100, 200 or other numbers, which are not described herein again.

When the target dialect is generated, the acquired two pieces of dialogue context information are input into the trained dialect generation model, and the generated target dialect can be predicted and output by the dialect generation model.

The two ways for generating the target dialogues can effectively ensure the simplicity and the accuracy of the target dialogues, avoid ambiguity and effectively ensure the quality of the generated dialogues.

This step S205 and step S206 of the present embodiment are an implementation manner of the step S103 of the embodiment shown in fig. 1.

S207, feeding back a target conversation;

in particular, to feedback of target dialogs to a user.

S208, acquiring a response emotion of the user based on the target conversation;

specifically, the response emotion of the user can be obtained by analyzing the response utterance made by the user based on the target utterance, for example, the response emotion may include satisfaction of the target utterance or dissatisfaction of the target utterance.

S209, detecting whether the response emotion of the user is unsatisfactory, if so, executing the step S210, otherwise, if the response emotion of the user is satisfactory, determining that the conversation with the user can be continued according to the current target strategy, and ending.

S210, updating a target strategy;

in this embodiment, when the target policy is updated, the previous target policy and another policy other than the target policy selected in step S204 may be selected from at least one policy corresponding to the target scene, and used as the updated target policy. For example, a target policy that may be more interesting to the user may also be reselected based on the user's attribute information. Reference may be made to the description of step S204 above in detail.

And S211, updating the target dialogues according to the information of the conversation text, the response emotions of the users, the updated target strategies and the conversation sets.

To better conform to the conversational habits of the language, the updated target conversation may include the front part: the preamble connecting portion and the target speech body content. Wherein the preamble concatenation portion is intended to better interface with a preceding conversation. For example, the preamble connection portion may be a connection statement that generates updated subject content for the target utterance and a previous sentence based on the user's response emotion, such as a "sorry" as if not happy, or else … "as if the updated subject content for the target utterance was employed later. The generation method of the updated main content of the target dialect is the same as the above embodiment, and is not described herein again.

For example, if the user dialog information is "good at today", and if the above technical scheme is adopted, the obtained target strategy is a listening strategy, and the target strategy may be "little master, something is not happy, and we can say that i help you to remove worry and trouble". If the feedback of the user based on the target language is "why you are not understanding you are, it can be detected that the user's response emotion is dissatisfied, and the target policy can be updated, for example, the motivational policy can be updated. At this time, the target dialogs are updated according to the user response mood, the updated target policy and the dialog set, for example, the updated target dialogs may be "little master, not good meaning, not worry, not recommend a song bar for which you hear. In practical applications, if the user is not satisfied with the updated policy, the target policy and the target strategy can be continuously updated in a similar manner, so as to improve the satisfaction degree of the user.

It should be noted that steps S207 to S211 of this embodiment are supplementary to this embodiment. The target strategy of the dialog can be adjusted in time. Alternatively, the target dialogs of the next dialog may be generated not by the steps S207 to S211 but by the steps S201 to S206.

According to the technical scheme, the accurate target scene can be acquired; and then accurately and reasonably acquiring a target strategy of the current conversation based on the target scene, finally acquiring various conversation context information corresponding to the conversation context information and replied by adopting the target strategy from the conversation set based on the conversation context information and the acquired target strategy, and generating the target conversation according to the various conversation context information, thereby effectively ensuring the generation quality and the generation efficiency of the conversation. In addition, in this embodiment, the response emotion of the user may be further detected, so that when the user is not satisfied, the target policy is timely adjusted, and the target dialogues are updated, so as to effectively improve the intelligence of the artificial intelligent dialog system, enhance the user experience, and effectively improve the stickiness of the user in the artificial intelligent dialog system.

FIG. 3 is a schematic illustration according to a third embodiment of the present application; as shown in fig. 3, the utterance generation apparatus 300 of the present embodiment includes:

the scene mining module 301 is configured to mine a target scene of a current conversation based on the information on the conversation text of the user;

a policy mining module 302, configured to mine a target policy adopted by a current session based on a target scene;

a generating module 303, configured to generate a target dialog according to the dialog context information, the target policy, and a pre-established dialog set.

The realization principle and technical effect of realizing the speech generation by using the modules of the speech generation apparatus 300 of this embodiment are the same as the realization of the related method embodiment, and the details of the related method embodiment can be referred to and are not repeated herein.

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application; as shown in fig. 4, the speech generation apparatus 300 of the present embodiment further describes the technical solution of the present application more information on the basis of the technical solution of the embodiment shown in fig. 3.

In the speech generating apparatus 300 of this embodiment, the scene mining module 301 is configured to:

performing semantic analysis on the information of the user on the conversation, and identifying a target scene of the current conversation; or

And mining a target scene of the current conversation according to the information of the conversation of the user and a scene mining model trained in advance based on a deep learning technology.

Further optionally, as shown in fig. 4, in the utterance generation apparatus 300 of the present embodiment, the policy mining module 302 includes:

a policy obtaining unit 3021, configured to obtain at least one dialog policy corresponding to a target scene according to a correspondence between a dialog scene and a dialog policy mined in advance;

a selecting unit 3022, configured to select a target policy adopted by the current dialog from at least one dialog policy.

Further optionally, the selecting unit 3022 is configured to:

and mining a target strategy adopted by the current conversation from at least one conversation strategy according to the attribute information of the user.

In the speech generating apparatus 300 of this embodiment, the generating module 303 includes:

an information obtaining unit 3031, configured to obtain, from the session set, a plurality of pieces of session context information replied by using the target policy, which correspond to the session context information, according to the session context information and the target policy;

a generating unit 3032, configured to generate a target dialect according to a plurality of dialog context information.

Further optionally, the generating unit 3032 is configured to:

acquiring any two kinds of session context information from the multiple kinds of session context information;

judging whether semantic conflict exists between the obtained two pieces of session context information;

and if the target language does not exist, generating a target language according to the two acquired dialog context information.

Further optionally, the generating unit 3032 is configured to:

merging the two types of session context information, and removing redundancy to obtain a target session; or

And generating the target dialect by utilizing a dialect generation model trained in advance based on a deep learning technology and two pieces of dialog context information.

Further optionally, as shown in fig. 4, the tactical generation apparatus 300 of this embodiment further includes a feedback module 304 and an emotion obtaining module 305;

a feedback module 304 for feeding back a target utterance;

an emotion acquisition module 305, configured to acquire a response emotion of the user based on the target speech;

the strategy mining module 302 is further configured to update the target strategy if the response emotion of the user is dissatisfied;

the generating module 303 is further configured to update the target dialogues according to the information on the dialog context, the response emotions of the user, the updated target policies, and the dialog sets.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device implementing a speech generation method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the word generation method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the word generating method provided herein.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., related modules shown in fig. 3 and 4) corresponding to the methods of generating words in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implementing the word generation method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device implementing the dialog generation method, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected via a network to an electronic device implementing the speech generation method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the dialog generation method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the dialog generation method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input device. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the target scene of the current conversation is mined through the information of the conversation text based on the user; mining a target strategy adopted by the current conversation based on the target scene; compared with the prior art, the target dialect is generated according to the information of the dialog text, the target strategy and the pre-established dialog set, the target dialect is automatically generated based on the information of the dialog text of the user, the target strategy of the current dialog and the pre-established dialog set without manual writing, labor cost can be effectively saved, generation time of the dialect can be shortened, and generation efficiency of the dialect is improved. Moreover, the existing manual speech writing method can only be applied to scenes in which speech is written, only part of the scenes in which chat conversations are covered, and the application range is narrow.

According to the technical scheme of the embodiment of the application, the accurate target scene can be obtained by adopting the technical scheme; and then accurately and reasonably acquiring a target strategy of the current conversation based on the target scene, finally acquiring various conversation context information corresponding to the conversation context information and replied by adopting the target strategy from the conversation set based on the conversation context information and the acquired target strategy, and generating the target conversation according to the various conversation context information, thereby effectively ensuring the generation quality and the generation efficiency of the conversation. In addition, in the embodiment of the application, the response emotion of the user can be further detected, so that when the user is not satisfied, the target strategy can be timely adjusted, the target dialogues can be updated, the intelligence of the artificial intelligent dialogue system can be effectively improved, the use experience of the user can be enhanced, and the stickiness of the user in the artificial intelligent dialogue system can be effectively improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of utterance generation, wherein the method comprises:

2. The method of claim 1, wherein mining the target scene of the current conversation based on the conversation context information of the user comprises:

performing semantic analysis on the information of the user on the dialog, and identifying the target scene of the current dialog; or

And mining the target scene of the current conversation according to the information of the user on the conversation and a scene mining model trained in advance based on a deep learning technology.

3. The method of claim 1, wherein mining a targeting policy employed by a current conversation based on the targeting scenario comprises:

acquiring at least one conversation strategy corresponding to the target scene according to a correspondence between a conversation scene and conversation strategies mined in advance;

selecting the target policy adopted by the current conversation from the at least one conversation policy.

4. The method of claim 3, wherein selecting the target policy to be employed by a current conversation from the at least one conversation policy comprises:

and mining the target strategy adopted by the current conversation from the at least one conversation strategy according to the attribute information of the user.

5. The method of claim 1, wherein generating a target dialog based on the dialog context information, the target policy, and a pre-established dialog set comprises:

according to the information on the conversation and the target strategy, acquiring a plurality of kinds of information on the conversation corresponding to the information on the conversation and replied by adopting the target strategy from the conversation set;

and generating the target dialect according to the plurality of types of the dialog context information.

6. The method of claim 5, wherein generating the target utterance from the plurality of dialog context information comprises:

acquiring any two kinds of session context information from the plurality of kinds of session context information;

judging whether the acquired two kinds of session context information have semantic conflict or not;

and if the target language does not exist, generating the target language according to the two acquired dialog context information.

7. The method of claim 6, wherein generating the target utterance from the two types of obtained dialog context information comprises:

merging the two kinds of session context information, and removing redundancy to obtain the target session; or

Generating the target dialect using a dialect generation model pre-trained based on a deep learning technique and the two types of dialog context information.

8. The method of any of claims 1-7, wherein after generating a target utterance based on the information on the utterance, the target policy, and a pre-established set of utterances, the method further comprises:

feeding back the target session;

acquiring a response emotion of the user based on the target conversation;

if the response emotion of the user is dissatisfied, updating the target strategy;

and updating the target dialect according to the information on the dialog, the response emotion of the user, the updated target strategy and the dialog set.

9. A dialog generation apparatus, wherein the apparatus comprises:

10. The apparatus of claim 9, wherein the scene mining module is to:

11. The apparatus of claim 9, wherein the policy mining module comprises:

the strategy acquisition unit is used for acquiring at least one conversation strategy corresponding to the target scene according to the correspondence between the conversation scene and the conversation strategy mined in advance;

a selecting unit, configured to select the target policy adopted by the current dialog from the at least one dialog policy.

12. The apparatus of claim 11, wherein the selection unit is configured to:

13. The apparatus of claim 9, wherein the generating means comprises:

the information acquisition unit is used for acquiring various kinds of dialogue context information which is replied by adopting the target strategy and corresponds to the dialogue context information from the dialogue set according to the dialogue context information and the target strategy;

and the generating unit is used for generating the target dialect according to the plurality of types of the dialog context information.

14. The apparatus of claim 13, wherein the generating unit is configured to:

15. The apparatus of claim 14, wherein the generating means is configured to:

16. The apparatus according to any one of claims 9-15, wherein the apparatus further comprises a feedback module and an emotion acquisition module;

the feedback module is used for feeding back the target dialogues;

the emotion acquisition module is used for acquiring the response emotion of the user based on the target conversation;

the strategy mining module is further used for updating the target strategy if the response emotion of the user is dissatisfied;

the generating module is further configured to update the target conversation process according to the information on the conversation, the response emotion of the user, the updated target strategy and the conversation set.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.