CN114756664A

CN114756664A - Generation method, device and equipment of reply content

Info

Publication number: CN114756664A
Application number: CN202210332122.5A
Authority: CN
Inventors: 徐志坚; 袁春阳; 陈首名; 曾轲
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-15

Abstract

The application discloses a reply content generation method, a reply content generation device and reply content generation equipment, and belongs to the technical field of man-machine conversation. The method comprises the following steps: acquiring content to be replied; analyzing the content to be replied to obtain an analysis result, wherein the analysis result comprises a target intention; determining a conversation strategy taking a set purpose as a target based on the target intention, wherein the set purpose comprises at least one of semantic relevance, maximum conversation turn and task migration completion; and generating the reply content based on the content to be replied and the conversation strategy. The reply content generated by the aid of the dialog strategy with the set aim as the target is utilized to take into account multiple dialog targets with the maximum dialog turn, semantic relevance and task migration completion degree, and different dialog strategies enable the generation method of the reply content provided by the embodiment of the application to be suitable for multiple contexts.

Description

Generation method, device and equipment of reply content

Technical Field

The embodiment of the application relates to the technical field of man-machine conversation, in particular to a method, a device and equipment for generating reply content.

Background

With the progress of man-machine conversation technology, the answer of the conversation robot to the question of the user is more natural. Two types of conversations that are common in human-machine conversations are chat-type conversations and task-type conversations. The chat-type dialog refers to a dialog in which the dialog robot generates a reply with the goal of the maximum turn of the dialog, and the task-type dialog refers to a dialog in which the dialog robot generates a reply with the goal of the task migration completion.

In the related technology, after receiving the content to be replied, which needs to reply, the conversation robot judges the type of the current conversation by combining the historical record of the current conversation. When the current round of conversation type is judged to belong to the chatting conversation type, the conversation robot calls a chatting conversation engine to generate a reply by taking the maximum turn of the conversation as a target; when the conversation type of the current round is judged to belong to the task type conversation, the conversation robot calls a task type conversation engine to generate a reply by taking the task migration completion degree as a target.

However, in the above technology, the manner of selecting the corresponding dialog engine to reply according to the dialog type is not flexible. After the dialog engine is selected, the dialog robot can only reply the type of the dialog engine, so the method is not flexible.

Disclosure of Invention

The embodiment of the application provides a reply content generation method, a reply content generation device and reply content generation equipment, which can be used for solving the problems in the related art. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for generating reply content, where the method includes:

acquiring content to be replied;

analyzing the content to be replied to obtain an analysis result, wherein the analysis result comprises a target intention;

Based on the target intention, determining a conversation strategy taking a set purpose as a target, wherein the set purpose comprises at least one of semantic correlation, maximum conversation turn and task migration completion degree;

and generating reply content based on the content to be replied and the conversation strategy.

In a possible implementation manner, the parsing result further includes at least one of an emotion type and a topic corresponding to the content to be replied; the determining a dialog strategy targeting a set purpose based on the target intention comprises: and determining a conversation strategy taking a set purpose as a target based on the target intention and at least one of the emotion type, the topic and the relevance, wherein the relevance is used for representing the possibility that various services corresponding to the content to be replied are selected, and the relevance is positively correlated with the possibility that various services are selected.

In a possible implementation manner, before determining a dialog strategy targeting a set purpose based on the target intention and at least one of the emotion type, the topic and the relevance, the method further includes: acquiring first target information corresponding to the content to be replied, wherein the first target information comprises at least one of time, type and frequency of service corresponding to the content to be replied; and determining the corresponding relevancy of the content to be replied based on the first target information.

In a possible implementation manner, the parsing the content to be replied to obtain a parsing result includes: performing intention identification on the content to be replied to obtain the target intention; carrying out emotion recognition on the content to be replied to obtain an emotion type; performing field identification on the content to be replied to obtain a target field related to the content to be replied; and acquiring a knowledge graph corresponding to the target field, detecting the part of the content to be replied, which is the same as the content of the entity node of the knowledge graph, and taking the part of the content, which is the same as the content, as a topic of conversation.

In one possible implementation, the determining a conversation strategy targeting a set purpose based on the target intent and at least one of the emotion type, the topic, the relevance includes: inputting the target intent and at least one of the emotion type, the topic, the relevancy into a target conversation strategy model, determining, by the target conversation strategy model, the conversation strategy based on the target intent and at least one of the emotion type, the topic, the relevancy.

In one possible implementation, before the inputting at least one of the emotion type, the topic, the relevancy, and the target intent into a target conversation strategy model, the method further comprises: simulating a conversation behavior based on the current conversation policy; analyzing the state of the conversation behavior under the set purpose, and calculating an incentive value based on the state, wherein the state comprises the task migration degree and the conversation turn corresponding to the conversation behavior, or the state comprises the task migration degree, the conversation turn and the semantic correlation degree corresponding to the conversation behavior; and regulating the conversation strategy based on the reward value constraint initial conversation strategy model to obtain the target conversation strategy model.

In one possible implementation manner, the adjusting the dialog policy based on the reward value constraint initial dialog policy model to obtain the target dialog policy model includes: when the reward value is larger than a reward value threshold value, the initial conversation strategy model keeps the current conversation strategy unchanged, and the initial conversation strategy model is used as the target conversation strategy model; or when the reward value is not greater than the reward value threshold value, the initial conversation strategy model changes the current conversation strategy into another conversation strategy, updates the initial conversation strategy model, and takes the updated conversation strategy model as the target conversation strategy model.

In one possible implementation, the simulating dialog behavior based on the current dialog policy includes: simulating normal conversation behavior based on the current conversation strategy, the normal conversation behavior being conversation behavior in a positive sample; or, simulating an abnormal dialog behavior based on the current dialog policy, the abnormal dialog behavior being a dialog behavior in a negative example.

In a possible implementation manner, before generating reply content based on the content to be replied and the conversation policy, the method further includes: acquiring a history record of the conversation; generating reply content based on the content to be replied and the conversation strategy, wherein the generating reply content comprises: generating the reply content based on the content to be replied, the conversation strategy and second target information, wherein the second target information comprises at least one of topics and the history records.

In one possible implementation manner, the second target information includes the topic and the history, and the generating reply content based on the content to be replied, the conversation policy and the second target information includes: encoding the content to be replied, the history record, the topic and the conversation strategy to obtain a first vector set corresponding to the content to be replied, a second vector set corresponding to the history record, a third vector set corresponding to the conversation strategy and a fourth vector set corresponding to the topic; performing weighted combination on the first vectors based on the second vector set, the third vector set and the fourth vector set to obtain a first vector weighted set; obtaining sub-topic entity nodes corresponding to the topics, and taking the contents of the sub-topic entity nodes as sub-topics, wherein the sub-topic entity nodes comprise entity nodes of which the path length to the entity node which is the same as the topic contents is smaller than a length threshold value in the entity nodes corresponding to the knowledge graph; coding the sub-topics to obtain a fifth vector set; generating the reply content based on the fifth set of vectors and the first set of vector weights.

In one possible implementation, the performing a weighted combination of the first vectors based on the second vector set, the third vector set, and the fourth vector set to obtain a first vector weight set includes: embedding the third vector set and the fourth vector set into the second vector set to obtain a dialogue strategy representation; and performing weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set.

In one possible implementation, the generating the reply content based on the fifth vector set and the first vector weight set includes: dynamically paying attention to the first vector weight set by utilizing the fifth vector set to obtain a dynamic first vector weight set; and decoding the dynamic first vector weight set to obtain the reply content.

In a possible implementation manner, the conversation strategy is chatting, or topic following, or new related topic starting, or task conversation object triggering, task conversation process executing, or conversation topic interruption; the task conversation execution process is a conversation strategy used under the condition that the last pair of conversation strategies is the task conversation triggering target; the starting of the new related topic is a conversation strategy used under the condition that the last pair of conversation strategies is interrupted as the conversation topic; the topic following is a conversation strategy used in the case where the last pair of conversation topics included the current turn of conversation topic.

In another aspect, an apparatus for generating reply content is provided, the apparatus including:

the acquisition module is used for acquiring the content to be replied;

the analysis module is used for analyzing the content to be replied to obtain an analysis result, and the analysis result comprises a target intention;

a determining module, configured to determine, based on the target intent, a conversation policy targeting a set purpose, where the set purpose includes at least one of semantic relevance, maximum conversation turn, and task migration completion;

and the generating module is used for generating the reply content based on the content to be replied and the conversation strategy.

In a possible implementation manner, the parsing result further includes at least one of an emotion type and a topic corresponding to the content to be replied; the determining module is used for determining a conversation strategy which aims at setting purposes based on the target intention and at least one of the emotion type, the topic and the relevance, the relevance is used for representing the possibility that various services corresponding to the content to be replied are selected, and the relevance is positively correlated with the possibility that various services are selected.

In a possible implementation manner, the determining module is further configured to obtain first target information corresponding to the content to be replied, where the first target information includes at least one of time, type, and number of times of a service corresponding to the content to be replied; and determining the corresponding relevancy of the content to be replied based on the first target information.

In a possible implementation manner, the parsing module is configured to perform intent recognition on the content to be replied to obtain the target intent; performing emotion recognition on the content to be replied to obtain an emotion type; performing field identification on the content to be replied to obtain a target field related to the content to be replied; and acquiring a knowledge graph corresponding to the target field, detecting the part of the content to be replied, which has the same content as the entity node of the knowledge graph, and taking the part of the content, which has the same content, as a topic of conversation.

In one possible implementation, the determining module is configured to input the target intent and at least one of the emotion type, the topic, and the relevancy into a target conversation strategy model, and the conversation strategy is determined by the target conversation strategy model based on the target intent and at least one of the emotion type, the topic, and the relevancy.

In one possible implementation, the apparatus further includes:

the simulation module is used for simulating conversation behaviors based on the current conversation strategy;

the analysis module is used for analyzing the state of the conversation behavior under the set purpose and calculating an incentive value based on the state, wherein the state comprises the task migration degree and the conversation turn corresponding to the conversation behavior, or the state comprises the task migration degree, the conversation turn and the semantic correlation degree corresponding to the conversation behavior;

And the adjusting module is used for adjusting the conversation strategy based on the reward value constraint initial conversation strategy model to obtain the target conversation strategy model.

In a possible implementation manner, the adjusting module is configured to, when the reward value is greater than a reward value threshold, keep the current conversation policy unchanged by the initial conversation policy model, and regard the initial conversation policy model as the target conversation policy model; or when the reward value is not greater than the reward value threshold value, the initial conversation strategy model changes the current conversation strategy into another conversation strategy, updates the initial conversation strategy model, and takes the updated conversation strategy model as the target conversation strategy model.

In one possible implementation, the simulation module is configured to simulate a normal conversation behavior based on the current conversation policy, the normal conversation behavior being a conversation behavior in a positive sample; or, simulating an abnormal dialogue behavior based on the current dialogue strategy, wherein the abnormal dialogue behavior is a dialogue behavior in a negative sample.

In a possible implementation manner, the generating module is further configured to obtain a history record of the current session; the generation module is used for generating the reply content based on the content to be replied, the conversation strategy and second target information, and the second target information comprises at least one of topics and the history records.

In a possible implementation manner, the second target information includes the topic and the history, and the generating module is configured to encode the content to be replied, the history, the topic and the conversation policy, so as to obtain a first vector set corresponding to the content to be replied, a second vector set corresponding to the history, a third vector set corresponding to the conversation policy and a fourth vector set corresponding to the topic; performing weighted combination on the first vector based on the second vector set, the third vector set and the fourth vector set to obtain a first vector weighted set; obtaining sub-topic entity nodes corresponding to the topic, and taking the content of the sub-topic entity nodes as sub-topics, wherein the sub-topic entity nodes comprise entity nodes of which the path length to the entity node which is the same as the topic content is less than a length threshold value in the entity nodes corresponding to the knowledge graph; encoding the sub-topics to obtain a fifth vector set; generating the reply content based on the fifth vector set and the first vector weight set.

In a possible implementation manner, the generating module is configured to embed the third vector set and the fourth vector set into the second vector set, so as to obtain a dialog policy representation; and performing weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set.

In a possible implementation manner, the generating module is configured to dynamically pay attention to the first vector weighting set by using the fifth vector set, so as to obtain a dynamic first vector weighting set; and decoding the dynamic first vector weighting set to obtain the reply content.

In a possible implementation manner, the conversation strategy is chat, or topic following, or new related topic starting, or task conversation object triggering, or task conversation process executing, or conversation topic interruption; the task conversation execution process is a conversation strategy used under the condition that the last pair of conversation strategies is the task conversation triggering target; the initiating of a new related topic is a conversation strategy used in the event that a previous pair of conversation strategies was discontinued for the conversation topic; the topic following is a conversation strategy used in the case where the last pair of conversation topics included the current turn of conversation topic.

In another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory stores at least one computer program, and the at least one computer program is loaded by the processor and executed to enable the computer device to implement any one of the above methods for generating reply content.

In another aspect, a computer-readable storage medium is provided, where at least one computer program is stored, and the at least one computer program is loaded by a processor and executed to enable a computer to implement any one of the above methods for generating reply content.

In another aspect, a computer program product or computer program is also provided, comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes any one of the above methods for generating reply content.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

according to the technical scheme provided by the embodiment of the application, the analysis result comprising the target intention is obtained by acquiring and analyzing the content to be replied. The dialog strategy which aims at setting purposes is determined based on the intention of the targets, the reply content which is generated by the help of the dialog strategy which aims at setting purposes considers a plurality of dialog targets such as maximum dialog turns, semantic relevance and task migration completion degree, and different dialog strategies enable the generation method of the reply content provided by the embodiment of the application to be flexibly suitable for various contexts.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a reply content generation method provided in an embodiment of the present application;

fig. 2 is a flowchart of a method for generating reply content according to an embodiment of the present application;

fig. 3 is an overall framework schematic diagram of a reply content generation method provided in an embodiment of the present application;

FIG. 4 is a partially schematic illustration of a movie knowledge-graph provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of reinforcement learning-based dialogue strategy provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an analysis of semantic relevance provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a process for generating reply content according to an embodiment of the present application;

fig. 8 is a schematic diagram of a device for generating reply content according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a server provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

It should be noted that information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data is required to comply with relevant laws and regulations and standards in relevant countries and regions. For example, the content to be replied to referred to in the present application is obtained under sufficient authorization.

Referring to fig. 1, a schematic diagram of an implementation environment of a method provided in an embodiment of the present application is shown. The implementation environment may include: a terminal 11 and a server 12.

The terminal 11 may receive the content to be replied, and may upload the content to be replied to the server 12. The server 12 can process the content to be replied from the terminal 11 and send a corresponding reply back to the terminal 11. The terminal 11 can receive the information transmitted by the server 12 and feed the information back to the user, and the terminal 11 can also process the received content to be replied and make a corresponding reply and feed the reply back to the user.

Alternatively, the terminal 11 may be any electronic product capable of performing man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment, such as a PC (Personal Computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant), a wearable device, a PPC (Pocket PC, palmtop), a tablet Computer, a smart car, a smart television, a smart speaker, and the like. The server 12 may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center. The terminal 11 and the server 12 establish a communication connection through a wired or wireless network.

It should be understood by those skilled in the art that the above-mentioned terminal 11 and server 12 are only examples, and other existing or future terminals or servers may be suitable for the present application and are included within the scope of the present application and are herein incorporated by reference.

Based on the implementation environment shown in fig. 1, the embodiment of the present application provides a method for generating dialog content, and takes application of the method to the terminal 11 as an example. As shown in fig. 2, a method provided in this embodiment of the present application may include the following steps 201 to 204.

In step 201, content to be replied to is obtained.

In one possible implementation manner, the content to be replied includes: and the conversation robot converses the reply content of the user with the user. Illustratively, the content to be replied to may be a new question, for example, the question of the user is: how much weather is. Illustratively, the content to be replied to may be a chat, for example, the content to be replied to is: today the weather is good.

In step 202, the content to be replied is analyzed to obtain an analysis result, and the analysis result includes the target intention.

In a possible implementation manner, the process of parsing the content to be replied to obtain the parsing result is as follows: and identifying the intention of the content to be replied to obtain the corresponding target intention.

In a possible implementation manner, the content to be replied is subjected to intent recognition, and obtaining the corresponding target intent includes, but is not limited to, recognizing the target intent contained in the content to be replied based on a deep learning manner. The intent recognition of the content to be replied corresponds to the natural language understanding in fig. 3. The method and the device for determining the target intention in the content to be replied are not limited to the method and the device for determining the target intention in the content to be replied. Wherein, the target intention is an intention preset manually. Illustratively, the target intent includes: watching a movie, looking up weather, booking a hotel, booking an airline ticket, chatting, interrupting a topic, etc.

In one possible implementation manner, the process of determining the target intention contained in the content to be replied to based on the deep learning manner includes: performing word segmentation on the content to be replied; encoding the content to be replied after word segmentation into an intention vector set; extracting a first feature of an intention vector set; and classifying the first characteristics based on the real target intention in the manually calibrated contents to be replied, and constraining a classification result through a first loss function to obtain the target intention of the user.

Illustratively, the content to be replied is: recently, there are good-looking movies. The content to be replied is participated to obtain: recent, present, good-looking, movie, do. The method comprises the steps of coding a word of 'nearest' obtained by segmenting a content to be replied into a first intention vector, coding a word of 'having' into a second intention vector, coding a word of 'good sight' into a third intention vector, coding a word of 'movie' into a fourth intention vector, and coding a word of 'Do' into a fifth intention vector. And performing feature extraction on an intention vector set formed by combining the first intention vector and the fifth intention vector to obtain a first feature. When the first feature is classified, the real intention of the content to be replied, which is manually calibrated, is to watch a movie, when the first feature is classified, a first loss function is used for constraining, the first loss function enables the target intention obtained by classifying the first feature to approach the real intention in the content to be replied, which is manually calibrated, and finally the target intention in the content to be replied, which is obtained by classifying the first feature, is to watch a movie.

Optionally, the parsing result further includes, in addition to the target intention: and at least one of the emotion type and the topic corresponding to the content to be replied. The method for analyzing the content to be replied further includes: performing emotion recognition on the content to be replied to obtain a corresponding emotion type; performing field identification on the content to be replied to obtain a target field related to the content to be replied; acquiring a knowledge graph corresponding to a target field; and detecting the part of the content to be replied, which is the same as the content of the entity node of the knowledge graph, and taking the part of the content, which is the same as the content, as the topic of the conversation.

In a possible implementation manner, the content to be replied is subjected to emotion recognition, and the obtained corresponding emotion type includes, but is not limited to, the emotion type included in the content to be replied is recognized in a deep learning-based manner. Emotion recognition of content to be replied corresponds to natural language understanding in fig. 3. The method and the device for determining the emotion types contained in the content to be replied do not limit the manner, and the manner capable of determining the emotion types contained in the content to be replied can be applied to the method and the device. The emotion types are preset manually, and optionally, the manually preset emotion types include: positive emotions, negative emotions, neutral emotions, and the like.

In one possible implementation manner, the process of identifying the emotion type included in the content to be replied to based on the deep learning manner includes: performing word segmentation on the content to be replied; encoding the segmented contents to be replied into an emotion vector set; extracting a second feature of the emotion vector set; classifying the second features based on the real emotion types in the contents to be replied, which are calibrated manually, and constraining the classification result through a second loss function to obtain the emotion types contained in the contents to be replied.

Illustratively, the contents to be replied are: today weather is true. The content to be replied is participated to obtain: today, weather, true bar. The method comprises the steps of coding a word of 'today' obtained by word segmentation of content to be replied into a first emotion vector, coding a word of 'weather' into a second emotion vector, and coding a word of 'true club' into a third emotion vector. And performing feature extraction on an emotion vector set combined by the first emotion vector and the third emotion vector to obtain a second feature. When the second characteristic is classified, the real emotion type contained in the content to be replied and calibrated manually is positive emotion. And constraining the classification of the second features by using a second loss function, wherein the emotion types obtained by classifying the second features are close to the real emotion types contained in the contents to be replied and calibrated manually by the second loss function, and the emotion types obtained by classifying the second features are positive emotions.

In a possible implementation manner, the content to be replied is subjected to domain recognition, and the target domain related to the content to be replied is obtained, which includes but is not limited to identifying the target domain related to the content to be replied based on a deep learning manner. The domain recognition of the content to be replied corresponds to the natural language understanding in fig. 3. The method for obtaining the target field related to the content to be replied is not limited by the embodiment of the application, and the method for obtaining the target field related to the content to be replied can be applied to the method. Wherein the target field is a field preset manually. Illustratively, the target areas include: movies, animals, food, etc.

In one possible implementation manner, the process of identifying the target field involved in the content to be replied to based on the deep learning manner includes: performing word segmentation on the content to be replied; encoding the content to be replied after word segmentation into a field vector set; extracting a third feature of the field vector set; and classifying the third features based on the real target field related in the manually calibrated content to be replied, and constraining the classification result through a third loss function to obtain the target field related in the content to be replied.

Illustratively, the contents to be replied are: which movies are mapped on recently. The content to be replied is participated to obtain: recent, some, movie, and show. The method comprises the steps of coding a word of 'nearest' obtained by segmenting a content to be replied into words into a first field vector, coding a word of 'having' into a second field vector, coding words of 'those' into a third field vector, coding a word of 'film' into a fourth field vector, and coding a word of 'showing up' into a fifth field vector. And performing feature extraction on a field vector set combined by the first field vector and the fifth field vector to obtain a third feature. When the third feature is classified, the real target field related in the content to be replied, which is manually calibrated, is a movie. And constraining the classification of the third features by using a third loss function, wherein the third loss function enables the target field obtained by classifying the third features to approach the real target field calibrated manually, and finally, the target field related to the content to be replied is obtained by classifying the third features and is a movie.

In one possible implementation, more than one target domain is involved in the content to be replied to. Illustratively, the content to be replied is: the fox in movie 1 is very lovely. The fields related to the content to be replied are obtained by carrying out field recognition on the content to be replied, and the fields include movies and animals.

In a possible implementation manner, the obtaining of the knowledge graph corresponding to the target field includes, but is not limited to, extracting the knowledge graph corresponding to the target field from the knowledge base based on the target field obtained by performing field recognition on the content to be replied. And acquiring the topic detection in the figure 3 corresponding to the knowledge graph corresponding to the target field. Wherein the knowledge base is a data repository created by human beings containing the knowledge required by the conversation robot. Illustratively, the contents to be replied are: which images are mapped in the recent past. And if the target field related in the content to be replied is the movie, acquiring a corresponding movie knowledge map from the knowledge base according to the target field of the movie. Wherein portions of the movie knowledge-graph are shown in fig. 4. It should be noted that fig. 4 is only for assisting understanding of the embodiment of the present application, and does not limit the content included in the movie knowledge graph.

In a possible implementation manner, detecting a part of the content to be replied, which is the same as the content of the entity node of the knowledge-graph, and taking the part of the content, which is the same as the content, as a topic of the conversation includes: and comparing the content to be replied with the knowledge graph obtained according to the content to be replied, and taking the part of the content to be replied, which is the same as the entity node content of the knowledge graph, as the topic of the current conversation. Detecting topics in the content to be replied corresponds to topic detection in fig. 3. Illustratively, the content to be replied is: the fox in movie 1 is very lovely. And the target fields obtained according to the field identification are the movies and the animals, and the movie knowledge graph and the animal knowledge graph are correspondingly obtained from the knowledge base according to the movies and the animals in the target fields. And comparing the film knowledge map and the animal knowledge map with the content to be replied to obtain the parts with the same content, namely a film 1 and a fox, and taking the film 1 and the fox as the topics of the current round of conversation.

In step 203, based on the target intent, a dialog strategy targeting a set purpose is determined, the set purpose including at least one of semantic relevance, maximum dialog turn, and task migration completion.

In one possible implementation, determining a dialog strategy targeting a set purpose based on the intent to target includes: and inputting the target intention into a target conversation strategy model, and determining a conversation strategy by the target conversation strategy model based on the target intention. The set purpose comprises at least one of semantic relevance, maximum conversation turn and task migration completion degree. Among them, the conversation strategy targeting the setting purpose is determined to correspond to the conversation strategy Top-Level (Top learning) in fig. 3 based on the target intention.

In another possible implementation manner, in order to obtain a more accurate conversation strategy, besides determining the conversation strategy directly based on the target intention, at least one of the emotion type, topic and relevance corresponding to the content to be replied needs to be utilized. The relevancy is used for representing the possibility that various services corresponding to the content to be replied are selected, and the relevancy is positively correlated with the possibility that various services are selected. In this case, the method for determining a dialog strategy targeting a setting purpose based on the intention of the user includes: a conversation strategy targeting a set purpose is determined based on the target intention and at least one of the emotion type, topic, and relevance.

The method comprises the following steps that the emotion type and the topic can be obtained when the content to be replied is analyzed, and the relevancy can be obtained before the content is used, so that before the conversation strategy which aims at the set purpose is determined based on at least one of the emotion type, the topic and the relevancy and the target intention, the method further comprises the following steps: acquiring first target information corresponding to the content to be replied, wherein the first target information comprises at least one of time, type and frequency of service corresponding to the content to be replied; and determining the corresponding relevancy of the content to be replied based on the first target information. Based on the first target information, it is determined that the correlation degree corresponding to the content to be replied corresponds to the correlation degree calculation in fig. 3.

In an exemplary embodiment, the service corresponding to the content to be replied includes: and the service experienced by the user corresponding to the content to be replied. In another exemplary embodiment, the service corresponding to the content to be replied includes: and the service experienced by the user corresponding to the content to be replied within the set time. Illustratively, the set time is three months, which is not limited by the embodiments of the present application. For example, the service experienced by the user corresponding to the content to be replied includes watching a movie, and if the number of times that the user watches the movie is greater, the relevance of the service type of watching the movie corresponding to the content to be replied is higher.

In some embodiments, determining a conversation strategy targeting a set purpose based on the target intent and at least one of emotional type, topic, and relevance comprises: inputting at least one of the emotion type, the topic and the relevancy and the target intention into a target conversation strategy model, and determining a conversation strategy by the target conversation strategy model based on the at least one of the emotion type, the topic and the relevancy and the target intention.

Before inputting at least one of emotion type, topic, relevance and target intention into the target conversation strategy model, the target conversation strategy model is required to be obtained. Thus, in a possible implementation, before inputting at least one of emotion type, topic, relevance and target intent into the target conversation strategy model, the method further comprises: simulating a conversation behavior based on the current conversation policy; analyzing the state of the conversation behavior under a set purpose, and calculating an incentive value based on the state, wherein the state comprises task migration degree and conversation turns corresponding to the conversation behavior, or the state comprises task migration degree, conversation turns and semantic correlation degree corresponding to the conversation behavior; and regulating the conversation strategy based on the reward value constraint initial conversation strategy model to obtain a target conversation strategy model.

In an exemplary embodiment, simulating dialog behavior based on the current dialog policy includes: simulating normal conversation behavior based on the current conversation strategy; alternatively, abnormal conversation behavior is simulated based on the current conversation strategy. The abnormal dialogue behaviors appear according to the set probability in each round of simulation dialogue. The simulated dialogue behaviors correspond to the simulated dialogue behaviors in fig. 5, wherein the user module usermodel is used for simulating normal dialogue behaviors, and the abnormal module errormodel is used for simulating abnormal dialogue behaviors in each round of simulated dialogue behaviors according to a set probability. Illustratively, the set probability may be 0.2 for the first round of dialog and 0.25 for the second round of dialog, or 0 for the first round of dialog, 0 for the second round of dialog, 0.25 for the third round of dialog, etc. The embodiment of the application does not limit the probability of occurrence of the abnormal dialogue simulation in each round, and can be set according to an empirical value.

In an exemplary embodiment, normal conversation behavior is conversation behavior in a positive sample, and abnormal conversation behavior is conversation behavior in a negative sample. Illustratively, during a normal conversation, the conversation robot simulates the conversation behavior: today weather is true. The dialog robot replies to the simulated dialog behavior with: therefore, the weather is good today, and the device is suitable for going out, and people want to go out and walk. At this time, the simulated normal dialog behavior may be: where to go and go to the woolen cloth. The simulated abnormal dialog behavior may be: i do not want to chat. Adding simulated abnormal dialogue acts to the simulated dialogue acts can increase the expansion capability and generalization capability of the target dialogue strategy model.

In an exemplary embodiment, when the set purpose is a maximum turn of the dialog and a task migration completion, analyzing a state of the dialog behavior under the set purpose includes: and analyzing the task migration degree corresponding to the conversation behavior and analyzing the conversation turns corresponding to the conversation behavior. When the set purpose is the maximum turn of the dialog, the task migration completion degree and the semantic relevance, the states of the dialog behavior under the set purpose comprise: analyzing the semantic correlation degree between the conversation behavior simulated in the current round and the reply of the previous round of the conversation robot, analyzing the task migration degree corresponding to the conversation behavior, and analyzing the conversation round corresponding to the conversation behavior. The state of the simulated dialog behavior under the set purpose corresponds to the dialog state dialogstate in fig. 5.

In an exemplary embodiment, analyzing semantic correlation between the dialogue acts of the present round of simulation and the reply of the preceding round of dialogue robot is performed based on SI (Sentence Interaction) network. As shown in fig. 6, the process of analyzing semantic relevance based on SI network includes: performing word segmentation on the conversation behaviors simulated in the current round and the replies of the conversation robot in the previous round, and converting the word into word vectors; after the processing of the embedding layer, carrying out interactive calculation on the conversation behaviors simulated in the current round and the word vectors corresponding to the replies of the conversation robot in the previous round to obtain an interactive matrix; and extracting the characteristics of the interaction matrix, and obtaining a semantic correlation score between the conversation behaviors simulated in the current round and the responses of the conversation robot in the previous round based on the characteristics. Alternatively, CNN (Convolutional Neural Network) may be used to extract features of the interaction matrix.

In an exemplary embodiment, analyzing the task migration completion degree corresponding to the dialog behavior refers to analyzing a task migration stage to which the dialog behavior currently belongs, and further obtaining the task migration completion degree corresponding to the dialog behavior. The task migration stage is a stage set manually, and different stages correspond to different task migration completion degrees. The embodiment of the application does not limit how the task migration stage is set, and the setting can be set according to experience. In one possible implementation, analyzing the dialog turn corresponding to the dialog behavior refers to the number of dialog turns under the current dialog strategy when the dialog behavior simulated in the current turn is analyzed. Illustratively, the dialog behavior of the current round of simulation is the 5 th round of dialog under the current dialog strategy.

In one possible implementation, calculating the reward value based on the state of the dialog behavior for the set purpose includes: and calculating the reward value according to a certain weight by utilizing the state corresponding to the conversation behavior of the current round of simulation obtained by analysis. Calculating the reward value based on the analysis results corresponds to the reward mechanism, reworkmodel, in fig. 5, and in one possible implementation, different dialog strategies correspond to different weight assignments for calculating the reward value. Exemplarily, when the dialogue strategy is chatting, the weight occupied by the dialogue turns corresponding to the dialogue behavior when calculating the reward value is larger than the weight occupied by the task migration completion degree corresponding to the dialogue behavior when calculating the reward value; when the conversation strategy is to execute the task conversation process, the weight of the conversation turns corresponding to the conversation behavior in calculating the reward value is smaller than the weight of the task migration completion degree corresponding to the conversation behavior in calculating the reward value.

In one possible implementation, constraining the initial conversation policy model to adjust the conversation policy based on the reward value, resulting in a target conversation policy model includes: when the reward value is larger than the reward value threshold value, the initial conversation strategy model keeps the current conversation strategy unchanged, and the initial conversation strategy model is used as a target conversation strategy model; or when the reward value is not greater than the reward value threshold value, the initial conversation strategy model changes the current conversation strategy into another conversation strategy, updates the initial conversation strategy model, and takes the updated conversation strategy model as the target conversation strategy model. The size of the reward threshold is not limited in the embodiment of the application, and can be set according to experience. The initial dialog policy model is constrained based on the reward value to adjust the dialog policy corresponding to the dialog policy model (dialog policy module) in fig. 5.

In an exemplary embodiment, the dialog policy includes: chatting, topic following, starting a new related topic, triggering a task conversation target, executing a task conversation process and interrupting a conversation topic.

In an exemplary embodiment, when the target intention is chatting according to the content to be replied, the current round of conversation strategy is determined to be chatting. In one possible implementation, topic following is a conversation strategy used in the case where the last pair of conversation topics contained the current turn of conversation topic. Illustratively, when the target intention obtained according to the content to be replied is chatting, the emotion type is positive emotion, the topic is a movie, the topic of the previous round of conversation comprises the movie, and the correlation degree between the first target information corresponding to the content to be replied and the life service is lower than the correlation degree threshold value, the conversation strategy of the current round is determined to be topic following. The correlation threshold is a set value, which is not limited in the embodiment of the present application.

In one possible implementation, initiating a new related topic is a conversation strategy that is used after the conversation topic is broken. Illustratively, when the target intention is chatty, the emotion type is negative emotion and the conversation strategy used by the previous conversation is interrupted as the conversation topic according to the content to be replied, the conversation strategy in the current round is determined to start a new related topic.

In an exemplary embodiment, when the target intention is an interruption topic and the emotion type is a negative emotion according to the content to be replied, the conversation strategy in the current round is determined to be interruption of the conversation topic. Illustratively, when the target intention is to watch a movie, the emotion type is positive emotion, the topic is a movie, and the correlation between the first target information corresponding to the content to be replied and the living service of watching the movie is the highest and is greater than the correlation threshold, the current turn of conversation strategy is determined as the task conversation target.

In one possible implementation, executing the task dialog flow is a dialog policy that is used after executing the trigger task dialog target. Illustratively, when the target intention obtained according to the content to be replied is a movie ticket, the topic is a movie, the correlation between the first target information corresponding to the content to be replied and the living service of watching the movie is the highest and is greater than the correlation threshold, and the strategy of the previous round of conversation is the task conversation triggering target, the conversation strategy of the current round is determined to be the task conversation executing process. In an exemplary embodiment, when the target intention is an interruption topic according to the content to be replied, the current conversation strategy is determined to be the interruption of the conversation topic.

In step 204, reply content is generated based on the content to be replied and the conversation policy.

In one possible implementation manner, generating the reply content based on the content to be replied and the conversation policy includes: and coding the conversation strategy and the content to be replied to obtain a corresponding conversation strategy representation and a first vector set, and paying attention to the first vector set by utilizing the conversation strategy representation so that the content related to the conversation strategy representation in the first vector set is focused. And then, decoding the noticed first vector to obtain the reply content. Based on the content to be replied and the dialog strategy, the generated reply content corresponds to the reply generation Turn-Level (reply learning) in fig. 3.

In another possible implementation manner, before generating the reply content based on the content to be replied and the dialog policy, the method further includes: and acquiring the history record of the conversation. At this time, generating reply content based on the content to be replied and the conversation strategy, including: and generating reply content based on the content to be replied, the conversation strategy and second target information, wherein the second target information comprises at least one of topics and historical records.

In an exemplary embodiment, the second target information includes a topic and a history, and the generating of the reply content based on the content to be replied, the second target information and the conversation policy includes: the method comprises the steps that content to be replied, history records, topics and conversation strategies are coded, and a first vector set corresponding to the content to be replied, a second vector set corresponding to the history records, a third vector set corresponding to the conversation strategies and a fourth vector set corresponding to the topics are obtained; performing weighted combination on the first vector based on the second vector set, the third vector set and the fourth vector set to obtain a first vector weighted set; obtaining sub-topic entity nodes with the path length to the entity node which is the same as the topic content and is less than a length threshold in a knowledge graph corresponding to the topic, and taking the content of the sub-topic entity nodes as the sub-topic; encoding the sub-topics to obtain a fifth vector set; generating reply content based on the fifth vector set and the first vector weight set.

In one possible implementation manner, the weighted combination of the first vectors based on the second vector set, the third vector set, and the fourth vector set to obtain the first vector weighted set includes: embedding the third vector set and the fourth vector set into the second vector set to obtain a dialogue strategy representation; and performing weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set. The process of generating the reply content based on the fifth vector set and the first vector weight set comprises: dynamically paying attention to the first vector weighting set by using a fifth vector set to obtain a dynamic first vector weighting set; and decoding the dynamic first vector weighting set to obtain reply content.

Based on the content to be replied, the history, the topic and the conversation strategy, the process of generating the reply content corresponds to fig. 7, and the topic map in fig. 7 corresponds to the sub-topic obtained based on the topic.

In one possible implementation, the weighted combination of the first vector set using the dialog strategy representation to obtain the first vector weight set includes: and focusing on part of the contents to be replied represented by the first vector set based on the information provided by the conversation strategy, the topic and the history represented by the conversation strategy representation. The important focus is that the content of the important focus has more weight when decoding. The weighted combination of the first vector set with the dialog strategy representation corresponds to the attention mechanism attention (attention) in fig. 7.

In a possible implementation manner, a sub-topic entity node in a knowledge graph corresponding to a topic, where a path length to an entity node the same as the content of the topic is less than a length threshold, is obtained, and the content of the sub-topic entity node is used as a sub-topic, which means that a sub-topic with a similarity value greater than a similarity threshold with the topic is obtained. And a fifth vector set obtained by coding the subtopic dynamically pays attention to the first vector weight set, so that the dynamically paid attention to the first vector weight set can refer to the subtopic to carry out dynamic reply during decoding, instead of only paying attention to the topic in the content to be replied, and finally the replied content is generated in a controllable range. Wherein, the required knowledge for generating the reply is provided by a knowledge base established in advance by a human, and the required skill in the reply is provided by a skill base established in advance by a human. Illustratively, the skill base includes skills for ordering movie tickets. Illustratively, the generated reply is: whether you are sure to order a movie ticket for this evening movie 1. The required skills in the reply are those for ordering movie tickets, which are provided by a skill base.

In another exemplary embodiment, the second target information is a history, and the generating of the reply content based on the content to be replied, the second target information and the conversation policy includes: coding the content to be replied, the history record and the conversation strategy to obtain a first vector set corresponding to the content to be replied, a second vector set corresponding to the history record and a fourth vector set corresponding to the topic; embedding the fourth vector set into the second vector set to obtain a dialogue strategy representation; carrying out weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set; and decoding the first vector weight set to obtain the reply content.

In another exemplary embodiment, the second target information is a topic, and the generating of the reply content based on the content to be replied, the second target information and the conversation policy includes: the method comprises the steps that contents to be replied, topics and conversation strategies are coded, and a first vector set corresponding to the contents to be replied, a third vector set corresponding to the topics and a fourth vector set corresponding to the topics are obtained; embedding the fourth vector set into the third vector set to obtain a conversation strategy representation; carrying out weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set; obtaining sub-topic entity nodes with the path length to the entity node which is the same as the topic content and is less than a length threshold in a knowledge graph corresponding to the topic, and taking the content of the sub-topic entity nodes as the sub-topic; encoding the sub-topics to obtain a fifth vector set; dynamically paying attention to the first vector weight set by using the fifth vector set to obtain a dynamic first vector weight set; and decoding the dynamic first vector weight set to obtain the reply content.

In the embodiment of the application, the analysis result including the target intention is obtained by acquiring and analyzing the content to be replied. The conversation strategy which aims at setting purposes is determined based on the purpose intentions, the reply content which is generated by the aid of the conversation strategy which aims at setting purposes considers a plurality of conversation targets such as maximum conversation turns, semantic relevance and task migration completion degree, and different conversation strategies enable the generation method of the reply content provided by the embodiment of the application to be suitable for various contexts.

Referring to fig. 8, an embodiment of the present application provides an apparatus for generating reply content, where the apparatus includes:

an obtaining module 801, configured to obtain content to be replied;

the analysis module 802 is configured to analyze the content to be replied to obtain an analysis result, where the analysis result includes a target intention;

a determining module 803, configured to determine, based on the target intent, a dialog strategy targeting a set purpose, where the set purpose includes at least one of semantic relevance, maximum dialog turn, and task migration completion;

the generating module 804 is configured to generate the reply content based on the content to be replied and the conversation policy.

In a possible implementation manner, the parsing result further includes at least one of an emotion type and a topic corresponding to the content to be replied; the determining module 803 is configured to determine a conversation strategy targeting a set purpose based on a target intention and at least one of an emotion type, a topic, and a relevance, where the relevance is used to indicate a possibility that each type of service corresponding to the content to be replied is selected, and the relevance is positively correlated with the possibility that each type of service is selected.

In a possible implementation manner, the determining module 803 is further configured to obtain first target information corresponding to the content to be replied, where the first target information includes at least one of time, type, and number of times of a service corresponding to the content to be replied; and determining the corresponding relevancy of the content to be replied based on the first target information.

In a possible implementation manner, the parsing module 802 is configured to perform intent recognition on content to be replied to obtain a target intent; carrying out emotion recognition on the content to be replied to obtain an emotion type; performing field identification on the content to be replied to obtain a target field related to the content to be replied; acquiring a knowledge graph corresponding to the target field, detecting a part of the content to be replied, which is the same as the content of an entity node of the knowledge graph, and taking the part of the content, which is the same as the content, as a topic of conversation.

In one possible implementation, the determining module 803 is configured to input at least one of emotion type, topic, relevancy, and target intent into the target conversation strategy model, and the target conversation strategy model determines the conversation strategy based on the at least one of emotion type, topic, relevancy, and target intent.

In one possible implementation, the apparatus further includes:

the simulation module is used for simulating a conversation behavior based on the current conversation strategy;

the analysis module is used for analyzing the state of the conversation behavior under a set purpose and calculating an incentive value based on the state, wherein the state comprises the task migration degree and the conversation turn corresponding to the conversation behavior, or the state comprises the task migration degree, the conversation turn and the semantic correlation degree corresponding to the conversation behavior;

And the adjusting module is used for restricting the initial conversation strategy model to adjust the conversation strategy based on the reward value to obtain a target conversation strategy model.

In a possible implementation manner, the adjusting module is configured to, when the reward value is greater than the reward value threshold, keep the current conversation policy unchanged by the initial conversation policy model, and use the initial conversation policy model as the target conversation policy model; or when the reward value is not greater than the reward value threshold value, the initial conversation strategy model changes the current conversation strategy into another conversation strategy, updates the initial conversation strategy model, and takes the updated conversation strategy model as the target conversation strategy model.

In one possible implementation, the simulation module is configured to simulate a normal conversation behavior based on the current conversation strategy, the normal conversation behavior being a conversation behavior in a positive sample; alternatively, abnormal dialogue acts are simulated based on the current dialogue strategy, the abnormal dialogue acts being dialogue acts in negative examples.

In a possible implementation manner, the generating module 804 is further configured to obtain a history record of the current session; a generating module 804, configured to generate the reply content based on the content to be replied, the conversation policy, and the second target information, where the second target information includes at least one of a topic and a history.

In a possible implementation manner, the second target information includes a topic and a history, and the generating module 804 is configured to encode the content to be replied, the history, the topic, and the conversation policy to obtain a first vector set corresponding to the content to be replied, a second vector set corresponding to the history, a third vector set corresponding to the conversation policy, and a fourth vector set corresponding to the topic; performing weighted combination on the first vector based on the second vector set, the third vector set and the fourth vector set to obtain a first vector weighted set; obtaining sub-topic entity nodes corresponding to the topics, and taking the contents of the sub-topic entity nodes as the sub-topics, wherein the sub-topic entity nodes comprise entity nodes of which the path length from the entity nodes which are the same as the topic contents in the knowledge graph is less than a length threshold; encoding the sub-topics to obtain a fifth vector set; generating reply content based on the fifth vector set and the first vector weight set.

In a possible implementation manner, the generating module 804 is configured to embed the third vector set and the fourth vector set into the second vector set, so as to obtain a dialog policy representation; and performing weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set.

In one possible implementation manner, the generating module 804 is configured to dynamically note the first vector weight set by using a fifth vector set, so as to obtain a dynamic first vector weight set; and decoding the dynamic first vector weight set to obtain the reply content.

In a possible implementation manner, the conversation strategy is chatting, or topic following, or new related topic starting, or task conversation object triggering, task conversation process executing, or conversation topic interruption; the task conversation process is a conversation strategy used under the condition that the last pair of conversation strategies is a task conversation triggering target; starting a new related topic is a conversation strategy used under the condition that the previous pair of conversation strategies is interrupted as the conversation topic; topic following is a conversation strategy used in the case where the last pair of conversation topics contained the current turn of conversation topic.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application, where the server may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors 901 and one or more memories 902, where the one or more memories 902 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 901, so as to enable the server to implement the method for generating the reply content according to the foregoing method embodiments. Illustratively, the processor 901 is a Central Processing Unit (CPU). Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may be: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. A terminal may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, a terminal includes: a processor 1501 and a memory 1502.

Processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 1501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 1502 may include one or more computer-readable storage media, which may be non-transitory. The memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1502 is configured to store at least one instruction for execution by the processor 1501 to cause the terminal to implement the method for generating reply content provided by the method embodiments of the present application.

In some embodiments, the terminal may further include: a peripheral interface 1503 and at least one peripheral. The processor 1501, memory 1502, and peripheral interface 1503 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1503 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1504, a display 1505, a camera assembly 1506, audio circuitry 1507, a positioning assembly 1508, and a power supply 1509.

The peripheral interface 1503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1501 and the memory 1502. In some embodiments, the processor 1501, memory 1502, and peripheral interface 1503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1501, the memory 1502, and the peripheral interface 1503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuitry 1504 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 1504 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1504 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1504 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1505 is a touch display screen, the display screen 1505 also has the ability to capture touch signals on or over the surface of the display screen 1505. The touch signal may be input to the processor 1501 as a control signal for processing. In this case, the display screen 1505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1505 may be one, provided on the front panel of the terminal; in other embodiments, the display 1505 may be at least two, each disposed on a different surface of the terminal or in a folded design; in other embodiments, the display 1505 may be a flexible display, disposed on a curved surface or a folded surface of the terminal. Even further, the display 1505 may be configured in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1505 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

Camera assembly 1506 is used to capture images or video. Optionally, camera assembly 1506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of a terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1507 may include a microphone and speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1501 for processing or inputting the electric signals to the radio frequency circuit 1504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones can be arranged at different parts of the terminal respectively. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1507 may also include a headphone jack.

The positioning component 1508 is used to locate the current geographic Location of the terminal to implement navigation or LBS (Location Based Service). The Positioning component 1508 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, the grignard System in russia, or the galileo System in the european union.

A power supply 1509 is used to supply power to the various components in the terminal. The power supply 1509 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 1509 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery can also be used to support fast charge technology.

In some embodiments, the terminal also includes one or more sensors 1510. The one or more sensors 1510 include, but are not limited to: acceleration sensor 1511, gyro sensor 1512, pressure sensor 1513, fingerprint sensor 1514, optical sensor 1515, and proximity sensor 1516.

The acceleration sensor 1511 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 1511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1501 may control the display screen 1505 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal collected by the acceleration sensor 1511. The acceleration sensor 1511 may also be used for acquisition of motion data of a game or a user.

The gyroscope sensor 1512 may detect a body direction and a rotation angle of the terminal, and the gyroscope sensor 1512 may cooperate with the acceleration sensor 1511 to acquire a 3D motion of the user on the terminal. The processor 1501, based on the data collected by the gyroscope sensor 1512, may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 1513 may be provided on a side frame of the terminal and/or under the display 1505. When the pressure sensor 1513 is disposed on the side frame of the terminal, the holding signal of the user to the terminal can be detected, and the processor 1501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1513. When the pressure sensor 1513 is disposed at a lower layer of the display screen 1505, the processor 1501 controls the operability control on the UI interface in accordance with the pressure operation of the user on the display screen 1505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1514 is configured to capture a fingerprint of the user, and the processor 1501 identifies the user based on the fingerprint captured by the fingerprint sensor 1514, or the fingerprint sensor 1514 identifies the user based on the captured fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 1514 may be disposed on the front, back, or side of the terminal. When a physical key or vendor Logo (trademark) is provided on the terminal, the fingerprint sensor 1514 may be integrated with the physical key or vendor Logo.

The optical sensor 1515 is used to collect ambient light intensity. In one embodiment, processor 1501 may control the brightness of display screen 1505 based on the intensity of ambient light collected by optical sensor 1515. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1505 is adjusted up; when the ambient light intensity is low, the display brightness of the display screen 1505 is adjusted down. In another embodiment, the processor 1501 may also dynamically adjust the shooting parameters of the camera assembly 1506 based on the ambient light intensity collected by the optical sensor 1515.

A proximity sensor 1516, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 1516 is used to collect a distance between the user and the front surface of the terminal. In one embodiment, when the proximity sensor 1516 detects that the distance between the user and the front face of the terminal gradually decreases, the processor 1501 controls the display 1505 to switch from a bright screen state to a dark screen state; when the proximity sensor 1516 detects that the distance between the user and the front of the terminal is gradually increased, the processor 1501 controls the display 1505 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium is further provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor of a computer device, so as to make a computer implement any one of the above-mentioned methods for generating reply content.

In one possible implementation, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes any one of the above methods for generating reply content.

It should be understood that reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only an exemplary embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for generating reply content, the method comprising:

acquiring content to be replied;

2. The method according to claim 1, wherein the parsing result further includes at least one of an emotion type and a topic corresponding to the content to be replied;

the determining a dialog strategy targeting a set purpose based on the target intention comprises:

and determining a conversation strategy taking a set purpose as a target based on the target intention and at least one of the emotion type, the topic and the relevance, wherein the relevance is used for representing the possibility that various services corresponding to the content to be replied are selected, and the relevance is positively correlated with the possibility that various services are selected.

3. The method of claim 2, wherein prior to determining a dialog strategy targeting a set purpose based on the target intent and at least one of the emotion type, the topic, and relevance, further comprising:

acquiring first target information corresponding to the content to be replied, wherein the first target information comprises at least one of time, type and frequency of service corresponding to the content to be replied;

and determining the relevancy corresponding to the content to be replied based on the first target information.

4. The method according to claim 2 or 3, wherein the parsing the content to be replied to obtain a parsing result comprises:

performing intention identification on the content to be replied to obtain the target intention;

performing emotion recognition on the content to be replied to obtain an emotion type;

performing field identification on the content to be replied to obtain a target field related to the content to be replied;

and acquiring a knowledge graph corresponding to the target field, detecting the part of the content to be replied, which has the same content as the entity node of the knowledge graph, and taking the part of the content, which has the same content, as a topic of conversation.

5. The method of claim 2 or 3, wherein the determining a conversation strategy targeting a set purpose based on the target intention and at least one of the emotion type, the topic, and the relevance comprises:

inputting the target intent and at least one of the emotion type, the topic, the relevancy into a target conversation strategy model, the conversation strategy determined by the target conversation strategy model based on the target intent and at least one of the emotion type, the topic, the relevancy.

6. The method of claim 5, wherein said inputting at least one of said emotion type, said topic, said relevancy, and said target intent into a target conversation strategy model further comprises:

simulating a conversation behavior based on the current conversation strategy;

analyzing the state of the conversation behavior under the set purpose, and calculating an incentive value based on the state, wherein the state comprises the task migration degree and the conversation turn corresponding to the conversation behavior, or the state comprises the task migration degree, the conversation turn and the semantic correlation degree corresponding to the conversation behavior;

And regulating the conversation strategy based on the reward value constraint initial conversation strategy model to obtain the target conversation strategy model.

7. The method of claim 6, wherein the constraining an initial conversation policy model to adjust a conversation policy based on the reward value results in the target conversation policy model, comprising:

when the reward value is larger than a reward value threshold value, the initial conversation strategy model keeps the current conversation strategy unchanged, and the initial conversation strategy model is used as the target conversation strategy model;

or when the reward value is not greater than the reward value threshold value, the initial conversation strategy model changes the current conversation strategy into another conversation strategy, updates the initial conversation strategy model, and takes the updated conversation strategy model as the target conversation strategy model.

8. The method of claim 6, wherein simulating conversation behavior based on the current conversation policy comprises:

simulating normal conversation behavior based on the current conversation strategy, the normal conversation behavior being conversation behavior in a positive sample;

or, simulating an abnormal dialog behavior based on the current dialog policy, the abnormal dialog behavior being a dialog behavior in a negative example.

9. The method according to any one of claims 1-3 and 5-8, wherein before generating reply content based on the content to be replied and the conversation policy, the method further comprises:

acquiring a history record of the conversation;

generating reply content based on the content to be replied and the conversation strategy, wherein the generating reply content comprises:

generating the reply content based on the content to be replied, the conversation strategy and second target information, wherein the second target information comprises at least one of topics and the history records.

10. The method of claim 9, wherein the second target information comprises the topic and the history, and wherein generating reply content based on the content to reply, the conversation policy, and the second target information comprises:

encoding the content to be replied, the history record, the topic and the conversation strategy to obtain a first vector set corresponding to the content to be replied, a second vector set corresponding to the history record, a third vector set corresponding to the conversation strategy and a fourth vector set corresponding to the topic;

performing weighted combination on the first vector based on the second vector set, the third vector set and the fourth vector set to obtain a first vector weighted set;

Obtaining sub-topic entity nodes corresponding to the topic, and taking the content of the sub-topic entity nodes as sub-topics, wherein the sub-topic entity nodes comprise entity nodes of which the path length to the entity node which is the same as the topic content is less than a length threshold value in the entity nodes corresponding to the knowledge graph;

coding the sub-topics to obtain a fifth vector set;

generating the reply content based on the fifth vector set and the first vector weight set.

11. The method of claim 10, wherein the weighted combination of the first vectors based on the second set of vectors, the third set of vectors, and the fourth set of vectors to obtain a first set of vector weights comprises:

embedding the third vector set and the fourth vector set into the second vector set to obtain a dialogue strategy representation;

and performing weighted combination on the first vector set based on the conversation strategy representation to obtain a first vector weighted set.

12. The method of claim 10, wherein generating the reply content based on the fifth set of vectors and the first set of vector weights comprises:

Dynamically paying attention to the first vector weighting set by using the fifth vector set to obtain a dynamic first vector weighting set;

and decoding the dynamic first vector weight set to obtain the reply content.

13. The method of any one of claims 1-3, 5-8, 10-12, wherein the conversation strategy is chat, or topic following, or initiating a new related topic, or triggering a task conversation goal, or performing a task conversation process, or interrupting a conversation topic;

the task conversation execution process is a conversation strategy used under the condition that the last pair of conversation strategies is the task conversation triggering target;

the starting of the new related topic is a conversation strategy used under the condition that the last pair of conversation strategies is interrupted as the conversation topic;

the topic following is a conversation strategy used in the case where the last pair of conversation topics included the current turn of conversation topic.

14. An apparatus for generating reply content, the apparatus comprising:

the acquisition module is used for acquiring the content to be replied;

A determining module, configured to determine, based on the target intent, a dialog strategy targeting a set purpose, where the set purpose includes at least one of semantic relevance, a maximum dialog turn, and task migration completion;

15. A computer device comprising a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to cause the computer device to implement the method for generating reply content according to any one of claims 1 to 13.