CN116541504B - Dialog generation method, device, medium and computing equipment - Google Patents

Dialog generation method, device, medium and computing equipment Download PDF

Info

Publication number
CN116541504B
CN116541504B CN202310765756.4A CN202310765756A CN116541504B CN 116541504 B CN116541504 B CN 116541504B CN 202310765756 A CN202310765756 A CN 202310765756A CN 116541504 B CN116541504 B CN 116541504B
Authority
CN
China
Prior art keywords
dialogue
reply
preset
target user
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310765756.4A
Other languages
Chinese (zh)
Other versions
CN116541504A (en
Inventor
杨家铭
郑叔亮
李文珏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingxin Intelligent Technology Co ltd
Original Assignee
Beijing Lingxin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lingxin Intelligent Technology Co ltd filed Critical Beijing Lingxin Intelligent Technology Co ltd
Priority to CN202310765756.4A priority Critical patent/CN116541504B/en
Publication of CN116541504A publication Critical patent/CN116541504A/en
Application granted granted Critical
Publication of CN116541504B publication Critical patent/CN116541504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a dialogue generation method, a dialogue generation device, a dialogue generation medium and a calculation device, wherein the dialogue generation method comprises the following steps: acquiring a dialogue request; determining a dialogue scene based on the historical dialogue record; acquiring at least one preset feature data from a plurality of preset feature data based on a dialogue scene, and constructing a target user portrait; determining the matching degree of each candidate reply text and the target user image; performing confidence degree sequencing on the target reply texts in each preset reply mode; and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence. The personalized reply text obtained by the dialogue generation method is more matched with the target user portrait in the dialogue scene, the confidence of the selected preset reply mode is higher, and the personalized processing direction is more accurate.

Description

Dialog generation method, device, medium and computing equipment
Technical Field
The present invention relates to the field of dialog generation, and in particular, to a dialog generation method, apparatus, medium, and computing device.
Background
At present, intelligent chat robots have been applied to industry with some initial success, and can replace a large amount of artificial customer service in the fields of intelligent customer service, personal assistant, electronic commerce, insurance and the like, so as to perform simple business processing and customer support.
However, the current intelligent chat robots can only perform mechanical reply according to a preset speaking operation based on certain fixed scenes, and cannot perform personalized reply according to different users and different scenes.
At present, aiming at the problems of stiff reply content and lack of individuation of an intelligent chat robot, a method for individuating and decorating reply content according to user images is adopted, but most of the methods are simple according to the user images, and individuation and decoration are carried out according to the same direction in any dialogue scene, so that individuation reply is obviously inaccurate. For example, the user portrait shows that the user is a person with lively personality, but in some serious and formal dialogue scenes (such as medical inquiry and job-seeking inquiry), personalized modification is carried out towards the lively direction, and display is untimely, so how to accurately carry out personalized reply is a problem to be solved urgently.
Disclosure of Invention
The invention mainly aims to provide a dialogue generation method, a dialogue generation device, a dialogue generation medium and a dialogue generation computing device, and aims to solve the problem of how to accurately conduct personalized reply in the background technology.
In order to achieve the above object, the present invention provides a dialog generating method, including:
Acquiring a dialogue request, wherein the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data;
based on a plurality of preset reply modes, respectively acquiring a plurality of candidate reply texts in each preset reply mode;
determining a dialogue scene based on the historical dialogue record;
acquiring at least one preset feature data from the plurality of preset feature data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene;
determining the matching degree of each candidate reply text and the target user image;
performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein each target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in a plurality of candidate reply texts in the preset reply mode;
and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
In an embodiment of the present application, the plurality of preset reply modes include: a knowledge question-answer reply mode, a common question answer mode, a dialogue generation mode and a rule dialogue mode.
In an embodiment of the present application, the dialog scenario includes: customer service scene, question-answering scene, and chat scene.
In an embodiment of the present application, the plurality of preset feature data includes: age, target product familiarity, target information acquisition channel, domain knowledge, learning objectives, information requirements, language style, interests, emotional state, topic preferences, social preferences.
In an embodiment of the present application, the obtaining, based on the dialog scene, at least one preset feature data from the plurality of preset feature data, and constructing a target user portrait that conforms to the dialog scene includes:
in a customer service scene, selecting the age, the familiarity of a target product and a target information acquisition channel of the target user, and constructing the target user portrait;
selecting domain knowledge, learning targets, information requirements and language styles of the target user under a question-answer scene, and constructing the target user portrait;
and under the chatting scene, selecting interests, emotional states, topic preferences and social preferences of the target user, and constructing the target user portrait.
In this embodiment of the present application, the determining the matching degree between each candidate reply text and the target user image includes:
Inputting the target user portrait and each candidate reply text into a first preset classification model, and determining the matching degree of each candidate reply text and the target user portrait;
the first preset classification model is obtained through training of a first training set, the first training set comprises a plurality of first training samples, each first training sample comprises a first context, and the user portrait corresponding to the first context.
In this embodiment of the present application, the performing confidence ranking on the target reply text in each preset reply mode includes:
respectively forming a dialogue pair with the information to be replied in the history dialogue record based on the target reply text in each preset reply mode;
inputting each dialogue pair and the historical dialogue record into a second preset classification model, and determining the relevance of each dialogue pair and the historical dialogue record; the second preset classification model is obtained through training of a second training set, the second training set comprises a plurality of second training samples and a plurality of third training samples, the second training samples comprise a first dialogue pair and a second context related to the first dialogue pair, and the third training samples comprise a second dialogue pair and a third context unrelated to the second dialogue pair;
And determining the confidence level of the target reply text corresponding to each dialogue pair based on the correlation between each dialogue pair and the historical dialogue record.
In this embodiment of the present application, the generating, based on the dialog scene, the target user portrait, and the target reply text with the highest confidence, a personalized reply text includes:
and under the customer service scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: professional, friendly, clear, guided and durable;
and under the question-answering scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: conciseness, detailed explanation, mild guidance, example illustration, interactive exploration;
and under the chatting scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: humor, interest resonance, information recommendation, exploring dialogue, emotion care.
In this embodiment of the present application, the generating, based on the dialog scene, the target user portrait, and the target reply text with the highest confidence, a personalized reply text includes:
Inputting the dialogue scene, the target user portrait and the target reply text with highest confidence into a preset language model to generate a personalized reply text;
the preset language model is obtained through training of a third training set, the third training set comprises a plurality of fourth training samples, each fourth training sample comprises a text to be rewritten and a target rewritten text of the text to be rewritten in each processing direction, the processing directions at least comprise each personalized processing direction in each dialogue scene, and the training process of the preset language model is as follows:
acquiring a rewritten text based on the preset language generation model, the text to be rewritten and any processing direction;
obtaining a rewrite loss based on the rewritten text and a target rewritten text corresponding to the text to be rewritten in the processing direction;
and if the rewrite loss is larger than a preset value, updating the rewrite parameters of the preset language model until the rewrite loss is not larger than the preset value.
In an embodiment of the present application, before obtaining the plurality of candidate reply texts in each preset reply mode, the method further includes:
Detecting violations based on the historical dialog records;
if the history dialogue record is detected to contain illegal contents, the first preset information is replied.
In an embodiment of the present application, before obtaining the plurality of candidate reply texts in each preset reply mode, the method further includes:
and judging whether the content to be replied in the history dialogue record meets a first preset condition, if so, replying second preset information, wherein the first preset condition can represent whether the content to be replied is non-chat information.
In an embodiment of the present application, after generating the personalized reply text, the method further includes:
and updating the personalized reply text to the historical dialogue record of the target user.
In an embodiment of the present application, the session request includes a plurality of processing tasks; the dialog generation method further comprises the following steps:
creating a dependency relationship between each processing task;
determining the dependency sequence of each processing task based on the dependency relationship;
and processing the plurality of processing tasks according to the dependency sequence.
In this embodiment of the present application, the creating a dependency relationship between each processing task includes:
Creating a dependency graph comprising a plurality of nodes, each node representing a processing task, a dependency link being connected between nodes having a dependency relationship, the dependency link being capable of indicating the dependency relationship of two nodes connected to each other.
The embodiment of the application also provides a dialogue generating device, which comprises:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a dialogue request, and the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data;
the processing module is used for respectively acquiring a plurality of candidate reply texts in each preset reply mode based on a plurality of preset reply modes;
determining a dialogue scene based on the historical dialogue record;
constructing a target user portrait conforming to the dialogue scene based on the dialogue scene and the plurality of preset feature data;
determining the matching degree of each candidate reply text and the target user image;
performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein the target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in the preset reply mode;
and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
The embodiments of the present application also propose a computer-readable storage medium, on which a computer program is stored, which when being executed by a processor implements a method according to any one of the preceding claims.
Embodiments of the present application also propose a computing device comprising a processor for implementing a method according to any of the preceding claims when executing a computer program stored in a memory.
According to the method and the device, a dialogue scene is determined through a history dialogue record, partial feature data is selected from a plurality of preset feature data based on the dialogue scene to serve as a target user portrait of a target user in the dialogue scene, and then matching degree sequencing is conducted on candidate reply texts in all preset reply modes based on the target user portrait in the dialogue scene, so that target reply texts which are most matched with the target portrait in all preset reply modes can be obtained, and then the candidate reply texts which are most matched with the target user portrait in the scene can be obtained according to the confidence degree of the target reply texts in all preset reply modes. In addition, when the candidate reply text is personalized based on the dialogue scene and the target user portrait, the candidate reply text is not personalized based on all the characteristic data of the target user, but is personalized based on the target user portrait formed by partial characteristic data conforming to the dialogue scene, so that the personalized direction is more accurate. The personalized reply text obtained by the method of the embodiment of the application is more matched with the target user portrait in the dialogue scene, the confidence of the selected preset reply mode is higher, and the personalized processing direction is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a step diagram of an embodiment of a dialog generation method according to the present invention;
FIG. 2 is a flow chart of a method for generating a dialogue according to an embodiment of the present invention;
FIG. 3 is a frame diagram of a chat robot or chat system in accordance with an embodiment of the invention;
FIG. 4 is a block diagram of a dialogue generating device according to an embodiment of the present invention;
FIG. 5 is a block diagram of one embodiment of a medium of the present invention;
FIG. 6 is a block diagram of one embodiment of a computing device of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be appreciated by those skilled in the art that embodiments of the present invention may be implemented as an apparatus, device, method or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a dialogue generating method, a dialogue generating device, a dialogue generating medium and a computing device are provided.
Exemplary method
Referring to fig. 1 and 2 in combination, the present exemplary embodiment provides a method for generating a dialogue, including the following steps:
step S100: a dialogue request is obtained, the dialogue request comprising a historical dialogue record of a target user and a plurality of preset feature data.
In this embodiment of the present application, the session request may be information that the user requests to perform a session with the chat robot and the chat system, where the session request may include a history session record between the target user and the chat robot and between the target user and the chat system, and the information to be replied of the target user may be specified through the history session record.
In addition, the dialog request may also include a plurality of preset feature data for the target user. For example, in the embodiment of the present application, the preset feature data may include: age, target product familiarity, target information acquisition channel, domain knowledge, learning objectives, information requirements, language style, interests, emotional state, topic preferences, social preferences.
The age refers to the age of the target user, and may be specific to the specific years of the target user, or specific to the age range where the target user is located, for example: teenagers, middle-aged and elderly people.
Familiarity with a target product represents the familiarity that a target user has with using the product or service, such as: in the buying and selling scenario, the target user is familiar with the products that want to be bought and sold.
A target information acquisition path representing what channel the target user uses to acquire information, such as: telephone, online chat, email, etc.
Domain knowledge, representing the level of expertise a target user has in various domains, such as: expertise level in science, history, sports, etc.
Learning objectives, representing learning objectives of a target user, such as: seeking answers to specific questions, expanding knowledge planes, or solving actual questions.
Information requirements, representing the type of information required by the target user, such as: factual answers, interpreted answers, specific examples, and the like.
Language style, speech style and expression habits representing the target user, such as: formally, professionally, orally speaking and randomly.
Hobbies, representing hobbies of the target user, such as; sports, music, movies, food, landscapes, etc.
Emotional states, representing emotional conditions of the target user, such as: happy, depressed, curious, etc.
Topic preferences, representing topics of interest to the target user, such as: weather, travel, science and technology, etc.
Social preferences, which represent social preferences of the target user, such as: preference favors friendly, humorous conversations, or preference serious, rational discussions.
The preset feature data can be filled by the target user, for example, the user can fill in the target user in the initialization process, or the preset feature data can be judged and obtained according to the historical chat record of the user.
Step S200: based on a plurality of preset reply modes, a plurality of candidate reply texts in each preset reply mode are respectively obtained.
In the embodiment of the present application, the preset reply mode includes the following four modes: a knowledge question-answer reply mode, a common question answer mode, a dialogue generation mode and a rule dialogue mode.
In the knowledge question-answer reply mode, the information to be replied can be replied to obtain a knowledge question-answer reply result, for example, the method can be carried out by the following steps:
firstly, entity extraction is carried out on the information to be replied to obtain an extraction result;
Then, replying the extraction result according to a preset knowledge graph to obtain a knowledge question-answer replying result;
when the extraction result cannot be replied according to the preset knowledge graph, a third party platform can be called to reply the extraction result, and a knowledge question-answer reply result is obtained. In addition, only the question-answer reply results may be obtained in a plurality, for example, a plurality of knowledge question-answer reply results are obtained according to a preset number (for example, 5).
In the common problem answering mode, the information to be replied can be replied to obtain an answer reply result of the common problem, for example, in the embodiment of the application, the following steps are performed:
firstly, extracting sentence vectors of information to be replied;
and then, according to the sequence of the cosine similarity from high to low, acquiring a plurality of candidate replies from a preset problem library.
In the dialogue generation mode, the dialogue can be generated according to the information to be replied, and the dialogue can be continuously generated according to the subsequent input of the target user. For example, in the embodiment of the present application, the following steps may be performed: firstly, extracting sentence vectors of information to be replied;
and then, according to sentence vectors of the information to be replied, acquiring a plurality of candidate replies from a preset reply library according to the sequence of cosine similarity from high to low.
In the regular dialogue mode, the information to be replied can be replied to obtain a corresponding reply result of the information to be replied, for example, in the embodiment of the application, the reply result can be performed by the following steps:
firstly, extracting sentence vectors of information to be replied;
and then, according to sentence vectors of the information to be replied, acquiring a plurality of candidate replies in a preset problem library according to the sequence that the cosine similarity is from high to low and based on answers corresponding to the problem library.
For another example, in other embodiments, the candidate replies may be obtained based on the preset key questions and the preset keyword search. For example, the preset keyword and the preset keyword may be searched based on the information to be replied, and if the preset keyword or the preset keyword is searched in the information to be replied, the reply is replied based on the preset keyword or the preset reply preset to the preset keyword.
In another embodiment, in the regular dialogue mode, the searching may be performed in the information to be replied according to the searching manner of the preset key problem and the preset key word, and if the related preset key problem and the related preset key word are not searched, the candidate reply may be obtained by extracting the sentence vector of the information to be replied. For the information to be replied with obvious keywords or key problems, compared with a mode of directly extracting sentence vectors, the method can reduce the calculated amount to a certain extent and accelerate the response speed. In addition, the preset keyword search and the preset keyword search can be arranged in series, namely, simultaneously, so that the response speed is further improved.
Step S300: based on the historical dialog records, a dialog scene is determined.
In the embodiment of the application, the determination of the dialogue scene may be performed based on the dialogue record between the beginning of the dialogue and the information to be replied. For example, the target user has a long-term intermittent dialogue with the chat robot, but dialogues at different times may have different dialogue scenes, and the information to be replied is in the last dialogue scene, so the dialogue scene can be determined based on the dialogue record from the last starting dialogue to the information to be replied.
In the embodiment of the application, the dialogue scene includes: customer service scene, question-answering scene, and chat scene. The customer service scenes mainly aim at scenes such as some online shopping, after-sales service and the like; the question-answer scenes mainly aim at some knowledge question-answer, information inquiry and other scenes; the chat scene is mainly an ambiguous chat scene for some purposes.
In addition, the dialogue scene is defined for the scene of the present dialogue based on the dialogue record during the time from the beginning of the present dialogue to the waiting for the reply message, and the preset reply modes mentioned in step S200 are mainly based on the waiting for the reply message, and different preset reply modes are selected to generate the candidate reply text with pertinence.
Step S400: and acquiring at least one preset characteristic data from the plurality of preset characteristic data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene.
The preset feature data describe the character, preference, expression habit of the target user from multiple aspects, but for any user, the user can have different language expression styles in different scenes, and in certain specific occasions, the user needs to reply according to the specific language styles. In the prior art, the blind personalized modification is simply carried out according to the user portrait (all the characteristics of the target user) without dividing the scenes, and obviously, the personalized reply made in some scenes is not convenient.
In the embodiment of the present application, after determining the dialogue scene in step S300, a part of feature data is selected from each preset feature data of the target user according to different dialogue scenes to construct the target user portrait of the target user, so that the specific dialogue scene is considered, and in the process of generating the personalized reply according to the target user portrait, the personalized processing direction is also more accurate.
In the embodiment of the present application, three dialog scenes are divided in step S300, so for the three dialog scenes, three different target user portraits may be constructed based on different preset feature data, as follows:
and under the customer service scene, selecting the age, the familiarity of the target user and the target information acquisition channel of the target user, and constructing the target user portrait.
In customer service, the method mainly aims at some answers of target users to products, services and after-sales, and at the moment, how to personalize the answers is mainly selected based on age, familiarity of target products and target information acquisition channels. For example, a teenager can reply directly and quickly, while the elderly need a polite and patience reply; for example, according to the familiarity of the target product, whether the target user is a novice user or not can be judged, whether basic introduction and guidance are needed, if the target user is an old user with experience, deep technical support or advanced function explanation is needed, and the like; for example, different communication modes and response modes are required for different target information acquisition channels, if the mode of acquiring the mail is adopted, the reply mode of replying detailed information of the mail can be selected, and if the mode of acquiring the telephone and online chat is adopted, the real-time reply mode can be selected.
In customer service scene, the age of the target user, the familiarity of the target product and the target information acquisition channel are selected to construct the target user portrait of the target user, so that accurate personalized reply can be generated in the personalized processing process. Compared with the prior art, in the customer service scene as well, the prior art is obviously not accurate enough to carry out personalized reply based on all the data capable of reflecting the target user portrait. Such as: the user asks what to do with it, if the target user's social preferences are conversations favoring humor, it is possible to reply "you guess will not do with it today" if personalized replies are made based on the social preferences, which is obviously untimely.
And under the question-answer scene, selecting domain knowledge, learning targets, information requirements and language styles of the target user, and constructing target user portraits. The technical scheme includes that domain knowledge, learning targets, information requirements and language style feature data of target users are selected, target user portraits of the target users in a question-answer scene are built, and personalized replies meeting the scene can be accurately generated in the subsequent personalized reply generation process.
And under the chatting scene, selecting interests, emotional states, topic preferences and social preferences of the target user, and constructing the target user portrait. And selecting feature data of interests, emotion states, topic preferences and social preferences of the target user, constructing a target user portrait of the target user in the boring scene, and accurately generating personalized replies meeting the dialogue scene in the subsequent personalized reply generation process.
Step S500: and determining the matching degree of each candidate reply text and the target user image.
In the embodiment of the application, the matching degree of each candidate reply text in each preset reply mode and the target user image can be determined through the first preset classification model.
The first preset classification model is obtained through training of a first training set, the first training set comprises a plurality of first training samples, and each first training sample comprises a first context and a user portrait corresponding to the first context.
Specifically, the first training sample may be obtained from historical dialogue records of different target users, or from an open-source dialogue dataset. For example, when the first context of the user, such as a sentence of context and a sentence of context, or a plurality of sentences of context and a plurality of sentences of context, and a user portrait of the user corresponding to the first context, is obtained from the open-source dialogue data set, where the user portrait may include all or part of the preset feature data mentioned in step S100, a section of the first context is used as a dialogue sample, a user portrait is used as a sample tag, and a first preset classification model is input for training, so that the first preset classification model can learn a mapping relationship between the first context and each preset feature data.
And after each preset reply mode generates a plurality of candidate reply texts, each candidate reply text and the target user portrait in the dialogue scene are input into a first preset classification model, and the first classification model can output the matching degree between each candidate reply text and the target user portrait in the specific dialogue scene.
For example, the user is determined to be a boring scene in step S300, and in step S400, a target user portrait of the boring scene is constructed, which includes feature data of interests, emotional states, topic preferences, and social preferences of the target user. After the candidate reply text and the target user portrait in the boring scene are input into a trained first preset classification model, the first preset classification model can respectively judge the matching degree of the candidate reply text and the interest, the emotion state, the topic preference and the social preference of the target user, and the matching degree of the candidate reply text and the four feature data are integrated to obtain the matching degree of the candidate reply text and the target user portrait of the target user in the specific dialogue scene.
After the matching degree of each candidate reply text in each preset reply mode and the target user portrait is determined, the matching degree of each candidate reply text in each preset reply mode can be ordered, and the candidate reply text with the highest matching degree in each preset reply mode is the reply text which is generated based on the preset reply mode in the dialogue scene and is the target reply text in each preset reply mode.
Step S600: and performing confidence degree sequencing on the target reply texts in the preset reply modes, wherein each target reply text in the preset reply mode is the candidate reply text with the highest matching degree with the target user portrait in the plurality of candidate reply texts in the preset reply mode.
In step S500, one of the multiple candidate reply texts generated in each preset reply mode, which has the highest matching degree with the target user portrait, can be obtained, that is, the target candidate reply text in each preset reply mode is obtained, and in this step, the one with the highest confidence degree is selected as the final candidate reply text.
In the embodiment of the present application, the obtaining of the confidence level of the target reply text in each preset reply mode may be performed through the following steps S610 to S630, as follows:
step S610: and respectively forming a dialogue pair with the information to be replied in the history dialogue record based on the target reply text in each preset reply mode.
The four preset reply modes have four target reply texts, so that the four target reply texts can respectively form four dialogue pairs with the information to be replied.
Step S620: inputting each dialogue pair and the historical dialogue record into a second preset classification model, and determining the relevance of each dialogue pair and the historical dialogue record; the second preset classification model is obtained through training of a second training set, the second training set comprises a plurality of second training samples and a plurality of third training samples, the second training samples comprise a first dialogue pair and a second context related to the first dialogue pair, and the third training samples comprise a second dialogue pair and a third context unrelated to the second dialogue pair.
The second preset classification model is also trained in advance, and a first dialogue pair and a second context related to the first dialogue pair and a third context unrelated to the second dialogue pair in the second training sample can be obtained based on the open-source dialogue data set.
For example, for a certain open session dataset, the following session is included:
dialog a:
the user: what is the weather today?
Chat robot: storms, northeast wind grade 5.
The user: is this weather appropriate for camping outside?
Chat robot: less suitable.
The user: what is this weather fit?
Chat robot: it is possible to watch movies and listen to music at home.
Dialog B:
the user: what is Beijing good?
Chat robot: beijing roast duck, marinated boiled, etc.
The user: that better eating point?
Chat robot: woolen cloth varies from person to person.
The user: is the roast duck in that store good to eat?
Chat robot: the roast ducks in the XXX store should be scored higher.
For obtaining the first dialog and the second context associated therewith:
if the first session pair is: "user: what is this weather fit? Chat robot: it is possible to watch movies and listen to music at home. ".
The second context associated therewith may then be: "user: what is the weather today? Chat robot: storms, northeast wind grade 5. The user: is this weather appropriate for camping outside? Chat robot: less suitable. ".
If the first session pair is: "user: is the roast duck in that store good to eat? Chat robot: the roast ducks in the XXX store should be scored higher. ".
The second context associated therewith may then be: "user: what is Beijing good? Chat robot: beijing roast duck, marinated boiled, etc. The user: that better eating point? Chat robot: woolen cloth varies from person to person. ".
It can be seen that the correlation degree between the two first dialogue pairs and the corresponding second contexts is higher, and if a value between 0 and 1 is selected to represent the correlation degree, 1 can be selected as the correlation degree between the two first dialogue pairs and the corresponding contexts.
For obtaining the second dialog and a third context unrelated thereto:
if the second session pair is: "user: what is this weather fit? Chat robot: it is possible to watch movies and listen to music at home. ".
Then the third context that is not relevant to it may then be: "user: what is Beijing good? Chat robot: beijing roast duck, marinated boiled, etc. The user: that better eating point? Chat robot: woolen cloth varies from person to person. ".
For another example, if the second pair of dialogs is: "user: is the roast duck in that store good to eat? Chat robot: the roast ducks in the XXX store should be scored higher. ".
Then the third context that is not relevant to it may then be: "user: what is the weather today? Chat robot: storms, northeast wind grade 5. The user: is this weather appropriate for camping outside? Chat robot: less suitable. ".
It can be seen that the degree of correlation between the two second dialog pairs and the respective third contexts is low, and if a value between 0 and 1 is selected to represent the degree of correlation, then 0 may be selected as the degree of correlation between the two second dialog pairs and the respective third contexts.
In the above manner, a plurality of first dialogue pairs and second contexts related thereto may be acquired, while values between 0 and 1 may be selected to represent the degree of correlation, and a plurality of second dialogue pairs and third contexts not related thereto may be acquired, while values between 0 and 1 may be selected to represent the degree of correlation, and further, the second preset classification model may be trained using the plurality of first dialogue pairs, the second contexts related to each first dialogue pair, and the corresponding degree of correlation values, and the plurality of second dialogue pairs, the third contexts not related to each second dialogue pair, and the corresponding degree of correlation values, so that the second preset classification model may learn mappings between the input dialogue pairs and contexts and the degree of correlation values.
Therefore, after the dialogue pair formed by the target reply text and the information to be replied in each preset reply mode in step S610 is input to the trained second preset classification model with the history dialogue record, the second preset classification model can output the dialogue pair formed by the target reply text and the information to be replied in each preset reply mode, and the larger the correlation degree value of the dialogue pair and the history dialogue record, the higher the correlation degree, and the higher the confidence degree of the target reply text obtained by selecting the preset reply mode.
Step S620: and determining the confidence level of the target reply text corresponding to each dialogue pair based on the correlation between each dialogue pair and the historical dialogue record.
In this embodiment of the present application, according to the dialog pair formed by the target reply text and the information to be replied in each preset reply mode obtained in step S610, the confidence level of the target reply text in each preset reply mode may be determined according to the correlation degree value of the history dialog record. The target reply text with the highest confidence coefficient is the final target reply text, and the corresponding preset reply mode is the mode which is most suitable for replying the information to be replied in the history dialogue record.
Step S700: and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
In step S300, a dialogue scene is determined, in step S400, a target user portrait conforming to the dialogue scene is determined, and in step S600, a target reply text with highest confidence is determined, and in step S700, personalized processing may be performed on the target reply text with highest confidence based on the dialogue scene and the target user portrait conforming to the dialogue scene, to generate a personalized reply.
In this embodiment of the present application, in the face of different dialog scenarios, the candidate reply text with the highest confidence level may be personalized in different directions according to the target user image, for example:
and under the customer service scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: professional, friendly, clear, guiding and durable. Wherein, the individual personalized processing directions of the specialty, the friendly, the clarity, the guidance and the patience can be determined based on the target user portrait. For example, if the target user portrait indicates that the target user is older, the target user can be personalized in the friendly and patience direction, and if the target user portrait indicates that the product familiarity of the target user is higher, the target user can be personalized in the professional direction. For another example, if the target user representation indicates that the target user's product familiarity is low, personalization may be performed in a guided, clear direction.
And under the question-answering scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: conciseness, detailed explanation, gentle guidance, example illustration, interactive exploration. For example, the target user portrait indicates that the target user has higher expertise in various fields, and then personalized processing can be performed in a concise and brief direction; for another example, the target user portrait indicates that the target user has lower expertise in various fields, and then personalized processing can be performed in the direction of detailed explanation and gentle guidance; for another example, the target user indicates that the target user wants to expand the knowledge surface, and then personalizes the process to the interactive exploration direction. For another example, if the target user indicates that the target user wants to obtain a specific instance, then the direction personalization process is described with respect to the instance.
And under the chatting scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: humor, interest resonance, information recommendation, exploring dialogue, emotion care. For example, according to the interests of the target user portraits, personalized processing is carried out to the interests resonance direction; for another example, personalized information recommendation is performed based on the interests of the target user portraits. For another example, humor personalization is performed according to social preferences in the target user representation. For another example, the personalized treatment of emotion care is performed according to the emotion state in the target user portrait. For another example, an exploration dialog is conducted based on social preferences in the target user portrayal.
It should be noted that, the foregoing personalized processing in each personalized direction (professional, friendly, clear, guiding, patience, brief summary, detailed explanation, mild guiding, instance description, interactive exploration, humour, interest resonance, information recommendation, exploration dialogue, emotion care) under each dialogue scene may be implemented by a preset language model.
The preset language model is obtained through training of a third training set. The third training set comprises a plurality of fourth training samples, each fourth training sample comprises a text to be rewritten and target rewritten texts of the text to be rewritten in various processing directions, and the processing directions at least comprise various personalized processing directions in various dialogue scenes.
The text to be rewritten can be a word, a short sentence or a long sentence composed of a plurality of short sentences. The processing direction is a personalized processing direction under various dialogue scenes, such as specialty, friendly, clear, guidance, patience, conciseness and conciseness, detailed explanation, mild guidance, instance explanation, interactive exploration, humour, interest resonance, information recommendation, exploration dialogue and emotion care. The target text in each processing direction is the expected text after the text to be rewritten is rewritten to the corresponding processing direction. For example, the text to be rewritten is "the commodity is folded in eight", the processing direction is "friendly", and the target rewritten text is "the commodity can be folded in eight". The target rewritten text can be manually written or obtained from an open source data set, and the text to be rewritten can also be written based on each target rewritten text.
In the embodiment of the present application, the training process of the preset language model is as follows:
and firstly, acquiring a rewritten text based on the preset language generation model, the text to be rewritten and any processing direction.
For example, in the embodiment of the present application, it is assumed that the text to be rewritten is "the article is folded in eight", the processing direction is "friendly", and after the text is input into the preset language model, the rewritten text can be obtained, and it is assumed that "the article is folded in eight only and is not purchased.
And secondly, obtaining the rewrite loss based on the rewritten text and the target rewritten text corresponding to the text to be rewritten in the processing direction.
In the first step, the rewritten text is obtained, namely, the rewritten text is only folded in eight, the user can buy easily, and the calculated loss between the rewritten text and the target rewritten text is calculated, namely, the calculated loss between the commodity is only folded in eight, the user can buy easily, and the commodity can be folded in eight, for example, the calculated loss can be calculated by cosine similarity, euclidean distance and the like. The obtained writing loss is compared with a preset value, the preset value can be set based on experience, the writing loss is not larger than the preset value, the writing after writing generated by writing is proved to meet the expectations, and if the writing loss is larger, the difference between the writing after writing generated by writing and the target writing text is proved to be larger, and the expectations are not met.
And a third step of: and if the rewrite loss is larger than a preset value, updating the rewrite parameters of the preset language model until the rewrite loss is not larger than the preset value.
At this time, the rewrite parameters of the preset language model may be updated, after updating, the rewritten text may be generated again, the rewrite loss may be calculated again, and whether the preset value is satisfied may be determined, if not, the rewrite parameters may be updated continuously until the rewrite loss satisfies the preset value, and at this time, it is explained that the preset language model may be rewritten in the direction based on the text to be rewritten to generate the rewritten text satisfying the desired target.
Training each processing direction according to the first step, the second step and the third step, and finally obtaining the rewriting parameters of the preset language model about each personalized processing direction.
According to the dialogue scene and the target personage portraits, the processing direction can be determined, then the target reply text with the highest confidence coefficient can be rewritten by selecting the rewriting parameters under the processing direction by utilizing the trained language model based on the processing direction and the target reply text with the highest confidence coefficient, and the rewritten text, namely the personalized reply text, can be generated.
In the embodiment of the present application, the preset language model includes a neural network, and the neural network may include the following structure:
after the input layer inputs the dialogue scene, the target character portrait and the target reply text with highest confidence into the preset language model, the input layer can acquire the processing direction according to the dialogue layer and the target character portrait, such as rewriting towards friendly direction, rewriting towards conciseness and brief direction, and the like. In addition, the input layer can also acquire the target input text with highest confidence, namely the text to be rewritten. After the input layer obtains the processing direction and the text to be rewritten, the processing direction can be sent to the rewrite encoding layer, and the text to be rewritten is sent to the embedding layer.
And the rewriting and encoding layer can acquire language style characteristics related to the processing direction after acquiring the processing direction, wherein the language style characteristics comprise a style vocabulary and emotion polarities. The style vocabulary is provided with a plurality of words related to language styles corresponding to the processing directions, and the words related to the friendly language styles are contained in the style vocabulary if the processing directions are friendly directions. Emotion polarity refers to the emotion direction of rewriting, in the embodiment of the present application, emotion polarity includes both friendly and non-emotion, friendly emotion polarity represents rewriting with friendly emotion, non-emotion represents rewriting with statement without emotion color, and in the embodiment of the present application, emotion polarity in the following processing direction is classified as friendly: friendly, heart-resistant, gentle guiding, humorous, emotional care, interest resonance; and classifying emotion polarities in the following treatment directions as non-emotion: professional, clear, guided brief, detailed explanation, example illustration, interactive exploration, information recommendation, exploration dialog.
In addition, the rewriting encoding layer adopts an attention mechanism, and the acquired language style characteristics and emotion polarities can be introduced into each subsequent encoding layer, so that the keyword related to the rewriting direction is guaranteed to have higher attention weight, the attention of the preset language model is always kept to be focused on the keyword or the context related to the rewriting direction, and the preset language model is enabled to capture and process information related to the direction more accurately.
The embedding layer can split the text to be rewritten according to the characters or words after receiving the text to be rewritten, embed marks for the characters or words obtained through splitting, and enable the marks embedded for each character or word to indicate semantic information and grammar information of the character or word. For example, assuming that the text to be rewritten is "today's weather is good", five words, "present", "day", "gas", "true" and "good" can be obtained by splitting the words, and the same word uses the same mark when the mark is embedded, for example, "present" uses "0" as the mark, "day" uses "1" as the mark, "gas" uses "2" as the mark, "true" uses "3" as the mark, and "good" uses "4" as the mark.
The position coding layer can be used for carrying out position coding on the characters or words based on the marks embedded by the embedding layer, and can be used for coding according to the original sequence of the characters or words in the text to be rewritten during position coding, for example, 011234 is obtained after the position coding of the weather today is true.
The multi-head self-attention layer can receive the emotion polarity obtained by the rewriting coding layer, and judge the dependency and importance between any two codes according to the position ordering of the codes, semantic information and grammar information of each code based on the emotion polarity by using an attention mechanism of the rewriting coding layer.
By dependency is meant the degree of dependency that two characters or words are generated in succession during the generation process. For example, "0" represents "jin" and "1" represents "Tian", and it can be judged that the "0" and "1" have higher dependency according to the semantic information and the grammar information of "jin" and "Tian", that is, the dependency degree of the two continuous generation is higher. While "0" and "4", i.e. "so far" and "good", are both grammatical and semantic with low dependency, i.e. they are not continuously generated.
With respect to importance, reference is made to how important a word or word in a text to be rewritten is to the semantics of the entire text to be rewritten. For example, it is known according to the semantics of "weather today is good", where "weather" and "good" are important.
When judging the dependency and the importance, the multi-head attention layer still keeps higher attention weight for the keywords related to the emotion polarity, namely, the emotion polarity in the processing direction is always kept, and the semantic information and the grammar information of each word and each word are synthesized to judge the dependency and the importance of each word and each word. The multi-headed attention layer performs this in parallel across multiple attention heads (emotion polarity, semantic information, grammar information, location information) so that the preset language model can capture the relationship between the different individual attention heads.
And the feedforward neural network layer carries out nonlinear transformation on the output of the multi-head main force layer, so that the model has stronger expression capability.
And the rewrite generation decoding layer is capable of guiding the preset language model to generate a reply conforming to the processing direction after determining the dependency and the importance between any two codes. In addition, a rewrite-level metric-indicator may be set during decoding to constrain the generated results and training using a rewrite optimization objective.
The output layer can project the output rewritten into the decoding layer into a space with the size of a vocabulary, activate the space through a classification network, generate probability distribution on the vocabulary, and generate rewritten text according to the distribution probability.
In addition, in another embodiment, a normalization layer can be further arranged after the coding layer, the position coding layer, the multi-head self-attention layer and the feedforward neural network layer are rewritten, so that the output of each coding layer is normalized, and the training stability of the preset language model is facilitated.
In another embodiment, the preset language model may be trained by using a fourth training set, where the fourth training set includes a plurality of fourth samples, and each fourth sample includes a dialog text, a dialog scene corresponding to the dialog text, and a target user portrait conforming to the dialog scene.
The dialogue text may be a sentence, or a dialogue pair, or a continuous context, where the dialogue scene indicates the sentence, or a dialogue pair, or a continuous context, and the target user portrait indicates the sentence, or a dialogue pair, or a target user portrait of the target user corresponding to the continuous context.
After training the preset language model by using the fourth sample, the preset language model can learn the response of the sentence, or a dialogue pair, or a continuous context related to the dialogue scene and the target user portrait. After the dialogue scene, the target user portrait and the target reply text with the highest confidence are input into the trained preset language model, the encoder of the preset language model can encode the target reply text with the highest confidence based on the dialogue scene, the target user portrait to obtain respective feature vectors, determine personalized response according to the feature vectors of the dialogue scene and the target user portrait, then add the personalized response into the feature vector of the target reply text with the highest confidence, and decode the personalized response to finally generate personalized reply. As the personalized response is added in the feature vector of the target reply text with highest confidence, the decoded text already contains the personalized response corresponding to the dialogue scene and the target user portrait.
According to the method and the device, a dialogue scene is determined through a history dialogue record, partial feature data is selected from a plurality of preset feature data based on the dialogue scene to serve as a target user portrait of a target user in the dialogue scene, and then matching degree sequencing is conducted on candidate reply texts in all preset reply modes based on the target user portrait in the dialogue scene, so that target reply texts which are most matched with the target portrait in all preset reply modes can be obtained, and then the candidate reply texts which are most matched with the target user portrait in the scene can be obtained according to the confidence degree of the target reply texts in all preset reply modes. In addition, when the candidate reply text is personalized based on the dialogue scene and the target user portrait, the candidate reply text is not personalized based on all the characteristic data of the target user, but is personalized based on the target user portrait formed by partial characteristic data conforming to the dialogue scene, so that the personalized direction is more accurate. The personalized reply text obtained by the method of the embodiment of the application is more matched with the target user portrait in the dialogue scene, the confidence of the selected preset reply mode is higher, and the personalized processing direction is more accurate.
In an embodiment of the present application, before obtaining the plurality of candidate reply texts in each preset reply mode, the method further includes:
detecting violations based on the historical dialog records;
if the history dialogue record is detected to contain illegal contents, the first preset information is replied.
For example, in the embodiment of the present application, the violation information is bad information such as "suicide information", "yellow anti-information", and the like, if one or more pieces of violation information exist in the detection result, the first preset information (for example, the content violation) is used as the reply information, and is directly replied to the target user, and at this time, reply based on each preset reply mode is not needed.
In an implementation of the present application, before obtaining the plurality of candidate reply texts in each preset reply mode, the method further includes:
and judging whether the content to be replied in the history dialogue record meets a first preset condition, if so, replying second preset information, wherein the first preset condition can represent whether the content to be replied is non-chat information.
Wherein the non-chat information includes an end command, but is not limited thereto.
Taking the ending language as an example, the ending language is: worship, bye, next bye, etc., and if the statement indicating the end of the conversation is detected, then the reply is not required based on each preset reply mode, and the second preset information can be directly replied to the target user, for example: bye, expect to see you next time, etc.
FIG. 3 illustrates a specific dialog generation framework (SDM) provided by an embodiment of the present invention, where the front-end API service (application programming interface) may communicate with the Session (Session) layer by means of a protocol website (http), google remote procedure call protocol (Google Remote Procedure Call, gRPC), etc., and input a user identifier and a user message specific to the front-end application, where the user identifier includes, but is not limited to, a user name, etc.;
all user inputs are stored in the input queue in the user's private space. Messages in the queue are processed sequentially. Thus, from the front-end perspective, the sending and receiving of messages is asynchronous, but can solve the problem of continuous input by the user, and can guarantee the logical order of the persisted conversation history, i.e., the order of the conversation histories stored in the database.
In an embodiment of the present application, after generating the personalized reply text, the method further includes:
and updating the personalized reply text to the historical dialogue record of the target user.
As shown in fig. 3, if the information to be replied is a chat message, updating the latest dialogue history in the memory, taking the latest dialogue history as the input of the chat robot (BOT) or the chat system, after obtaining the reply, updating the reply to the latest dialogue history, and writing the latest dialogue history into the output queue of the private space of the target user.
In another embodiment, to improve overall performance, the last N rounds of conversations are saved in memory; while earlier dialogs are saved in a database (dialog history service Chat Context DB). If the length of the last dialog history in memory exceeds N, then the earlier dialog will be removed from the last dialog history and written to the database.
In another embodiment, to further ensure the reliability of the system, to prevent the session information in the memory from being lost due to system failure, a pre-write log mechanism (WAL file management system) is introduced, and before the session information enters the input queue and the output queue, a global WAL service is called to ensure that the session is logged into a specially designed log system, and after the system fails and recovers, the session information can be automatically recovered.
In another embodiment of the present application, in order to improve the performance of the chat robot or chat system in a highly concurrent scenario, the internal framework of the chat robot or chat system may also be implemented in an asynchronous manner.
In an embodiment of the present application, the session request includes a plurality of processing tasks, and the session generating method further includes the following steps:
creating a dependency relationship between each processing task;
Determining the dependency sequence of each processing task based on the dependency relationship;
and processing the plurality of processing tasks according to the dependency sequence.
For creating the dependency relationship between the processing tasks, in this embodiment of the present application, the dependency relationship between the processing tasks may be created by means of a preset framework, for example, a node object and an identifier of each processing task are defined based on the preset framework, the identifier is used to represent the processing task of each node, then an adjacency list and an adjacency matrix between the nodes are built by means of the preset framework, where the adjacency list and the adjacency matrix may represent the dependency relationship between the processing tasks of each node, and then the dependency relationship between the processing tasks of each node is determined by means of programming the preset framework.
Creating a dependency graph comprising a plurality of nodes, each node representing a processing task, a dependency link being connected between nodes having a dependency relationship, the dependency link being capable of indicating the dependency relationship of two nodes connected to each other.
For determining the dependency order of each processing task based on the dependency relationship, in the embodiment of the present application, topology ordering may be performed, such as ordering the dependency graph using a topology ordering method of depth-first search (DFS), to determine the dependency order between the processing tasks.
After determining the dependency sequence between the processing tasks of the respective nodes, the respective processing tasks may be processed in the dependency sequence.
According to the method and the device, a dialogue scene is determined through a history dialogue record, partial feature data is selected from a plurality of preset feature data based on the dialogue scene to serve as a target user portrait of a target user in the dialogue scene, and then matching degree sequencing is conducted on candidate reply texts in all preset reply modes based on the target user portrait in the dialogue scene, so that target reply texts which are most matched with the target portrait in all preset reply modes can be obtained, and then the candidate reply texts which are most matched with the target user portrait in the scene can be obtained according to the confidence degree of the target reply texts in all preset reply modes. In addition, when the candidate reply text is personalized based on the dialogue scene and the target user portrait, the candidate reply text is not personalized based on all the characteristic data of the target user, but is personalized based on the target user portrait formed by partial characteristic data conforming to the dialogue scene, so that the personalized direction is more accurate. The personalized reply text obtained by the method of the embodiment of the application is more matched with the target user portrait in the dialogue scene, the confidence of the selected preset reply mode is higher, and the personalized processing direction is more accurate.
Exemplary apparatus
Having described the method of an exemplary embodiment of the present invention, an exemplary dialog generating apparatus 100 of the present invention is described next, as shown in fig. 4, which, in an example of the present application, includes:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a dialogue request, and the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data;
the processing module is used for respectively acquiring a plurality of candidate reply texts in each preset reply mode based on a plurality of preset reply modes;
determining a dialogue scene based on the historical dialogue record;
acquiring at least one preset feature data from the plurality of preset feature data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene;
determining the matching degree of each candidate reply text and the target user image;
performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein each target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in a plurality of candidate reply texts in the preset reply mode;
and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
In an embodiment of the present application, the plurality of preset reply modes include: a knowledge question-answer reply mode, a common question answer mode, a dialogue generation mode and a rule dialogue mode.
In an embodiment of the present application, the dialog scenario includes: customer service scene, question-answering scene, and chat scene.
In an embodiment of the present application, the plurality of preset feature data includes: age, target product familiarity, target information acquisition channel, domain knowledge, learning objectives, information requirements, language style, interests, emotional state, topic preferences, social preferences.
In an embodiment of the present application, the processing module is configured to:
in a customer service scene, selecting the age, the familiarity of a target product and a target information acquisition channel of the target user, and constructing the target user portrait;
selecting domain knowledge, learning targets, information requirements and language styles of the target user under a question-answer scene, and constructing the target user portrait;
and under the chatting scene, selecting interests, emotional states, topic preferences and social preferences of the target user, and constructing the target user portrait.
In an embodiment of the present application, the processing module is configured to:
Inputting the target user portrait and each candidate reply text into a first preset classification model, and determining the matching degree of each candidate reply text and the target user portrait;
the first preset classification model is obtained through training of a first training set, the first training set comprises a plurality of first training samples, each first training sample comprises a first context, and the user portrait corresponding to the first context.
In an embodiment of the present application, the processing module is configured to:
respectively forming a dialogue pair with the information to be replied in the history dialogue record based on the target reply text in each preset reply mode;
inputting each dialogue pair and the historical dialogue record into a second preset classification model, and determining the relevance of each dialogue pair and the historical dialogue record; the second preset classification model is obtained through training of a second training set, the second training set comprises a plurality of second training samples and a plurality of third training samples, the second training samples comprise a first dialogue pair and a second context related to the first dialogue pair, and the third training samples comprise a second dialogue pair and a third context unrelated to the second dialogue pair;
And determining the confidence level of the target reply text corresponding to each dialogue pair based on the correlation between each dialogue pair and the historical dialogue record.
In an embodiment of the present application, the processing module is configured to:
and under the customer service scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: professional, friendly, clear, guided and durable;
and under the question-answering scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: conciseness, detailed explanation, mild guidance, example illustration, interactive exploration;
and under the chatting scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: humor, interest resonance, information recommendation, exploring dialogue, emotion care.
In an embodiment of the present application, the processing module is configured to:
inputting the dialogue scene, the target user portrait and the target reply text with highest confidence into a preset language model to generate a personalized reply text;
The preset language model is obtained through training of a third training set, the third training set comprises a plurality of fourth training samples, each fourth training sample comprises a text to be rewritten and a target rewritten text of the text to be rewritten in each processing direction, the processing directions at least comprise each personalized processing direction in each dialogue scene, and the training process of the preset language model is as follows:
acquiring a rewritten text based on the preset language generation model, the text to be rewritten and any processing direction;
obtaining a rewrite loss based on the rewritten text and a target rewritten text corresponding to the text to be rewritten in the processing direction;
and if the rewrite loss is larger than a preset value, updating the rewrite parameters of the preset language model until the rewrite loss is not larger than the preset value.
In an embodiment of the present application, before obtaining a plurality of candidate reply texts in each of the preset reply modes, the processing module is configured to:
detecting violations based on the historical dialog records;
if the history dialogue record is detected to contain illegal contents, the first preset information is replied.
In an embodiment of the present application, before obtaining a plurality of candidate reply texts in each of the preset reply modes, the processing module is configured to:
and judging whether the content to be replied in the history dialogue record meets a first preset condition, if so, replying second preset information, wherein the first preset condition can represent whether the content to be replied is non-chat information.
In an embodiment of the present application, after generating the personalized reply text, the processing module is configured to:
and updating the personalized reply text to the historical dialogue record of the target user.
In an embodiment of the present application, the dialogue request includes a plurality of processing tasks, and the processing module is configured to:
creating a dependency relationship between each processing task;
determining the dependency sequence of each processing task based on the dependency relationship;
and processing the plurality of processing tasks according to the dependency sequence.
In an embodiment of the present application, the processing module is configured to: creating a dependency graph comprising a plurality of nodes, each node representing a processing task, a dependency link being connected between nodes having a dependency relationship, the dependency link being capable of indicating the dependency relationship of two nodes connected to each other.
According to the method, the processing module is utilized, a dialogue scene is determined firstly based on a history dialogue record, then partial feature data are selected from a plurality of preset feature data based on the dialogue scene to serve as target user portraits of a target user in the dialogue scene, and then matching degree sequencing is conducted on candidate reply texts in all preset reply modes based on the target user portraits in the dialogue scene, so that target reply texts which are most matched with the target portraits in all preset reply modes can be obtained, and then the candidate reply texts which are most matched with the target user portraits in the scene can be obtained according to the confidence degrees of the target reply texts in all preset reply modes. In addition, when the candidate reply text is personalized based on the dialogue scene and the target user portrait, the candidate reply text is not personalized based on all the characteristic data of the target user, but is personalized based on the target user portrait formed by partial characteristic data conforming to the dialogue scene, so that the personalized direction is more accurate. That is, the personalized reply text obtained by the dialogue generating device in the embodiment of the application is more matched with the target user portrait in the dialogue scene, the confidence of the selected preset reply mode is higher, and the personalized processing direction is more accurate.
Exemplary Medium
Having described the methods and apparatus of exemplary embodiments of the present invention, a computer-readable storage medium of exemplary embodiments of the present invention is described next with reference to fig. 5.
Referring to fig. 5, a computer readable storage medium is shown as an optical disc 70, on which a computer program (i.e., a program product) is stored, which when executed by a processor, implements the steps described in the above method embodiments, for example: acquiring a dialogue request, wherein the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data; based on a plurality of preset reply modes, respectively acquiring a plurality of candidate reply texts in each preset reply mode; determining a dialogue scene based on the historical dialogue record; acquiring at least one preset feature data from the plurality of preset feature data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene; determining the matching degree of each candidate reply text and the target user image; performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein each target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in a plurality of candidate reply texts in the preset reply mode; generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence; the specific implementation of each step is not repeated here.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
Exemplary computing device
Having described the method, apparatus, and medium of the exemplary embodiments of the present invention, a computing device 80 of the exemplary embodiments of the present invention is next described with reference to FIG. 6.
FIG. 6 illustrates a block diagram of an exemplary computing device 80 suitable for use in implementing embodiments of the invention, the computing device 80 may be a computer system or a server. The computing device 80 shown in fig. 6 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 6, components of computing device 80 may include, but are not limited to: one or more processors or processing units 801, a system memory 802, and a bus 803 that connects the various system components (including the system memory 802 and processing units 801).
Computing device 80 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computing device 80 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 802 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 8021 and/or cache memory 8022. Computing device 70 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM8023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in fig. 6, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media), may be provided. In such cases, each drive may be coupled to bus 803 via one or more data medium interfaces. The system memory 802 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 8025 having a set (at least one) of program modules 8024 may be stored, for example, in system memory 802, and such program modules 8024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 8024 generally perform the functions and/or methods in the embodiments described herein.
The computing device 80 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, display, etc.). Such communication may be through an input/output (I/O) interface. Moreover, computing device 80 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 806. As shown in fig. 6, network adapter 806 communicates with other modules of computing device 80 (e.g., processing unit 801, etc.) over bus 803. It should be appreciated that although not shown in fig. 6, other hardware and/or software modules may be used in connection with computing device 80.
The processing unit 801 executes various functional applications and data processing by running a program stored in the system memory 802, for example, acquires a dialogue request including a history dialogue record of a target user and a plurality of preset feature data; based on a plurality of preset reply modes, respectively acquiring a plurality of candidate reply texts in each preset reply mode; determining a dialogue scene based on the historical dialogue record; acquiring at least one preset feature data from the plurality of preset feature data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene; determining the matching degree of each candidate reply text and the target user image; performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein each target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in a plurality of candidate reply texts in the preset reply mode; generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence; the specific implementation of each step is not repeated here.
Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (17)

1. A dialog generation method, comprising:
acquiring a dialogue request, wherein the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data;
based on a plurality of preset reply modes, respectively acquiring a plurality of candidate reply texts in each preset reply mode;
determining a dialogue scene based on a dialogue record from the last dialogue start to the information to be replied in the history dialogue record;
acquiring at least one preset feature data from the plurality of preset feature data based on the dialogue scene, and constructing a target user portrait conforming to the dialogue scene;
determining the matching degree of each candidate reply text and the target user image;
performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein each target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in a plurality of candidate reply texts in the preset reply mode;
and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
2. The dialog generation method of claim 1, wherein the plurality of preset reply modes include: a knowledge question-answer reply mode, a common question answer mode, a dialogue generation mode and a rule dialogue mode.
3. The dialog generation method of claim 1, the dialog scene comprising: customer service scene, question-answering scene, and chat scene.
4. A dialog generation method as claimed in claim 3, the plurality of preset feature data comprising: age, target product familiarity, target information acquisition channel, domain knowledge, learning objectives, information requirements, language style, interests, emotional state, topic preferences, social preferences.
5. The dialog generation method of claim 4, wherein the obtaining at least one preset feature data from the plurality of preset feature data based on the dialog scene, and constructing the target user representation conforming to the dialog scene, comprises:
in a customer service scene, selecting the age, the familiarity of a target product and a target information acquisition channel of the target user, and constructing the target user portrait;
selecting domain knowledge, learning targets, information requirements and language styles of the target user under a question-answer scene, and constructing the target user portrait;
and under the chatting scene, selecting interests, emotional states, topic preferences and social preferences of the target user, and constructing the target user portrait.
6. The dialog generation method of claim 1, wherein the determining the degree of matching of each of the candidate reply texts with the target user image comprises:
Inputting the target user portrait and each candidate reply text into a first preset classification model, and determining the matching degree of each candidate reply text and the target user portrait;
the first preset classification model is obtained through training of a first training set, the first training set comprises a plurality of first training samples, each first training sample comprises a first context, and the user portrait corresponding to the first context.
7. The dialog generation method of claim 1, wherein the confidence ranking of the target reply text in each of the preset reply modes includes:
respectively forming a dialogue pair with the information to be replied in the history dialogue record based on the target reply text in each preset reply mode;
inputting each dialogue pair and the historical dialogue record into a second preset classification model, and determining the relevance of each dialogue pair and the historical dialogue record; the second preset classification model is obtained through training of a second training set, the second training set comprises a plurality of second training samples and a plurality of third training samples, the second training samples comprise a first dialogue pair and a second context related to the first dialogue pair, and the third training samples comprise a second dialogue pair and a third context unrelated to the second dialogue pair;
And determining the confidence level of the target reply text corresponding to each dialogue pair based on the correlation between each dialogue pair and the historical dialogue record.
8. The dialog generation method of claim 2, wherein the generating personalized reply text based on the dialog scene, the target user representation, and the target reply text with the highest confidence comprises:
and under the customer service scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: professional, friendly, clear, guided and durable;
and under the question-answering scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: conciseness, detailed explanation, mild guidance, example illustration, interactive exploration;
and under the chatting scene, based on the target user image, carrying out personalized processing on the candidate reply text with the highest confidence according to the following processing directions: humor, interest resonance, information recommendation, exploring dialogue, emotion care.
9. The dialog generation method of claim 8, wherein the generating personalized reply text based on the dialog scene, the target user representation, and the target reply text with the highest confidence comprises:
Inputting the dialogue scene, the target user portrait and the target reply text with highest confidence into a preset language model to generate a personalized reply text;
the preset language model is obtained through training of a third training set, the third training set comprises a plurality of fourth training samples, each fourth training sample comprises a text to be rewritten and a target rewritten text of the text to be rewritten in each processing direction, the processing directions at least comprise each personalized processing direction in each dialogue scene, and the training process of the preset language model is as follows:
acquiring a rewritten text based on the preset language generation model, the text to be rewritten and any processing direction;
obtaining a rewrite loss based on the rewritten text and a target rewritten text corresponding to the text to be rewritten in the processing direction;
and if the rewrite loss is larger than a preset value, updating the rewrite parameters of the preset language model until the rewrite loss is not larger than the preset value.
10. The dialog generation method of claim 1, further comprising, prior to obtaining a plurality of candidate reply texts in each of the preset reply modes:
Detecting violations based on the historical dialog records;
if the history dialogue record is detected to contain illegal contents, the first preset information is replied.
11. The dialog generation method of claim 10, further comprising, prior to obtaining a plurality of candidate reply texts in each of the preset reply modes:
and judging whether the content to be replied in the history dialogue record meets a first preset condition, if so, replying second preset information, wherein the first preset condition can represent whether the content to be replied is non-chat information.
12. The dialog generation method of claim 1, further comprising, after generating the personalized reply text:
and updating the personalized reply text to the historical dialogue record of the target user.
13. The dialog generation method of claim 1, the dialog request comprising a plurality of processing tasks; the dialog generation method further comprises the following steps:
creating a dependency relationship between each processing task;
determining the dependency sequence of each processing task based on the dependency relationship;
and processing the plurality of processing tasks according to the dependency sequence.
14. The dialog generation method of claim 13, wherein creating the dependency relationship between the respective processing tasks comprises:
Creating a dependency graph comprising a plurality of nodes, each node representing a processing task, a dependency link being connected between nodes having a dependency relationship, the dependency link being capable of indicating the dependency relationship of two nodes connected to each other.
15. A dialog generation device comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a dialogue request, and the dialogue request comprises a historical dialogue record of a target user and a plurality of preset characteristic data;
the processing module is used for respectively acquiring a plurality of candidate reply texts in each preset reply mode based on a plurality of preset reply modes;
determining a dialogue scene based on the historical dialogue record;
constructing a target user portrait conforming to the dialogue scene based on the dialogue scene and the plurality of preset feature data;
determining the matching degree of each candidate reply text and the target user image;
performing confidence degree sequencing on the target reply texts in each preset reply mode, wherein the target reply text in each preset reply mode is a candidate reply text with the highest matching degree with the target user portrait in the preset reply mode;
and generating personalized reply text based on the dialogue scene, the target user portrait and the target reply text with highest confidence.
16. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-14.
17. A computing device comprising a processor for implementing the method of any of claims 1-14 when executing a computer program stored in memory.
CN202310765756.4A 2023-06-27 2023-06-27 Dialog generation method, device, medium and computing equipment Active CN116541504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310765756.4A CN116541504B (en) 2023-06-27 2023-06-27 Dialog generation method, device, medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310765756.4A CN116541504B (en) 2023-06-27 2023-06-27 Dialog generation method, device, medium and computing equipment

Publications (2)

Publication Number Publication Date
CN116541504A CN116541504A (en) 2023-08-04
CN116541504B true CN116541504B (en) 2024-02-06

Family

ID=87447408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310765756.4A Active CN116541504B (en) 2023-06-27 2023-06-27 Dialog generation method, device, medium and computing equipment

Country Status (1)

Country Link
CN (1) CN116541504B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117221451B (en) * 2023-09-27 2024-04-26 杭州龙席网络科技股份有限公司 Customer service response system and method based on artificial intelligence
CN117216229A (en) * 2023-11-08 2023-12-12 支付宝(杭州)信息技术有限公司 Method and device for generating customer service answers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183848A (en) * 2015-09-07 2015-12-23 百度在线网络技术(北京)有限公司 Human-computer chatting method and device based on artificial intelligence
CN114996433A (en) * 2022-08-08 2022-09-02 北京聆心智能科技有限公司 Dialog generation method, device and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016167424A1 (en) * 2015-04-16 2016-10-20 주식회사 플런티코리아 Answer recommendation device, and automatic sentence completion system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183848A (en) * 2015-09-07 2015-12-23 百度在线网络技术(北京)有限公司 Human-computer chatting method and device based on artificial intelligence
CN114996433A (en) * 2022-08-08 2022-09-02 北京聆心智能科技有限公司 Dialog generation method, device and equipment

Also Published As

Publication number Publication date
CN116541504A (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US10536402B2 (en) Context-sensitive generation of conversational responses
US11669918B2 (en) Dialog session override policies for assistant systems
CN107846350B (en) Method, computer readable medium and system for context-aware network chat
CN107943998B (en) Man-machine conversation control system and method based on knowledge graph
US10162816B1 (en) Computerized system and method for automatically transforming and providing domain specific chatbot responses
CN108304439B (en) Semantic model optimization method and device, intelligent device and storage medium
CN110121706B (en) Providing responses in a conversation
CN116541504B (en) Dialog generation method, device, medium and computing equipment
US10713317B2 (en) Conversational agent for search
WO2018224034A1 (en) Intelligent question answering method, server, terminal and storage medium
CN114930363A (en) Generating active content for an assistant system
US11250839B2 (en) Natural language processing models for conversational computing
CN117219080A (en) Virtual assistant for generating personalized responses within a communication session
US20220374605A1 (en) Continuous Learning for Natural-Language Understanding Models for Assistant Systems
KR101891498B1 (en) Method, computer device and computer readable recording medium for multi domain service resolving the mixture of multi-domain intents in interactive ai agent system
Ali et al. Automatic text‐to‐gesture rule generation for embodied conversational agents
US20230306205A1 (en) System and method for personalized conversational agents travelling through space and time
CN110110218B (en) Identity association method and terminal
CN117194646A (en) Question and answer method and device and electronic equipment
FR3089324A1 (en) Method for determining a conversational agent on a terminal
KR102120748B1 (en) Method and computer readable recording medium for providing bookmark search service stored with hierachical dialogue flow management model based on context
US20230259541A1 (en) Intelligent Assistant System for Conversational Job Search
CN114048319B (en) Humor text classification method, device, equipment and medium based on attention mechanism
US20240037339A1 (en) Domain-specific named entity recognition via graph neural networks
US20240143678A1 (en) Intelligent content recommendation within a communication session

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant