CN117216223A - Dialogue text generation method and device, storage medium and electronic equipment - Google Patents

Dialogue text generation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117216223A
CN117216223A CN202311289708.9A CN202311289708A CN117216223A CN 117216223 A CN117216223 A CN 117216223A CN 202311289708 A CN202311289708 A CN 202311289708A CN 117216223 A CN117216223 A CN 117216223A
Authority
CN
China
Prior art keywords
target
dialogue
text
model
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311289708.9A
Other languages
Chinese (zh)
Inventor
陈春全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311289708.9A priority Critical patent/CN117216223A/en
Publication of CN117216223A publication Critical patent/CN117216223A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a method and a device for generating dialogue text, a storage medium and electronic equipment. Wherein the method comprises the following steps: the method comprises the steps of obtaining an initial dialogue text and target dialogue attributes in a target dialogue scene, inputting the initial dialogue text into a target dialogue model to perform undisturbed forward transmission, outputting an undisturbed dialogue text, wherein the undisturbed dialogue text comprises segmented words which do not accord with the target dialogue attributes, respectively inputting characterization vectors corresponding to the segmented words in the undisturbed dialogue text into a target attribute classifier, determining loss parameters of the target attribute classifier, updating a historical information matrix by using the loss parameters, inputting the initial dialogue text into the target dialogue model to perform undisturbed forward transmission, and outputting the disturbed dialogue text, wherein the segmented words in the disturbed dialogue text accord with the target dialogue attributes. The method and the device can be applied to the field of natural language processing, and solve the technical problems of low generation efficiency and accuracy of the dialogue text conforming to attribute constraint.

Description

Dialogue text generation method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of computers, and in particular, to a method and apparatus for generating a dialog text, a storage medium, and an electronic device.
Background
Currently, dialog models expect to generate dialog content that conforms to properties and styles mainly includes three types: template and rule based methods, conditional dialog generation models, and attribute classifier based screening methods. The method based on the templates and the rules generates the text according to the predefined templates or rules, a large number of manual design templates and rules are needed, complex and various generating tasks are difficult to adapt, and the generated text is possibly stiff and single. A conditional dialog generation model requires collection of a large amount of training data with attribute and style tags, and the control capability of the model may be limited by the diversity of dialog training data. The method based on attribute classifier screening can not screen out satisfactory replies meeting attribute constraint if no reply text meeting target attribute and style exists in a plurality of replies generated by the dialogue model.
Therefore, the related art still has the technical problems of low generation efficiency and low accuracy of the dialogue text conforming to the attribute constraint.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a method and a device for generating a dialogue text, a storage medium and electronic equipment, which are used for at least solving the technical problems of low generation efficiency and low accuracy of the dialogue text conforming to attribute constraint.
According to an aspect of an embodiment of the present application, there is provided a method for generating a dialog text, including: acquiring an initial dialogue text and a target dialogue attribute in a target dialogue scene, wherein the target dialogue attribute is used for indicating a dialogue style of the target dialogue scene; the initial dialogue text is input into a target dialogue model to conduct undisturbed forward transfer, and a undisturbed dialogue text is output, wherein the undisturbed dialogue text comprises segmented words which do not accord with the target dialogue attribute, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words according to the input segmented words and a historical information matrix step by step in time, and determining a corresponding dialogue text according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules; respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into a target attribute classifier, respectively determining the loss parameters of the target attribute classifier, and updating the history information matrix by using the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes; and inputting the initial dialogue text into the target dialogue model to perform perturbed forward transmission, and outputting a perturbed dialogue text, wherein the perturbed dialogue text represents the dialogue text output by the target dialogue model after the history information matrix is updated by using the loss parameters, and each word segmentation in the perturbed dialogue text accords with the target dialogue attribute.
According to another aspect of the embodiment of the present application, there is also provided a device for generating a dialog text, including: the system comprises an acquisition module, a target dialogue module and a display module, wherein the acquisition module is used for acquiring initial dialogue text and target dialogue attributes in a target dialogue scene, and the target dialogue attributes are used for indicating dialogue styles of the target dialogue scene; the first processing module is used for inputting the initial dialogue text into a target dialogue model to perform undisturbed forward transmission and outputting an undisturbed dialogue text, wherein the undisturbed dialogue text comprises segmented words which do not accord with the target dialogue attribute, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words step by step according to time steps and according to the input segmented words and a historical information matrix, and determining a corresponding dialogue text according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules; the updating module is used for respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into a target attribute classifier, respectively determining the loss parameters of the target attribute classifier, and updating the historical information matrix by utilizing the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes; and the second processing module is used for inputting the initial dialogue text into the target dialogue model to carry out disturbance forward transmission and outputting disturbance dialogue text, wherein the disturbance dialogue text represents the dialogue text output by the target dialogue model after the history information matrix is updated by using the loss parameter, and each word in the disturbance dialogue text accords with the target dialogue attribute.
Optionally, the device is further configured to: inputting the initial dialogue text into the target dialogue model to generate t first characterization vectors, wherein the target dialogue model comprises an i-layer attention module, the first characterization vectors represent output results of the i-layer attention module, the target dialogue model sequentially outputs t participles in t time steps in an iterative loop mode, j+1th participles are determined by inputting the target dialogue model together with a j-th participle and a j-th historical information matrix output by the target dialogue model in a j+1th time step, the j-th historical information matrix is used for representing i key value pairs corresponding to the j-th time step, the i key value pairs are in one-to-one correspondence with the i-layer attention module, the t first characterization vectors are used for determining the undisturbed dialogue text composed of the t participles, i is an integer greater than or equal to 2, t is an integer greater than or equal to j+1, and j is an integer greater than or equal to 1; respectively inputting the t first characterization vectors into the target attribute classifier, determining t classification results, and updating a target history information matrix, wherein the target history information matrix represents a history information matrix corresponding to target words output by the target dialogue model after the target dialogue model is input, and the target words comprise words which are determined by the t classification results and do not accord with the target dialogue attributes; and inputting the initial dialogue text into the target dialogue model, and determining t second characterization vectors by using the updated target history information matrix to generate the disturbed dialogue text, wherein the generation mode of the second characterization vectors is the same as that of the first characterization vectors.
Optionally, the device is configured to input the t first token vectors into a pre-trained target attribute classifier respectively, determine t classification results, and update a target history information matrix by: respectively inputting the t first characterization vectors into the target attribute classifier, and determining t loss parameters, wherein the t classification results are in one-to-one correspondence with the t loss parameters; determining a target loss parameter corresponding to an nth time step from the t loss parameters, wherein the target loss parameter represents a loss parameter which does not meet a preset loss condition, and n is a positive integer less than or equal to t; and updating the target historical information matrix according to the target loss parameters.
Optionally, the apparatus is configured to update the target history information matrix according to the target loss parameter by: acquiring an initial tensor allowing training; determining the gradient of the target attribute classifier according to the target loss parameter, and training the initial tensor by utilizing the gradient until a target tensor is obtained, wherein the target tensor represents an updated value of the target historical information matrix; and determining the sum value of the target historical information matrix and the target tensor as the updated target historical information matrix.
Optionally, the device is configured to determine a gradient of the target attribute classifier according to the target loss parameter, and train the initial tensor by using the gradient to obtain a target tensor: acquiring a predetermined training step length, wherein the value of the training step length is positively correlated with the amplitude adjusted by the initial tensor round-by-round training; and training the initial tensor wheel by wheel according to the training step length, updating the target history information matrix by using the tensor after training, and recalculating the loss parameters and gradients corresponding to the nth time step of each wheel after updating the target history information matrix until the loss parameters corresponding to the nth time step meet the preset loss conditions, and determining the tensor after training of the wheel as the target tensor, wherein the tensor after training of each wheel is jointly determined by the tensor before training of each wheel, the gradients corresponding to each wheel and the updating step length.
Optionally, the device is configured to input the initial dialog text into the target dialog model for performing perturbed forward transfer, and output the perturbed dialog text by: inputting the initial dialogue text into the target dialogue model to generate t second characterization vectors, wherein the history information matrix corresponding to the nth time step is updated; respectively inputting the t second characterization vectors into the target attribute classifier, and redefining t classification results, wherein the redetermined t classification results indicate that the corresponding segmentation accords with the target dialogue attribute; and executing sampling operation on the t second characterization vectors to obtain the disturbed dialogue text.
Optionally, the device is further configured to: respectively executing normalization operation on the t first characterization vectors and the t second characterization vectors to obtain a first probability distribution and a second probability distribution; determining a target divergence value according to the first probability distribution and the second probability distribution, wherein the target divergence value is used for measuring the difference degree between the first probability distribution and the second probability distribution; and taking the value minimizing the target divergence value as a target, and updating the target historical information matrix.
Optionally, the device is further configured to: obtaining unlabeled dialogue corpus data; preprocessing the dialogue corpus data to obtain target dialogue corpus, wherein the target dialogue corpus represents dialogue corpus between two objects; and pre-training an initial dialogue model by using the target dialogue corpus to obtain the target dialogue model, wherein the initial dialogue model and the target dialogue model adopt a bidirectional attention mechanism on an input text, and adopt a unidirectional attention mechanism on an output text.
Optionally, the device is configured to obtain the initial dialog text and the target dialog attribute in the target dialog scene by: acquiring account data of a target account, wherein the target account represents an account participating in the target dialogue scene; and determining the target dialogue attribute according to the account data.
Optionally, the device is configured to determine the target session attribute according to the account data by at least one of the following manners: acquiring interaction data of the target account number, and determining the target dialogue attribute as a dialogue attribute related to the interaction data; acquiring a search history of the target account, and determining the target dialogue attribute as a dialogue attribute related to the search history; and acquiring the emotion type of the initial dialogue text, and determining the target dialogue attribute as the dialogue attribute which is the same as the emotion type of the initial dialogue text.
According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described method of generating dialog text when run.
According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the method of generating the dialog text as above.
According to still another aspect of the embodiments of the present application, there is also provided an electronic device including a memory, in which a computer program is stored, and a processor configured to execute the above-described dialog text generation method by the computer program.
In the embodiment of the application, the initial dialogue text and the target dialogue attribute in the target dialogue scene are acquired, wherein the target dialogue attribute is used for indicating the dialogue style of the target dialogue scene; the method comprises the steps that an initial dialogue text is input into a target dialogue model to conduct undisturbed forward transmission, and a undisturbed dialogue text is output, wherein the undisturbed dialogue text comprises segmented words which do not accord with target dialogue attributes, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words according to the input segmented words and a historical information matrix step by step in time, the corresponding dialogue text is determined according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules; respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into target attribute classifiers, respectively determining loss parameters of the target attribute classifiers, and updating a historical information matrix by using the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes; the method comprises the steps of inputting an initial dialogue text into a target dialogue model to perform disturbance forward transfer and outputting a disturbance dialogue text, wherein the disturbance dialogue text represents the dialogue text output by the target dialogue model after a history information matrix is updated by using loss parameters, and has the mode that each word in the disturbance dialogue text accords with a target dialogue attribute.
In addition, the chat robot with specific attributes and styles can be customized according to the preference and the demand of the user, so that the immersion and participation of the user are enhanced, and the interactive experience of the user is improved. Replies with comforting, encouraging, homomorphic, etc. attributes can also be generated according to the emotional state of the user, providing emotional support and psychological distraction. Through realizing the text generation with controllable attributes and controllable styles, the dialogue model can better meet different scenes and user requirements, and the value and application breadth of the product are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic illustration of an application environment of an alternative dialog text generation method in accordance with an embodiment of the present application;
FIG. 2 is a flow diagram of an alternative method of generating dialog text in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative dialog text generation method in accordance with an embodiment of the present application;
FIG. 4 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the application;
FIG. 6 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the present application;
FIG. 7 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the application;
FIG. 8 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the present application;
FIG. 9 is a schematic diagram of yet another alternative method of generating dialog text in accordance with an embodiment of the present application;
FIG. 10 is a schematic diagram of an alternative dialog text generation device in accordance with an embodiment of the present application;
FIG. 11 is a schematic diagram of an alternative dialog text generation product in accordance with an embodiment of the present application;
fig. 12 is a schematic structural view of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in the course of describing the embodiments of the application are applicable to the following explanation:
transformer: a deep learning model structure based on an attention mechanism.
The style is controllable: in the text generation process, text meeting the attribute and style requirements is generated. These attributes may include emotion (e.g., positive, negative, neutral), theme (e.g., science, sports, travel), etc.
The application is illustrated below with reference to examples:
according to an aspect of the embodiment of the present application, there is provided a method for generating a dialog text, optionally, in this embodiment, the method for generating a dialog text described above may be applied to a hardware environment composed of the server 101 and the terminal device 103 as shown in fig. 1. As shown in fig. 1, a server 101 is connected to a terminal 103 through a network, and can be used to provide services to a terminal device or an application 107 installed on the terminal device, where the application 107 can be a video application, an instant messaging application, a browser application, an educational application, a game application, and the like. The database 105 may be provided on or separate from the server for providing data storage services for the server 101, such as a game data storage server, which may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communications, the terminal device 103 may be a terminal configured with an application 107, which may include, but is not limited to, at least one of: mobile phones (such as Android mobile phones, iOS mobile phones, etc.), notebook computers, tablet computers, palm computers, MIDs (Mobile Internet Devices ), PADs, desktop computers, smart televisions, smart voice interaction devices, smart home appliances, vehicle terminals, aircrafts, virtual Reality (VR) terminals, augmented Reality (Augmented Reality, AR) terminals, mixed Reality (MR) terminals, and other electronic devices, where the servers may be a single server, a server cluster composed of multiple servers, or a cloud server.
Alternatively, in the present embodiment, the above-described method for generating the dialog text may also be implemented by a server, for example, in the server 101 shown in fig. 1; or by both the terminal device and the server.
The above is merely an example, and the present embodiment is not particularly limited.
Alternatively, as an optional implementation manner, as shown in fig. 2, the method for generating the dialog text may be performed by an electronic device, where the electronic device may be a terminal device or a server, and includes:
s202, acquiring an initial dialogue text and a target dialogue attribute in a target dialogue scene, wherein the target dialogue attribute is used for indicating the dialogue style of the target dialogue scene;
alternatively, in the present embodiment, the target dialog scenario described above may include, but is not limited to, an intelligent interaction scenario applied to an artificial intelligence-based implementation. The target dialogue scene may include, but is not limited to, intelligent customer service, intelligent attack, intelligent after-sales, chat robots, etc.
In particular, artificial intelligence (Artificial Intelligence, AI) is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
In an exemplary embodiment, the target dialog scene may be converted into a dialog scene corresponding to a natural language by inputting an image and then identifying the image.
Key technologies to the speech technology (Speech Technology) are automatic speech recognition technology (ASR) and speech synthesis technology (TTS) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.
In an exemplary embodiment, the target dialog scene may be converted into a dialog scene corresponding to a natural language by inputting a voice and then recognizing the voice.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
In an exemplary embodiment, the target dialog scene may be a dialog scene corresponding to the initial dialog text by inputting text and then using the text as the text.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
In an exemplary embodiment, the target dialog scene may be converted into a dialog scene corresponding to a natural language by inputting an image and then identifying the image.
The automatic driving technology generally comprises high-precision map, environment perception, behavior decision, path planning, motion control and other technologies, and has wide application prospect.
In an exemplary embodiment, the target dialog scene may be a dialog scene corresponding to a voice or text interaction with an application.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
Alternatively, in this embodiment, the initial dialogue text may be understood as text input by the user, where the text may be input by a physical keyboard, a mouse, a virtual keyboard, or the like, and may include, but not limited to, text that is recognized after being input by voice, image, or the like.
It should be noted that the target session scenario may include, but is not limited to, a specific environment or scenario in the robot session system that communicates with the user. The initial dialogue text refers to the initial text that the robot communicates with the user in the dialogue scene.
Optionally, in this embodiment, the target dialog attribute is used to indicate a dialog style of the target dialog scene. The dialogue style refers to the characteristics of expression mode, language, word habit and the like in dialogue. The target dialog attributes may include, but are not limited to, the following: formalism, informal, friendly, serious, humor, etc.
Of course, the above-described target conversation attribute may also include, but is not limited to, the type of conversation, e.g., robot setup for feminization, maleinization, etc., and may also be set to relaxed, professional, interesting, etc.
In one exemplary embodiment, the target dialogue scenario is a self-service robot of a bank, and the user needs to perform account inquiry and transfer operations. The initial dialog text may be business information entered by the user and the target dialog attributes may be formal and serious. In this scenario, the reply of the robot should follow the banking standard, the mood is formalized, and the expression mode is serious.
In another exemplary embodiment, the target dialogue scene is a voice assistant of a smart speaker, and the user needs to ask weather information and play music. The initial dialog text may be a question or instruction spoken by the user and the target dialog attribute may be informal and friendly. In this scenario, the robot may reply using informal spoken language expressions, friendly and intimate, and easy communication with the user.
In yet another exemplary embodiment, the target dialogue scene is a control system of a smart home device, and a user needs to control on-off and adjustment of the home appliance through voice commands. The initial dialog text may be a specific instruction spoken by the user and the target dialog attribute may be concise and practical. In this scenario, the reply of the robot should be concise and clear, providing only the necessary information, allowing the user to complete the operation quickly.
Through the above example, the expression mode and the reply content of the robot in the dialogue can be guided by acquiring the initial dialogue text and the target dialogue attribute in the target dialogue scene, so that the requirements and the expectations of users can be better met.
S204, inputting an initial dialogue text into a target dialogue model to perform undisturbed forward transmission, and outputting an undisturbed dialogue text, wherein the undisturbed dialogue text comprises segmented words which do not accord with target dialogue attributes, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words according to the input segmented words and a historical information matrix step by step in time steps, and determining a corresponding dialogue text according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules;
Alternatively, in this embodiment, the target dialog model may include, but is not limited to, a transducer model, where a common transducer model includes, but is not limited to BERT (Bidirectional Encoder Representations from Transformers), employs a transducer architecture, and learns language representations from a large-scale text corpus by unsupervised learning. BERT can be fine-tuned in various natural language processing tasks, including text classification, named entity recognition, question-answering, and the like. By encoding the input text into a sequence of word vectors, which are then processed by a multi-layer transducer encoder, the BERT is able to capture the contextual relationships between words, resulting in a better semantic representation. In the fine tuning stage, BERT may learn a representation of a particular task by performing supervised learning on a pre-trained corpus. For example, in a text classification task, BERT may predict the class of text by adding a classification layer based on language representations learned during a pre-training phase. By using a transform model, BERT achieves significant performance improvement over multiple natural language processing tasks, and becomes one of the important models in the field of natural language processing.
Alternatively, in this embodiment, the foregoing forward transfer may be understood as a process of calculating weights and offsets of input data through different layers, and finally obtaining an output. In the forward propagation process, the input data X, the weights W, and the bias b are known, and the final output result Y is obtained through a series of calculations. That is, the forward transfer may be understood as inputting the raw data into the neural network model and obtaining the corresponding output result.
It should be noted that fig. 3 is a schematic diagram of an alternative method for generating a dialogue text according to an embodiment of the present application, and as shown in fig. 3, a decoder structure of a transducer model is shown, where the initial dialogue text is input into the model from below, so as to obtain a corresponding probability distribution, which is a common forward transmission manner.
Alternatively, in this embodiment, the above-mentioned undisturbed dialogue text may be understood as dialogue text obtained by inputting the initial dialogue text into the pre-trained target dialogue model, where the style of the dialogue text is not controllable, and where the presence of a word segment that does not conform to the target dialogue attribute is allowed.
In an exemplary embodiment, fig. 4 is a schematic diagram of another alternative dialog text generation method according to an embodiment of the present application, as shown in fig. 4, including: dialogue text 1, dialogue text 2, dialogue text 3, dialogue text 4, wherein dialogue text 1, dialogue text 3 is the initial dialogue text that the user entered, dialogue text 2, dialogue text 4 is the undisturbed dialogue text that is automatically generated through the transducer model, because the user selects "serious" dialogue attribute, dialogue text 2, dialogue text 4 all include the word that does not accord with this dialogue attribute, therefore, this dialogue text is the aforesaid undisturbed dialogue text.
Alternatively, in this embodiment, the above-mentioned target dialogue model may be understood as a model for generating dialogue texts, where the model is used to determine output words according to input words and a history information matrix step by step in time steps, and determine corresponding dialogue texts according to the output words, including but not limited to determining output words according to the input words and the history information matrix step by step, and splicing the words into final dialogue texts.
In the target dialogue model, the input word is context information in the dialogue text, and may include the previous dialogue content and the like. The history information matrix is a matrix in which dialogue history is recorded, one word segment represents the input of one time step and the output of the other time step, and the generation of dialogue is realized by gradually determining and splicing the word segments of the output.
In an exemplary embodiment, where the goal dialog model includes a plurality of attention modules, and the history information matrix includes key-value pairs corresponding to each of the plurality of attention modules, it may be understood that each of the attention modules uses an attention parameter (W Q 、W K 、W V ) To be processed to obtain a representation vector Q, K, V of the layer, wherein K, V constitutes the key value pair described above.
It should be noted that the method can include, but is not limited to, using KV Cache to realize large model reasoning performance optimization, and the technology can improve reasoning performance through spatial time exchange thought on the premise of not affecting any calculation accuracy.
Illustratively, given input text, the model will output an answer (of length N) in which the reasoning process is performed N times. That is, the model only outputs one token at a time, the output token is spliced with the input token, and then the output token is used as the input of the next reasoning, so that the process is repeated until the terminator is encountered. In the reasoning process, a token sequence is input in each time step, the input token sequence is changed into a three-dimensional tensor [ b, s, h ] through an Embedding layer, the calculation result is mapped to a vocabulary space through a logits layer after calculation, and the dimension of the output tensor is [ b, s, vocab_size ]. The output token of the current round is spliced with the input token and is used as the input token of the next round, and the input data of the j+1th round is repeated for a plurality of times, so that the input data of the j+1th round is only newly added with one token compared with the input data of the j round, and all other input data are the same, and therefore the j+1th round inevitably comprises part of calculation of the j round during reasoning. The KV Cache caches the calculation result which can be repeatedly used in the current round, and the cached result is directly read in the next round of calculation.
After the KV Cache configuration is started, the reasoning process can be divided into 2 stages:
s1, a pre-filling stage: during the process of calculating the first output token, the Cache is empty, key Cache and value Cache are needed to be calculated and stored for each transformer layer during calculation, and the Cache completes filling when the token is output.
S2, using KV Cache phase: in the process of calculating the second output token to the last token, the Cache is valued, each round of reasoning only needs to read the Cache, and new Key and Value calculated in the current round are additionally written into the Cache.
In one exemplary embodiment, the target dialog model is a dialog text generation model implemented using deep learning techniques. According to the input dialogue text and the history information matrix, new dialogue text can be generated step by step, and semantic consistency of the input dialogue text is maintained.
The target dialogue model is composed of a plurality of attention modules, and each attention module is responsible for generating output word segmentation according to the input word segmentation and the history information matrix. At each time step, the attention module calculates a weight vector by combining the current word segmentation and key value pairs in the history information matrix, and the weight vector is used for guiding the generation of the word segmentation of the next step. The history information matrix is composed of key value pairs corresponding to attention modules in the plurality of attention modules. At each time step, the attention module calculates an attention weight based on the current word and the history information matrix, for determining which key values are important for the generation of the current word. The output undisturbed dialogue text can be obtained by inputting the initial dialogue text into the target dialogue model for undisturbed forward transfer. The undisturbed dialog text comprises a segmentation that does not conform to the target dialog properties.
Illustratively, assume that the initial dialog text entered is: "weather today is good" and the target dialog attribute requires a word of the output dialog text corresponding to "active emotion". The target dialogue model calculates an attention weight based on the initial dialogue text and the history information matrix at the first time step, and determines the generation probability of the word corresponding to the positive emotion. If the attention weight is high, then the word segment corresponding to "active emotion" is preferentially generated to meet the target dialog attribute.
S206, respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into the target attribute classifier, respectively determining the loss parameters of the target attribute classifier, and updating the historical information matrix by using the loss parameters, wherein the loss parameters are used for indicating the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes;
optionally, in this embodiment, the token vector corresponding to the above-mentioned word is generated by the above-mentioned multiple attention modules, and specifically, in a case that i attention modules are included, the token vector generated by the i attention module is the token vector corresponding to the above-mentioned word, where the input of each attention module is the output of the last attention module. The undisturbed dialog text refers to a preprocessed text that retains dialog key content by removing redundant information. The processing is aimed at extracting the main information in the dialogue to reduce the interference to the classifier.
In order to better represent semantic information of the word segment, the present embodiment adopts a characterization vector as a representation mode of the word segment. The token vector is usually generated by a pre-trained word vector model, and can better capture the semantics and context information of the words. The target attribute classifier is a model for determining whether a word corresponds to a target dialog attribute. The classifier can be a traditional machine learning method or a deep learning model, and a proper classifier is selected according to specific requirements. The loss parameter is used for measuring the degree that the segmentation corresponding to the characterization vector accords with the target dialogue attribute. By entering the loss parameters into the target attribute classifier, it can be used to update the history information matrix to provide reference and optimization in subsequent classifications.
Illustratively, assume that a restaurant dialogue is classified for attributes, such as "service attitudes," and the like. Firstly, carrying out undisturbed processing on the dialogue text, and removing non-key information.
Next, each word is converted into a token vector and input into a target attribute classifier. The classifier determines the optimization direction according to the loss parameters and judges whether the word segmentation accords with the target attribute.
For example, for the sentence "service attitudes are good, the quality of vegetable is general", and "service attitudes" and "quality of vegetable" are input as target attributes to the classifier. The classifier judges according to the characterization vector and the loss parameter, can obtain a loss function value, and reflects the degree of the word segmentation conforming to the target attribute. Based on the loss function values, the history information matrix may be updated to provide references and optimizations in subsequent classifications. The history information matrix may contain information such as history classification results, loss parameters, etc. of each word for assisting in training and optimizing the classifier. Through continuous iteration and updating, the accuracy of attribute classification of the dialogue text can be improved.
Alternatively, in this embodiment, updating the history information matrix using the loss parameters may be understood as adding an attribute classifier after the conversion of the conversion model, taking the output vector of the last conversion layer as the input of the attribute classifier, calculating the attribute loss (the attribute classifier calculates the loss parameters indicating whether the text currently generated meets the desired attribute, and the larger and the smaller the loss parameters are, the history information matrix H of each time step of the conversion model is updated by freezing the model parameters of the conversion model, and reversing the transmission with the loss of the attribute classifier) t Rather than updating the model parameters.
Specifically, it can be inserted with H t Trainable tensor T of identical shape t (as a pair H t For perturbation), model parameters of the dialogue model are frozen while back-propagating with the loss of the attribute classifier, the gradient of the model parameters is not calculated, but the trainable tensor T is calculated t Is a gradient of (a).
Will T t Is marked as H t With state H t +T t To generate a probability distribution of movement such that the generated text is more likely to possess the desired attribute a by updating the value T t Initializing to 0, updating T with gradient of attribute classifier model t . The attribute classifier model measures how well the generated text possesses the desired attribute a. Rewriting an attribute classifier model p (a|x) as p (a|H) t +T t ) T is then t The updating process of (2) is as follows:
where α is the update step size. Through the updating process, T is calculated t After that, the state H can be disturbed t The update step length can be understood as the learning rate, and the larger the value is, the larger the update amplitude is.
S208, inputting the initial dialogue text into a target dialogue model to perform perturbed forward transmission, and outputting a perturbed dialogue text, wherein the perturbed dialogue text represents the dialogue text output by the target dialogue model after the history information matrix is updated by using the loss parameters, and each word in the perturbed dialogue text accords with the target dialogue attribute.
Alternatively, in the present embodiment, in the target dialog model, the initial dialog text input may be passed forward with a disturbance to output the disturbed dialog text. Perturbed here refers to updating the historical information matrix to indicate perturbation to the target dialog model. Each word in the perturbed dialog text corresponds to a property of the target dialog.
Illustratively, assume that a target dialog model is used to generate dialog about travel. An initial dialog text is input into the model for forward transfer, and a dialog text based on the model is obtained. Then, the history information matrix is updated to a certain degree to disturb the target dialogue model. In the disturbed dialog text, each word should conform to the properties of the target dialog, i.e. be related to the topic of travel. For example, if the initial dialog text is "I want to travel," the perturbed dialog text may become "I want to travel, then visit the tower, and finally taste a delicates. Each word in this dialog text is associated with a trip, conforming to the attributes of the target dialog model.
Through the embodiment, the method comprises the steps of acquiring an initial dialogue text and a target dialogue attribute in a target dialogue scene, wherein the target dialogue attribute is used for indicating a dialogue style of the target dialogue scene; the method comprises the steps that an initial dialogue text is input into a target dialogue model to conduct undisturbed forward transmission, and a undisturbed dialogue text is output, wherein the undisturbed dialogue text comprises segmented words which do not accord with target dialogue attributes, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words according to the input segmented words and a historical information matrix step by step in time, the corresponding dialogue text is determined according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules; respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into target attribute classifiers, respectively determining loss parameters of the target attribute classifiers, and updating a historical information matrix by using the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes; the method comprises the steps of inputting an initial dialogue text into a target dialogue model to perform disturbance forward transfer and outputting a disturbance dialogue text, wherein the disturbance dialogue text represents the dialogue text output by the target dialogue model after a history information matrix is updated by using loss parameters, and has the mode that each word in the disturbance dialogue text accords with a target dialogue attribute.
In addition, the chat robot with specific attributes and styles can be customized according to the preference and the demand of the user, so that the immersion and participation of the user are enhanced, and the interactive experience of the user is improved. Replies with comforting, encouraging, homomorphic, etc. attributes can also be generated according to the emotional state of the user, providing emotional support and psychological distraction. Through realizing the text generation with controllable attributes and controllable styles, the dialogue model can better meet different scenes and user requirements, and the value and application breadth of the product are improved.
As an alternative, the method further includes: inputting an initial dialogue text into a target dialogue model to generate t first characterization vectors, wherein the target dialogue model comprises an i-layer attention module, the first characterization vectors represent output results of the i-layer attention module, the target dialogue model sequentially outputs t participles in t time steps in an iterative loop mode, j+1th participles are determined by the j-th participles output by the target dialogue model in the j-th time steps and j-th history information matrixes which are input into the target dialogue model together in the j+1th time steps, the j-th history information matrixes are used for representing i key value pairs corresponding to the j-th time steps, the i key value pairs are in one-to-one correspondence with the i-layer attention module, the t first characterization vectors are used for determining a disturbance-free dialogue text composed of t participles, i is an integer greater than or equal to 2, t is an integer greater than or equal to j+1, and j is an integer greater than or equal to 1; respectively inputting t first characterization vectors into a target attribute classifier, determining t classification results, and updating a target history information matrix, wherein the target history information matrix represents a history information matrix corresponding to target segmentation output by a target dialogue model after the target dialogue model is input, and the target segmentation comprises segmentation which is determined by the t classification results and does not accord with target dialogue attributes; inputting the initial dialogue text into a target dialogue model, and determining t second characterization vectors by using the updated target history information matrix to generate a disturbed dialogue text, wherein the generation mode of the second characterization vectors is the same as that of the first characterization vectors.
Alternatively, in this embodiment, the target dialogue model may be understood as generating new word segments each time according to the input word segments and the history information matrix, and outputting updated word segments after inputting the word segments last time and the new history information matrix.
It should be noted that, in the target dialog model, the initial dialog text is input, and t first token vectors are generated. These vectors represent the output results of the i-layer attention module in the target dialog model at different time steps. The target dialogue model adopts an iterative loop mode, one word segmentation is output on each time step, and t word segmentation is output in total. The generation of the j+1th word requires the j-th word segment output at the j-th time step depending on the target dialogue model, and the j-th history information matrix. The j-th historical information matrix is used for representing i key value pairs at the j-th time step, wherein the i key value pairs are in one-to-one correspondence with the i-layer attention modules. At the same time, the t first characterization vectors also play a role in determining the undisturbed dialogue text consisting of t participles. In this process, i represents an integer greater than or equal to 2, representing a multi-layered attention module; t represents an integer greater than or equal to j+1, indicating how many time steps there are; j represents an integer greater than or equal to 1, representing the current time step.
By the method, the target dialogue model can generate dialogue texts with certain consistency and logic. Such models have found wide application in the field of natural language processing and dialog systems. The quality and effect of dialog generation can be further improved by continuously optimizing the structure and parameters of the model.
As an alternative, t first token vectors are respectively input into a pre-trained target attribute classifier, t classification results are determined, and a target history information matrix is updated, including: inputting t first characterization vectors into a target attribute classifier respectively, and determining t loss parameters, wherein t classification results are in one-to-one correspondence with t loss parameters; determining a target loss parameter corresponding to an nth time step from t loss parameters, wherein the target loss parameter represents a loss parameter which does not meet a preset loss condition, and n is a positive integer less than or equal to t; and updating the target historical information matrix according to the target loss parameters.
Alternatively, in this embodiment, the one-to-one correspondence between the t classification results and the t loss parameters may be understood that a classification structure is determined by one loss parameter, a preset loss condition may be set to a preset threshold, and when the loss parameter is greater than the preset threshold, the classification result is considered to indicate that the corresponding word does not conform to the target dialogue attribute, and vice versa.
It should be noted that, the nth time step may be understood as a time step corresponding to a loss parameter that does not satisfy the preset loss condition, for example, the loss parameter obtained by inputting the characterization vector generated in the 1 st time step into the target attribute classifier does not satisfy the preset loss condition, and the 1 st time step is the nth time step.
In one exemplary embodiment, in the field of machine learning, the target attribute classifier is an important tool for classifying the input first token vector, including, but not limited to, the following steps:
s1, inputting a first characterization vector and determining a classification result:
for a given t first token vectors, they are input into a target attribute classifier for classification. The classifier will calculate the corresponding classification result based on the input feature vector. For example, for a face recognition system, the feature vector of the face image may be taken as input, and the classifier will output the person class to which the face belongs. For a sentence, the token vector corresponding to each word in the sentence can be taken as input, and the classifier outputs whether each token vector accords with the target dialogue style.
S2, determining t loss parameters:
after the classification result is obtained, t loss parameters need to be determined. The loss parameter is an indicator that measures whether the output text meets the target dialog style.
S3, determining a target loss parameter corresponding to the nth time step:
the target loss parameter refers to a loss parameter that does not satisfy a preset loss condition. Of the t loss parameters, a target loss parameter corresponding to the nth time step needs to be determined. This can be achieved by setting a suitable threshold. For example, if the threshold of the error rate is set to 0.1, when the error rate of a certain time step is greater than 0.1, the loss parameter corresponding to the time step is the target loss parameter.
S4, updating a target historical information matrix according to the target loss parameters:
the target history information matrix is an important data structure for storing history information. The target history information matrix may be updated based on the target loss parameters. The method of updating may be determined according to specific requirements.
As an alternative, updating the target history information matrix according to the target loss parameter includes: acquiring an initial tensor allowing training; determining the gradient of a target attribute classifier according to the target loss parameter, and training an initial tensor by utilizing the gradient until a target tensor is obtained, wherein the target tensor represents an updated value of a target historical information matrix; and determining the sum value of the target historical information matrix and the target tensor as an updated target historical information matrix.
Alternatively, in this embodiment, the initial tensor may be, but is not limited to, initialized to 0, and the tensor shape is [ batch_size, head_num, seq_len, hidden_size ]. batch_size is the batch size, head_num is the attention header number, seq_len is the sequence length, and hidden_size is the hidden dimension.
Optionally, in this embodiment, the gradient of the target attribute classifier is determined according to the target loss parameter, and the initial tensor is trained by using the gradient until the target tensor is obtained, which may include, but is not limited to, interpolation and H t Trainable tensor T of identical shape t (as a pair H t For perturbation), model parameters of the dialogue model are frozen while back-propagating with the loss of the attribute classifier, the gradient of the model parameters is not calculated, but the trainable tensor T is calculated t Is a gradient of (a).
Will T t Is marked as H t With state H t +T t To generate a probability distribution of movement such that the generated text is more likely to possess the desired attribute a by updating the value T t Initializing to 0, updating T with gradient of attribute classifier model t
As an alternative, determining the gradient of the target attribute classifier according to the target loss parameter, and training the initial tensor by using the gradient to obtain the target tensor, including: acquiring a predetermined training step length, wherein the value of the training step length is positively correlated with the amplitude adjusted by the initial tensor round-by-round training; training the initial tensor wheel by wheel according to the training step length, updating a target historical information matrix by using the tensor after training, and recalculating a loss parameter and a gradient corresponding to the nth time step of each time after updating the target historical information matrix until the loss parameter corresponding to the nth time step meets a preset loss condition, and determining the tensor after training of the wheel as the target tensor, wherein the tensor after training of each time is jointly determined by the tensor before training of each time, the gradient corresponding to each time step and the updating step length.
Optionally, in this embodiment, when the training step size is used to represent the adjustment amplitude of the tensor during each training round, the round-by-round training may be understood as modifying the corresponding historical information matrix when a certain word does not meet the target dialogue style, recalculating the loss function, and when the recalculated loss function still does not meet the preset loss condition, modifying the corresponding historical information matrix again, and recalculating the loss function until the recalculated loss function meets the preset loss condition.
In one exemplary embodiment, the attribute classifier model described above measures how well the generated text possesses the desired attribute a. Rewriting an attribute classifier model p (a|x) as p (a|H) t +T t ) T is then t The updating process of (2) is as follows:
as an alternative, inputting the initial dialog text into the target dialog model for perturbed forward transfer, outputting the perturbed dialog text, comprising: inputting the initial dialogue text into a target dialogue model to generate t second characterization vectors, wherein the history information matrix corresponding to the nth time step is updated; t second characterization vectors are respectively input into a target attribute classifier, t classification results are redetermined, wherein the redetermined t classification results indicate that the corresponding word segmentation accords with the target dialogue attribute; and (3) performing sampling operation on the t second characterization vectors to obtain the disturbed dialogue text.
Optionally, in this embodiment, when updating the history information matrix corresponding to the nth time step is completed, it may be understood that the loss parameters corresponding to the nth time step all meet the preset loss condition, that is, the loss parameters corresponding to the nth time step are as small as possible.
It should be noted that, the t classification results determined again may be understood as that the corresponding history information matrix is modified, and the loss parameters are recalculated, so as to obtain the t classification results again.
As an alternative, the method further includes: respectively executing normalization operation on t first characterization vectors and t second characterization vectors to obtain first probability distribution and second probability distribution; determining a target divergence value according to the first probability distribution and the second probability distribution, wherein the target divergence value is used for measuring the degree of difference between the first probability distribution and the second probability distribution; and taking the value of the minimized target divergence value as a target, and updating the target historical information matrix.
Alternatively, in this embodiment, the above normalization operation performed on the t first token vectors and the t second token vectors may include, but is not limited to, implementation by a softmax function, and, as shown in fig. 3, obtaining, by a linear transformation layer and an activation function, a first probability distribution and a second probability distribution corresponding to the first token vector and the second token vector, respectively.
In one exemplary embodiment, the historical information matrix H is generated by perturbing the dialog model t To make the generated text more consistent with the wanted attribute a, since the smoothness of the generated text can be influenced, in order to keep the smoothness of the generated text, the method is updated with H t When calculating the probability distribution p of the undisturbed forward transmission t+1 And probability distribution with disturbance forward transferBy minimizing the loss of KL divergence, the smoothness of the recovery generated after disturbance is maintained, the difference between two probability distributions is measured, and the closer the KL divergence is, the smaller the difference is.
As an alternative, the method further includes: obtaining unlabeled dialogue corpus data; preprocessing dialogue corpus data to obtain target dialogue corpus, wherein the target dialogue corpus represents dialogue corpus between two objects; and pre-training the initial dialogue model by using the target dialogue corpus to obtain a target dialogue model, wherein the initial dialogue model and the target dialogue model adopt a bidirectional attention mechanism on input texts and adopt a unidirectional attention mechanism on output texts.
Optionally, in this embodiment, a social media platform may be used as a data source, and crawlers and other means may be used to capture a large number of dialog corpora in the open field. Preprocessing and data cleaning are carried out on the collected original dialogue corpus data so as to improve the data quality.
The pretreatment step includes: removing irrelevant information such as links, HTML labels, advertisements and the like; duplicate, nonsensical or low quality conversations, etc. are removed. And filtering out conversations of three persons and more, and only retaining conversations between two persons as training data. The different dialog texts are separated by special marks SEP.
An example of training data is shown below, X being the input text and Y being the output text.
Is "x=hamster suitable for use as a pet? [ SEP ] I feel very suitable, I's special smart at home, not biting. [ SEP ] how do you feed? "
"y=what the mouse is buying, and what the kettle is, it is hungry without eating, hungry. "
The pre-training the initial dialogue model with the target dialogue corpus may include, but is not limited to, the following:
the dialogue model adopts a transducer structure based on a self-attention mechanism, has strong transducer modeling capability and good expandability, and can well perform parallel calculation. A bidirectional attention mechanism is adopted on the input text, and each word in the input text can pay attention to all other words; a unidirectional attention mechanism, i.e. left to right attention, is employed on the output text, each word in the output text being able to focus only on the preceding word and not on the following word. Encoding the input text with a bi-directional language model helps the dialog model to better understand the semantic information of the input text. A cross entropy loss function is used to measure the loss between the dialog model predicted and true replies.
As an alternative, acquiring the initial dialog text and the target dialog attribute in the target dialog scene includes: acquiring account data of a target account, wherein the target account represents an account participating in a target dialogue scene; and determining the target dialogue attribute according to the account data.
Alternatively, in this embodiment, the account data may be understood as data that is actively confirmed by the user and is actively provided, for example, the user clicks a button for selecting a dialogue attribute, or the like.
As an alternative, determining the target session attribute according to the account data includes at least one of: acquiring interaction data of a target account number, and determining a target dialogue attribute as a dialogue attribute related to the interaction data; acquiring a search history of a target account, and determining a target dialogue attribute as a dialogue attribute related to the search history; the emotion type of the initial dialogue text is obtained, and the target dialogue attribute is determined to be the same dialogue attribute as the emotion type of the initial dialogue text.
Optionally, in this embodiment, the interaction data may include, but is not limited to, interaction data such as clicking, long-pressing, etc. of the target account in the dialogue scene. The above search history may be understood as a search history of the user on the platform, including the type of user search, etc., for example, feminine products, etc. The above emotion type may be understood as an emotion type determined by the initial dialog text, such as a positive emotion or a negative emotion, etc.
The application is further illustrated by the following examples:
the application provides a style controllable dialogue generation method based on an attribute discriminator. In contrast to directly targeting a dialog model that is trained alone to meet certain attribute constraints, the present application does not require any modification or retraining of the pre-trained dialog model. In the application, in the reasoning stage of the dialogue model, whether the generated text accords with the expected attribute is measured by the attribute discriminator, the loss of the attribute discriminator is transmitted back and returned, the state of the dialogue model is disturbed and updated, the possibility of the attribute compliance is increased, and the pre-trained dialogue model is guided to generate the reply text meeting the attribute constraint, thereby realizing the style-controllable dialogue generation. The application can flexibly control the attribute constraint of the generated reply, and keep the smoothness of the generated reply while enabling the generated reply to meet the attribute requirement.
In an exemplary embodiment, chat robots with specific attributes and styles, such as feminization, formalism, relatedness, etc., can be customized according to user preferences and needs, so that user immersion and participation are enhanced, and user interaction experience is improved. In addition, replies with comforting, encouraging, homomorphic and other attributes can be generated according to the emotional state of the user, so that emotion support and psychological dispersion are provided. In general, through realizing the text generation with controllable attributes and controllable styles, the dialogue model can better meet different scenes and user requirements, and the value and application breadth of the product are improved.
FIG. 5 is a schematic diagram of yet another alternative method for generating dialog text, as shown in FIG. 5, according to an embodiment of the application, including but not limited to the following:
s502, collecting open domain dialogue corpus:
the social media platform is used as a data source, and crawlers and other means are used for capturing a large amount of conversation corpus in the open field. Preprocessing and data cleaning are carried out on the collected original dialogue corpus data so as to improve the data quality.
The pretreatment step comprises the following steps: removing irrelevant information such as links, HTML labels, advertisements and the like; duplicate, nonsensical or low quality conversations, etc. are removed. And filtering out conversations of three persons and more, and only retaining conversations between two persons as training data.
S504, pre-training a general dialogue model:
the dialogue model adopts a transducer structure based on a self-attention mechanism, has strong transducer modeling capability and good expandability, and can well perform parallel calculation. A bidirectional attention mechanism is adopted on the input text, and each word in the input text can pay attention to all other words; a unidirectional attention mechanism, i.e. left to right attention, is employed on the output text, each word in the output text being able to focus only on the preceding word and not on the following word. Encoding the input text with a bi-directional language model helps the dialog model to better understand the semantic information of the input text. A cross entropy loss function is used to measure the loss between the dialog model predicted and true replies. FIG. 6 is a schematic diagram of yet another alternative method of generating dialog text, the model structure of which is shown in FIG. 6, in which bos (i.e., begin of sentence) characterizes the beginning of a sentence and eos (i.e., end of content) characterizes the end of a sentence, according to an embodiment of the present application. Usually, a special symbol bos is added at the beginning of X, a special symbol eos is added at the end of X, and similarly, a special symbol bos is added at the beginning of Y, and a special symbol eos is added at the end of Y.
In one exemplary embodiment, the generic dialog model of the transducer model structure is pre-trained on a massive (on the order of billions of scale) open-field dialog corpus. The pre-trained universal dialogue model has good man-machine interaction and dialogue capability, and can generate smooth replies conforming to the context. However, since the general dialog model is an unconditional language model, the generated text content cannot satisfy the attribute to be controlled.
S506, generating a style-controllable reply:
by giving a sequence of input words and a sequence of output words: x= { X 0 ,x 1 ,…,x m };Y={y 0 ,y 1 ,…,y n The language model computes a probability distribution P (y|x) for the word sequence. The probability distribution of a word sequence can be written as a product of conditional probabilities according to the chain law.
Wherein the probability distribution of word sequences is modeled with a transducer dialog model that uses an iterative loop based on dialog historyThe complete reply text is generated by means of words next to each other, and the transformation is briefly introduced by using the concept of an iterative loop. Defining a history information matrix H t Comprising past key-value pairsWherein (1)>Is the key pair for all time steps from 0 to t for the ith transducer layer. the transducer layer is composed of at least a self-focusing block, a feedforward neural network block, and the like. Key-value pair refers to K, V in the self-attention mechanism. The specific K/V is a matrix tensor, respectively, the tensor shape is [ batch_size, head_num, seq_len, hidden_size ] ]. batch_size is the batch size, head_num is the attention header number, seq_len is the sequence length, and hidden_size is the hidden dimension. Efficient implementation of a transducer given the last input word y t In the case of (a) based on the history state H of the cache t To generate the next word y t+1 . the cyclic interpretation of the transducer can be expressed as the following mathematical expression.
o t+1 ,H t+1 =LM(y t ,H t )
Wherein o is t+1 Namely, word segmentation in the undisturbed dialogue text;
y t+1 ~p t+1 =softmax(Wo t+1 )
where W is a linear transformation matrix, mapping the logits vectors to vectors on the vocabulary. The next word y t+1 From probability distribution p t+1 And (3) sampling. Matrix state H of history information with buffer memory t The word sequence { y } before repeated calculation can be avoided 0 ,y 1 ,…,y t-1 Forward transfer of the estimate can effectively reduce the time of the inference.
FIG. 7 is a schematic diagram of yet another alternative dialog text generation method, attribute-controlled reply generation schematic diagram as shown in FIG. 7, for training an attribute classifier p (a|x) in accordance with an embodiment of the application. a denotes the desired attribute, and whether the generation meets the desired attribute is measured by an attribute classifier p (a |x). For example, attribute a is an emotional tendency, and the reply text that is desired to be generated is an emotional tendency that is positive, rather than negative. Fig. 8 is a schematic diagram of yet another alternative dialog text generation method according to an embodiment of the present application, and as shown in fig. 8, the attribute classifier is composed of a full connection layer 802, a nonlinear activation function layer 804, and a full connection layer 806. This attribute classifier is trained on the emotion classification data set in advance.
This attribute classifier is added after the transform dialog model, and the output vector of the last transform layer is used as input to the attribute classifier to calculate the attribute loss, which measures whether the generated text meets the expected attribute. Freezing model parameters of a transducer dialogue model, reversely transferring by using the loss of an attribute classifier, and updating a state history information matrix H of the dialogue model t Rather than updating model parameters, insert and H t Trainable tensor T of identical shape t When back propagation is performed with the loss of the attribute classifier, model parameters of the dialogue model are frozen, and instead of calculating the gradient of the model parameters, a trainable tensor T is calculated t Is a gradient of (a).
Will T t Recorded as state H t With state H t +T t To generate a probability distribution that will move so that the generated text is more likely to possess the desired attribute a. Will update the value T t Initializing to 0, updating T with gradient of attribute classifier model t . The attribute classifier model measures how well the generated text possesses the desired attribute a. Rewriting an attribute classifier model p (a|x) as p (a|H) t +T t ) T is then t The updating process of (2) is as follows:
where α is the update step size. Through the updating process, T is calculated t After that, the state H can be disturbed t
Disturbance state:
there is a disturbance forward transfer: the text is the disturbed dialogue text;
sampling:
state history information matrix H through disturbance dialogue model t To make the generated text more desirable to the property a, which may affect the fluency of the generated text. In order to maintain smoothness of text generation, update H t When calculating the probability distribution p of the undisturbed forward transmission t+1 And probability distribution with disturbance forward transferBy minimizing the loss of KL divergence, the smoothness of the recovery generated after disturbance is maintained.
The attribute classifier measures whether the generated text accords with the expected attribute, the loss of the attribute classifier is used for reverse transfer, and the state history information matrix H of the dialogue model is disturbed t The method comprises the steps of carrying out a first treatment on the surface of the Based on the disturbed historical information matrix, the dialogue model carries out the disturbed forward transfer again, and a reply text which accords with attribute constraint can be generated. In general, based on the attribute classifier, fig. 9 is a schematic diagram of another alternative method for generating dialog text, where, as shown in fig. 9, a process for generating a style-controllable reply is:
s902, the dialogue model carries out normal undisturbed forward transmission, and generates probability distribution on a vocabulary.
S904, calculating loss through an attribute classifier, and measuring the degree of the generated text conforming to the wanted attribute.
S906, backward transfer is carried out, loss of the attribute classifier model is returned, and gradient is calculated.
S908, updating the internal state of the dialogue model, and increasing the probability of conforming to the attribute.
S910, based on the updated internal state, the dialogue model carries out disturbed forward transmission, and a new word is generated by sampling from the newly obtained probability distribution.
With this embodiment, no modification or retraining of the pre-trained dialog model is required, nor is a large amount of dialog training data with attribute and style labels collected. In the reasoning stage of the dialogue model, the application measures whether the generated text accords with the target attribute and style through the attribute discriminator, and backward transmits and returns the loss of the attribute discriminator, perturbs and updates the state of the dialogue model, increases the possibility of accord with attribute constraint, and guides the pre-trained dialogue model to generate a reply text meeting the attribute constraint, thereby realizing the text generation with controllable style. Compared with a conditional dialogue model, the method does not need to collect a large amount of dialogue training data with attributes and style labels, does not need to retrain or finely tune a pre-trained general dialogue model, can flexibly control attribute constraint of generated replies, and keeps fluency of generating the replies while enabling the generated replies to meet attribute requirements.
It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
According to another aspect of the embodiment of the application, a device for generating the dialogue text is also provided. As shown in fig. 10, the apparatus includes:
an obtaining module 1002, configured to obtain an initial dialog text and a target dialog attribute in a target dialog scene, where the target dialog attribute is used to indicate a dialog style of the target dialog scene;
A first processing module 1004, configured to input the initial dialog text into a target dialog model for performing undisturbed forward transmission, and output an undisturbed dialog text, where the undisturbed dialog text includes a word segment that does not conform to the target dialog attribute, the undisturbed dialog text is composed of the word segment output by the target dialog model, the target dialog model is configured to determine the output word segment step by step according to a time step according to the input word segment and a history information matrix, and determine a corresponding dialog text according to the output word segment, and the target dialog model includes a plurality of attention modules, and the history information matrix includes key value pairs corresponding to each of the attention modules in the plurality of attention modules;
an updating module 1006, configured to input, respectively, a token vector corresponding to each word in the undisturbed dialogue text into a target attribute classifier, respectively determine a loss parameter of the target attribute classifier, and update the history information matrix using the loss parameter, where the loss parameter is used to represent a degree that the word corresponding to the token vector accords with the target dialogue attribute;
and a second processing module 1008, configured to input the initial dialog text into the target dialog model for performing a perturbed forward transmission, and output a perturbed dialog text, where the perturbed dialog text represents the dialog text output by the target dialog model after the history information matrix is updated with the loss parameter, and each word in the perturbed dialog text conforms to the target dialog attribute.
As an alternative, the device is further configured to: inputting the initial dialogue text into the target dialogue model to generate t first characterization vectors, wherein the target dialogue model comprises an i-layer attention module, the first characterization vectors represent output results of the i-layer attention module, the target dialogue model sequentially outputs t participles in t time steps in an iterative loop mode, j+1th participles are determined by inputting the target dialogue model together with a j-th participle and a j-th historical information matrix output by the target dialogue model in a j+1th time step, the j-th historical information matrix is used for representing i key value pairs corresponding to the j-th time step, the i key value pairs are in one-to-one correspondence with the i-layer attention module, the t first characterization vectors are used for determining the undisturbed dialogue text composed of the t participles, i is an integer greater than or equal to 2, t is an integer greater than or equal to j+1, and j is an integer greater than or equal to 1; respectively inputting the t first characterization vectors into the target attribute classifier, determining t classification results, and updating a target history information matrix, wherein the target history information matrix represents a history information matrix corresponding to target words output by the target dialogue model after the target dialogue model is input, and the target words comprise words which are determined by the t classification results and do not accord with the target dialogue attributes; and inputting the initial dialogue text into the target dialogue model, and determining t second characterization vectors by using the updated target history information matrix to generate the disturbed dialogue text, wherein the generation mode of the second characterization vectors is the same as that of the first characterization vectors.
As an alternative, the device is configured to input the t first token vectors into a pre-trained target attribute classifier respectively, determine t classification results, and update a target history information matrix in the following manner: respectively inputting the t first characterization vectors into the target attribute classifier, and determining t loss parameters, wherein the t classification results are in one-to-one correspondence with the t loss parameters; determining a target loss parameter corresponding to an nth time step from the t loss parameters, wherein the target loss parameter represents a loss parameter which does not meet a preset loss condition, and n is a positive integer less than or equal to t; and updating the target historical information matrix according to the target loss parameters.
As an alternative, the apparatus is configured to update the target history information matrix according to the target loss parameter by: acquiring an initial tensor allowing training; determining the gradient of the target attribute classifier according to the target loss parameter, and training the initial tensor by utilizing the gradient until a target tensor is obtained, wherein the target tensor represents an updated value of the target historical information matrix; and determining the sum value of the target historical information matrix and the target tensor as the updated target historical information matrix.
As an alternative, the device is configured to determine a gradient of the target attribute classifier according to the target loss parameter, and train the initial tensor by using the gradient to obtain a target tensor: acquiring a predetermined training step length, wherein the value of the training step length is positively correlated with the amplitude adjusted by the initial tensor round-by-round training; and training the initial tensor wheel by wheel according to the training step length, updating the target history information matrix by using the tensor after training, and recalculating the loss parameters and gradients corresponding to the nth time step of each wheel after updating the target history information matrix until the loss parameters corresponding to the nth time step meet the preset loss conditions, and determining the tensor after training of the wheel as the target tensor, wherein the tensor after training of each wheel is jointly determined by the tensor before training of each wheel, the gradients corresponding to each wheel and the updating step length.
As an alternative, the device is configured to input the initial dialog text into the target dialog model for performing a perturbed forward transfer, and output the perturbed dialog text by: inputting the initial dialogue text into the target dialogue model to generate t second characterization vectors, wherein the history information matrix corresponding to the nth time step is updated; respectively inputting the t second characterization vectors into the target attribute classifier, and redefining t classification results, wherein the redetermined t classification results indicate that the corresponding segmentation accords with the target dialogue attribute; and executing sampling operation on the t second characterization vectors to obtain the disturbed dialogue text.
As an alternative, the device is further configured to: respectively executing normalization operation on the t first characterization vectors and the t second characterization vectors to obtain a first probability distribution and a second probability distribution; determining a target divergence value according to the first probability distribution and the second probability distribution, wherein the target divergence value is used for measuring the difference degree between the first probability distribution and the second probability distribution; and taking the value minimizing the target divergence value as a target, and updating the target historical information matrix.
As an alternative, the device is further configured to: obtaining unlabeled dialogue corpus data; preprocessing the dialogue corpus data to obtain target dialogue corpus, wherein the target dialogue corpus represents dialogue corpus between two objects; and pre-training an initial dialogue model by using the target dialogue corpus to obtain the target dialogue model, wherein the initial dialogue model and the target dialogue model adopt a bidirectional attention mechanism on an input text, and adopt a unidirectional attention mechanism on an output text.
As an alternative, the apparatus is configured to obtain the initial dialog text and the target dialog attribute in the target dialog scene by: acquiring account data of a target account, wherein the target account represents an account participating in the target dialogue scene; and determining the target dialogue attribute according to the account data.
As an alternative, the device is configured to determine the target session attribute according to the account data by at least one of: acquiring interaction data of the target account number, and determining the target dialogue attribute as a dialogue attribute related to the interaction data; acquiring a search history of the target account, and determining the target dialogue attribute as a dialogue attribute related to the search history; and acquiring the emotion type of the initial dialogue text, and determining the target dialogue attribute as the dialogue attribute which is the same as the emotion type of the initial dialogue text.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
According to one aspect of the present application, a computer program product is provided, the computer program product comprising a computer program.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
Fig. 11 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application.
It should be noted that, the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 11, the computer system 1100 includes a central processing unit 1101 (Central Processing Unit, CPU) that can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1102 (ROM) or a program loaded from a storage section 1108 into a random access Memory 1103 (Random Access Memory, RAM). In the random access memory 1103, various programs and data necessary for the system operation are also stored. The cpu 1101, the rom 1102, and the ram 1103 are connected to each other via a bus 1104. An Input/Output interface 1105 (i.e., an I/O interface) is also connected to bus 1104.
The following components are connected to the input/output interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a local area network card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the input/output interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.
In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The computer programs, when executed by the central processor 1101, perform the various functions defined in the system of the present application.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. When executed by the central processor 1101, performs the various functions provided by the embodiments of the present application.
According to still another aspect of the embodiment of the present application, there is further provided an electronic device for implementing the method for generating a dialog text, where the electronic device may be a terminal device or a server as shown in fig. 1. The present embodiment is described taking the electronic device as a terminal device as an example. As shown in fig. 12, the electronic device comprises a memory 1202 and a processor 1204, the memory 1202 storing a computer program, the processor 1204 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the method in the embodiments of the present application by a computer program.
Alternatively, it will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 12 is merely illustrative, and that fig. 12 is not intended to limit the configuration of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.
The memory 1202 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for generating a dialog text in the embodiments of the present application, and the processor 1204 executes the software programs and modules stored in the memory 1202 to perform various functional applications and data processing, that is, implement the method for generating a dialog text described above. Memory 1202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1202 may further include memory located remotely from the processor 1204, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1202 may be used for storing information such as text of a conversation, but is not limited to. As an example, as shown in fig. 12, the memory 1202 may include, but is not limited to, an acquisition module 1002, a first processing module 1004, an update module 1006, and a second processing module 1008 in the dialog text generating device. In addition, other module units in the dialog text generating apparatus may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1206 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1206 comprises a network adapter (Network Interface Controller, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1206 is a Radio Frequency (RF) module for communicating wirelessly with the internet.
In addition, the electronic device further includes: a display 1208 for displaying the dialog text; and a connection bus 1210 for connecting the respective module parts in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. The nodes may form a peer-to-peer network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the peer-to-peer network.
According to one aspect of the present application, there is provided a computer-readable storage medium, from which a processor of an electronic device reads the computer instructions, the processor executing the computer instructions, causing the electronic device to perform the method of generating dialog text provided in various alternative implementations of the generating aspect of dialog text described above.
Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a program for executing the method in the embodiments of the present application.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product or all or part of the technical solution, which is stored in a storage medium, comprising several instructions for causing one or more electronic devices to perform all or part of the steps of the method described in the various embodiments of the present application.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed application program may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (15)

1. A method for generating dialog text, comprising:
acquiring an initial dialogue text and a target dialogue attribute in a target dialogue scene, wherein the target dialogue attribute is used for indicating a dialogue style of the target dialogue scene;
the initial dialogue text is input into a target dialogue model to conduct undisturbed forward transfer, and a undisturbed dialogue text is output, wherein the undisturbed dialogue text comprises segmented words which do not accord with the target dialogue attribute, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words according to the input segmented words and a historical information matrix step by step in time, and determining a corresponding dialogue text according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules;
Respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into a target attribute classifier, respectively determining the loss parameters of the target attribute classifier, and updating the history information matrix by using the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes;
and inputting the initial dialogue text into the target dialogue model to perform perturbed forward transmission, and outputting a perturbed dialogue text, wherein the perturbed dialogue text represents the dialogue text output by the target dialogue model after the history information matrix is updated by using the loss parameters, and each word segmentation in the perturbed dialogue text accords with the target dialogue attribute.
2. The method according to claim 1, wherein the method further comprises:
inputting the initial dialogue text into the target dialogue model to generate t first characterization vectors, wherein the target dialogue model comprises an i-layer attention module, the first characterization vectors represent output results of the i-layer attention module, the target dialogue model sequentially outputs t participles in t time steps in an iterative loop mode, j+1th participles are determined by inputting the target dialogue model together with a j-th participle and a j-th historical information matrix output by the target dialogue model in a j+1th time step, the j-th historical information matrix is used for representing i key value pairs corresponding to the j-th time step, the i key value pairs are in one-to-one correspondence with the i-layer attention module, the t first characterization vectors are used for determining the undisturbed dialogue text composed of the t participles, i is an integer greater than or equal to 2, t is an integer greater than or equal to j+1, and j is an integer greater than or equal to 1;
Respectively inputting the t first characterization vectors into the target attribute classifier, determining t classification results, and updating a target history information matrix, wherein the target history information matrix represents a history information matrix corresponding to target words output by the target dialogue model after the target dialogue model is input, and the target words comprise words which are determined by the t classification results and do not accord with the target dialogue attributes;
and inputting the initial dialogue text into the target dialogue model, and determining t second characterization vectors by using the updated target history information matrix to generate the disturbed dialogue text, wherein the generation mode of the second characterization vectors is the same as that of the first characterization vectors.
3. The method of claim 2, wherein the inputting the t first token vectors into the pre-trained target attribute classifier, determining t classification results, and updating a target history information matrix, respectively, comprises:
respectively inputting the t first characterization vectors into the target attribute classifier, and determining t loss parameters, wherein the t classification results are in one-to-one correspondence with the t loss parameters;
Determining a target loss parameter corresponding to an nth time step from the t loss parameters, wherein the target loss parameter represents a loss parameter which does not meet a preset loss condition, and n is a positive integer less than or equal to t;
and updating the target historical information matrix according to the target loss parameters.
4. A method according to claim 3, wherein said updating said target history information matrix according to said target loss parameter comprises:
acquiring an initial tensor allowing training;
determining the gradient of the target attribute classifier according to the target loss parameter, and training the initial tensor by utilizing the gradient until a target tensor is obtained, wherein the target tensor represents an updated value of the target historical information matrix;
and determining the sum value of the target historical information matrix and the target tensor as the updated target historical information matrix.
5. The method of claim 4, wherein determining the gradient of the target attribute classifier based on the target loss parameter and training the initial tensor using the gradient to obtain a target tensor comprises:
Acquiring a predetermined training step length, wherein the value of the training step length is positively correlated with the amplitude adjusted by the initial tensor round-by-round training;
and training the initial tensor wheel by wheel according to the training step length, updating the target history information matrix by using the tensor after training, and recalculating the loss parameters and gradients corresponding to the nth time step of each wheel after updating the target history information matrix until the loss parameters corresponding to the nth time step meet the preset loss conditions, and determining the tensor after training of the wheel as the target tensor, wherein the tensor after training of each wheel is jointly determined by the tensor before training of each wheel, the gradients corresponding to each wheel and the updating step length.
6. A method according to claim 3, wherein said inputting the initial dialog text into the target dialog model for perturbed forward transfer, outputting perturbed dialog text, comprises:
inputting the initial dialogue text into the target dialogue model to generate t second characterization vectors, wherein the history information matrix corresponding to the nth time step is updated;
Respectively inputting the t second characterization vectors into the target attribute classifier, and redefining t classification results, wherein the redetermined t classification results indicate that the corresponding segmentation accords with the target dialogue attribute;
and executing sampling operation on the t second characterization vectors to obtain the disturbed dialogue text.
7. The method according to claim 1, wherein the method further comprises:
respectively executing normalization operation on the t first characterization vectors and the t second characterization vectors to obtain a first probability distribution and a second probability distribution;
determining a target divergence value according to the first probability distribution and the second probability distribution, wherein the target divergence value is used for measuring the difference degree between the first probability distribution and the second probability distribution;
and taking the value minimizing the target divergence value as a target, and updating the target historical information matrix.
8. The method according to claim 1, wherein the method further comprises:
obtaining unlabeled dialogue corpus data;
preprocessing the dialogue corpus data to obtain target dialogue corpus, wherein the target dialogue corpus represents dialogue corpus between two objects;
And pre-training an initial dialogue model by using the target dialogue corpus to obtain the target dialogue model, wherein the initial dialogue model and the target dialogue model adopt a bidirectional attention mechanism on an input text, and adopt a unidirectional attention mechanism on an output text.
9. The method of claim 1, wherein the obtaining the initial dialog text and the target dialog attributes in the target dialog scene comprises:
acquiring account data of a target account, wherein the target account represents an account participating in the target dialogue scene;
and determining the target dialogue attribute according to the account data.
10. The method of claim 1, wherein the determining the target session attribute from the account data comprises at least one of:
acquiring interaction data of the target account number, and determining the target dialogue attribute as a dialogue attribute related to the interaction data;
acquiring a search history of the target account, and determining the target dialogue attribute as a dialogue attribute related to the search history;
and acquiring the emotion type of the initial dialogue text, and determining the target dialogue attribute as the dialogue attribute which is the same as the emotion type of the initial dialogue text.
11. A dialog text generation device, comprising:
the system comprises an acquisition module, a target dialogue module and a display module, wherein the acquisition module is used for acquiring initial dialogue text and target dialogue attributes in a target dialogue scene, and the target dialogue attributes are used for indicating dialogue styles of the target dialogue scene;
the first processing module is used for inputting the initial dialogue text into a target dialogue model to perform undisturbed forward transmission and outputting an undisturbed dialogue text, wherein the undisturbed dialogue text comprises segmented words which do not accord with the target dialogue attribute, the undisturbed dialogue text consists of segmented words output by the target dialogue model, the target dialogue model is used for determining the output segmented words step by step according to time steps and according to the input segmented words and a historical information matrix, and determining a corresponding dialogue text according to the output segmented words, the target dialogue model comprises a plurality of attention modules, and the historical information matrix comprises key value pairs corresponding to each attention module in the plurality of attention modules;
the updating module is used for respectively inputting the characterization vectors corresponding to the words in the undisturbed dialogue text into a target attribute classifier, respectively determining the loss parameters of the target attribute classifier, and updating the historical information matrix by utilizing the loss parameters, wherein the loss parameters are used for representing the degree that the words corresponding to the characterization vectors accord with the target dialogue attributes;
And the second processing module is used for inputting the initial dialogue text into the target dialogue model to carry out disturbance forward transmission and outputting disturbance dialogue text, wherein the disturbance dialogue text represents the dialogue text output by the target dialogue model after the history information matrix is updated by using the loss parameter, and each word in the disturbance dialogue text accords with the target dialogue attribute.
12. The apparatus of claim 11, wherein the apparatus is further configured to:
inputting the initial dialogue text into the target dialogue model to generate t first characterization vectors, wherein the target dialogue model comprises an i-layer attention module, the first characterization vectors represent output results of the i-layer attention module, the target dialogue model sequentially outputs t participles in t time steps in an iterative loop mode, j+1th participles are determined by inputting the target dialogue model together with a j-th participle and a j-th historical information matrix output by the target dialogue model in a j+1th time step, the j-th historical information matrix is used for representing i key value pairs corresponding to the j-th time step, the i key value pairs are in one-to-one correspondence with the i-layer attention module, the t first characterization vectors are used for determining the undisturbed dialogue text composed of the t participles, i is an integer greater than or equal to 2, t is an integer greater than or equal to j+1, and j is an integer greater than or equal to 1;
Respectively inputting the t first characterization vectors into the target attribute classifier, determining t classification results, and updating a target history information matrix, wherein the target history information matrix represents a history information matrix corresponding to target words output by the target dialogue model after the target dialogue model is input, and the target words comprise words which are determined by the t classification results and do not accord with the target dialogue attributes;
and inputting the initial dialogue text into the target dialogue model, and determining t second characterization vectors by using the updated target history information matrix to generate the disturbed dialogue text, wherein the generation mode of the second characterization vectors is the same as that of the first characterization vectors.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the computer program is executable by an electronic device to perform the method of any one of claims 1 to 10.
14. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method as claimed in any one of claims 1 to 10.
15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 10 by means of the computer program.
CN202311289708.9A 2023-10-07 2023-10-07 Dialogue text generation method and device, storage medium and electronic equipment Pending CN117216223A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311289708.9A CN117216223A (en) 2023-10-07 2023-10-07 Dialogue text generation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311289708.9A CN117216223A (en) 2023-10-07 2023-10-07 Dialogue text generation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117216223A true CN117216223A (en) 2023-12-12

Family

ID=89046153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311289708.9A Pending CN117216223A (en) 2023-10-07 2023-10-07 Dialogue text generation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117216223A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094445A (en) * 2024-04-23 2024-05-28 科大讯飞股份有限公司 Man-machine interaction method, device, equipment and program product based on large model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094445A (en) * 2024-04-23 2024-05-28 科大讯飞股份有限公司 Man-machine interaction method, device, equipment and program product based on large model

Similar Documents

Publication Publication Date Title
CN112487182B (en) Training method of text processing model, text processing method and device
JP7194284B2 (en) Quantization model optimization method, device, information recommendation method, device, neural network model optimization method, device, electronic device, and computer program
CN111897933B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
CN111897941A (en) Dialog generation method, network training method, device, storage medium and equipment
CN110263324A (en) Text handling method, model training method and device
CN111966800B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
CN106409290B (en) A method of child's intelligent sound education based on image analysis
CN112271001B (en) Medical consultation dialogue system and method applying heterogeneous graph neural network
CN112214591B (en) Dialog prediction method and device
CN110347792A (en) Talk with generation method and device, storage medium, electronic equipment
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN115186110B (en) Multi-modal knowledge graph completion method and system based on relationship-enhanced negative sampling
CN111259668B (en) Reading task processing method, model training device and computer equipment
CN111046178B (en) Text sequence generation method and system
CN112733043B (en) Comment recommendation method and device
CN116821457B (en) Intelligent consultation and public opinion processing system based on multi-mode large model
CN117216223A (en) Dialogue text generation method and device, storage medium and electronic equipment
CN112527993A (en) Cross-media hierarchical deep video question-answer reasoning framework
CN113761156A (en) Data processing method, device and medium for man-machine interaction conversation and electronic equipment
CN117033602A (en) Method for constructing multi-mode user mental perception question-answering model
CN113360618A (en) Intelligent robot dialogue method and system based on offline reinforcement learning
CN117437317A (en) Image generation method, apparatus, electronic device, storage medium, and program product
CN116956116A (en) Text processing method and device, storage medium and electronic equipment
CN114783601A (en) Physiological data analysis method and device, electronic equipment and storage medium
CN116882450B (en) Question-answering model editing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination