CN113987154A - Similar sentence generation model training method based on UniLM and contrast learning and related equipment - Google Patents

Similar sentence generation model training method based on UniLM and contrast learning and related equipment Download PDF

Info

Publication number
CN113987154A
CN113987154A CN202111327839.2A CN202111327839A CN113987154A CN 113987154 A CN113987154 A CN 113987154A CN 202111327839 A CN202111327839 A CN 202111327839A CN 113987154 A CN113987154 A CN 113987154A
Authority
CN
China
Prior art keywords
sentence
loss function
unilm
sample
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111327839.2A
Other languages
Chinese (zh)
Inventor
黄勇其
王伟
于翠翠
张黔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Runlian Software System Shenzhen Co Ltd
Original Assignee
Runlian Software System Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Runlian Software System Shenzhen Co Ltd filed Critical Runlian Software System Shenzhen Co Ltd
Priority to CN202111327839.2A priority Critical patent/CN113987154A/en
Publication of CN113987154A publication Critical patent/CN113987154A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a similar sentence generation model training method based on UniLM and contrast learning, which comprises the steps of inputting a sample sentence into a similar sentence generation model, wherein a sentence coding layer is used for coding the sample sentence to obtain a dense vector, a multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function; inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence; and adding the contrast loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model. The method improves the accuracy of the trained model.

Description

Similar sentence generation model training method based on UniLM and contrast learning and related equipment
Technical Field
The application relates to the field of artificial intelligence, in particular to a similar sentence generation model training method and device based on UniLM and contrast learning, computer equipment and a storage medium.
Background
With the development of technology, natural language processing is more and more widely applied to the fields of finance, internet, education, medical treatment and the like, particularly, in the field of FAQ question answering, answers are retrieved from an intelligent question answering system through question sentences given by users, and the related technology is to intensively match the question sentences of the users with FAQ standard questions, retrieve questions with the highest matching degree according to the matching degree and return the corresponding answers. The traditional FAQ standard problem set is usually expanded in a manual labeling mode, and the mode consumes time and labor and increases a large amount of labor cost. In recent years, models based on generation countermeasure (GAN) or on variational self-encoding (VAE) have been used to perform generation of similar sentences, but GAN faces the problem that the result of generation is uncontrollable and the sentence does not conform to the grammatical rules, while VAE faces the problem of posterior collapse (spatial collapse) depending on the distribution and number of samples. In addition, with the development of the pre-training language model, research has also been carried out on the use of the Bert model for similar sentence generation, and research has found that sentences generated by subjecting vectors generated by the self-supervision model to random dropout have higher similarity compared with the original sentences, which is superior to conventional sentence expansion methods of cutting out and deleting words from the original sentences.
Disclosure of Invention
Based on this, in order to solve the above technical problems, the present application provides a method, an apparatus, a computer device, and a storage medium for training a similar sentence generation model based on UniLM and contrast learning, so as to solve the technical problem in the prior art that important information is lost due to random dropout.
A similar sentence generation model training method based on UniLM and contrast learning, the method comprises the following steps:
inputting a sample sentence into a similar sentence generating model, wherein the similar sentence generating model comprises a sentence coding layer and a multi-head self-attention structure comprising a mask matrix, the sentence coding layer is used for coding the sample sentence to obtain a dense vector, the multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function;
inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence;
and adding the contrast loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
A similar sentence generation model training apparatus based on UniLM and contrast learning, the apparatus comprising:
the system comprises a first loss module, a second loss module and a third loss module, wherein the first loss module is used for inputting a sample sentence into a similar sentence generating model, the similar sentence generating model comprises a sentence coding layer and a multi-head self-attention structure comprising a mask matrix, the sentence coding layer is used for coding the sample sentence to obtain a dense vector, the multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function;
the second loss module is used for inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence;
and the total loss module is used for adding the comparison loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
A computer device comprising a memory and a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the steps of the above method for training a model of generating similar sentences based on UniLM and contrastive learning when executing the computer readable instructions.
A computer readable storage medium storing computer readable instructions which, when executed by a processor, implement the steps of the above method for training a model for generating similar sentences based on UniLM and contrastive learning.
According to the method and the device for training the similar sentence generation model based on the UniLM and the contrast learning, the information with higher value for the generation task can be effectively screened out through the provided key information extractor, the key point location extractor is designed for screening the information with higher value for the generation task, and the improved method for performing combined training of small-batch contrast learning and the UniLM native one-way language model (Left-to-Right LM) is further provided, so that the quality of the generated sample is further improved. In addition, the model obtained by the training method can generate a plurality of similar sentences at one time, and the purpose of rapidly expanding the data set is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of an application environment of a method for training a similarity sentence generation model based on UniLM and contrast learning;
FIG. 2 is a schematic flow chart of a method for training a similarity sentence generation model based on UniLM and contrast learning;
FIG. 3 is a diagram of a mask corresponding to a language model;
FIG. 4 is a diagram of a training network architecture;
FIG. 5 is a graph of a loss function;
FIG. 6 is a schematic diagram of a similar sentence generation model training apparatus based on UniLM and contrast learning;
FIG. 7 is a diagram of a computer device in one embodiment.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for generating the similar sentence based on the UniLM and the comparative learning provided by the embodiment of the invention can be applied to the application environment shown in FIG. 1. The application environment may include a terminal 102, a network for providing a communication link medium between the terminal 102 and the server 104, and a server 104, wherein the network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may use the terminal 102 to interact with the server 104 over a network to receive or send messages, etc. The terminal 102 may have installed thereon various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal 102 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.
The server 104 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal 102.
It should be noted that the method for generating similar sentences based on UniLM and comparative learning provided in the embodiment of the present application is generally executed by a server/terminal, and accordingly, the device for generating similar sentences based on UniLM and comparative learning is generally disposed in the server/terminal device.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be understood that the number of terminals, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Wherein, the terminal 102 communicates with the server 104 through the network. The terminal 102 and the server 104 are connected through a network, the network may be a wired network or a wireless network, the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for training a similarity sentence generation model based on UniLM and comparison learning is provided, which is described by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:
202, inputting a sample sentence into a similar sentence generating model, wherein the similar sentence generating model comprises a sentence coding layer and a multi-head self-attention structure comprising a mask matrix, the sentence coding layer is used for coding the sample sentence to obtain a dense vector, the multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function;
the application provides a similar sentence generating method based on UniLM and contrast learning, designs a key point location extractor for screening information with a high value for generating tasks, so as to solve the problem that important information may be lost by a dropout method, and further provides an improved small-batch contrast learning and UniLM native single language model (Left-t-rightLM) combined training method, so as to further improve the quality of generated samples.
The UniLM is a unified pre-training language model in the field, and a transform network is jointly optimized through three language models of different objective functions, wherein the three language models comprise a bidirectional language model (Bicorrective LM), a unidirectional language model (Left-to-Right LM) and a sequence-to-sequence language model (Seq-to-Seq LM). The application adopts a pre-trained UniLM language model and only uses a one-way language model part for improvement. The UniLM comprises a sentence coding module, a plurality of transform blocks and an optimization function layer (namely an optimization objective function of three language models).
Note that the input sentence is X, X ═ X1,x2,...,xnIn which xiAnd inputting X into a sentence coding layer, and coding the obtained sentence into X _ vec (as shown in FIG. 2, X is used for calculating a cross entropy loss function, and X _ vec is used for calculating a contrast learning loss function).
Dense vector in this embodiment, which belongs to the normal operation of NLP, maps a word to a vector.
In one embodiment, the structure of the transform block of the UniLM is substantially the same as that of Bert (a pre-trained language model), and a multi-head self-attention structure is adopted, except that a mask matrix is added, and formula (1) is calculated as follows:
Figure BDA0003347847120000051
where QKV is the result of multiplying the input vector by three transformation matrices, AlIs the output of the corresponding block. M is a mask, which indicates whether the word can be seen by other words, for example, the mask corresponding to the one-way language model is shown in FIG. 3.
For unidirectional LM (left-to-right or right-to-left), we can only attribute the context of one direction for each word when doing the attribute. For left-to-right LM, we can only attribute the information on the left side of the word, and need to give the word on the right side to the mask; the same is true for right-to-left LM. The Mask matrix we obtain is thus an upper or lower triangle.
In this embodiment, in order to solve the technical problem, the existing UniLM model is modified:
designing a key information extractor: the method for generating the similar sentence by using dropout is to AlRandom dropout is carried out, thus obtaining a new AlThe method discards important information, and the method designs the following formulas (2), (3) and (4):
Figure BDA0003347847120000061
select(Q,K)=sorted(k(Qi,K))[1:m] (3)
newAl=(select(Q,K)+M)V (4)
wherein, the formula (2) k (Q)iK) calculating the similarity value of the ith component of the Q vector and all the K component vectors, dkIs the dimension of the vector, e.g., (64, 256, 512, etc.), lk is the input sentence length. Equation (3) represents that the calculation of equation (2) is carried out on all Q components, the Q components are sorted from large to small according to the obtained values, the first m components are selected, and m can be customized and represents that m important vectors are reserved. Equation (4) is to select the important sentence vector (equation (3) can achieve the purpose) through two ways of (2) and (3), add the important sentence vector with the mask M, multiply the added important sentence vector by V to obtain a new value newAlAnd obtaining a new sentence code. The formula (2) can calculate the relation (which can be understood as similarity) between two vectors qk, if qk is closely related, the obtained value is larger, the relation is not so close, the obtained value is smaller, q is calculated with all k to obtain a series of numerical values, then the numerical values are sorted by the formula (3) to obtain the m largest vectors in front, and the vectors corresponding to the values are reserved, so that the purpose of extracting key information can be achieved. Each layer of the transform block corresponds to a key information extractor, and L corresponding outputs can be obtained through the L layers of the transform block and the corresponding key information extractors, and the L outputs and the encoded X _ vec of the input sentence form a positive sample pair for calculating the contrast loss function, as shown in fig. 4.
In the present embodiment, contrast learning is a kind of self-supervised learning, which trains a representation learning model by constructing similar examples and dissimilar examples, so that similar examples are closer in projection space and dissimilar examples are farther in projection space through the model. Namely, the purpose of contrast learning is to shorten the distance between similar sentences and to shorten the distance between irrelevant sentences, so that the similarity between the sentences generated by the model after training and the original sentences is higher.
The L vectors obtained from the key information extractor obtained from the above equations (2), (3) and (4) are denoted as { z }1,z2,...,zL}, (i.e. newA obtained in the above stepl). Training stage, input sentence code X _ vec and { z1,z2,...,zLForm L sets of positive sample pairs, i.e.
Figure BDA0003347847120000062
And X _ vec and other samples of the same batch of training samples form negative sample pairs. The following equations (5), (6) and (7) are designed to calculate the contrast loss function:
Figure BDA0003347847120000063
Lcont=α1l12l2+...+αLlL (6)
α12+...+αL=1 (7)
in the formula (5), the numerator calculates the similarity of the ith positive sample pair, the denominator calculates the similarity of all the negative sample pairs, and the cumulative sum is calculated. In the formula hiIndicating that the input sample, i.e. X vec,
Figure BDA0003347847120000071
representing vectors obtained by key-point decimators, i.e.
{z1,z2,...,zLZ iniAnd tau is the hyper-parameter to be trained. Formula (A), (B)6) Is a total contrast learning loss function, is composed of the losses of L positive samples, and gives a weight α to each lossi,αiSatisfies the relation of equation (7). Parameter alphaiThe network may be trained (e.g., a gradient descent algorithm) to determine a final value. The improved contrast loss function can comprehensively consider the influence of the vectors selected by the decimator on the quality of the finally generated samples, alphaiThe existence of (2) gives larger weight to the important vector, and smaller weight to the minor vector, so that the generated result is more reasonable; the generated result refers to a sentence decoded by the Left-to-Right LM module, that is, a process of decoding a stack of vectors into words.
Step 204, inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence;
as shown in fig. 5, the text alignment loss function adopts a cross entropy loss function commonly used in generative models, which is denoted as J, formula (8). The input sentence is the input sentence X in step 202, and the generated sentence is a sentence generated after X passes through UniLM, and the cross entropy loss function is obtained by calculating the input sentence and the generated sentence.
Figure BDA0003347847120000072
Wherein the content of the first and second substances,
Figure BDA0003347847120000073
and mapping the dense vectors into probability values through a softmax function, namely a normalized exponential function, obtaining the probability of decoding the current dense vectors into a word, and summing the loss functions of each step to obtain a total cross entropy loss function J.
And step 206, adding the contrast loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
The training of the similar sentence generation model adopts a multi-task joint training mode, and the total loss function comprises a comparison learning loss function and a cross entropy loss function, namely:
Loss=Lcont+J
in another embodiment, the generated sentences may be obtained according to the trained model, the training samples are input into the model for training, and after the model is trained, L +1 generated sentences may be obtained, where L generated sentences are vectors obtained after conversion by the key information extractor, and then input into the sentences generated by the decoding module (i.e., Left-to-Right LM), and the other is a sentence generated without decoding by the key information extractor.
Example (c):
in the field of intelligent customer service data expansion or other intelligent question and answer fields, the problems of the user and the problems of the standard question bank are often required to be accurately matched so as to achieve the purpose of quickly returning the user request. However, the expansion of the conventional standard question bank is manually input, so that a large amount of labor cost is caused.
Such as the following training samples:
the benefits of more leeks
How much money is earned by the cloud
What married a prayer wheel
What the long fresh means
Can we marry
The data is used as a training batch input model to be trained, and the model is reconstructed, for example, a sentence of 'how good leeks are eaten' can obtain L feature vectors through a feature extractor, the feature vectors of the sentence and the L feature vectors form a positive sample pair, and the feature vectors and other 4 sentences such as 'how money is earned in clouds' form a negative sample pair, and finally training is performed. After the training is finished, if the user inputs a sentence "how much leek has the benefit", L +1 sentences similar to "how much leek has the benefit" can be obtained, such as "how much leek has the benefit, what how much leek has the benefit", and the like.
The key point location extractor is used for screening the information with the high value for the generation task, and the improved method for small batch comparison learning and UniLM native one-way language model (Left-to-Right LM) combined training is further provided, so that the quality of the generated sample is further improved. In addition, the model obtained by the training method can generate a plurality of similar sentences at one time, and the purpose of rapidly expanding the data set is achieved.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 6, a similar sentence generation model training device based on UniLM and comparative learning is provided, and the similar sentence generation model training device based on UniLM and comparative learning corresponds to the similar sentence generation model training method based on UniLM and comparative learning in the above embodiment one to one. The similar sentence generation model training device based on the UniLM and the comparison learning comprises:
a first loss module 602, configured to input a sample sentence into a similar sentence generating model, where the similar sentence generating model includes a sentence encoding layer and a multi-head self-attention structure including a mask matrix, the sentence encoding layer is configured to encode a sample sentence to obtain a dense vector, the multi-head self-attention structure is configured to extract key information in the dense vector, and form a positive sample with the key information and the dense vector to calculate a contrast loss function;
a second loss module 604, configured to input the dense vector into a UniLM model to obtain an output sentence, and calculate a text alignment loss function between the sample sentence and the output sentence;
and a total loss module 606, configured to add the comparison loss function and the text alignment loss function to obtain a total loss function, and calculate a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
The key information extractor provided by the device can effectively screen out information with great value for generating the task, the key point location extractor is designed for screening the information with great value for generating the task, and the method for performing the combined training of improved small-batch contrast learning and UniLM native one-way language model (Left-to-Right LM) is also provided, so that the quality of the generated sample is further improved. In addition, the model obtained by the training method can generate a plurality of similar sentences at one time, and the purpose of rapidly expanding the data set is achieved.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operating system and execution of computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store sample data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer readable instructions, when executed by a processor, implement a UniLM and contrast learning based method of training a model for generating similar sentences.
As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
In one embodiment, a computer readable storage medium is provided, on which computer readable instructions are stored, and when executed by a processor, implement the steps of the method for training a UniLM and contrastive learning-based clause generation model in the above embodiments, such as step 202 to step 206 shown in fig. 2, or implement the functions of the modules/units of the apparatus for training a UniLM and contrastive learning-based clause generation model in the above embodiments, such as the functions of module 602 to module 606 shown in fig. 6.
It will be understood by those of ordinary skill in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a non-volatile computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, without departing from the spirit and scope of the present invention, several changes, modifications and equivalent substitutions of some technical features may be made, and these changes or substitutions do not make the essence of the same technical solution depart from the spirit and scope of the technical solution of the embodiments of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A similar sentence generation model training method based on UniLM and contrast learning is characterized by comprising the following steps:
inputting a sample sentence into a similar sentence generating model, wherein the similar sentence generating model comprises a sentence coding layer and a multi-head self-attention structure comprising a mask matrix, the sentence coding layer is used for coding the sample sentence to obtain a dense vector, the multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function;
inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence;
and adding the contrast loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
2. The method of claim 1, wherein the cross-entropy loss function J is:
Figure FDA0003347847110000011
wherein the content of the first and second substances,
Figure FDA0003347847110000012
as an input vector ziAnd mapping the normalized exponential function into a probability value to obtain the probability of decoding the current dense vector into a certain word.
3. The method of claim 1, wherein the total Loss function Loss is:
Loss=Lcont+J
where Lcont is the contrast loss function and J is the cross entropy loss function.
4. A similar sentence generation model training device based on UniLM and contrast learning is characterized by comprising the following components:
the system comprises a first loss module, a second loss module and a third loss module, wherein the first loss module is used for inputting a sample sentence into a similar sentence generating model, the similar sentence generating model comprises a sentence coding layer and a multi-head self-attention structure comprising a mask matrix, the sentence coding layer is used for coding the sample sentence to obtain a dense vector, the multi-head self-attention structure is used for extracting key information in the dense vector, and the key information and the dense vector form a positive sample to calculate a contrast loss function;
the second loss module is used for inputting the dense vector into a UniLM model to obtain an output sentence, and calculating a text alignment loss function between the sample sentence and the output sentence;
and the total loss module is used for adding the comparison loss function and the text alignment loss function to obtain a total loss function, and calculating a final value of the total loss function based on a gradient descent method to obtain a trained similar sentence generation model.
5. A computer device comprising a memory and a processor, the memory storing computer readable instructions, wherein the processor when executing the computer readable instructions implements the steps of the method of any one of claims 1 to 3.
6. A computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the steps of the method of any one of claims 1 to 3.
CN202111327839.2A 2021-11-10 2021-11-10 Similar sentence generation model training method based on UniLM and contrast learning and related equipment Pending CN113987154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111327839.2A CN113987154A (en) 2021-11-10 2021-11-10 Similar sentence generation model training method based on UniLM and contrast learning and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111327839.2A CN113987154A (en) 2021-11-10 2021-11-10 Similar sentence generation model training method based on UniLM and contrast learning and related equipment

Publications (1)

Publication Number Publication Date
CN113987154A true CN113987154A (en) 2022-01-28

Family

ID=79747770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111327839.2A Pending CN113987154A (en) 2021-11-10 2021-11-10 Similar sentence generation model training method based on UniLM and contrast learning and related equipment

Country Status (1)

Country Link
CN (1) CN113987154A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118114677A (en) * 2024-04-30 2024-05-31 杭州思锐信息技术股份有限公司 Automatic labeling optimization method and system for entity identification based on dense retrieval

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118114677A (en) * 2024-04-30 2024-05-31 杭州思锐信息技术股份有限公司 Automatic labeling optimization method and system for entity identification based on dense retrieval
CN118114677B (en) * 2024-04-30 2024-07-05 杭州思锐信息技术股份有限公司 Automatic labeling optimization method and system for entity identification based on dense retrieval

Similar Documents

Publication Publication Date Title
WO2022007438A1 (en) Emotional voice data conversion method, apparatus, computer device, and storage medium
CN112685565A (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN112863683A (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN111695591A (en) AI-based interview corpus classification method, device, computer equipment and medium
CN113239176B (en) Semantic matching model training method, device, equipment and storage medium
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
JP2022169743A (en) Information extraction method and device, electronic equipment, and storage medium
CN112632256A (en) Information query method and device based on question-answering system, computer equipment and medium
CN112084752A (en) Statement marking method, device, equipment and storage medium based on natural language
CN111368551A (en) Method and device for determining event subject
CN113887237A (en) Slot position prediction method and device for multi-intention text and computer equipment
CN115730597A (en) Multi-level semantic intention recognition method and related equipment thereof
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN113468857B (en) Training method and device for style conversion model, electronic equipment and storage medium
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN112307738B (en) Method and device for processing text
CN114091452A (en) Adapter-based transfer learning method, device, equipment and storage medium
CN113987162A (en) Text abstract generation method and device and computer equipment
CN113987154A (en) Similar sentence generation model training method based on UniLM and contrast learning and related equipment
CN113256395B (en) Product recommendation method, device, equipment and storage medium based on recommendation graph network
CN114881033A (en) Text abstract generation method and device, computer equipment and storage medium
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN114490969A (en) Question and answer method and device based on table and electronic equipment
CN113297367A (en) Method for generating user conversation linking language and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination