CN116595385B

CN116595385B - Composition generation model training method and device

Info

Publication number: CN116595385B
Application number: CN202310876691.0A
Authority: CN
Inventors: 王芳; 暴宇健
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-10-03
Anticipated expiration: 2043-07-18
Also published as: CN116595385A

Abstract

The disclosure relates to the technical field of machine learning, and provides a composition generation model training method and device. The method comprises the following steps: acquiring a first training data set, and training a pre-training language model by using a cross entropy loss function based on the question requirement in the first training data set and the composition corresponding to the question requirement; acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model; training the question text distance measurement model by using a triplet loss function based on the second training data set; based on the third training data set, retraining the trained pre-training language model by using the strategy gradient function, and taking the retrained pre-training language model as a composition generation model.

Description

Composition generation model training method and device

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a composition generation model training method and device.

Background

In recent years, computer technology has rapidly developed, and many industries can be greatly supported and improved by using artificial intelligence technology. In the field of article generation, more and more people prefer to generate compositions using a composition generation model. Although the existing composition generation model has made great progress in language generation, the generated text still has a certain problem in terms of logicality and consistency.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method, apparatus, electronic device, and computer readable storage medium for training a composition generation model, so as to solve the problems of poor composition logic and incoherence of semantics in model generation in the prior art.

In a first aspect of an embodiment of the present disclosure, a composition generation model training method is provided, including: acquiring a first training data set, and training a pre-training language model by using a cross entropy loss function based on the question requirement in the first training data set and the composition corresponding to the question requirement; acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model; training the question text distance measurement model by using a triplet loss function based on the second training data set; based on the third training data set, retraining the trained pre-training language model by using the strategy gradient function, and taking the retrained pre-training language model as a composition generation model.

In a second aspect of the embodiments of the present disclosure, there is provided a composition generation model training apparatus, including: the first training module is configured to acquire a first training data set, and train the pre-training language model by using a cross entropy loss function based on the question requirement in the first training data set and the composition corresponding to the question requirement; the construction module is configured to acquire a question data set, generate corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and construct a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model; the second training module is configured to train the question text distance measurement model by using a triplet loss function based on a second training data set; and the third training module is configured to retrain the trained pre-training language model by utilizing the strategy gradient function based on the third training data set, and takes the retrained pre-training language model as a composition generation model.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: because the embodiment of the disclosure trains the pre-training language model by acquiring the first training data set based on the question requirement in the first training data set and the composition corresponding to the question requirement and utilizing the cross entropy loss function; acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model; training the question text distance measurement model by using a triplet loss function based on the second training data set; based on the third training data set, the trained pre-training language model is retrained by using the strategy gradient function, and the retrained pre-training language model is used as a composition generation model, so that the problems of poor composition logic and incoherence of semantics generated by the model in the prior art can be solved by adopting the technical means, and the readability of the generated composition is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic flow chart of a composition generation model training method provided in an embodiment of the disclosure;

FIG. 2 is a flow chart of a method of constructing a second training data set and a third training data set provided by an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a composition generation model training device provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

Fig. 1 is a schematic flow chart of a composition generation model training method provided in an embodiment of the disclosure. The composition generation model training method of fig. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the composition generation model training method includes:

s101, acquiring a first training data set, and training a pre-training language model by using a cross entropy loss function based on a question requirement in the first training data set and a composition corresponding to the question requirement;

s102, acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by using the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model;

s103, training the question distance measurement model by using a triplet loss function based on the second training data set;

s104, based on the third training data set, retraining the trained pre-training language model by using the strategy gradient function, and taking the retrained pre-training language model as a composition generation model.

The first training data set is acquired, the second training data set and the third training data set are constructed by utilizing the trained pre-training language model, and articles generated by using the text generation model do not violate the copyright of other people and accord with relevant legal regulations.

Specifically: the first training data set, the second training data set and the third training data set are provided with a large number of question requirements and compositions corresponding to each question requirement, the question data set is provided with a large number of question requirements, the question requirements are requirements for generating the compositions, and the pre-training language model is used for understanding the question requirements like examination questions before writing the compositions by people; training a pre-training language model by using a cross entropy loss function, wherein in the training, the question requirement is a training sample, and the composition corresponding to the question requirement is a label of the training sample; training the question text distance measurement model by using a triplet loss function, and belonging to unsupervised training; retraining the trained pre-training language model by using a strategy gradient function, and belongs to reinforcement learning training; the Pre-training language model can be an OPT model, and English is fully called Open-Pre-trained-transducer; the question distance measurement model is a BERT model of a double-tower structure, in which one branch is used for processing the question requirement, the other branch is used for processing a composition corresponding to the question requirement, and the double-tower structure is a common structure and is not repeated.

According to the technical scheme provided by the embodiment of the disclosure, a first training data set is obtained, and a pre-training language model is trained by using a cross entropy loss function based on the question requirement in the first training data set and the composition corresponding to the question requirement; acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model; training the question text distance measurement model by using a triplet loss function based on the second training data set; based on the third training data set, the trained pre-training language model is retrained by using the strategy gradient function, and the retrained pre-training language model is used as a composition generation model, so that the problems of poor composition logic and incoherence of semantics generated by the model in the prior art can be solved by adopting the technical means, and the readability of the generated composition is further improved.

Fig. 2 is a flow chart of a method of constructing a second training data set and a third training data set provided in an embodiment of the present disclosure. As shown in fig. 2, includes:

s201, putting the composition corresponding to the question requirement in the question data set to a plurality of online platforms to obtain the reading time of the composition corresponding to the question requirement in the question data set by the user of the plurality of online platforms;

s202, constructing a second training data set from the question requirement in the question data set, the composition with the longest reading time corresponding to the question requirement and the composition with the shortest reading time corresponding to the question requirement;

and S203, constructing a third training data set by the question requirement in the question data set and the composition with the longest viewing duration corresponding to the question requirement.

That is, the second training data set includes: the method comprises the steps of acquiring a question requirement in a question data set, and a composition with the longest reading time corresponding to the question requirement and a composition with the shortest reading time corresponding to the question requirement; a third training data set comprising: question requirements in the question data set and compositions with longest viewing time corresponding to the question requirements.

A triplet loss function L comprising:

；

wherein a is a vector of a question requirement in the question data set, p is a vector of a composition with the longest viewing duration corresponding to the question requirement, n is a vector of a composition with the shortest viewing duration corresponding to the question requirement, d () is a distance measurement function, and margin is a preset vector.

For example, the trained pre-training language model generates corresponding compositions for the question requirement in the question data set three times in total, and then the three compositions corresponding to the question requirement are respectively marked as A, B, C, A is the composition with the longest viewing duration corresponding to the question requirement, p is the vector of A, C is the composition with the shortest viewing duration corresponding to the question requirement, n is the vector of C, and B is useless to discard.

Retraining the trained pre-trained language model using the strategic gradient function based on the third training dataset, comprising: taking the question requirement in the third training data set as the input of the trained composition generating model, taking the composition with the longest reading time corresponding to the question requirement as the output of the trained composition generating model, taking the question distance corresponding to the question requirement as the reward, and retraining the trained pre-training language model by using a strategy gradient function; and inputting the question requirement and the composition with the longest reading time corresponding to the question requirement into a trained question distance measurement model, and outputting the question distance corresponding to the question requirement.

The strategy gradient function is as follows:

；

wherein θ is a model parameter of the trained pre-training language model,for the step size of the pre-trained language model,for the nth question requirement in the third training data set,/for>Requiring the corresponding composition with the longest viewing duration for the nth question, +.>The input for the trained pre-trained language model is +.>The time output is +.>Is a function of the probability of (1),is to->Gradient determination->Representation->The corresponding question distance, there are N question requirements in the third training data set, where N is the maximum value of N.

As a variable. />And the convergence rate of the pre-trained language model. />Is output by the question text distance measurement model, +.>Is output by the pre-trained language model.

For example, if the trained pre-training language model generates corresponding compositions for three times of the question requirements in the question data set, the three compositions corresponding to the question requirements are respectively denoted as A, B, C, a is the composition with the longest viewing duration corresponding to the question requirements, and the question requirements are denoted as D. And (3) inputting the A and the D into a trained question distance measurement model, outputting a question distance corresponding to the question requirement, and recording as E. Retraining the trained pre-trained language model by using a strategy gradient function: d is used as input of the trained composition generation model, A is output of the trained composition generation model, E is used as rewards, and model parameters of the trained pre-training language model are updated.

When training or retraining a pre-trained language model: extracting a question requirement and sentence-level representation and question-level representation of a composition corresponding to the question requirement by utilizing a pre-training language model; through the sentence-level representation and the question-level representation of the question requirement and the composition corresponding to the question requirement, the understanding of the pre-training language model to the question requirement and the composition corresponding to the question requirement is enhanced, so that the training or retraining of the pre-training language model is completed.

A question requirement can be divided into sentences according to commas, periods, question marks and other symbols, the question-level representation of the question requirement is a feature representation of the whole of the question requirement, the sentence-level representation of the question requirement is a feature representation (detail feature) of each sentence in the question requirement, and the sentence-level representation is a feature representation in more detail than the question-level representation. The sentence-level representation and the question-level representation of the composition corresponding to the question requirement are similar to the sentence-level representation and the question-level representation of the question requirement. According to the embodiment of the application, according to sentence-level representation and question-level representation (namely, when the whole feature is concerned, the detail feature is also concerned), the understanding of the pre-training language model on the question requirement and the composition corresponding to the question requirement is enhanced, and the pre-training language model can be used for generating logic and smoothness.

When training the pre-training language model, providing a dynamic temperature super-parameter according to the following method: calculating the gradient of the temperature super parameter adopted by the current batch by utilizing a gradient reverse network according to the calculated loss value of the current batch, wherein the calculated loss value of the current batch is the result of cross entropy loss function calculation in the current batch, and if the current batch is the first batch, the temperature super parameter adopted by the current batch is set according to a user instruction; determining the temperature super parameter adopted by the next batch of the current batch by utilizing a learnable temperature super parameter network according to the temperature super parameter adopted by the current batch and the gradient thereof; wherein training comprises a plurality of batch training, and temperature super-parameters are used for controlling the quality and the diversity of the output of the pre-training language model.

The embodiment of the application provides a dynamic temperature super-parameter, wherein the temperature super-parameter adopted in the first batch is set according to a user instruction or according to the experience of the previous training. Model parameters of the pre-trained language model are updated once per batch. The multi-batch training of the model is a common technology, and is not repeated, and the embodiment of the application dynamically adjusts the temperature super-parameters in the multi-batch training.

The gradient reverse network GRL (Gradient ReversalLayer) is used for reversely calculating the gradient of the temperature super parameter, inputting the loss value obtained by calculation of the current batch into the gradient reverse network, and outputting the gradient of the temperature super parameter adopted by the current batch. The learnable temperature super-parameter network is a neural network, and the temperature super-parameter adopted by the current batch and the gradient thereof (representing the temperature super-parameter adopted by the current batch) are input into the learnable temperature super-parameter network, and the temperature super-parameter adopted by the next batch of the current batch is output. The effect can be achieved by training a learnable temperature super-parameter network through a common model.

Dividing the output word distribution of the pre-training language model by the temperature super-parameter, and generating the composition with higher diversity as the temperature super-parameter value is larger as the real output, wherein the composition quality is reduced, otherwise, the composition quality is higher and the composition diversity is lower.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a composition generation model training device provided in an embodiment of the present disclosure. As shown in fig. 3, the composition generation model training device includes:

a first training module 301 configured to acquire a first training data set, and train the pre-training language model by using a cross entropy loss function based on a question requirement in the first training data set and a composition corresponding to the question requirement;

a building module 302, configured to obtain a question data set, generate corresponding compositions for the question requirements in the question data set for multiple times by using the trained pre-training language model, and build a second training data set and a third training data set according to the question requirements in the question data set and the multiple compositions corresponding to the question requirements generated by the pre-training language model;

a second training module 303 configured to train the topic distance measurement model with a triplet loss function based on a second training dataset;

the third training module 304 is configured to retrain the trained pre-training language model with the policy gradient function based on the third training data set, and take the retrained pre-training language model as the composition generation model.

Optionally, the construction module 302 is further configured to put the composition corresponding to the question requirement in the question data set on the multiple online platforms, so as to obtain the browsing duration of the composition corresponding to the question requirement in the question data set by the user on the multiple online platforms; constructing a second training data set from the question requirement in the question data set, the composition with the longest reading time corresponding to the question requirement and the composition with the shortest reading time corresponding to the question requirement; and constructing the composition with the longest reading duration corresponding to the question requirement in the question data set to obtain a third training data set.

A triplet loss function L comprising:

；

Optionally, the third training module 304 is further configured to use the question requirement in the third training data set as an input of the trained composition generating model, use the composition with the longest viewing duration corresponding to the question requirement as an output of the trained composition generating model, use the question distance corresponding to the question requirement as a reward, and use the strategy gradient function to retrain the trained pre-training language model; and inputting the question requirement and the composition with the longest reading time corresponding to the question requirement into a trained question distance measurement model, and outputting the question distance corresponding to the question requirement.

The strategy gradient function is as follows:

；

wherein θ is a model parameter of the trained pre-training language model,for the step size of the pre-trained language model,for the nth question requirement in the third training data set,/for>Requiring the corresponding composition with the longest viewing duration for the nth question, +.>For pre-trainingThe input of the training language model is +.>The time output is +.>Is a function of the probability of (1),is to->Gradient determination->Representation->The corresponding question distance, there are N question requirements in the third training data set, where N is the maximum value of N.

Optionally, the third training module 304 is further configured or the second training module 303 is further configured to extract the question requirement and the sentence-level representation and the question-level representation of the composition corresponding to the question requirement using the pre-trained language model; through the sentence-level representation and the question-level representation of the question requirement and the composition corresponding to the question requirement, the understanding of the pre-training language model to the question requirement and the composition corresponding to the question requirement is enhanced, so that the training or retraining of the pre-training language model is completed.

Optionally, the second training module 303 is further configured to calculate, according to the calculated loss value of the current batch, a gradient of the temperature super parameter adopted by the current batch by using a gradient reverse network, where the calculated loss value of the current batch is a result of calculating a cross entropy loss function in the current batch, and if the current batch is the first batch, the temperature super parameter adopted by the current batch is set according to a user instruction; determining the temperature super parameter adopted by the next batch of the current batch by utilizing a learnable temperature super parameter network according to the temperature super parameter adopted by the current batch and the gradient thereof; wherein training comprises a plurality of batch training, and temperature super-parameters are used for controlling the quality and the diversity of the output of the pre-training language model.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 4 is a schematic diagram of an electronic device 4 provided by an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.

The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A composition generation model training method, comprising:

acquiring a first training data set, and training a pre-training language model by using a cross entropy loss function based on the question requirement in the first training data set and a composition corresponding to the question requirement;

acquiring a question data set, generating corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and constructing a second training data set and a third training data set according to the question requirements in the question data set and the multiple compositions corresponding to the question requirements generated by the pre-training language model;

training a question distance measurement model by using a triplet loss function based on the second training data set;

based on the third training data set, retraining the trained pre-training language model by utilizing a strategy gradient function, and taking the retrained pre-training language model as a composition generation model;

the method for constructing the second training data set and the third training data set according to the question requirements in the question data set and a plurality of compositions corresponding to the question requirements generated by the pre-training language model comprises the following steps: putting the composition corresponding to the question requirement in the question data set to a plurality of online platforms to obtain the reading time of the user on the plurality of online platforms for the composition corresponding to the question requirement in the question data set; constructing the second training data set by the question requirement in the question data set, the composition with the longest reading time corresponding to the question requirement and the composition with the shortest reading time corresponding to the question requirement; constructing the composition with the longest viewing duration corresponding to the question requirement in the question data set to obtain the third training data set;

based on the third training data set, retraining the trained pre-training language model by using a strategy gradient function, including: taking the question requirement in the third training data set as the input of the trained composition generating model, taking the composition with the longest viewing time corresponding to the question requirement as the output of the trained composition generating model, taking the question distance corresponding to the question requirement as rewards, and retraining the trained pre-training language model by utilizing the strategy gradient function; and inputting the question requirement and the composition with the longest reading time corresponding to the question requirement into the trained question distance measurement model, and outputting the question distance corresponding to the question requirement.

2. The method of claim 1, wherein the triplet loss function comprises:

wherein a is a vector of the question requirement in the question data set, p is a vector of the composition with the longest viewing duration corresponding to the question requirement, n is a vector of the composition with the shortest viewing duration corresponding to the question requirement, d () is a distance measurement function, and margin is a preset vector.

3. The method of claim 1, wherein the policy gradient function is as follows:

wherein θ is a model parameter of the trained pre-training language model,step size for the pre-trained language model, < >>For the nth question requirement in said third training dataset,/>Requiring the corresponding composition with the longest viewing duration for the nth question, +.>The input for the trained pre-trained language model is +.>The time output is +.>Probability of->Is to->Gradient determination->Representation->And corresponding question distances, wherein the third training data set has N question requirements, and N is the maximum value of N.

4. The method according to claim 1, wherein the method further comprises:

upon said training or said retraining of said pre-trained language model:

extracting sentence-level representation and question-level representation of the question requirement and a composition corresponding to the question requirement by using the pre-training language model;

and enhancing the understanding of the pre-training language model to the question requirement and the composition corresponding to the question requirement through the sentence-level representation and the question-level representation of the composition corresponding to the question requirement so as to complete the training or the retraining of the pre-training language model.

5. The method according to claim 1, wherein the method further comprises:

when the training is carried out on the pre-training language model, a dynamic temperature super-parameter is provided according to the following method:

calculating the gradient of the temperature super parameter adopted by the current batch by utilizing a gradient reverse network according to the calculated loss value of the current batch, wherein the calculated loss value of the current batch is the result of cross entropy loss function calculation in the current batch, and if the current batch is the first batch, the temperature super parameter adopted by the current batch is set according to a user instruction;

determining the temperature super parameter adopted by the next batch of the current batch by utilizing a learnable temperature super parameter network according to the temperature super parameter adopted by the current batch and the gradient thereof;

wherein the training comprises a plurality of batch training, the temperature super-parameters being used to control the quality and diversity of the output of the pre-trained language model.

6. A composition generation model training device, comprising:

the first training module is configured to acquire a first training data set, and train the pre-training language model by using a cross entropy loss function based on the question requirement in the first training data set and the composition corresponding to the question requirement;

the construction module is configured to acquire a question data set, generate corresponding compositions for the question requirements in the question data set for multiple times by utilizing the trained pre-training language model, and construct a second training data set and a third training data set according to the question requirements in the question data set and the compositions corresponding to the question requirements generated by the pre-training language model;

a second training module configured to train a topic distance measurement model using a triplet loss function based on the second training dataset;

the third training module is configured to retrain the trained pre-training language model by utilizing a strategy gradient function based on the third training data set, and takes the retrained pre-training language model as a composition generation model;

the construction module is further configured to put the composition corresponding to the question requirement in the question data set into a plurality of online platforms so as to obtain the browsing duration of the composition corresponding to the question requirement in the question data set for the users of the plurality of online platforms; constructing the second training data set by the question requirement in the question data set, the composition with the longest reading time corresponding to the question requirement and the composition with the shortest reading time corresponding to the question requirement; constructing the composition with the longest viewing duration corresponding to the question requirement in the question data set to obtain the third training data set;

the third training module is configured to take a question requirement in the third training data set as input of the trained composition generating model, take a composition with the longest viewing duration corresponding to the question requirement as output of the trained composition generating model, take a question distance corresponding to the question requirement as a reward, and retrain the trained pre-training language model by utilizing the strategy gradient function; and inputting the question requirement and the composition with the longest reading time corresponding to the question requirement into the trained question distance measurement model, and outputting the question distance corresponding to the question requirement.

7. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.