CN113094482B

CN113094482B - Lightweight semantic intelligent service adaptation training evolution method and system

Info

Publication number: CN113094482B
Application number: CN202110334447.2A
Authority: CN
Inventors: 张玉清; 郭智宸; 周长兵
Original assignee: China University of Geosciences Beijing
Current assignee: China University of Geosciences Beijing
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2023-10-17
Anticipated expiration: 2041-03-29
Also published as: CN113094482A

Abstract

The embodiment of the application discloses a lightweight semantic intelligent service adaptation training evolution method and a system, wherein the method comprises the following steps: performing mask language model MLM task training on the Generator model; inputting training text data into a Generator module to generate a counterfeit word; restoring the replacement mark RTR task training to the Restorer module by using the forged text; inputting the forged text serving as input, an original text and a forged position serving as a prediction label into the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text; and monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine adjustment stage are reduced, so that the model can be subjected to fine adjustment and distributed dynamic adaptation training on edge equipment.

Description

Lightweight semantic intelligent service adaptation training evolution method and system

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a lightweight semantic service model training evolution method, a system, equipment and a readable storage medium.

Background

Semantic services in the current computer science field are mainly provided by means of traditional LSTM (long short term memory network) and a pre-training model based on a transducer structure. LSTM adds various activation functions to the recurrent neural network structure to improve model performance, and pre-trained models based on the Transformer structure use encoders and decoders built by self-attion mechanism to stack in large numbers to achieve semantic understanding capability of the model. With the advent of the architecture of the transducer language model, semantic services commonly used in the field of computer science were provided based on the model of the transducer architecture. Pre-training of such models is typically performed by a Masked Language Model (MLM) method, and the pre-trained model is applied to a specific task, and the requirements of the specific task are met through secondary fine-tuning.

Compared with the traditional RNN model, the transducer can solve the problems of gradient disappearance and difficult parallel calculation, and the correlation between each minimum unit text in the text can be effectively observed through a Multi-Head-attribute mechanism in the transducer, and the correlation between the current text paragraph and the previous n text paragraphs is not only assumed. Training and application of a language model of a transducer structure is required to meet several requirements:

(1) Training can be performed normally based on powerful computing resources, and training is generally performed by using a GPU or TPU.

(2) The model needs to be pre-trained by using a general corpus and an MLM task, so that the model has a certain semantic understanding function.

(3) The model can only understand a single kind of language, for example, the model trained by English cannot be provided with Chinese semantic understanding capability.

Both designs have drawbacks, however, the former is not accurate and parallel computing is difficult to achieve, and the latter requires powerful computing power to pretrain and fine tune for downstream tasks. Neither approach is practical for use in an edge computing environment where computing power is limited.

Disclosure of Invention

Therefore, the embodiment of the application provides a lightweight semantic intelligent service adaptation training evolution method and system, which are used for carrying out model reconstruction by using a dynamic self-adaptation method through Replace Token Restore (RTR) pre-training tasks, so that on one hand, the semantic understanding capability of a pre-training model is enhanced, and on the other hand, the calculated amount and parameters of a fine-tuning stage model are reduced, and the model can be subjected to fine-tuning and distributed dynamic adaptation training on edge equipment.

In order to achieve the above object, the embodiment of the present application provides the following technical solutions:

according to a first aspect of an embodiment of the present application, there is provided a lightweight semantic service model training evolution method, the method including:

inputting a generic corpus into a Generator model to perform Mask Language Model (MLM) task training on the Generator model;

inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a fake word at a corresponding position;

acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding;

restoring the replacement mark RTR task training to the Restorer module by using the forged text;

inputting the forged text serving as input, an original text and a forged position serving as a prediction label into the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text;

and monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method.

Optionally, the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask language model Masked Language Model and a restore replacement mark model Replace Token Restorer Model, the mask language model Masked Language Model is used to restore the replaced text, and the restore replacement mark model Replace Token Restorer Model is used to distinguish the location of the forged text; the double output layer at the bottom of the model is an output layer of the RTR task, and the double output layer comprises a Word Embedding layer and a Position Embedding position coding layer.

Optionally, each encoding Layer in the Restorer module outputs a result to a corresponding decision module, the decision module performs cross Entropy function calculation on the currently output result and the predicted target actual value to obtain a current loss, and performs KL divergence calculation on the output result and the predicted target actual value to evaluate a distribution gap between the current result and the actual value.

Optionally, the Dynamic self-adaptation method based on Dynamic self-adaptation monitors and adjusts the identification result of the Restorer module, including:

starting Dynamic self-adjustment, performing fine-tuning training on a downstream task, and providing corresponding lightweight semantic service; when the training round is greater than the round N and the score of the C layer is smaller than the threshold value D of the score, the [0, C-1] layer is enough to fit the current training task, the C layer and the later layers of the model are all deleted, and the current layer is directly output to the Project projection layer, so that dynamic self-adaptation is realized.

Optionally, the Restorer module is further configured to maintain a self-measurement table, and when a Score of each Encoder Layer is stored in the self-measurement table, an initial value of the Score is 0, and calculation of the Score is determined by the Decision module of the Decision maker of each Layer, where a formula is as follows:

where Epochs is the number of rounds of current training.

According to a second aspect of an embodiment of the present application, there is provided a lightweight semantic service model training system, the system comprising:

the Generator training unit is used for inputting the universal corpus into a Generator model so as to perform mask language model MLM task training on the Generator model;

an input text unit for inputting training text data into a Generator module for randomly replacing each word in each text data with a counterfeit word at a corresponding location;

the data acquisition unit is used for acquiring the triplet data generated by the Generator module, wherein the triplet data comprises an original text, a fake text and a fake position; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding;

the Restorer training unit is used for restoring the replacement mark RTR task training of the Restorer module by using the forged text;

the restoring and replacing marking unit is used for taking the forged text as input, the original text and the forged position as prediction labels, inputting the input to the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text;

and the Dynamic self-adapting unit is used for monitoring and adjusting the identification result of the Restorer module based on the Dynamic self-adapting method.

Optionally, the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask language model Masked Language Model and a restore replacement mark model Replace Token Restorer Model, the mask language model Masked Language Model is used to restore the replaced text, and the restore replacement mark model Replace Token Restorer Model is used to distinguish the location of the forged text; the dual output layer at the bottom of the model is the output layer of the RTR task, and the dual output layer comprises a Masked Language Model mask language model output layer and a Replace Token Restorer Model replacement reduction language model output layer.

In summary, the embodiment of the application provides a lightweight semantic intelligent service adaptation training evolution method and system, which inputs a general corpus into a Generator model to perform mask language model MLM task training on the Generator model; inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a fake word at a corresponding position; acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding; restoring the replacement mark RTR task training to the Restorer module by using the forged text; inputting the forged text serving as input, an original text and a forged position serving as a prediction label into the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text; and monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine adjustment stage are reduced, so that the model can be subjected to fine adjustment and distributed dynamic adaptation training on edge equipment.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the application, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present application, should fall within the scope of the application.

FIG. 1 is a schematic flow chart of a training evolution method of a lightweight semantic service model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of the overall structure of a model according to an embodiment of the present application;

FIG. 3 is a schematic view of a Generator structure according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a Restorer structure according to an embodiment of the present application;

fig. 5 is a block diagram of a lightweight semantic service model training system according to an embodiment of the present application.

Detailed Description

Other advantages and advantages of the present application will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 shows a lightweight semantic service model training evolution method provided by an embodiment of the present application, where the method includes:

step 101: inputting a generic corpus into a Generator model to perform Mask Language Model (MLM) task training on the Generator model;

step 102: inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a fake word at a corresponding position;

step 103: acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding;

step 104: restoring the replacement mark RTR task training to the Restorer module by using the forged text;

step 105: inputting the forged text serving as input, an original text and a forged position serving as a prediction label into the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text;

step 106: and monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method.

In one possible implementation, the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask language model Masked Language Model and a restore replace mark model Replace Token Restorer Model, the mask language model Masked Language Model is used to restore the replaced text, and the restore replace mark model Replace Token Restorer Model is used to distinguish the location of the forged text; the dual output layer at the bottom of the model is the output layer of the RTR task, and the dual output layer comprises a Masked Language Model mask language model output layer and a Replace Token Restorer Model replacement reduction language model output layer.

In one possible implementation manner, each encoding Layer in the Restorer module outputs a result to a corresponding decision module, the decision module performs cross en tropy function calculation on the currently output result and the predicted target true value to obtain a current loss, and performs KL divergence calculation on the output result and the predicted target true value to evaluate a distribution gap between the current result and the true value.

In one possible implementation manner, the Dynamic self-adaptation method based on Dynamic self-adaptation monitors and adjusts the identification result of the Restorer module, including: starting Dynamic self-adjustment, performing fine-tuning training on a downstream task, and providing corresponding lightweight semantic service; when the training round is greater than the round N and the score of the C layer is smaller than the threshold value D of the score, the [0, C-1] layer is enough to fit the current training task, the C layer and the later layers of the model are all deleted, and the current layer is directly output to the Project projection layer, so that dynamic self-adaptation is realized.

In one possible implementation manner, the Restorer module is further configured to maintain a self-measurement table, where when a Score of each Encoder Layer is stored in the self-measurement table, an initial value of the Score is 0, and the calculation of the Score is determined by the Decision maker precision module of each Layer, formula (1) is as follows:

where Epochs is the number of rounds of current training.

FIG. 2 shows an overall system diagram of a lightweight network model provided by an embodiment of the present application; the structure of the lightweight network model is mainly divided into a Generator module and a Restorer module, wherein the Generator module is used for generating fake data, the Restorer module is used for identifying and restoring the fake data, the Restorer module has strong semantic understanding capability through RTR pre-training tasks, and the Generator module is only used for assisting the Restorer module in training. The specific model operation flow is as follows:

step 1: the training text data is fed into the Generator module, by which each word in each text is randomly replaced with a reasonable word at that position with a probability of 5% (the word after and before replacement is the same word with a certain probability), in units of articles.

Step 2: the training data is expanded into triples (original text, forged position) by a Generator module, wherein the forged position refers to the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding.

Step 3: the new training data is sent to the Restorer module, the forged text is used as input, the original text and the forged position are used as prediction labels, so that the Restorer module restores the replaced words in the text, and the judgment of which positions in the forged text are forged is carried out.

The Generator module provided by the embodiment of the application adopts a multi-Layer Encoder Layer stack, and the specific structure is shown in figure 3: the traditional mask language model (Masked Language Model, MLM) task is used for pre-training, and the double output layers (Masked Language Model mask language model output layer and Replace Token Restorer Model replacement reduction language model output layer) at the bottom of the model are the output layers of the RTR task. The Generator module used in the embodiment of the application has a total of 3 layers of Encoder structures, and the total parameters are 320 ten thousand.

It should be noted that, in order to implement the distributed training, the Generator module does not perform the combined training with the Restorer module, but performs the training alone, performs the text forging after the training is completed, and the forged text is stored in the flash memory of the device alone.

The Restorer module provided by the embodiment of the application adopts a multi-Layer encoding Layer Encoder Layer stack, and the output Layer uses multi-output connection, wherein a mask language model Masked Language Model is used for restoring the replaced text, and a restoring and replacing mark model Replace Token Restorer Model is used for judging the forged text position. The Restorer module structure is as shown in fig. 4:

the dual output layer at the bottom of the model (Masked Language Model mask language model output layer and Replace Token Restorer Model replace the reduced language model output layer) is the output layer of the RTR task. The Restorer model provided by the embodiment of the application adopts a 12-Layer Encoder Layer, and the total parameters are 812 tens of thousands.

The Decision module provided by the embodiment of the application is described below. In the fine tuning stage, each Encoder Layer in the Restorer module outputs a result to a corresponding precision module, the module performs cross entropy function calculation on the currently output result and the predicted target true value, so as to calculate the current loss, performs KL divergence calculation on the output result and the predicted target true value, and evaluates the distribution gap between the current result and the true value.

The Dynamic self-adaptation method of the Dynamic self-adaptation model provided by the embodiment of the application is introduced below, and the Dynamic adjustment of the model scale and the training progress is realized through the Dynamic monitoring of the calculation result of the Dynamic self-adaptation model, so that the model can be controlled by dynamically adjusting the balance between the model scale and the model performance according to different application scenes and training data.

The resurver module maintains a table, and when the table stores a Score of each Encoder Layer, the initial value is 0, and the calculation of the Score is determined by the precision module of each Layer, and the specific formula is shown in the formula (1).

Two super parameters are set: n: a round; d: score threshold. When the training round is greater than N and the score of the C layer is less than D, the [0, C-1] layer can be considered to be enough to fit the current training task, and then the C layer and the later layers of the model are all deleted, so that the C layer is directly output to the Project projection layer, and the effect of dynamic self-adaptation is achieved.

In one possible implementation manner, the lightweight network model provided by the embodiment of the application is specifically deployed in the following manner:

1. mask language model MLM task training is performed on the Generator model by using the general corpus.

2. The Generator module is deployed on the device, and the Generator module is used for performing text forging on the general corpus.

3. RTR task training is performed on the Restorer module using counterfeit text, which does not use Dynamic self-adjustment.

4. The trained Restorer module is deployed on equipment, dynamic self-adjustment is started, fine-tuning training is conducted aiming at specific downstream tasks, corresponding lightweight semantic services (such as text classification, natural language reading understanding and the like) are provided, and a double-output layer corresponding to the RTR tasks can be abandoned at the stage.

It should be noted that, since the process of forging text by the Generator and the process of training RTR tasks by the Restorer are not truly joint training, deployment of the model allows a distributed scheme, and one Generator can provide data for multiple restorers.

According to the embodiment of the application, the pre-training model is split, only a part of the whole model can be used in the fine tuning and prediction stage, so that the parameters and the calculated amount of the model are effectively reduced, the model can be used for distributed training, a part of the model (Generator module) can be used for single-case multiplexing, and the calculated amount of the multi-model simultaneous training is reduced.

The embodiment of the application also provides a new RTR pre-training task, enhances the semantic understanding capability of the model in the aspect of discrimination, and can effectively ensure that the model still has better performance under smaller parameter quantity.

The embodiment of the application also provides a Dynamic self-adjustment method, and Dynamic adjustment of the model scale and the training progress is realized by Dynamic monitoring of the model calculation result, so that the model can be dynamically adjusted and controlled according to different application scenes and the balance between the model scale and the model performance of training data.

In summary, the embodiments of the present application provide a lightweight semantic service model training evolution method, system, device, and readable storage medium, which inputs a generic corpus into a Generator model to perform mask language model MLM task training on the Generator model; inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a fake word at a corresponding position; acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding; restoring the replacement mark RTR task training to the Restorer module by using the forged text; inputting the forged text serving as input, an original text and a forged position serving as a prediction label into the Restorer module so that the Restorer module restores the replaced words in the text and recognizes the forged words in the forged text; and monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine adjustment stage are reduced, so that the model can be subjected to fine adjustment and distributed dynamic adaptation training on edge equipment.

Based on the same technical concept, the embodiment of the application also provides a lightweight semantic service model training system, as shown in fig. 5, wherein the system comprises:

a Generator training unit 501, configured to input a generic corpus into a Generator model, so as to perform mask language model MLM task training on the Generator model;

an input text unit 502 for inputting training text data into a Generator module for randomly replacing each word in each text data with a counterfeit word at a corresponding location;

a data obtaining unit 503, configured to obtain triplet data generated by the Generator module, where the triplet data includes an original text, a forged text, and a forged location; wherein the forged position is the position of the replaced word in the original text, the replaced position is set to 1, and the non-replaced position is set to 0 using one-hot encoding;

a Restorer training unit 504, configured to perform restoring and replacing the marked RTR task training on the Restorer module using the forged text;

a restoration replacement marking unit 505, configured to input the forged text as an input, the original text and the location of the forged text as a prediction tag, to the Restorer module, so that the Restorer module restores the replaced word in the text and identifies the forged word in the forged text;

and the Dynamic self-adapting unit 506 is configured to monitor and adjust the identification result of the Restorer module based on a Dynamic self-adapting method.

Based on the same technical concept, the embodiment of the application also provides equipment, which comprises: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the method.

Based on the same technical concept, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method.

In the present specification, each embodiment of the method is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments. For relevance, see the description of the method embodiments.

It should be noted that although the operations of the method of the present application are depicted in the drawings in a particular order, this does not require or imply that the operations be performed in that particular order or that all illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Although the application provides method operational steps as an example or a flowchart, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element.

The units, devices or modules etc. set forth in the above embodiments may be implemented in particular by a computer chip or entity or by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when implementing the present application, the functions of each module may be implemented in the same or multiple pieces of software and/or hardware, or a module implementing the same function may be implemented by multiple sub-modules or a combination of sub-units. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. A lightweight semantic service model training evolution method, the method comprising:

monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method;

the Dynamic self-adaptation method based on Dynamic self-adaptation monitors and adjusts the identification result of the Restorer module, and comprises the following steps:

starting Dynamic self-adjustment, performing fine-tuning training on a downstream task, and providing corresponding lightweight semantic service; when the training round is greater than the round N and the score of the C layer is smaller than the threshold value D of the score, the [0, C-1] layer is enough to fit the current training task, all the C layer and the later layers of the model are deleted, and the current layer is directly output to the Project projection layer, so that dynamic self-adaptation is realized;

the Restorer module adopts a multi-Layer encoding Layer Encoder Layer stack, an output Layer uses a multi-output connection mask language model Masked Language Model and a restoration replacement mark model Replace Token Restorer Model, the mask language model Masked Language Model is used for restoring the replaced text, and the restoration replacement mark model Replace Token Restorer Model is used for judging the forged text position; the dual output layer at the bottom of the model is the output layer of the RTR task, and the dual output layer comprises a Masked Language Model mask language model output layer and a Replace Token Restorer Model replacement reduction language model output layer.

2. The method of claim 1, wherein each encoding Layer in the Restorer module outputs a result to a corresponding decision module, the decision module performs a cross entropy function calculation on the currently output result and the predicted target actual value to obtain a current loss, and performs a KL divergence calculation on the output result and the predicted target actual value to evaluate a distribution gap between the current result and the actual value.

3. The method of claim 2, wherein the Restorer module is further configured to maintain a self-measurement table, and when a Score of each Encoder Layer is stored in the self-measurement table, an initial value of the Score is 0, and a calculation of the Score is determined by a Decision maker Decision module of each Layer, a formula is as follows:

where Epochs is the number of rounds of current training.

4. A lightweight semantic service model training system employing the method of any one of claims 1 to 3, the system comprising:

5. The system of claim 4, wherein the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask language model Masked Language Model and a restore replace mark model Replace Token Restorer Model, the mask language model Masked Language Model is used to restore replaced text, and the restore replace mark model Replace Token Restorer Model is used to distinguish counterfeit text locations; the dual output layer at the bottom of the model is the output layer of the RTR task, and the dual output layer comprises a Masked Language Model mask language model output layer and a Replace Token Restorer Model replacement reduction language model output layer.

6. The system of claim 4, wherein each encoding Layer in the Restorer module outputs a result to a corresponding decision module, the decision module performs a cross entropy function calculation on the currently output result and the predicted target actual value to calculate a current loss, and performs a KL divergence calculation on the output result and the predicted target actual value to evaluate a distribution gap between the current result and the actual value.

7. A computer readable storage medium, characterized in that the computer storage medium contains one or more program instructions for performing the method according to any of claims 1-3.