CN113094482A

CN113094482A - Lightweight semantic intelligent service adaptation training evolution method and system

Info

Publication number: CN113094482A
Application number: CN202110334447.2A
Authority: CN
Inventors: 张玉清; 郭智宸; 周长兵
Original assignee: China University of Geosciences Beijing
Current assignee: China University of Geosciences Beijing
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-07-09
Anticipated expiration: 2041-03-29
Also published as: CN113094482B

Abstract

The embodiment of the application discloses a lightweight semantic intelligent service adaptation training evolution method and system, wherein the method comprises the following steps: performing mask language model MLM task training on the Generator model; inputting training text data into a Generator module to generate a pseudo-word; carrying out restoration replacement marking RTR task training on the Restorer module by using a forged text; inputting the forged text as an input, the original text and a forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text; and monitoring and adjusting the recognition result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine-tuning stage are reduced, so that the model can be subjected to fine-tuning and distributed dynamic adaptive training on edge equipment.

Description

Lightweight semantic intelligent service adaptation training evolution method and system

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a lightweight semantic service model training evolution method, system, equipment and readable storage medium.

Background

Semantic services in the field of computer science are mainly provided by means of traditional LSTM (long short term memory network) and a pre-training model based on a Transformer structure at present. The LSTM adds various activation functions in a recurrent neural network structure to improve the performance of the model, and a pre-training model based on a Transformer structure is stacked in a large number by using an encoder and a decoder which are constructed by a self-attention mechanism to realize the semantic comprehension capability of the model. With the birth of the structure of the Transformer language model, semantic services commonly used in the field of computer science are provided based on the model of the Transformer structure. The pre-training of the Model is usually carried out by adopting a Masked Language Model (MLM) method, the pre-trained Model is applied to a specific task, and the requirement of the specific task is met through secondary fine-tuning.

Compared with the traditional RNN model, the Transformer can solve the problems of gradient disappearance and difficulty in parallel computation, and the Multi-Head-orientation mechanism in the Transformer can effectively observe the association between each minimum unit text in the text, rather than only assuming that the current text passage is related to the previous n text passages. Training and application of the language model of the Transformer structure requires the following requirements to be met:

(1) normally training can be performed based on powerful computing resources, and a GPU or a TPU is generally adopted for training.

(2) The model needs to be pre-trained by using a universal corpus and an MLM task, so that the model has a certain semantic understanding function.

(3) Models can only understand a single kind of language, for example, a model trained by using English cannot make Chinese semantic comprehension.

However, both of the two design schemes have defects, the accuracy of the former is poor, parallel computation is difficult to realize, and the latter needs strong computing power to perform pre-training and fine adjustment aiming at downstream tasks. Neither approach is practical for use in marginal computing environments with limited computing power.

Disclosure of Invention

Therefore, the embodiment of the application provides a lightweight semantic intelligent service adaptive training evolution method and system, and a dynamic self-adaptation method is used for model reconstruction through a Replace Token Restore (RTR) pre-training task, so that on one hand, the semantic understanding capability of a pre-training model is enhanced, on the other hand, the calculated amount and parameters of the model in a fine-tuning stage are reduced, and the model can be subjected to fine-tuning and distributed dynamic adaptive training on edge equipment.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

according to a first aspect of embodiments of the present application, there is provided a lightweight semantic service model training evolution method, including:

inputting a universal language material into a Generator model to carry out Mask Language Model (MLM) task training on the Generator model;

inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a pseudo-word in a corresponding position;

acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0;

carrying out restoration replacement marking RTR task training on the Restorer module by using a forged text;

inputting the forged text as an input, the original text and a forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text;

and monitoring and adjusting the recognition result of the Restorer module based on a Dynamic self-adaptation method.

Optionally, the Restorer module adopts a multilayer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask Language Model and a reduction replacement mark Model Replace Token register Model, the mask Language Model is used for restoring a replaced text, and the reduction replacement mark Model Replace Token register Model is used for distinguishing a forged text position; the dual output layers at the bottom of the model are the output layers of the RTR task, and comprise a Word Embedding layer and a Position Embedding Position coding layer.

Optionally, each encoding Layer Encoder Layer in the Restorer register module outputs a result to a corresponding decision module, the decision module performs crossentry function calculation on the currently output result and the predicted target true value to obtain a current loss through calculation, and performs KL divergence calculation on the output result and the predicted target true value to evaluate a distribution difference between the current result and the true value.

Optionally, the monitoring and adjusting the recognition result of the restore module based on the Dynamic self-adaptation method includes:

starting Dynamic self-adaptation, performing fine-tuning training on a downstream task, and providing corresponding light-weight semantic service; when the training round is more than the round N and the score of the C layer is less than the threshold D of the score, the [0, C-1] layer is enough to fit the current training task, the C layer and the following layers of the model are all deleted, and the current layer is directly output to the Project projection layer to realize dynamic self-adaptation.

Optionally, the restore module is further configured to maintain a self-measure table, when a Score of each Encoder Layer is stored in the self-measure table, and an initial value is 0, the calculation of the Score is determined by a Decision module of each Layer, and a formula is as follows:

where Epochs is the number of rounds currently trained.

According to a second aspect of embodiments of the present application, there is provided a lightweight semantic service model training system, the system including:

the Generator training unit is used for inputting the universal linguistic data into a Generator Generator model so as to carry out Mask Language Model (MLM) task training on the Generator model;

the system comprises an input text unit, a Generator module and a display module, wherein the input text unit is used for inputting training text data into the Generator module, and the Generator module is used for randomly replacing each word in each text data with a pseudo-word in a corresponding position;

the data acquisition unit is used for acquiring triple data generated by the Generator module, and the triple data comprises an original text, a forged text and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0;

the Restorer training unit is used for carrying out restoration replacement marking RTR task training on the Restorer Restorer module by using the forged text;

the restoring replacement marking unit is used for inputting the forged text as an input and the original text and the forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text;

and the Dynamic self-adaptation unit is used for monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method.

Optionally, the Restorer module adopts a multilayer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask Language Model and a reduction replacement mark Model Replace Token register Model, the mask Language Model is used for restoring a replaced text, and the reduction replacement mark Model Replace Token register Model is used for distinguishing a forged text position; and the double output layers at the bottom of the Model are output layers of the RTR task, and comprise a Masked Language Model mask Language Model output layer and a Replace Token restore Language Model output layer.

In summary, the embodiment of the application provides a lightweight semantic intelligent service adaptation training evolution method and system, wherein a universal language material is input into a Generator model to perform mask language model MLM task training on the Generator model; inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a pseudo-word in a corresponding position; acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0; carrying out restoration replacement marking RTR task training on the Restorer module by using a forged text; inputting the forged text as an input, the original text and a forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text; and monitoring and adjusting the recognition result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine-tuning stage are reduced, so that the model can be subjected to fine-tuning and distributed dynamic adaptive training on edge equipment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic flow chart of a training evolution method of a lightweight semantic service model provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an overall structure of a model provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of the Generator structure provided in the embodiments of the present application;

FIG. 4 is a schematic diagram of a Restorer structure provided in an embodiment of the present application;

fig. 5 is a block diagram of a lightweight semantic service model training system provided in an embodiment of the present application.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a lightweight semantic service model training evolution method provided in an embodiment of the present application, where the method includes:

step 101: inputting a universal language material into a Generator model to carry out Mask Language Model (MLM) task training on the Generator model;

step 102: inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a pseudo-word in a corresponding position;

step 103: acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0;

step 104: carrying out restoration replacement marking RTR task training on the Restorer module by using a forged text;

step 105: inputting the forged text as an input, the original text and a forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text;

step 106: and monitoring and adjusting the recognition result of the Restorer module based on a Dynamic self-adaptation method.

In a possible implementation manner, the Restorer module adopts a multilayer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connection mask Language Model and a reduction replacement mark Model Replace Token Model, the mask Language Model is used for reducing the replaced text, and the reduction replacement mark Model Replace Token Model is used for judging the position of the forged text; and the double output layers at the bottom of the Model are output layers of the RTR task, and comprise a Masked Language Model mask Language Model output layer and a Replace Token restore Language Model output layer.

In a possible implementation manner, each encoding Layer Encoder Layer in the Restorer restore module outputs a result to a corresponding decision module, the decision module performs crossentry function calculation on the currently output result and the predicted target true value to obtain a current loss through calculation, and performs KL divergence calculation on the output result and the predicted target true value to evaluate a distribution difference between the current result and the true value.

In a possible embodiment, the monitoring and adjusting the recognition result of the restore module based on the Dynamic self-adaptation method includes: starting Dynamic self-adaptation, performing fine-tuning training on a downstream task, and providing corresponding light-weight semantic service; when the training round is more than the round N and the score of the C layer is less than the threshold D of the score, the [0, C-1] layer is enough to fit the current training task, the C layer and the following layers of the model are all deleted, and the current layer is directly output to the Project projection layer to realize dynamic self-adaptation.

In a possible implementation, the restore module is further configured to maintain a self-measure table, when a Score of each Encoder Layer is stored in the self-measure table, an initial value is 0, and the calculation of the Score is determined by a Decision module of each Layer, where equation (1) is as follows:

where Epochs is the number of rounds currently trained.

FIG. 2 is a diagram illustrating an overall system of a lightweight network model provided by an embodiment of the present application; the lightweight network model is mainly divided into a Generator module and a Restorer module, the Generator module is used for generating fake data, the Restorer module is used for identifying and restoring the fake data, the Restorer module has strong semantic understanding capacity through an RTR pre-training task, and the Generator module is only used for assisting the training of the Restorer module. The specific model operation flow is as follows:

step 1: the training text data is sent to a Generator module, and each word in each text is randomly replaced by a reasonable word at the position by the Generator module with a probability of 5% in units of articles (the word after replacement and the word before replacement are the same word with a certain probability).

Step 2: the training data is expanded into a triple (original text, forged position) through the Generator module, wherein the forged position refers to the position of the replaced word in the original text, the replaced position is set to be 1, and the non-replaced position is set to be 0 by using one-hot coding representation.

And step 3: and sending the new training data into a Restorer module, taking the forged text as input, taking the original text and the forged position as a prediction label, enabling the Restorer module to restore the replaced words in the text, and judging which position words in the forged text are forged.

The Generator module provided by the embodiment of the application adopts multilayer Encoder Layer stacking, and the specific structure is shown in FIG. 3: the method is characterized in that a traditional Mask Language Model (MLM) task is used for pre-training, and a double output layer (a mask Language Model output layer and a Replace Token Language Model output layer) at the bottom of the Model is an output layer of the RTR task. The Generator module used in the embodiment of the application has a 3-layer Encoder structure, and the total number of parameters is 320 ten thousand.

It should be noted that, in order to implement distributed training, the Generator module does not perform joint training with the restore module, but performs training alone, text falsification is performed after training is completed, and the falsified text is stored in the flash memory of the device alone.

The Restorer register module provided by the embodiment of the application adopts a multilayer encoding Layer stack, the output Layer uses multi-output connection, a mask Language Model is used for restoring a replaced text, and a Replace mark Model Replace Token register module is used for judging the position of the forged text. The structure of the Restorer module is shown in FIG. 4:

the dual output layers (the Masked Language Model output layer and the Replace Token restore Model output layer) at the bottom of the Model are the output layers of the RTR task. The Restorer model provided by the embodiment of the application adopts a 12-Layer Encoder Layer, and the total parameters are 812 ten thousand.

The Decision device Decision module provided by the embodiment of the present application is described below. In the fine tuning stage, each Encoder Layer in the restore module outputs a result to a corresponding decision module, and the module performs cross entry function calculation on the currently output result and the predicted target true value so as to calculate the current loss, performs KL divergence calculation on the output result and the predicted target true value, and evaluates the distribution difference between the current result and the true value.

The Dynamic self-adaptation method of Dynamic self-adaptation provided by the embodiment of the application is introduced below, and Dynamic adjustment of the model scale and the training progress is realized by dynamically monitoring the model calculation result through the Dynamic self-adaptation, so that the model can dynamically adjust and control the balance between the model scale and the model performance according to different application scenes and training data.

The Restorer module will maintain a self-table (table), when the table stores the Score of each Encoder Layer, the initial value is 0, and the calculation of the Score is determined by the resolution module of each Layer, and the specific formula is shown in the above formula (1).

Two hyper-parameters are set: n: performing turns; d: score threshold. When the training round is more than N and the score of the C layer is less than D, the [0, C-1] layer is considered to be enough to fit the current training task, then the C layer and the subsequent layers of the model are all deleted, and the layer is directly output to the Project projection layer, so that the effect of dynamic self-adaptation is achieved.

In a possible implementation manner, a specific deployment manner of the lightweight network model provided by the embodiment of the present application is as follows:

1. and performing MLM task training on the Generator model by using the universal linguistic data.

2. And deploying the Generator module on the equipment, and performing text counterfeiting on the universal language material by using the Generator module.

3. The Restorer module is RTR task trained using fake text, and Dynamic self-adaptation is not used at this stage.

4. The trained Restorer module is deployed on equipment, Dynamic self-adaptation is started, fine-tuning training is carried out on specific downstream tasks, corresponding lightweight semantic services (such as text classification and natural language reading and understanding) are provided, and the double output layers corresponding to the RTR task can be abandoned at this stage.

It should be noted that, since the process of the Generator performing text falsification and the process of the restore performing RTR task training are not really joint training, the deployment of the model allows a distributed scheme to be adopted, and one Generator can provide data for multiple restore.

According to the embodiment of the application, the pre-training model is split, and only one part of the whole model can be used in the fine-tuning and predicting stages, so that the parameters and the calculated amount of the model are effectively reduced, the model can be subjected to distributed training, and the part of the model (the Generator module) can be subjected to single-case multiplexing, so that the calculated amount of multiple models in simultaneous training is reduced.

The embodiment of the application also provides a new RTR pre-training task, so that the semantic understanding capability of the model in the aspect of discrimination is enhanced, and the model can still have better performance under smaller parameters.

The embodiment of the application also provides a Dynamic self-adjustment method, which realizes the Dynamic adjustment of the model scale and the training progress through the Dynamic monitoring of the model calculation result, so that the model can dynamically adjust and control the balance between the model scale and the model performance according to different application scenes and training data.

In summary, the embodiment of the present application provides a method, a system, a device and a readable storage medium for training and evolving a lightweight semantic service model, where a universal corpus is input into a Generator model to perform mask language model MLM task training on the Generator model; inputting training text data into a Generator module, wherein the Generator module is used for randomly replacing each word in each text data with a pseudo-word in a corresponding position; acquiring triple data generated by the Generator module, wherein the triple data comprises an original text, a forged text and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0; carrying out restoration replacement marking RTR task training on the Restorer module by using a forged text; inputting the forged text as an input, the original text and a forged position as a prediction label into the Restorer module, so that the Restorer module restores the replaced words in the text and identifies the forged words in the forged text; and monitoring and adjusting the recognition result of the Restorer module based on a Dynamic self-adaptation method. On one hand, the semantic understanding capability of the pre-training model is enhanced, and on the other hand, the calculated amount and parameters of the model in the fine-tuning stage are reduced, so that the model can be subjected to fine-tuning and distributed dynamic adaptive training on edge equipment.

Based on the same technical concept, an embodiment of the present application further provides a lightweight semantic service model training system, as shown in fig. 5, the system includes:

a Generator training unit 501, configured to input a universal corpus into a Generator model, so as to perform mask language model MLM task training on the Generator model;

an input text unit 502 for inputting the training text data to a Generator module for randomly replacing each word in each text data with a pseudo-word at a corresponding position;

a data obtaining unit 503, configured to obtain triple data generated by the Generator module, where the triple data includes an original text, a forged text, and a forged position; the forged position is the position of the replaced character in the original text, and is represented by using one-hot coding, the replaced position is set to be 1, and the non-replaced position is set to be 0;

a Restorer training unit 504, configured to perform restore replacement labeling RTR task training on the Restorer restore module using a fake text;

a restore replacement marking unit 505, configured to input the forged text as an input, the original text, and a forged position as a prediction tag into the Restorer module, so that the Restorer module restores the replaced word in the text and identifies the forged word in the forged text;

and the Dynamic self-adaptation unit 506 is used for monitoring and adjusting the identification result of the Restorer module based on a Dynamic self-adaptation method.

Based on the same technical concept, an embodiment of the present application further provides an apparatus, including: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method.

Based on the same technical concept, the embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments.

It is noted that while the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not a requirement or suggestion that the operations must be performed in this particular order or that all of the illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as in embodiments or flowcharts, additional or fewer steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

The units, devices, modules, etc. set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of a plurality of sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above-mentioned embodiments are further described in detail for the purpose of illustrating the invention, and it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A lightweight semantic service model training evolution method, the method comprising:

2. The method of claim 1, wherein the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connected mask Language Model for restoring the replaced text and a restore replacement mark Model Replace Token Model for discriminating the forged text position; and the double output layers at the bottom of the Model are output layers of the RTR task, and comprise a Masked Language Model mask Language Model output layer and a Replace Token restore Language Model output layer.

3. The method of claim 1, wherein each Encoder Layer in the Restorer module outputs a result to a corresponding decision module, the decision module performs crossentry function calculation on a currently output result and a predicted target true value to calculate a current loss, and performs KL divergence calculation on the output result and the predicted target true value to evaluate a distribution between the current result and the true value.

4. The method of claim 1, wherein the monitoring and adjusting the recognition result of the restore module based on a Dynamic self-adaptation method comprises:

5. The method of claim 4, wherein the restore module is further configured to maintain a self-measure table, and when the self-measure table stores therein a Score of each Encoder Layer, the initial value is 0, and the Score is calculated by the Decision module of each Layer, and the formula is as follows:

where Epochs is the number of rounds currently trained.

6. A lightweight semantic service model training system, the system comprising:

7. The system of claim 6, wherein the Restorer module employs a multi-Layer encoding Layer Encoder Layer stack, the output Layer uses a multi-output connected mask Language Model for restoring the replaced text and a restore replacement mark Model Replace Token Model for discriminating the forged text position; and the double output layers at the bottom of the Model are output layers of the RTR task, and comprise a Masked Language Model mask Language Model output layer and a Replace Token restore Language Model output layer.

8. The system of claim 6, wherein each Encoder Layer in the Restorer module outputs a result to a corresponding decisioning module, the decisioning module performs crossEncopy function calculation on a currently output result and a predicted target true value, calculates a current loss, and performs KL divergence calculation on the output result and the predicted target true value to evaluate a distribution between the current result and the true value.

9. An apparatus, characterized in that the apparatus comprises: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor, configured to execute one or more program instructions to perform the method of any of claims 1-5.

10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-5.