CN114911927A

CN114911927A - Training method and system of neural language model and storage medium

Info

Publication number: CN114911927A
Application number: CN202210355946.4A
Authority: CN
Inventors: 李鹏宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-08-16

Abstract

The present application relates to the field of data acquisition and model training technologies, and more particularly, to a method, a system, and a storage medium for training a neural language model. The method comprises the following steps: acquiring a task data set to be trained; performing parameter configuration on a pre-training model corresponding to the task data set to be trained; acquiring a high-value auxiliary data set; fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model; and saving the trained model for application of downstream tasks. The method and the system can reduce the dependence of model training on manpower, are simple and convenient to operate, can process and utilize hundreds or even thousands of data sets, and can use the existing data sets to promote the current task model, thereby promoting the utilization value of the existing data sets and further greatly promoting the efficiency of data acquisition and data mining.

Description

Training method, system and storage medium of neural language model

Technical Field

The present application relates to the field of data acquisition and model training technologies, and more particularly, to a method, a system, and a storage medium for training a neural language model.

Background

The neuro-linguistic model is used for recognition of natural language, but with the continuous expansion of the application of natural language processing technology, one of the problems is faced: as natural language processing technology advances through an application, more and more text data sets are accumulated. This problem can significantly reduce the utilization of the text data set and even cause data loss to occur.

Therefore, in the present day that data resources have become the "petroleum" industry, it is very necessary to manage and utilize good text data sets.

Disclosure of Invention

Based on the technical problems, the invention aims to construct a model training method and a model training system, so that an old data set participates in a new training task, the accumulated data set is fully utilized, and the value of the old data set is mined.

The invention provides a training method of a neural language model in a first aspect, which comprises the following steps:

acquiring a task data set to be trained;

performing parameter configuration on a pre-training model corresponding to the task data set to be trained;

extracting a high-value auxiliary data set;

fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model;

and saving the trained model for application of downstream tasks.

In some embodiments of the present invention, the performing parameter configuration on the pre-training model corresponding to the task data set to be trained includes:

determining a neural language model to which a pre-training model corresponding to the task data set to be trained belongs;

and performing parameter configuration according to the attribute of the neural language model, wherein the parameter configuration at least comprises the initialization parameter configuration of an encoder in the neural language model.

In some embodiments of the invention, said extracting the high value assistance data set comprises:

checking a data set stored in a pre-established database;

determining an auxiliary sample related to a task data set to be trained from the stored data set;

extracting a high-value assistance data set from the assistance sample based on a relevance weight.

In some embodiments of the present invention, the fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model, includes:

fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set serving as a training sample;

training the training sample through a neural language model to which a configured pre-training model belongs;

monitoring the neuro-linguistic model during a training process;

and when the training reaches the preset times, stopping training.

In some embodiments of the invention, said supervising said neuro-linguistic model in the training process comprises:

a label that supervises an encoder output layer output of the neuro-language model;

determining whether the output label is matched with a task corresponding to the task data set to be trained;

if not, the encoder is updated based on the loss function.

In some embodiments of the invention, the saving the trained model for application by a downstream task includes:

storing the trained model according to a preset format, and taking the stored model as a target neural language model;

determining a corresponding target text in a downstream task;

and inputting the target text into the target neural language model to obtain a classification result corresponding to the target text.

In some embodiments of the invention, the predetermined format includes at least a source of the model, a name of the model, and a date of saving.

In a second aspect, the present invention provides a system for training a neural language model, the system comprising:

the acquisition module is used for acquiring a task data set to be trained;

the configuration module is used for carrying out parameter configuration on a pre-training model corresponding to the task data set to be trained;

an extraction module for extracting a high-value auxiliary data set;

the training module is used for fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model;

and the storage module is used for storing the trained model for the application of the downstream task.

A third aspect of the present invention provides an electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer program to implement the steps of:

acquiring a task data set to be trained;

extracting a high-value auxiliary data set;

and saving the trained model for application of downstream tasks.

A fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring a task data set to be trained;

extracting a high-value auxiliary data set;

and saving the trained model for application of downstream tasks.

The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:

according to the training method of the neural language model, a task data set to be trained is obtained, a pre-training model corresponding to the task data set to be trained is subjected to parameter configuration, a high-value auxiliary data set is extracted, the task data set to be trained and the high-value auxiliary data set are fused to obtain a first data set, the first data set is trained through the configured pre-training model, the trained model is stored for downstream tasks to use, the existing data set can be used for improving a current task model, the utilization value of the existing data set is improved, and the data acquisition and data mining efficiency is greatly improved;

the training system of the neural language model provided in the embodiment of the application can reduce the dependence of model training on manpower, is simple and convenient to operate, can process and utilize hundreds or even thousands of data sets, can use the existing data sets to promote the current task model, thereby promoting the utilization value of the existing data sets, and further greatly promoting the efficiency of data acquisition and data mining.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a diagram illustrating the steps of a method for training a neural language model according to an exemplary embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a training supervision process according to an exemplary embodiment of the present application;

FIG. 3 is a diagram illustrating a training system architecture for a neural language model in accordance with an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a schematic diagram of a storage medium provided by an exemplary embodiment of the present application.

Detailed Description

Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present application. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present application. It will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other instances, well-known features of the art have not been described in order to avoid obscuring the present application.

It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Exemplary embodiments according to the present application will now be described in more detail with reference to the accompanying drawings. These exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to only the embodiments set forth herein. The figures are not drawn to scale, wherein certain details may be exaggerated and omitted for clarity. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.

Several examples are given below in conjunction with the description of figures 1-5 to describe exemplary embodiments according to the present application. It should be noted that the following application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.

For the moment, Neural Language Models (NLM) are a class of Language models used to overcome the disaster of dimensionality, which use a distributed representation of words to Model natural Language sequences. Unlike class-based n-gram models, which are able to recognize two similar words without losing the ability to encode each word differently from the others, neuro-language models share one word (and its context) and other similar words. However, as the natural language processing technology is applied to a project, more and more text data sets are accumulated. This problem can significantly reduce the utilization of the text data set and even cause data loss to occur. Therefore, managing and utilizing text data sets is essential today when data resources have become the "oil" artificial intelligence industry.

However, methods proposed by academia, including a multitask learning method, an auxiliary learning method, a field adaptive method, and the like, are mostly constructed based on a single digit data set, and are not practical.

Accordingly, in some exemplary embodiments of the present application, there is provided a method for training a neural language model, as shown in fig. 1, the method including: s1, acquiring a task data set to be trained; s2, performing parameter configuration on a pre-training model corresponding to the task data set to be trained; s3, extracting a high-value auxiliary data set; s4, fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model; and S5, saving the trained model for application of the downstream task. Each step is described in detail below.

Firstly, a data set of a task to be trained is obtained.

In a specific implementation, the data set of the task to be trained may be sent by the end user, for example, if the WEB client user wants to execute a task of a pre-training model, the data set required by the pre-training model is correspondingly sent from the WEB client user, and the data set may be regarded as the current task data set and also the data set of the task to be trained. It is understood that the task may be initial training, or training the model applied downstream with a new set of data.

And secondly, performing parameter configuration on a pre-training model corresponding to the task data set to be trained.

In a specific implementation manner, performing parameter configuration on the pre-training model corresponding to the task data set to be trained includes: determining a neural language model to which a pre-training model corresponding to the task data set to be trained belongs; and performing parameter configuration according to the attribute of the neural language model, wherein the parameter configuration at least comprises the initialization parameter configuration of an encoder in the neural language model. It should be noted here that if the user does not specify the type of the pre-training model, the initialization parameter configuration of the encoder may be given by using a random initialization method. In addition, the main configuration parameters include the number of times of traversing the training samples during training, the encoder learning rate, the correlation weight learning rate, and the like, as shown in table 1.

TABLE 1 Main configuration parameters Table

And thirdly, extracting a high-value auxiliary data set.

In a specific implementation, the extracting the high-value auxiliary data set includes: checking a data set stored in a pre-established database; determining an auxiliary sample related to a task data set to be trained from the stored data set; extracting a high-value assistance data set from the assistance sample based on a relevance weight. Specifically, the user may determine that a certain data set is relevant or not relevant to the target task based on domain-specific knowledge and then adjust the high-value data set list. Of course, the high-value auxiliary data set may also be obtained according to the Rate _ corra in table 1, i.e. the proportion of the inventory data set participating in training, and by combining the correlation weight, where if the proportion is a floating point number, the obtained data size needs to be rounded. In addition, the extracted high-value auxiliary data sets are sorted according to the relevance weight and can be selected manually or automatically by a system.

And fourthly, fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model.

In a specific implementation manner, the fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model, including: fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set serving as a training sample; training the training sample through a neural language model to which a configured pre-training model belongs; monitoring the neuro-linguistic model during a training process; and when the training reaches the preset times, stopping training.

In one specific implementation, the supervising the neural language model in the training process includes: a label that supervises an encoder output layer output of the neuro-language model; determining whether the output label is matched with a task corresponding to the task data set to be trained; if not, the encoder is updated based on the loss function. In the process of supervision, as shown in fig. 2, classification results displayed by Net0 through NetK, i.e., recognition results of text in the data set, i.e., tags, are observed. Referring to the label schematic table of table 2, corresponding labels are given according to the text in the data set. Specifically, each line in the data file is a json string of sample data, for example, in the text classification task test data, the information of one sample is "hello", and the corresponding json string marks (tokens) of "hello" and "good", respectively, and gives a corresponding tag (label) as "call", that is, the encoder gives different representations to the corresponding text, including but not limited to part-of-speech tagging and intention classification.

TABLE 2 schematic list of labels

As a convertible embodiment, the fused first dataset may also be trained on the basis of the high-value assistance dataset and the task dataset to be trained, respectively. The task corresponding to the task data set to be trained can be used as a main task, and the high-value auxiliary data set is used as an auxiliary data set used by one corresponding auxiliary task after the main task is split. Thus, as shown in fig. 2, Net0 to NetK can also be regarded as one auxiliary task, each auxiliary task corresponds to one text representation result, and the number of data sets for participating in training is fine-tuned according to the results in the training process, so that the learning is further fine-tuned from part to the whole training.

And fifthly, storing the trained model for application of downstream tasks.

In a specific implementation, the saving the trained model for application by a downstream task includes: storing the trained model according to a preset format, and taking the stored model as a target neural language model; determining a corresponding target text in a downstream task; and inputting the target text into the target neural language model to obtain a classification result corresponding to the target text. The preset format at least includes a source of the model, a name and a storage date of the model, and specifically refers to table 3.

TABLE 3 model preservation trust information Table

According to the training method of the neural language model, the task data set to be trained is obtained firstly, then the pre-training model corresponding to the task data set to be trained is subjected to parameter configuration, then the high-value auxiliary data set is extracted and fused, the task data set to be trained and the high-value auxiliary data set are obtained to form a first data set, the first data set is trained through the configured pre-training model, the trained model is stored for downstream task application, the existing data set can be used for improving the current task model, the utilization value of the existing data set is improved, and the efficiency of data acquisition and data mining is greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

In some exemplary embodiments of the present application, there is also provided a training system of a neural language model, as shown in fig. 3, the system including: the acquisition module is used for acquiring a task data set to be trained; the configuration module is used for carrying out parameter configuration on a pre-training model corresponding to the task data set to be trained; an extraction module for extracting a high-value auxiliary data set; the training module is used for fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model; and the storage module is used for storing the trained model for the application of the downstream task.

Preferably, as shown in fig. 3, the training system of the neural language model further includes a distributed file system, and the file system is used for storing the trained model data, because the pre-trained model data is large in size, the system uses the distributed file system for storage, and shares storage pressure, and the distributed file system is also used for storing the downstream task model. Preferably, the training system of the neural language model further comprises a distributed computing framework, a deep learning framework and a structured database, the distributed computing framework undertakes computing tasks of data and processing, statistics and other links, the deep learning framework is used for supporting loading of relevant data and the like when the training module conducts training, and the structured database is used for storing description information of the data set and supporting a user to retrieve and analyze the pre-training model so as to support the extraction module to extract the high-value auxiliary data set. As a changeable implementation mode, the system training module also comprises a fine tuning module which is used for optimizing model parameters based on current task data and all stock data sets, the fine tuning module comprises data set loading and preprocessing when the functions of the fine tuning module are realized, and after the data set loading and preprocessing are finished, the training module carries out model training. It can be understood that all data sets are preprocessed, for example, text processing into token sequences, class label conversion, and the like, all depend on the support of the distributed computing framework in the system, and are not described herein again.

In a specific embodiment, the file system further includes a front-end interface and an API, when the obtaining module obtains the data set of the task to be trained, the user may input the data set of the task to be trained on the front-end interface, for example, the user inputs the data set of the task to be trained on the front-end interface of the training system of the neural language model, wants to execute a task of a pre-training model, and transmits a data set required by the pre-training model, where the data set may be regarded as a current task data set and also as the data set of the task to be trained. It is understood that the task may be initial training, or training the model applied downstream with a new set of data.

The training system of the neural language model can collect a large amount of model data, and can be input by a user in advance, for example, according to a model data field description table shown in table 4. As shown in table 4, int represents integer, string represents character string, ID is model number, etc., and other fields are correspondingly given in the introduction, although optionally, task types good for the model, such as information of detailed description fields of "text generation", "text classification", "dialog", etc., may also be added.

Table 4 model data field description table

It should be noted that for the downstream application, the following network address https:// hugging face. co/bert-base-chip/tree/main may be referred to, and the downstream user will use a popular manner to load and use the trained and well-saved model in our embodiment. At present, the industrial industry does not provide a model training system which is simple and convenient to operate and can process and utilize hundreds or even thousands of data sets, the training system of the neural language model provided by the application can reduce the dependence of model training on manpower, is simple and convenient to operate, can process and utilize hundreds or even thousands of data sets, enables old data sets to participate in the latest training task, and further greatly improves the efficiency of data acquisition and data mining.

It is further emphasized that the system provided in the embodiments of the present application may be based on artificial intelligence techniques for obtaining and processing relevant data. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Reference is now made to fig. 4, which is a schematic diagram illustrating a computer device provided in some embodiments of the present application. As shown in fig. 4, the computer device 2 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the training method of the neural language model provided in any of the foregoing embodiments when executing the computer program.

The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 202 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the training method of the neural language model disclosed in any embodiment of the present application may be applied to the processor 200, or implemented by the processor 200.

The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.

Referring to fig. 5, the computer readable storage medium shown in fig. 5 is an optical disc 30, and a computer program (i.e., a program product) is stored on the optical disc 30, and when the computer program is executed by a processor, the computer program performs the method for training the neural language model provided in any of the foregoing embodiments.

In addition, examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the quantum key distribution channel allocation method in the spatial division multiplexing optical network provided by the embodiment of the present application have the same inventive concept, and have the same beneficial effects as the method adopted, run, or implemented by the application program stored in the computer-readable storage medium.

The present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for training a neural language model provided in any of the foregoing embodiments includes: acquiring a task data set to be trained; performing parameter configuration on a pre-training model corresponding to the task data set to be trained; extracting a high-value auxiliary data set; fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model; and saving the trained model for application of downstream tasks.

It should be noted that: the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application. In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification, and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except that at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the creation apparatus of a virtual machine according to embodiments of the present application. The present application may also be embodied as an apparatus or device program for carrying out a part or all of the methods described herein. A program implementing the present application may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a neural language model, the method comprising:

acquiring a task data set to be trained;

extracting a high-value auxiliary data set;

and saving the trained model for application of downstream tasks.

2. The method for training the neuro-linguistic model according to claim 1, wherein the performing parameter configuration on the pre-training model corresponding to the task data set to be trained comprises:

3. The method for training a neuro-linguistic model according to claim 2, wherein the extracting the high-value auxiliary data set comprises:

checking a data set stored in a pre-established database;

4. The method for training a neuro-linguistic model according to claim 3, wherein the fusing the task data set to be trained and the high-value auxiliary data set to obtain a first data set, and training the first data set through a configured pre-training model comprises:

monitoring the neuro-linguistic model during a training process;

and when the training reaches the preset times, stopping training.

5. The method for training the neuro-linguistic model according to claim 4, wherein the supervising the neuro-linguistic model in the training process comprises:

if not, the encoder is updated based on the loss function.

6. The method for training the neural language model according to claim 4, wherein the step of saving the trained model for application by a downstream task comprises:

determining a corresponding target text in a downstream task;

7. The method for training a neuro-linguistic model according to claim 6, wherein the predetermined format includes at least a source of the model, a name of the model, and a date of saving.

8. A system for training a neuro-linguistic model, the system comprising:

the acquisition module is used for acquiring a task data set to be trained;

an extraction module for extracting a high-value auxiliary data set;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method according to any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.