CN114611625A

CN114611625A - Language model training method, language model training device, language model data processing method, language model data processing device, language model data processing equipment, language model data processing medium and language model data processing product

Info

Publication number: CN114611625A
Application number: CN202210290842.XA
Authority: CN
Inventors: 朱泽润
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-10

Abstract

The present disclosure provides a method, an apparatus, a device, a medium and a product for language model training and data processing, which relate to the field of artificial intelligence, in particular to the fields of natural language processing, deep learning and knowledge maps. The specific implementation scheme is as follows: acquiring labeled first training data and unlabeled second training data; the first training data and the second training data are text data; performing data expansion processing on the second training data to obtain expansion data corresponding to the second training data; calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data in the language model to be trained; calculating to obtain a second loss value by taking the extension data as comparison data of the corresponding second training data in the language model; and if the sum of the first loss value and the second loss value meets the loss condition, determining that the training of the language model is finished, and obtaining the target model parameters of the language model. The technical scheme of the language model improves the model precision of the language model.

Description

Language model training method, language model training device, language model data processing method, language model data processing device, language model data processing equipment, language model data processing medium and language model data processing product

Technical Field

The present disclosure relates to the fields of natural language processing, deep learning, and knowledge maps in the field of artificial intelligence, and in particular, to a method, an apparatus, a device, a medium, and a product for language model training and data processing.

Background

Natural Language Processing (NLP) models are a subject of studying the Language problem of human interaction with computers, and are primarily aimed at making computers understand Natural Language. In general, NLP models can be employed to convert natural language into feature vectors or feature matrices that can be understood by a computer. In practical applications, the NLP model needs to be obtained through training, and training data participating in the training generally needs to be provided with a label, where the label is a record of real content corresponding to the training data, so as to train the language model through the data with the label. However, the NLP model obtained by the label training method is not very accurate, resulting in low accuracy of the model.

Disclosure of Invention

The present disclosure provides a language model training, data processing method, apparatus, device, medium, and product for natural language model precision enhancement.

According to a first aspect of the present disclosure, there is provided a language model training method, including:

acquiring labeled first training data and unlabeled second training data; the first training data and the second training data are text data;

performing data expansion processing on the second training data to obtain expanded data corresponding to the second training data;

calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data in the language model to be trained;

calculating to obtain a second loss value by taking the extension data as comparison data of corresponding second training data in the language model;

and if the sum of the first loss value and the second loss value is determined to meet the loss condition, determining that the training of the language model is finished, and obtaining target model parameters of the language model.

According to a second aspect of the present disclosure, there is provided a data processing method including:

receiving text data to be processed sent by user equipment; the data type of the text data to be processed is the same as that of the first training data or the second training data; the text data to be processed is any one of text data, image data, voice data and video data;

inputting the text data to be processed into a language model corresponding to a target model parameter to obtain a language processing result of the language model on the text data to be processed; the target model parameters are obtained based on the training of the language model training method in the first aspect;

and sending the language processing result to the user equipment, wherein the language processing result is displayed by the user equipment.

According to a third aspect of the present disclosure, there is provided a language model training apparatus comprising:

the data acquisition unit is used for acquiring labeled first training data and unlabeled second training data; the first training data and the second training data are text data;

the data expansion unit is used for performing data expansion processing on the second training data to obtain expansion data corresponding to the second training data;

the first processing unit is used for calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data on a language model to be trained;

the second processing unit is used for calculating to obtain a second loss value by taking the extension data as comparison data of corresponding second training data in the language model;

and the target determining unit is used for determining that the training of the language model is finished and obtaining target model parameters of the language model if the sum of the first loss value and the second loss value is determined to meet a loss condition.

According to a fourth aspect of the present disclosure, there is provided a data processing apparatus comprising:

the data receiving unit is used for receiving text data to be processed sent by user equipment; the data type of the text data to be processed is the same as that of the first training data or the second training data;

the result acquisition unit is used for inputting the text data to be processed into a language model corresponding to the target model parameters and acquiring a language processing result of the language model on the text data to be processed; the target model parameters are obtained based on the training of the language model training method of the first aspect;

and the result sending unit is used for sending the language processing result to the user equipment, and the language processing result is displayed by the user equipment.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first or second aspects.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the first or second aspects.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect or the second aspect.

According to the technology disclosed by the invention, the problem that the NLP model obtained by a label training mode is not very accurate, so that the precision of the model is low is solved, and marked first training data and unmarked second training data are adopted. The language model to be trained is trained through the two training data simultaneously, so that the language model can be trained more accurately, and the language model corresponding to the obtained target model parameter can generate a processing result with higher accuracy for both the labeled first training data and the unlabeled second training data. And the training precision of the language model is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a diagram of an application scenario of a language model training method and a data processing method provided according to a first embodiment of the present disclosure;

FIG. 2 is a flowchart of a language model training method according to a second embodiment of the present disclosure;

FIG. 3 is a flowchart of a language model training method according to a third embodiment of the present disclosure;

FIG. 4 is a flowchart of a language model training method according to a fourth embodiment of the present disclosure;

FIG. 5 is a flowchart of a language model training method according to a fifth embodiment of the present disclosure;

FIG. 6 is a flowchart of a method for training a language model according to a sixth embodiment of the present disclosure;

fig. 7 is a flowchart of a data processing method according to a seventh embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a language model training apparatus according to an eighth embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a data processing apparatus according to a ninth embodiment of the present disclosure;

FIG. 10 is a block diagram of an electronic device for implementing a language model training method or a data processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The technical scheme disclosed by the invention can be applied to a natural language processing scene, and the marked first training data and the unmarked second training data are adopted to participate in the training of the language model to be trained so as to obtain accurate model parameters and improve the model training precision.

In the related art, when a natural language model is trained, training data with labels is usually adopted to train the language model to be trained, so as to obtain an accurate model training result. And calculating the loss of the language model through a processing result of the natural language model, and if the obtained loss value is less than a loss threshold value, indicating that the training reaches a target. At this time, it can be determined that the language model training is finished, and corresponding model parameters are obtained. However, in practical applications, when a natural language model obtained by training is used, it is found that the difference between the processing result of the language model and the actual result is large, and the processing accuracy of the language model is not high.

In order to solve the above problems, the inventors have found that when a language model is trained by using labeled training data, the training data used is usually labeled manually. The actual manual labeling is inefficient, and therefore, a long time is consumed, resulting in a long labeling period. In addition, the labeled data is only partial data of the original data, which results in a large difference between the labeled data and the original data, and the model training using only the labeled data may result in problems of data loss, poor generalization capability and the like of the model, so that the processing result of the language model is inaccurate and the precision is poor in the using process.

In order to solve the above problem, in the present disclosure, labeled first training data and unlabeled second training data are employed. And simultaneously training the language model to be trained through the two training data. Specifically, data extension processing may be performed on the second training data to obtain extension data corresponding to the second training data. Furthermore, the first loss value is calculated by using the label of the first training data as the comparison data of the first training data in the language model, and the second loss value is calculated by using the extension data as the comparison data of the second training data in the language model. Meanwhile, the first loss value and the second loss value are used as a judgment basis of the loss condition, so that the language model can be trained more accurately, and the language model corresponding to the obtained target model parameter can generate a processing result with higher accuracy on the marked first training data and the unmarked second training data. And the training precision of the language model is improved.

The invention provides a method, a device, equipment, a medium and a product for training a language model and processing data, which can be applied to the fields of natural language processing, deep learning and knowledge maps in the field of artificial intelligence so as to achieve the aim of improving the model precision of the language model.

The technical solution of the present disclosure will be described in detail with reference to the accompanying drawings.

As shown in fig. 1, an application scenario diagram of a language model training method and a data processing method provided for a first embodiment of the present disclosure may include an electronic device 1, for example, a cloud server in fig. 1. The electronic device 1 may be configured with the language model training apparatus of the present disclosure, and the model training apparatus may execute a training method of a language model, and may train the language model by the language model training apparatus to obtain target model parameters of the language model.

Referring to fig. 1, a model training apparatus in the present disclosure may specifically include initial data, perform data cleaning to generate training data, and divide the training data to obtain first training data and second training data. The second training data can be subjected to data expansion to obtain expanded data. And inputting the first training data, the second training data and the extension data into the language model for prediction to obtain a prediction result. The prediction results can be used for loss calculation and judgment of loss conditions. And further acquiring target model parameters of the language model when the judgment result is that the loss condition is met, updating the language model when the judgment result is that the loss condition is not met, and returning to the prediction step of the language model to continue executing.

The electronic device 1 may also establish a wired or wireless communication connection with at least one first user equipment 2 (one shown in the figure). The electronic device 1 may perform the initial data acquisition to the first user device 2. The initial data may be text data to be processed, which the first user equipment sends to the electronic device, and may include, for example, a voice signal, search text, text to be translated, and the like. The electronic device 1 acquires labeled first training data and unlabeled second training data by collecting initial data.

In some embodiments, referring to fig. 1, the electronic device 1 may further establish a wired or wireless communication connection with the labeling electronic device 3, after the labeling user labels the first training data through the labeling electronic device 3, the labeled first training data is sent to the electronic device 1, and the electronic device 1 acquires the labeled first training data to train the language model by using the labeled first training data and the unlabeled second training data.

In practical applications, referring to fig. 1, the electronic device 1 may also have a wired or wireless communication connection with the second user equipment 4. The second user device 4 may send the pending text data to the electronic device 1. The electronic device 1 may process the text data to be processed based on the language model corresponding to the target model parameter, and obtain a corresponding language processing result. The language processing result is then sent to the second user equipment 4. The second user equipment 4 may present the received language processing result.

The electronic device 1, the first user device 2, the annotating electronic device 3 and the second user device 4 in fig. 1 are only schematic, and the specific type and number of the devices are not limited too much in this application.

As shown in fig. 2, a flowchart of a language model training method provided for a second embodiment of the present disclosure, an execution subject of the method may be a language model training apparatus, and the language model training apparatus may be located in an electronic device, and the method may include the following steps:

201: and acquiring labeled first training data and unlabeled second training data.

The first training data and the second training data are text data. The first training data may be labeled training data and may include first text data and a label for the first text data. The second training data may be unlabeled training data and may include second text data.

The text data may be obtained by image, voice signal, or video conversion. For example, text recognition is performed on the collected voice signals, and corresponding text data is obtained.

202: and performing data expansion processing on the second training data to obtain expanded data corresponding to the second training data.

The extension data may be obtained by performing data extension processing on the second training data. The extension data may be text data. The extension data may be a word or sentence having the same literal meaning as the second training data.

203: and calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data in the language model to be trained.

The first training data is input to the language model as input data, and a first prediction result of the language model on the first training data can be obtained. The data type of the first prediction result may be the same as the label of the first training data. The first prediction result may be directly compared to the label of the first training data to obtain a first loss value. In practical applications, the first training data may include a plurality. In order to train the language model accurately, the amount of the first training data may be larger than a certain amount of data.

The first prediction result of each first training data may be loss-calculated with a corresponding label to obtain loss values corresponding to the plurality of first training data, and the loss values corresponding to the plurality of first training data may be weighted to obtain the first loss values.

204: and calculating to obtain a second loss value by taking the extension data as comparison data of the corresponding second training data in the language model.

The second training data does not have a label, so that when the second training data is directly input into the language model, the loss value of the second training data cannot be directly calculated. The second loss value may be obtained by performing a difference calculation on the extended prediction result of the extended data and the second prediction result of the second training data using the language model.

205: and if the sum of the first loss value and the second loss value meets the loss condition, determining that the training of the language model is finished, and obtaining the target model parameters of the language model.

The sum of the first loss value and the second loss value may be obtained by performing a weighted calculation on the first loss value and the second loss value. The sum of the first loss value and the second loss value may be represented using a target loss value. The sum of the first loss value and the second loss value satisfying the loss condition may include that the target loss value satisfies the loss condition. Specifically, the target loss value may be smaller than the loss threshold.

In the process of training the language model, each time the language model is trained, the language model has model parameters, and the language model corresponding to the model parameters can be used for predicting the results of the first training data, the second training data and the extension data. When it is determined that the target loss value satisfies the loss condition, it may be determined that the training of the language model is finished, and the model parameters of the language model during the training may be the target model parameters.

Optionally, after the target model parameters are obtained, data verification may be performed on the language model corresponding to the target model parameters, and after the data verification is successful, online application may be performed on the language model.

The data verification of the language model corresponding to the target model parameter may specifically refer to performing predictive verification on the language model corresponding to the target model parameter by using verification data, calculating according to a verification index by using an obtained verification prediction result, and after obtaining index data of the index, determining that the verification is successful if the index data meets a verification threshold, otherwise, determining that the verification is failed. After the verification fails, the first training data and the second training data may be updated and the language model may be retrained.

In this embodiment, labeled first training data and unlabeled second training data are used. And simultaneously training the language model to be trained through the two training data. Specifically, data extension processing may be performed on the second training data to obtain extension data corresponding to the second training data. Furthermore, the first loss value is calculated by using the label of the first training data as the comparison data of the first training data in the language model, and the second loss value is calculated by using the extension data as the comparison data of the second training data in the language model. Meanwhile, the first loss value and the second loss value are used as a judgment basis of the loss condition, so that the language model can be trained more accurately, and the language model corresponding to the obtained target model parameter can generate a processing result with higher accuracy for the labeled first training data and the unlabeled second training data. And the training precision of the language model is improved.

To give the reader a more profound understanding of the principles underlying the present disclosure, the embodiment shown in fig. 2 will now be further refined in conjunction with fig. 3 below.

As an example, as shown in fig. 3, the step 202: performing data expansion processing on the second training data to obtain expansion data corresponding to the second training data, which may include the following steps:

301: and performing word segmentation on the second training data to obtain at least one initial word corresponding to the second training data.

302: and performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data.

Optionally, performing word segmentation on the second training data to obtain at least one initial word corresponding to the second training data may include performing word segmentation on the second training data by using a word segmentation algorithm to obtain at least one initial word corresponding to the second training data. The word segmentation algorithm may include at least one of a word segmentation algorithm based on a dictionary, a word segmentation algorithm based on a meta-grammar model, and a word segmentation algorithm based on a word, and in the embodiment of the present disclosure, the specific type of the word segmentation algorithm is not limited too much, and the specific steps of word segmentation performed by the word segmentation algorithm may refer to descriptions of related technologies, which are not described herein again.

The word expansion strategy may refer to a strategy of searching for similar words using word meaning similarity. Any one of the initial terms may determine a corresponding expansion term, and the expansion data may include expansion terms to which at least one of the initial terms respectively corresponds.

In the embodiment of the disclosure, when the second training data may be expanded, the second training data may be subjected to word segmentation to obtain at least one initial word corresponding to the second training data, so that the second training data is expanded by using the at least one initial word to obtain expanded data corresponding to the second training data. Through the word segmentation mode, the second training data can be effectively and accurately expanded.

In one possible design, the word expansion strategy may include: at least one of a traffic extension policy and a knowledge extension policy.

As an alternative embodiment, the word expansion policy includes: and (5) service expansion strategy. The above step 302: performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data, wherein the method comprises the following steps:

performing word expansion processing on at least one initial word by using a service expansion strategy to obtain a first expansion word;

and determining the first expansion word as expansion data of the second training data.

In the embodiment of the disclosure, a word expansion processing may be performed on at least one initial word by using a service expansion policy. The method and the device realize the expansion of the service direction of at least one initial word and can obtain accurate expansion data related to the service.

As yet another alternative, the word expansion policy includes: and (5) a knowledge extension strategy. The above step 302: performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data, wherein the method comprises the following steps:

performing word expansion processing on at least one initial word by using a knowledge expansion strategy to obtain a second expansion word;

and determining the second expansion word as expansion data of the second training data.

In the embodiment of the disclosure, a word expansion strategy can be utilized to perform word expansion processing on at least one initial word. The knowledge direction expansion of at least one initial word is realized, and accurate expansion data related to knowledge can be obtained.

As yet another alternative, the word expansion policy includes: a service extension policy and a knowledge extension policy. The above step 302: performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data, wherein the method comprises the following steps:

and determining the first expansion word and the second expansion word as expansion data of the second training data.

The business expansion strategy is a business word set according to a business background or business knowledge to which the language model belongs, and the business word can be used for expanding the initial word from the business direction.

A knowledge extension strategy may refer to an extension strategy that utilizes a knowledge graph. The knowledge-graph may be a graph that performs a word expansion process on at least one initial word. The knowledge graph can be an existing graph, and the intellectual extension of words can be realized.

In the embodiment of the disclosure, a service expansion strategy and a knowledge expansion strategy can be respectively utilized to perform word expansion processing on at least one initial word to obtain a first expansion word and a second expansion word. And then determining the extension data which is the second training data in the first extension word and the second extension word. By acquiring the first expansion words and the second expansion words, the expansion of two aspects of business and knowledge can be performed on at least one initial word, so that more comprehensive word expansion is realized, and more accurate and more comprehensive expansion data can be obtained.

As an optional implementation manner, performing a word expansion process on at least one initial word by using a service expansion policy to obtain a first expanded word, including:

and determining at least one candidate word corresponding to the business expansion strategy.

And for any initial word, determining a first word matched with the initial word from the at least one candidate word so as to determine that the first word respectively corresponding to the at least one initial word is a first expansion word.

The at least one candidate word may be a word related to an application scenario of the language model or related to application knowledge, and may be obtained through a business summary.

In the embodiment of the disclosure, at least one candidate word in the service expansion strategy is adopted, word expansion can be performed on any initial word to obtain a first word corresponding to each initial word, and expansion of the service direction of at least one initial word is realized, so that the first expanded word is more closely combined with the relevant scene of the service, and the word expansion effectiveness is improved.

In one possible design, for any initial word, determining a first word from the at least one candidate word that matches the initial word includes:

dividing at least one candidate word into a candidate entity word and a candidate non-entity word;

if any initial word is determined to be the entity word, determining a first word matched with the initial word from the candidate entity words;

and if any initial word is determined to be a non-entity word, determining a first word matched with the initial word from the candidate non-entity words.

The entity word may refer to a word having a specific meaning, for example, company a belongs to one entity word. Non-entity words may refer to words that do not have an entity meaning, e.g., words that i want to know, ask, etc. belong to non-entity words.

Alternatively, dividing at least one candidate word into a candidate entity word and a candidate non-entity word may mean that at least one candidate word is divided into a candidate entity word and a candidate non-entity word according to a division rule of the entity word and the non-entity word.

In order to improve the recognition efficiency of the entity words, the candidate entity words can be classified according to the types of brand words, commodity words, attribute words and the like, candidate words corresponding to all word types are obtained, word matching is carried out according to all word lists, and first words corresponding to initial words are obtained.

In one possible design, tasks such as word segmentation and entity word recognition can be implemented by using a network structure training model combining ERNIE (Enhanced Language with information Entities Enhanced Language Representation), LSTM (Long-Short Term Memory rnn.), and CRF (Conditional Random Fields).

In the embodiment of the disclosure, at least one candidate word may be divided into a candidate entity word and a candidate non-entity word, so that when any initial word is an entity word, a first word is determined from the candidate entity word, and when any initial word is a non-entity word, the first word is determined from the candidate non-entity word. By means of division according to the entity attributes of the words, the words can be matched more quickly and accurately, and the obtaining efficiency and accuracy of the first words are improved.

As another alternative, performing a word expansion process on at least one initial word by using a knowledge expansion strategy to obtain a second expanded word may include:

determining a knowledge graph matched with the data content of the second training data; the knowledge graph comprises: nodes formed by the knowledge keywords and edges formed by the incidence relation among the knowledge keywords;

and performing word expansion processing on at least one initial word by using the knowledge graph to obtain a second expanded word.

The knowledge-graph may be a network structure formed by nodes and edges. A node may be a knowledge keyword, and an edge may refer to an association between two knowledge keywords to which it is connected. The knowledge-graph may be generated in advance. When the words are expanded, the words can be directly read and used. The knowledge content of the initial words can be expanded through the knowledge map, information of more dimensions of the words is enhanced, the depth meaning of the initial words is mined, and the accuracy and comprehensiveness of the second expansion words can be enhanced. The meaning of the second expansion word may be the same as the second training data. When the second training data is a word, the second expanded word may be a word. When the second training data is a phrase, the second expansion word may be a phrase. For convenience of understanding, for example, the relationship between "company a" and "product B" may be known as "development" through the knowledge graph, and the alias of "company a" is "short a sentence", in this case, if the second training data is "how the product B of company a is, after the knowledge graph expansion processing, the obtained second expansion word may be" how the product B developed by company a "or" how the product B developed by short a sentence is ".

In embodiments of the present disclosure, the knowledge extension policy may include a knowledge graph. A knowledge-graph may be determined that matches the data content of the second training data. And performing word expansion processing on at least one initial word by using nodes formed by the knowledge keywords in the knowledge graph and edges formed by the incidence relation between the knowledge keywords to obtain a second expanded word. The obtained second expansion words are derived from the knowledge map, and the expansion of the knowledge direction of the initial words can be realized, so that the second expansion words and the knowledge are combined more closely, and the word expansion effectiveness is improved.

As shown in fig. 4, a flowchart of a language model training method provided for a fourth embodiment of the present disclosure, an execution subject of the method may be a language model training apparatus, and the language model training apparatus may be located in an electronic device, and the method may include the following steps:

401: labeled first training data and unlabeled second training data are obtained.

402: and performing data expansion processing on the second training data to obtain expanded data corresponding to the second training data.

403: and respectively inputting the first training data, the second training data and the extension data into the language model to obtain a first prediction result corresponding to the first training data, a second prediction result corresponding to the second training data and an extension prediction result corresponding to the extension data.

The language model may include model processing tasks that may include, for example, classification tasks, recognition tasks, question-and-answer tasks, translation tasks, and the like. The setting of the model processing task may be determined according to a specific application scenario of the language model, and the processing task of the language model is not specifically limited in this disclosure.

The language model may perform task processing on the first training data, the second training data, and the extension data, respectively, to obtain a first prediction result of the first training data, a second prediction result of the second training data, and an extension prediction result of the extension data.

404: and performing loss calculation based on the label of the first training data and the first prediction result to obtain a first loss value.

Alternatively, an error between the label of the first training data and the first prediction result may be calculated, an error value obtained, and the first loss value calculated from the error value of the first training data. In practical applications, when the first training data includes a plurality of first training data, the error values corresponding to the plurality of first training data may be added to obtain the first loss value.

405: and performing loss calculation based on a second prediction result of the second training data and an extension prediction result corresponding to the extension data to obtain a second loss value.

Alternatively, a difference in outcome between the second prediction and the extended prediction may be calculated, from which a second loss value is determined. In practical application, when the second training data includes a plurality of second loss values, the second loss values corresponding to the plurality of second training data may be added to obtain the second loss value.

406: and if the sum of the first loss value and the second loss value meets the loss condition, determining that the training of the language model is finished, and obtaining the target model parameters of the language model.

It should be noted that, some steps in this embodiment are the same as those in the foregoing embodiment, and for simplicity of description, detailed description is omitted here.

In the embodiment of the disclosure, the first training data, the second training data and the extension data can be respectively subjected to the first prediction result, the second prediction result and the extension prediction result of the language model, so that the effective prediction of the language model on the data is realized, the first loss value can be obtained by performing loss calculation based on the label of the first training data and the first prediction result, and the accurate calculation of the first loss value is realized. Then, a second loss value may be obtained by performing a loss calculation based on the second prediction result and the extension prediction result corresponding to the extension data. The prediction result of the first training data is compared with the corresponding label to obtain a first loss value, the prediction result of the second training data is compared with the prediction result of the extension data to obtain a second loss value, accurate prediction of the first training data and the second training data by the language model can be achieved, and model precision of the language model is improved.

As an example, the step 404: performing loss calculation based on a second prediction result of the second training data and an extension prediction result corresponding to the extension data to obtain a second loss value, which may include the following steps:

and performing loss calculation on the result difference between the second prediction result and the extended prediction result by adopting a relative loss function to obtain a second loss value.

Alternatively, the relative loss function may include a KL Divergence (full English: Kullback-Leibler Divergence, full Chinese: relative entropy) function. In practical applications, the relative loss function may also include a cross-entropy function, etc.

The second prediction result and the extended prediction result may be input to the KL divergence function, and a second loss value may be calculated.

In the embodiment of the present disclosure, a relative loss function may be adopted to perform loss calculation on the result difference between the second prediction result and the extended prediction result, so as to obtain a second loss value, and improve the data accuracy of the second loss value.

As shown in fig. 5, a flowchart of a language model training method provided in a fifth embodiment of the present disclosure, an execution subject of the method may be a language model training apparatus, and the language model training apparatus may be located in an electronic device, and the method may include the following steps:

501: labeled first training data and unlabeled second training data are obtained.

502: and performing data expansion processing on the second training data to obtain expanded data corresponding to the second training data.

503: and calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data in the language model to be trained.

504: and calculating to obtain a second loss value by taking the extension data as comparison data of the corresponding second training data in the language model.

505: and if the sum of the first loss value and the second loss value meets the loss condition, determining that the training of the language model is finished, and obtaining the target model parameters of the language model.

506: and if the sum of the first loss value and the second loss value is determined not to meet the loss condition, updating the language model, and returning to the step 503 to continue the execution.

In the language model training process, the language model may have model parameters. Inputting the first training data, the second training data, and the extension data into the language model may refer to inputting the first training data, the second training data, and the extension data into the language model corresponding to the current model parameter, and obtaining a first prediction result, a second prediction result, and an extension prediction result through predictive model prediction.

After the language model predicts the training data to obtain a prediction result, the determination of the loss condition may be performed. If the sum of the first loss value and the second loss value is determined to meet the loss condition, it can be determined that the training of the language model is finished, and the current model parameter of the language model is obtained and is the target model parameter.

If the sum of the first loss value and the second loss value is determined not to meet the loss condition, the language model can be updated, namely the model parameters of the language model are updated, the label of the first training data is returned to be used as the comparison data of the first training data in the language model to be trained, and the first loss value is calculated to be continuously executed.

In the embodiment of the disclosure, after the first loss value and the second loss value are obtained, it may be determined that the sum of the first loss value and the second loss value meets the loss condition, and it is determined that the training of the language model is finished, and the target model parameter of the language model is obtained. And when the sum of the first loss value and the second loss value does not meet the loss condition, updating the language model, returning to the step of taking the label of the first training data as the comparison data of the first training data in the language model to be trained, and calculating to obtain the first loss value to continue executing. Through the judgment of the loss condition, the language model can be continuously updated and iterated until the language model meeting the loss condition is obtained, and the model training precision and accuracy of the language model are improved.

In some embodiments, updating the language model includes:

determining at least one computing node corresponding to the current language model;

selecting at least one computing node based on a node selection strategy to obtain a target computing node;

adjusting parameters of the target computing node based on the parameter adjusting strategy to obtain target parameters corresponding to the target computing node;

and determining the target computing node provided with the target parameters as the updated language model.

The language model may include at least one compute node. The language model may be, for example, a neural network model, and the compute nodes may be neurons in the neural network model.

The target computing node may be selected from at least one computing node of the language model based on a node selection policy. The target parameter may be a parameter adjusted for the selected target computing node based on a parameter adjustment policy.

The updated language model can be continuously used for the prediction processing of the first training data, the second training data and the extension data to obtain a first prediction result of the first training data, a second prediction result of the second training data and an extension prediction result of the extension data, so as to execute the subsequent steps of loss calculation and judgment of loss conditions, realize the iteration of the language model until the loss conditions are met, and determine that the training of the language model is finished.

Optionally, the node selection policy may include Dropout (drop algorithm). The updating of the language model can be made to be independent of the fixed node connection relation through node selection, and the computing network of the language model is made to be more robust.

In the embodiment of the disclosure, at least one computing node corresponding to the current language model is obtained, and the at least one computing node can select a target computing node under the selection of a node selection strategy, so that the model of the language model is updated, and each node of the model has a certain stable output for different input data. Meanwhile, parameters of the target computing node can be adjusted by using a parameter adjusting strategy, so that the parameters of the target computing node can be adjusted quickly and accurately, the target computing node corresponding to the obtained target parameters is an updated language model, the language model can be updated accurately, and the model training precision is improved.

In one possible design, selecting at least one computing node based on a node selection policy to obtain a target computing node includes:

and taking the preset target probability value as the selection probability of the computing node selected as the target computing node, and randomly selecting at least one computing node according to the selection probability to obtain the target computing node.

In the embodiment of the present disclosure, a preset target probability value may be used as a selection probability that a computing node is selected as a target node, so as to randomly select at least one computing node according to the selection probability to obtain the target computing node. The probabilistic selection of at least one computing node can be realized by setting the target probability value, the selectivity of each computing node is controlled, and the selection probability and efficiency of the computing nodes are improved.

In another possible design, adjusting a parameter of a target computing node based on a parameter adjustment policy to obtain a target parameter corresponding to the target computing node includes:

and adjusting parameters of the target computing node by taking a network searching parameter adjusting algorithm as a parameter adjusting strategy to obtain target parameters corresponding to the target computing node.

Optionally, the parameter adjustment strategy may include a gradient descent algorithm in addition to the network search parameter adjustment algorithm. In practical application, the gradient descent algorithm and the network search parameter adjusting algorithm can be used for parameter adjustment. The network search parameter adjusting algorithm may include setting parameter adjusting parameters such as learning rate, batch _ size, warming (heating and tuning algorithm), optimizer, and the like, and performing automatic grid search to obtain accurate target parameters. The model can be accurately adjusted through a network searching parameter adjusting algorithm, and the effectiveness and accuracy of parameter adjustment of the language model are improved.

In the embodiment of the disclosure, a parameter adjustment strategy can be used according to a network search parameter adjustment algorithm to adjust parameters of a target computing node, so as to obtain target parameters of the target computing node. The selection efficiency and accuracy of the target computing node are improved.

To give the reader a deeper understanding of the principles underlying the present disclosure, the embodiment illustrated in fig. 5 will now be further refined in conjunction with fig. 6 below.

In order to obtain accurate first training data and second training data. As shown in fig. 6, in the above embodiment, the obtaining of the labeled first training data and the unlabeled second training data may include the following steps:

601: and cleaning the original data based on a data cleaning strategy to obtain training data.

The raw data may be text data. Through the cleaning of the original data, the training data can be ensured to meet the use requirements of the data, the invalid training data is avoided, and the use requirements of the training data are ensured.

The raw data may be obtained by document reading, voice signal conversion, image conversion, or video conversion.

602: the training data is divided into first data and second data.

Alternatively, the training data may be randomly divided into the first data and the second data.

Of course, in some embodiments, the data amount of the first data and the second data in the training data may be predefined to complete the acquisition of the first data and the second data according to the predetermined data amount.

603: and performing labeling processing on the first data to obtain labeled first training data.

604: and determining the second data as unlabeled second training data.

In the embodiment of the disclosure, the original data can be cleaned based on a data cleaning strategy to obtain training data. Dividing the training data into first data and second data to perform labeling processing on the first data to obtain labeled first training data, and determining that the second data is unlabeled second training data. By cleaning the original data, the data validity of the training data can be guaranteed. In addition, through the processing of the original data, the training data can be updated in time, and the validity and the reliability of the data are ensured.

As an embodiment, performing labeling processing on the first data to obtain labeled first training data includes:

and sending the first data to the labeling electronic equipment.

And receiving a label of the first data sent by the labeling electronic equipment to obtain the first data with the label as labeled first training data.

Alternatively, the annotation electronic device can be an electronic device that participates in annotation of data. After receiving the first data, the annotation electronic device may display the first data for the user participating in the annotation, and detect a tag set for the first data by the user participating in the annotation. Then, the labeling electronic device may send the label of the first data to the electronic device to provide the labeled first training data for the electronic device.

In the embodiment of the disclosure, the first data can be marked by sending the first data to the marking electronic device, so that the marking timeliness of the data is improved.

In certain embodiments, the data cleansing policy comprises: at least one of a spelling conversion policy, a symbol removal policy, a format consistency policy, and a data removal policy;

the spelling conversion strategy comprises the steps that characters with different spelling modes in the training data are spelled according to the same mode;

the symbol clearing strategy comprises clearing target symbols in the training data;

the format consistency strategy comprises setting the character format in the training data according to a target format;

the data clearing strategy comprises deleting invalid data in the training data.

The data cleaning strategy can be used for eliminating data noise in the original data to obtain training data corresponding to the original data. Of course, in some embodiments, if any original data cannot be data cleaned according to the data cleaning policy, the original data may be directly determined to be invalid data by using the data cleaning policy, and the invalid data may be directly deleted.

Spelling different spellings of characters in the same way can refer to spelling normalization of characters in a string of characters that are represented in upper case and lower case. The characters spelled in the upper case mode can be converted into the lower case mode or the characters spelled in the lower case mode can be converted into the upper case mode.

The target symbol may be a special character such as a horizontal line, an underline, etc., and the target character is a symbol having no meaning to the text expression.

The character format can refer to character formats such as time, number, space, half full angle and the like which are frequently appeared in the text, and the character formats can be uniformly set according to a target format.

In the embodiment of the disclosure, the data cleaning policy may include at least one of a spelling conversion policy, a symbol cleaning policy, a format consistency policy, and a data cleaning policy, so that the data is effectively cleaned by using the at least one data cleaning policy, validity and stability of the data are ensured, and further, model training accuracy is improved.

As shown in fig. 7, a flowchart of a data processing method according to a seventh embodiment of the present disclosure is provided, and the method may be configured as a data processing apparatus, and the data processing apparatus may be located in an electronic device. The data processing method may comprise the following steps:

701: receiving text data to be processed sent by user equipment; the data type of the text data to be processed is the same as the data type of the first training data or the second training data.

The text data to be processed may be text data. The text data to be processed can be obtained by file reading, image recognition, voice recognition and video processing.

702: inputting the text data to be processed into a language model corresponding to the target model parameter to obtain a language processing result of the language model on the text data to be processed; the target model parameters are obtained by training based on the language model training method provided by the above embodiment.

703: and sending the language processing result to the user equipment, wherein the language processing result is displayed by the user equipment.

Transmitting the language processing result to the user device may include converting the language processing result into a voice signal and transmitting the voice signal to the user device. The speech signal may be played by the user device.

In the embodiment of the disclosure, when text data to be processed sent by user equipment is received, the text data to be processed can be input into the language model corresponding to the target model parameter. And obtaining a language processing result of the text data to be processed by the language model, and feeding back the language processing result to the user equipment. The language processing result is presented by the user equipment. By displaying the language processing result, the text data to be processed sent by the user equipment can be effectively processed in the language, and an accurate language processing result can be obtained.

In practical application, the technical scheme of the disclosure can be applied to various application fields. Such as search domains, smart question and answer domains, service domains, financial domains, emotion recognition domains, voice chatting, and the like. The model processing tasks performed by the language model in the solution of the present disclosure may include, for example, question answering in natural language, translation in natural language, content recommendation, question answering, and the like. The specific tasks performed by the language model in this disclosure are not overly limited. During the training process, the first training data and the second training data may be derived from raw data. The raw data may be collected from text data provided by the user. The acquisition can be carried out in real time or off-line. The original data can be derived from signals such as images, videos and voices, and corresponding text data can be extracted through algorithms such as image recognition, content extraction and voice recognition.

For the convenience of understanding, the question answering field is taken as an example, and the technical scheme of the disclosure is introduced in detail. In the field of question and answer, the text data to be processed and the training data may be text data. The language model may perform a question-and-answer task on the input data. Assuming that the input text data to be processed or training data is a "question", an "answer" can be obtained by processing of the language model, and the question is matched with the answer. For example, assuming that the text data input to the language model is "how good today's weather", the language model recognizes the question and maps the question to obtain an "today's weather is good, sun is present, and it is not raining" answer. The "answer" may then be converted from text to speech for output to the user. The input text data can be mapped through the processing of the language model, and on the basis of the technical scheme, the obtained language model has higher precision and the obtained result is more accurate.

As shown in fig. 8, a schematic structural diagram of a language model training apparatus provided for an eighth embodiment of the present disclosure may be configured with a method for training a language model, and the language model training apparatus may be located in an electronic device. The language model training device may comprise the following elements:

data acquisition unit 801: the method comprises the steps of acquiring labeled first training data and unlabeled second training data; the first training data and the second training data are text data;

the data expansion unit 802: the data expansion processing module is used for performing data expansion processing on the second training data to obtain expansion data corresponding to the second training data;

the first processing unit 803: the method comprises the steps of calculating to obtain a first loss value by taking a label of first training data as comparison data of the first training data in a language model to be trained;

the second processing unit 804: the method is used for calculating to obtain a second loss value by taking the extension data as comparison data of corresponding second training data in the language model;

the target determination unit 805: and if the sum of the first loss value and the second loss value meets the loss condition, determining that the training of the language model is finished, and obtaining the target model parameters of the language model.

As an embodiment, a data expansion unit includes:

the first expansion module is used for segmenting the second training data to obtain at least one initial word corresponding to the second training data;

and the second expansion module is used for performing word expansion processing on at least one initial word by using the word expansion strategy to obtain expansion data corresponding to the second training data.

In some embodiments, the word expansion policy includes: and (5) service expansion strategy. A second expansion module, which may include:

the first expansion submodule is used for performing word expansion processing on at least one initial word by using a service expansion strategy to obtain a first expansion word;

and the first determining submodule is used for determining that the first expansion word is the expansion data of the second training data.

In some embodiments, the word expansion policy includes: and (4) a knowledge extension strategy. A second expansion module, which may include:

the second expansion submodule is used for carrying out word expansion processing on at least one initial word by using a knowledge expansion strategy to obtain a second expansion word;

and the second determining submodule is used for determining that the second expansion word is the expansion data of the second training data.

In some embodiments, the word expansion policy includes: a service extension policy and a knowledge extension policy. The second expansion module may include:

and the third determining submodule is used for determining at least one of the first expansion words and the second expansion words as the expansion data of the second training data.

As a possible implementation manner, the first extension submodule is specifically configured to:

determining at least one candidate word corresponding to the service expansion strategy;

and for any initial word, determining a first word matched with the initial word from the at least one candidate word so as to determine the first word corresponding to each of the at least one initial word as a first expansion word.

In some embodiments, the first extension submodule is specifically configured to:

if any initial word is determined to be an entity word, determining a first word matched with the initial word from the candidate entity words;

As another embodiment, the second extension submodule is specifically configured to:

In certain embodiments, further comprising:

the model prediction unit is used for respectively inputting the first training data, the second training data and the extension data into the language model to obtain a first prediction result corresponding to the first training data, a second prediction result corresponding to the second training data and an extension prediction result corresponding to the extension data;

a first processing unit comprising:

the first processing module is used for performing loss calculation based on the label of the first training data and the first prediction result to obtain a first loss value;

a second processing unit comprising:

and the second processing module is used for performing loss calculation based on a second prediction result of the second training data and an extension prediction result corresponding to the extension data to obtain a second loss value.

As an optional implementation, the second processing module includes:

and the loss calculation submodule is used for calculating the loss of the result difference between the second prediction result and the expansion prediction result by adopting a relative loss function to obtain a second loss value.

As still another embodiment, the method further includes:

and the second training unit is used for updating the language model if the sum of the first loss value and the second loss value is determined not to meet the loss condition, returning to the comparison data of the first training data in the language model to be trained by using the label of the first training data, and calculating to obtain the first loss value to continue executing.

In some embodiments, the second training unit comprises:

the node determining module is used for determining at least one computing node corresponding to the current language model;

the node selection module is used for selecting at least one computing node based on a node selection strategy to obtain a target computing node;

the parameter adjusting module is used for adjusting the parameters of the target computing node based on the parameter adjusting strategy to obtain target parameters corresponding to the target computing node;

and the model determining module is used for determining the target computing node provided with the target parameters as the updated language model.

In one possible design, the node selection module includes:

and the probability selection submodule is used for presetting the selection probability of selecting the target probability value as the computing node as the target computing node, and randomly selecting at least one computing node according to the selection probability to obtain the target computing node.

In certain embodiments, the parameter adjustment module comprises:

and the search adjustment submodule is used for adjusting the parameters of the target computing node by taking a network search parameter adjustment algorithm as a parameter adjustment strategy to obtain the target parameters corresponding to the target computing node.

In one possible design, the data acquisition unit includes:

the data cleaning module is used for cleaning the original data based on a data cleaning strategy to obtain training data;

the data dividing module is used for dividing the training data into first data and second data;

the data labeling module is used for labeling the first data to obtain labeled first training data;

and the data determining module is used for determining the second data as the second training data which is not marked.

In certain embodiments, a data annotation module, comprises:

the data sending submodule is used for sending the first data to the labeling electronic equipment;

and the data receiving submodule is used for receiving the label of the first data sent by the labeling electronic equipment so as to obtain the first data with the label as labeled first training data.

As shown in fig. 9, a schematic structural diagram of a data processing apparatus provided in a ninth embodiment of the present disclosure, the apparatus may be configured with a data processing method, and the data processing apparatus may be located in an electronic device. The data processing device 900 may comprise several units:

data receiving unit 901: the text processing device is used for receiving text data to be processed sent by user equipment; the data type of the text data to be processed is the same as the data type of the first training data or the second training data.

Result acquisition unit 902: the language model is used for inputting the text data to be processed into the language model corresponding to the target model parameter to obtain a language processing result of the language model on the text data to be processed; the target model parameters are obtained by training based on the language model training method provided by the embodiment;

result transmitting section 903: and the language processing device is used for sending the language processing result to the user equipment, and the language processing result is displayed by the user equipment.

It should be noted that the language model in this embodiment is not a natural language processing model for a specific user, and cannot reflect personal information of a specific user. It should be noted that the data in the present embodiment are all from public data sets.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the device 1000 comprises a computing unit 1001 which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 10010 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 10010 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as a language model training method or a data processing method. For example, in some embodiments, the language model training method or the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 10010. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the language model training method or the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the language model training method or the data processing method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of language model training comprising:

and if the sum of the first loss value and the second loss value meets the loss condition, finishing the training of the language model and obtaining the target model parameters of the language model.

2. The method according to claim 1, wherein the performing data expansion processing on the second training data to obtain expanded data corresponding to the second training data includes:

performing word segmentation on the second training data to obtain at least one initial word corresponding to the second training data;

and performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data.

3. The method of claim 2, wherein the term expansion policy comprises: a service expansion strategy;

performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data, including:

performing word expansion processing on at least one initial word by using the service expansion strategy to obtain a first expansion word;

determining that the first expansion word is expansion data of the second training data.

4. The method of claim 2, wherein the term expansion policy comprises: a knowledge extension strategy;

performing word expansion processing on at least one initial word by using the knowledge expansion strategy to obtain a second expansion word;

determining the second expansion word as expansion data of the second training data.

5. The method of claim 2, wherein the term expansion policy comprises: a service expansion strategy and a knowledge expansion strategy;

determining that the first expansion word and the second expansion word are expansion data of the second training data.

6. The method according to claim 3 or 5, wherein said performing, by using the service expansion policy, a word expansion process on at least one of the initial words to obtain a first expanded word comprises:

and for any initial word, determining a first word matched with the initial word from at least one candidate word so as to determine that the first word corresponding to at least one initial word is the first expansion word.

7. The method of claim 6, wherein said determining, for any initial word, a first word from at least one of said candidate words that matches said initial word comprises:

8. The method according to claim 4 or 5, wherein said performing a word expansion process on at least one of the initial words by using the knowledge expansion strategy to obtain a second expanded word comprises:

determining a knowledge-graph matching the data content of the second training data; the knowledge-graph comprises: nodes formed by the knowledge keywords and edges formed by the incidence relation among the knowledge keywords;

9. The method of any of claims 1-8, further comprising:

inputting the first training data, the second training data and the extension data into the language model respectively to obtain a first prediction result corresponding to the first training data, a second prediction result corresponding to the second training data and an extension prediction result corresponding to the extension data;

the calculating and obtaining a first loss value by taking the label of the first training data as comparison data of the first training data in the language model to be trained comprises the following steps:

performing loss calculation based on the label of the first training data and a first prediction result to obtain the first loss value;

the calculating to obtain a second loss value by taking the extension data as comparison data of the corresponding second training data in the language model includes:

and performing loss calculation based on a second prediction result of the second training data and an extension prediction result corresponding to the extension data to obtain the second loss value.

10. The method of claim 9, wherein the performing a loss calculation based on the second prediction result of the second training data and the extension prediction result corresponding to the extension data to obtain the second loss value comprises:

and performing loss calculation on the result difference between the second prediction result and the extended prediction result by adopting a relative loss function to obtain the second loss value.

11. The method of any of claims 1-10, further comprising:

and if the sum of the first loss value and the second loss value is determined not to meet the loss condition, updating the language model, returning to the language model to be trained by taking the label of the first training data as comparison data of the first training data in the language model to be trained, and calculating to obtain a first loss value to continue execution.

12. The method of claim 11, wherein said updating the language model comprises:

adjusting parameters of the target computing node based on a parameter adjusting strategy to obtain target parameters corresponding to the target computing node;

and determining the target computing node provided with the target parameters as an updated language model.

13. The method of claim 12, wherein said selecting at least one of said computing nodes based on a node selection policy to obtain a target computing node comprises:

and taking a preset target probability value as the selection probability of the computing node selected as the target computing node, and randomly selecting at least one computing node according to the selection probability to obtain the target computing node.

14. The method according to claim 12 or 13, wherein the adjusting the parameter of the target computing node based on the parameter adjusting policy to obtain the target parameter corresponding to the target computing node comprises:

and adjusting the parameters of the target computing node by taking a network search parameter adjusting algorithm as the parameter adjusting strategy to obtain the target parameters corresponding to the target computing node.

15. The method of any one of claims 1-14, wherein the obtaining labeled first training data and unlabeled second training data comprises:

cleaning the original data based on a data cleaning strategy to obtain training data;

dividing the training data into first data and second data;

labeling the first data to obtain labeled first training data;

determining the second data as the unlabeled second training data.

16. The method of claim 15, wherein said labeling said first data to obtain said labeled first training data comprises:

sending the first data to the labeling electronic equipment;

receiving a label of the first data sent by the labeling electronic device to obtain the first data with the label as the labeled first training data.

17. The method of claim 15 or 16, wherein the data cleansing policy comprises: at least one of a spelling conversion policy, a symbol removal policy, a format consistency policy, and a data removal policy;

the symbol removal strategy comprises removing target symbols in the training data;

the format consistency strategy comprises setting a character format in the training data according to a target format;

the data cleaning strategy comprises deleting invalid data in the training data.

18. A method of data processing, comprising:

receiving text data to be processed sent by user equipment; the data type of the text data to be processed is the same as that of the first training data or the second training data;

inputting the text data to be processed into a language model corresponding to a target model parameter to obtain a language processing result of the language model on the text data to be processed; the target model parameters are obtained based on the training of the language model training method of claims 1-17;

19. A language model training device comprising:

the first processing unit is used for calculating to obtain a first loss value by taking the label of the first training data as comparison data of the first training data in a language model to be trained;

20. The apparatus of claim 19, wherein the data expansion unit comprises:

and the second expansion module is used for performing word expansion processing on at least one initial word by using a word expansion strategy to obtain expansion data corresponding to the second training data.

21. The apparatus of claim 20, wherein the word expansion policy comprises: a service expansion strategy;

the second expansion module includes:

the first expansion submodule is used for performing word expansion processing on at least one initial word by using the service expansion strategy to obtain a first expansion word;

a first determining submodule, configured to determine that the first expansion word is expansion data of the second training data.

22. The apparatus of claim 20, wherein the word expansion policy comprises: a knowledge extension strategy;

the second expansion module comprises:

the second expansion submodule is used for carrying out word expansion processing on at least one initial word by utilizing the knowledge expansion strategy to obtain a second expansion word;

a second determining submodule, configured to determine that the second expansion word is expansion data of the second training data.

23. The apparatus of claim 20, wherein the word expansion policy comprises: a service expansion strategy and a knowledge expansion strategy;

the second expansion module includes:

a third determining submodule, configured to determine that the first expansion word and the second expansion word are expansion data of the second training data.

24. The apparatus according to claim 21 or 23, wherein the first expansion submodule is specifically configured to:

25. The apparatus according to claim 24, wherein the first extension submodule is specifically configured to:

26. The apparatus according to claim 22 or 23, wherein the second expansion submodule is specifically configured to:

27. The apparatus of any of claims 19-26, further comprising:

a model prediction unit, configured to input the first training data, the second training data, and the extension data into the language model, respectively, to obtain a first prediction result corresponding to the first training data, a second prediction result corresponding to the second training data, and an extension prediction result corresponding to the extension data;

the first processing unit includes:

the first processing module is used for performing loss calculation based on the label of the first training data and a first prediction result to obtain a first loss value;

the second processing unit includes:

and the second processing module is used for performing loss calculation based on a second prediction result of the second training data and an extension prediction result corresponding to the extension data to obtain the second loss value.

28. The apparatus of claim 27, wherein the second processing module comprises:

and the loss calculation submodule is used for calculating the loss of the result difference between the second prediction result and the extended prediction result by adopting a relative loss function to obtain the second loss value.

29. The apparatus of any of claims 19-28, further comprising:

and the second training unit is used for updating the language model if the sum of the first loss value and the second loss value is determined not to meet the loss condition, returning to the language model to be trained by using the label of the first training data as comparison data of the first training data in the language model to be trained, and calculating to obtain a first loss value to continue execution.

30. The apparatus of claim 29, wherein the second training unit comprises:

the parameter adjusting module is used for adjusting the parameters of the target computing node based on a parameter adjusting strategy to obtain target parameters corresponding to the target computing node;

31. The apparatus of claim 30, wherein the node selection module comprises:

32. The apparatus of claim 30 or 31, wherein the parameter adjustment module comprises:

and the search adjustment submodule is used for adjusting the parameters of the target computing node by taking a network search parameter adjustment algorithm as the parameter adjustment strategy to obtain the target parameters corresponding to the target computing node.

33. The apparatus of any one of claims 19-32, wherein the data acquisition unit comprises:

the data labeling module is used for labeling the first data to obtain the labeled first training data;

and the data determining module is used for determining that the second data is the unlabeled second training data.

34. The apparatus of claim 33, wherein the data annotation module comprises:

and the data receiving submodule is used for receiving the label of the first data sent by the labeling electronic equipment so as to obtain the first data with the label as the labeled first training data.

35. The apparatus of claim 33 or 34, wherein the data cleansing policy comprises: at least one of a spelling conversion policy, a symbol removal policy, a format consistency policy, and a data removal policy;

36. A data processing apparatus comprising:

the result acquisition unit is used for inputting the text data to be processed into a language model corresponding to the target model parameters and acquiring a language processing result of the language model on the text data to be processed; the target model parameters are obtained based on the training of the language model training method of claims 1-17;

37. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-17 or 18.

38. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-17 or 18.

39. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 17 or 18.