CN114706984A

CN114706984A - Training method and device of text processing model

Info

Publication number: CN114706984A
Application number: CN202210396669.1A
Authority: CN
Inventors: 吴通通; 赵薇; 柳景明; 李旭
Original assignee: Beijing Feixiang Xingxing Technology Co ltd
Current assignee: Beijing Feixiang Xingxing Technology Co ltd
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2022-07-05

Abstract

The present specification provides a method and an apparatus for training a text processing model, wherein the method for training the text processing model includes: reconstructing combined sample texts in the initial sample set to obtain a target sample set, wherein the combined sample texts comprise at least two types of sub-texts; coding target sample data contained in the target sample set based on a coding unit in a text processing model to obtain a coding vector; updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data; and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data. Therefore, the prediction accuracy of the text processing model is improved.

Description

Training method and device of text processing model

Technical Field

The specification relates to the technical field of artificial intelligence, in particular to a training method of a text processing model. The present specification also relates to a training apparatus for a text processing model, a text processing method, a text processing apparatus, a computing device, and a computer-readable storage medium.

Background

With the development of the internet technology, chapter labels can be set for a plurality of subject matters in a teaching environment, classification of the subject matters is achieved, and then the subjects and the chapter labels are stored in a teaching system together. In the prior art, a teacher usually needs to manually label chapter labels for a plurality of subject matters, and this way consumes a lot of manpower, so that a simpler and more convenient method is needed for labeling chapters and titles.

Disclosure of Invention

In view of this, embodiments of the present specification provide a method for training a text processing model. The present specification also relates to a training apparatus for a text processing model, a text processing method, a text processing apparatus, a computing device, and a computer-readable storage medium, so as to solve the technical defects in the prior art.

According to a first aspect of embodiments of the present specification, there is provided a method for training a text processing model, including:

reconstructing a combined sample text in an initial sample set to obtain a target sample set, wherein the combined sample text comprises at least two types of sub-texts;

coding target sample data contained in the target sample set based on a coding unit in a text processing model to obtain a coding vector;

updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data;

and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data.

Optionally, the reconstructing the combined sample text in the initial sample set to obtain the target sample set includes:

obtaining an initial sample set;

selecting a sample text containing at least two types of sub texts in the initial sample set as a combined sample text, and selecting a sample text containing one type of sub texts as a single sample text;

reconstructing the combined sample text to obtain an intermediate sample text;

constructing the target sample set based on the intermediate sample text and the single sample text.

Optionally, the reconstructing the combined sample text to obtain an intermediate sample text includes:

extracting non-question stem sub-texts and question stem sub-texts from the combined sample text;

determining a reconstruction numerical value corresponding to the non-subject stem sub-document;

and under the condition that the reconstruction numerical value is larger than a preset reconstruction threshold value, taking the question stem text as a middle sample text.

Optionally, the method further comprises:

determining an answer reconstruction value corresponding to an answer sub-text in the non-question stem sub-text;

determining an analysis reconstruction value corresponding to an analysis sub-text in the non-question stem sub-text under the condition that the answer reconstruction value is smaller than the preset reconstruction threshold;

and under the condition that the analysis reconstruction value is larger than the preset reconstruction threshold value, constructing an intermediate sample text according to the question stem sub-text of the answer sub-text.

Optionally, the updating the coding vector includes:

selecting vector elements to be processed in the coding vectors according to a preset selection strategy;

converting the vector elements to be processed to obtain target vector elements;

constructing a target code vector based on the target vector element and unselected vector elements of the code vector.

Optionally, the updating the text processing model to a target text processing model according to the prediction category and a target category pair corresponding to the target sample data includes:

calculating a first loss value of the text processing model according to the prediction category and a target category corresponding to the target sample data;

optimizing the text processing model according to the first loss value until a target text processing model meeting a training stopping condition is obtained;

wherein the training stop condition comprises an iteration number condition and/or a first loss value comparison condition.

Optionally, after obtaining the prediction category corresponding to the target sample data, the method further includes:

inputting the target sample data to the text processing model for prediction for the second time, and acquiring a reference prediction category corresponding to the target sample data;

calculating a difference value of the target sample data according to the prediction category and the reference prediction category;

calculating a second loss value based on a preset weight factor and the difference value;

and optimizing the text processing model according to the second loss value until a target text processing model meeting the training stopping condition is obtained.

According to a second aspect of embodiments herein, there is provided a training apparatus for a text processing model, including:

the acquisition module is configured to reconstruct a combined sample text in an initial sample set to obtain a target sample set, wherein the combined sample text comprises at least two types of subfolders;

the encoding module is configured to perform encoding processing on target sample data contained in the target sample set based on an encoding unit in a text processing model to obtain an encoding vector;

the classification module is configured to update the coding vector, and classify the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data;

and the training module is configured to update the text processing model to a target text processing model according to the prediction type and a target type corresponding to the target sample data.

According to a third aspect of embodiments herein, there is provided a text processing method including:

acquiring text data uploaded by a user;

inputting the text data into a text processing model in the training method of the text processing model for processing to obtain the category of the text data;

and feeding back the text data category as chapter information of the corresponding text data to a user.

According to a fourth aspect of embodiments herein, there is provided a text processing apparatus including:

the text acquisition module is configured to acquire text data uploaded by a user;

the text processing module is configured to input the text data into a text processing model in a training method of the text processing model for processing to obtain a text data category;

and the information feedback module is configured to feed back the text data category as chapter information of the corresponding text data to a user.

According to a fifth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

according to a sixth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method.

In the training method of the text processing model provided by the present specification, a target sample set is obtained by performing reconstruction processing on a combined sample text in an initial sample set, and target sample data contained in the target sample set is encoded based on an encoding unit in the text processing model to obtain an encoding vector; updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data; and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data. Therefore, the category of the target sample data is automatically determined, the prediction accuracy of the trained text processing model is effectively guaranteed, the prediction capability of the text processing model is effectively improved, and the category prediction task is accurately and efficiently completed.

Drawings

FIG. 1 is a flowchart of a method for training a text processing model according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a text processing model in a training method of the text processing model according to an embodiment of the present specification;

FIG. 3 is a schematic structural diagram of an apparatus for training a text processing model according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a text processing method provided in an embodiment of the present specification;

fig. 5 is a flowchart of a process applied to a chapter division scenario according to an embodiment of the present specification;

fig. 6 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms referred to in one or more embodiments of the present specification are explained.

Dropout: in the neural network training process, the output of part of the intermediate neurons is randomly discarded according to a certain probability, so that overfitting of the model can be effectively prevented.

RDrop is given by: regularised Dropout, a Regularized Dropout method, performs two different dropouts on the same input in the model forward process and performs consistency optimization on the two results obtained.

KL divergence: an indicator measures the difference between the two probability distributions, the closer the two probability distributions are, the smaller the indicator is.

The Bert model: the most popular deep learning model in the field of natural language processing is used for pre-training, so that the time for model training can be greatly shortened, and the accuracy of the model is improved.

Word2vec, a kind of Word embedding. Can be used to map each word to a vector, thereby allowing the machine to read the data. The purpose of Word2vec is to convert words into vectors.

Topic section: in the subjects of Chinese, mathematics and the like, chapters are divided according to grades and units, such as the characteristic of 'triangle in a four-grade lower book', and different chapters contain different knowledge points and different titles.

In the present specification, a method for training a text processing model is provided, and the present specification relates to a device for training a text processing model, a method for processing a text, a text processing device, a computing apparatus, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

In practical application, in order to meet the requirement of daily teaching activities, the questions are usually required to be input into a question bank system, and before the questions are input into the question bank system, a teacher is required to manually determine chapter labels for the questions and divide the chapters, so that the questions are divided into the corresponding chapters. So that the questions stored in the question bank can be extracted according to chapters when being extracted subsequently.

However, due to cognitive differences among different teachers, the results often noted in determining chapter labels for a topic are also inconsistent. In the labeling process, a teacher needs to browse massive titles and chapter information and needs to consume a great deal of energy, so that the labeling result is not accurate enough, in the prior art, deep learning models such as a Bert classification model are used for chapter division, and the prediction accuracy is low.

In view of this, the present specification provides a training method for a text processing model, which obtains a target sample set by performing reconstruction processing on a combined sample text in an initial sample set, and obtains a coding vector by performing coding processing on target sample data included in the target sample set based on a coding unit in the text processing model; updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data; and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data. Therefore, the category of the target sample data is automatically determined, the prediction accuracy of the trained text processing model is effectively guaranteed, the prediction capability of the text processing model is effectively improved, and the category prediction task is accurately and efficiently completed.

Fig. 1 is a flowchart illustrating a method for training a text processing model according to an embodiment of the present disclosure, which specifically includes the following steps:

step S102, reconstructing the combined sample text in the initial sample set to obtain a target sample set, wherein the combined sample text comprises at least two types of sub-texts.

Specifically, the initial sample set refers to a set composed of a plurality of samples input into the text processing model when the text processing model is trained. As in the field of text classification, an initial sample set contains a plurality of initial sample texts, and an initial sample may be composed of text content and a classification label; or in the image classification field, the initial sample set comprises a plurality of image samples, and the image samples comprise images and classification labels of the images. The target sample set is a sample set composed of target samples obtained by processing the initial samples included in the initial sample set. The combined sample text refers to a sample text combined from at least two types of texts. It should be noted that, sample expansion may also be performed on the combined sample text, and reconstruction processing is performed before the combined sample text is input into the text processing model to obtain the target sample, where the reconstruction processing refers to expanding the number of initial samples in the initial sample set, and usually, Dropout is used to perform sample expansion.

In practical applications, the initial samples used in different classification fields may be constructed according to actual requirements, and the method for expanding the samples is not limited to the aforementioned Dropout manner, which is not limited in this embodiment.

Based on this, after the initial sample set is obtained, the initial samples contained in the initial sample set are screened, the initial samples containing at least two types of sub-texts are screened out and used as combined sample texts, and the combined sample texts are reconstructed in a specific reconstruction mode that Dropout is performed on the combined sample texts to obtain at least two reconstructed sample texts corresponding to the combined sample texts, so that the combined sample texts can be subsequently input into a text processing model for training.

Furthermore, in order to train a text processing model meeting the use requirement and improve the performance of the text processing model, a large number of samples need to be prepared. Therefore, the initial sample set input into the text processing model can be sample-expanded, and the initial samples which can be expanded in the initial sample set are selected for reconstruction processing, which is specifically realized as follows:

obtaining an initial sample set; selecting a sample text containing at least two types of sub texts in the initial sample set as a combined sample text, and selecting a sample text containing one type of sub texts as a single sample text; reconstructing the combined sample text to obtain an intermediate sample text; constructing the target sample set based on the intermediate sample text and the single sample text.

Specifically, the initial sample set includes two types of sample texts, namely a combined sample text and a single sample text, where the combined sample text refers to a sample text including two types of sub-texts; a single sample text refers to a sample text that contains one type of sub-text. If the combined sample can be composed of a question stem text, an answer text and an analysis text, can also be composed of a question stem text and an answer text, and can also be composed of a question stem text and an analysis text; the sample text containing only the stem text is then a single sample text. The intermediate sample text refers to a sample text obtained after the combined sample text is subjected to reconstruction processing.

Based on the method, after the initial sample set is obtained, the initial sample set is divided according to the number of the sub-sample types contained in the initial samples in the initial sample set, and the initial samples in the initial sample set are divided into the combined sample text and the single sample text. The method comprises the steps that an initial sample set is formed, a combined sample text comprises at least two types of sub texts, the combined sample text can be subjected to sample expansion according to the types of the sub texts, at least two combined sample texts corresponding to the combined sample text are obtained, an intermediate sample text is obtained after the combined sample text contained in the initial sample set is reconstructed, and a target sample set is formed by the intermediate sample text and a single sample text which is not processed in the initial sample set and is used for being input into a text processing model to conduct model training.

In summary, at least two reconstructed combined sample texts corresponding to the combined sample texts are obtained by reconstructing the combined sample texts contained in the initial sample set, so that the number of samples in the initial sample set is expanded, and model training is performed by using the target sample set obtained after sample expansion. The performance of the text processing model is improved.

Further, when reconstructing the combined sample text, since the combined sample text is composed of at least two types of sub-texts, the reconstruction processing can be performed on the at least two types of sub-texts in the combined sample text, which is specifically implemented as follows:

extracting non-question stem sub-texts and question stem sub-texts from the combined sample text; determining a reconstruction numerical value corresponding to the non-subject stem sub-document; and under the condition that the reconstruction numerical value is larger than a preset reconstruction threshold value, taking the question stem text as a middle sample text.

Specifically, the topic stem subforms refer to texts with the text contents of the topic stem in the combined sample text, and the topic stem refers to sentences or sentence components for raising questions; the non-question stem subfile is text content except the question stem text in the combined sample text; under the scene that the question is a choice question, the question stem subfile is a question part and a option part of the choice question, and the non-question stem subfile is an answer corresponding to the choice question and is used for analyzing the answer given by the answer; the reconstructed value refers to a value randomly assigned to the non-subject stem sub-text within a certain numerical range; the preset reconstruction threshold value is a preset parameter value and is used for comparing reconstruction values corresponding to the non-subject stem sub-texts, so as to determine whether the non-subject stem sub-texts need to be reconstructed.

Based on the method, a combined sample text is determined, non-subject stem sub-texts in the combined sample text are extracted, the non-subject stem sub-texts are assigned, namely reconstruction values corresponding to the non-subject stem sub-texts are determined, the reconstruction values corresponding to the non-subject stem sub-texts are compared with a preset reconstruction threshold value, and under the condition that the reconstruction values corresponding to the non-subject stem sub-texts are larger than the preset reconstruction threshold value, the subject stem sub-texts in the combined sample text are used as middle sample texts, so that expansion samples corresponding to the combined sample text are obtained, namely, the combined sample text is reconstructed into the middle sample text on the basis of keeping the combined sample text.

In summary, the non-question stem sub-texts in the combined sample text are extracted, so that the middle sample text is determined based on the combined sample text, and the sample data is expanded based on the combined sample text, so that a richer sample set is provided for the training of the text processing model, the performance of the text processing model is improved, and the prediction accuracy of the text processing model is improved.

Further, when the combined sample text is reconstructed, since the non-stem sub-text in the combined sample text is composed of the answer sub-text and the parsing sub-text, each type of sub-text in the combined sample text can be reconstructed, which is specifically implemented as follows:

determining an answer reconstruction value corresponding to an answer sub-text in the non-question stem sub-text; determining an analysis reconstruction value corresponding to an analysis sub text in the non-question stem sub text under the condition that the answer reconstruction value is smaller than the preset reconstruction threshold value; and under the condition that the analysis reconstruction value is larger than the preset reconstruction threshold value, constructing an intermediate sample text according to the question stem sub-text of the answer sub-text.

Specifically, the answer subfile refers to answer content corresponding to the question stem subfile in the combined sample text, and correspondingly, the answer reconstruction numerical value refers to a numerical value randomly given to the answer subfile within a certain numerical value range; the analysis sub-text refers to analysis content corresponding to the answer sub-text in the combined sample text, and is a process description for analyzing the question stem so as to determine the answer corresponding to the question stem; the analysis reconstruction threshold is a numerical value randomly assigned to the analysis sub-text within a certain numerical value range.

Based on the above, when the answer reconstruction value corresponding to the answer subfile contained in the combined sample text and the analysis reconstruction value corresponding to the analysis subfile are both greater than the preset reconstruction threshold, the question stem subfile in the combined sample text is the intermediate sample text; when answer reconstruction values corresponding to the answer subforms contained in the combined sample text are larger than a preset reconstruction threshold value and analysis reconstruction values corresponding to the analysis subforms are smaller than the preset reconstruction threshold value, the question stem subforms and the analysis subforms in the combined sample text form an intermediate sample text; and when the answer reconstruction value corresponding to the answer subfile contained in the combined sample text is smaller than the preset reconstruction threshold value and the analysis reconstruction value corresponding to the analysis subfile is larger than the preset reconstruction threshold value, the question stem subfile and the answer subfile in the combined sample text form an intermediate sample text.

For example, an initial sample set is obtained that includes a plurality of initial samples. As shown in fig. 2, the initial sample is a question including a question stem text, an answer text, and an analysis text, where the question is an input of the text processing model. The initial sample set comprises two types of initial samples, one is a combined sample text, and the other is a single sample text, the combined sample text comprises an answer stem text and a non-answer stem text, wherein the non-answer stem text comprises an answer text and a parsing text. The sample only containing the question stem text is a single sample text; the sample containing the stem text, the answer text and the analysis text and/or the stem text and the answer text and/or the stem text and the analysis text is a combined sample text. Therefore, the combined sample text can be reconstructed to obtain the intermediate sample text corresponding to the combined sample text.

The input Dropout step shown in FIG. 2 is performed, Dropout being performed on the inputs of the Bert model. Namely, a combined sample text containing a stem text, an answer text and an analysis text is reconstructed, a reconstruction value of 0.4 is assigned to the answer text, a reconstruction value of 0.2 is assigned to the analysis text, and a preset reconstruction threshold value is 0.3. The reconstruction value 0.4 of the answer text is greater than the preset reconstruction threshold value 0.3, so that the answer text and the analysis text are 'discarded', the obtained intermediate sample text corresponding to the combined sample text is the subject stem text, or the reconstruction value 0.2 of the analysis text is less than the preset reconstruction threshold value 0.3, so that only the answer text is 'discarded', and the obtained intermediate sample text corresponding to the combined sample text is composed of the subject stem text and the analysis sample b. It should be noted that "discard" means not participating in the calculation. And reconstructing each combined sample text contained in the initial sample set to obtain a plurality of intermediate samples, wherein the intermediate samples, the single sample text and the combined sample text form a target sample set.

In summary, at least two reconstructed combined sample texts corresponding to the combined sample text are obtained by reconstructing the answer sub-text and the parsing sub-text in the combined sample text, so that the number of samples in the initial sample set is expanded, and model training is performed by using the target sample set obtained after sample expansion. The performance of the text processing model is improved, and therefore the prediction accuracy of the text processing model is improved.

And step S104, carrying out coding processing on target sample data contained in the target sample set based on a coding unit in the text processing model to obtain a coding vector.

Specifically, after the initial sample set is reconstructed to obtain a target sample set, since target sample data included in the target sample set needs to be converted into vector expression, the target sample data needs to be encoded by an encoding unit in a text processing model, where the encoding unit is a processing module that converts text content from characters or characters into vector expression in the text processing model; the encoding process refers to a process of converting characters or characters into vector expressions, and in this embodiment, a Bert model is used for encoding; the target sample data refers to samples corresponding to initial samples in the initial sample set and is used for being input into the text processing model for model training. It should be noted that the method for converting text content into an expression form of a vector further includes Word2vec, etc., which is not limited in this embodiment.

Based on the method, each target sample data contained in the target sample set is coded through a coding unit in the text processing model, the text content contained in the target sample data is determined, the target sample data is divided into a plurality of word units, the Bert model is loaded, each word unit is processed by using a correlation function in the Bert model to obtain the vector expression of each word unit, and the vector expressions corresponding to each word unit contained in the target sample data are integrated to obtain the coding vector corresponding to the target sample data.

For example, as shown in fig. 2, the target sample data "question stem a, answer B, and analysis C" is encoded by the Bert model, and the "question stem a, answer B, and analysis C" are respectively used as three word units, and a vector expression corresponding to each word unit is calculated to obtain an encoding vector [101, 364, 294] corresponding to the question stem a, an encoding vector [451, 309, 192] corresponding to the answer B, and an encoding vector [562, 384, 671] corresponding to the analysis C, and the encoding vector corresponding to each word unit constitutes an encoding vector corresponding to the "question stem a, answer B, and analysis C", that is, a matrix [ [101, 364, 294], [451, 309, 192], [562, 384, 671] ].

And step S106, updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data.

Specifically, after the coding vector of each target sample set is determined, the target sample sets corresponding to the coding vectors can be classified according to the coding vectors, so as to predict the categories of the target sample sets, wherein the classification unit is a classification module used for classifying the coding vectors corresponding to the target sample data in the target sample sets in the text processing model; the prediction category refers to a category prediction result of a corresponding coding vector obtained by classifying the coding vector through a classification unit in the text processing model.

Based on this, after the coding vector corresponding to each target sample data is obtained, the coding vector is updated, and then the updated coding vector is classified by the classification unit in the text processing model, so as to obtain the prediction category of each coding vector, that is, the prediction category of each target sample data, so that the parameter tuning/optimization is performed on the text processing model based on a predetermined loss function.

Further, in order to improve the performance of the text processing model, before the prediction category of the coding vector is determined by the classification unit in the text processing model, the coding vector may be updated, and a part of vector elements included in the coding vector are processed to construct a target coding vector, which is specifically implemented as follows:

selecting vector elements to be processed in the coding vectors according to a preset selection strategy; converting the vector elements to be processed to obtain target vector elements; constructing a target code vector based on the target vector element and unselected vector elements of the code vector.

Specifically, the preset selection policy refers to a preset selection rule for selecting vector elements in the coding vector aiming at the coding vector; the vector elements to be processed refer to vector elements selected in the coding vectors according to a preset selection strategy; the conversion processing refers to processing operation performed on the vector elements to be processed, and includes but is not limited to processing modes such as setting the numerical values of the vector elements to be processed to 0, and the like, and aims to eliminate the influence of the vector elements to be processed on the encoding vector, so that the vector elements to be processed do not participate in the calculation of the encoding vector, and overfitting is prevented; the target vector element is a vector element obtained by processing the vector element to be processed; the target coding vector is a target coding vector of a corresponding coding vector obtained after processing an element to be processed in the coding vector and is used for performing class prediction.

Based on this, selecting the vector elements to be processed in each coding vector according to a preset selection strategy, determining the partial vector elements to be processed in each coding vector which need to be converted, and converting each vector element to be processed, namely setting the numerical value of the vector element to be processed to 0, wherein the vector elements to be processed and the vector elements which are not selected as the vector elements to be processed form the target coding vector corresponding to each coding vector.

Following the above example, Dropout is performed on the output of the Bert model as shown in FIG. 2. Combining sample texts: the encoding vector corresponding to the "question stem a, answer B, and analysis C" is [ [101, 364, 294], [451, 309, 192], [562, 384, 671], "it is known that the encoding vector includes 9 vector elements, and the encoding vector is randomly selected from the 9 vector elements at a ratio of 11%, and after the vector element 309 is selected, the value of the vector element 309 is set to 0, and 0 is the target vector element, and a target encoding vector is constructed from the target vector element 0 and other unselected vector elements in the encoding vector, and the constructed target encoding vector is [ [101, 364, 294], [451, 0, 192], [562, 384, 671] ]. Classifying the target coding vectors to obtain the category of the combined sample text corresponding to the target coding vectors, and predicting the target coding vectors into X types, namely, combining the combined sample texts: "stem A, answer B, resolution C" is predicted as chapter X.

In conclusion, by randomly 'discarding' part of vector elements of the coded vector, the influence of the part of vector elements in the coded vector on the coded vector is eliminated, the overfitting of the text processing model can be effectively prevented, and the prediction accuracy of the text processing model is improved.

And S108, updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data.

Specifically, on the basis of determining the prediction type of the corresponding target sample data, the text processing model may be trained by combining the target type corresponding to the target sample data and the prediction type of the target sample data, so that the text processing model may learn the characteristics of the target type corresponding to the target sample data in the target sample data, and obtain the target text processing model corresponding to the text processing model through continuous iteration and optimization of the text processing model. The training stopping condition may be training iteration number or loss value comparison, and in practical application, the training stopping condition may be set according to a requirement, which is not limited herein.

In practical application, after the prediction category of the corresponding target sample data is obtained, a loss function can be selected or designed according to the prediction category and the target category of the target sample data, parameters in the text processing model are adjusted according to the prediction category and the target category of the text processing model, the text processing model is updated until a training stopping condition is reached, and the obtained model is the trained target text processing model.

Further, in the process of training the text processing model by using the target sample data, the text processing model is a supervised model, so that the training of the text processing model can be completed only by continuous optimization and parameter adjustment, and the specific implementation is as follows:

calculating a first loss value of the text processing model according to the prediction category and a target category corresponding to the target sample data; optimizing the text processing model according to the first loss value until a target text processing model meeting a training stopping condition is obtained; wherein the training stop condition comprises an iteration number condition and/or a first loss value comparison condition.

Specifically, the first loss value is a loss value obtained by comparing a target class and a prediction class corresponding to a target sample and then calculating through a loss function, where the loss value represents a difference degree between the prediction class and the target class, and in practical application, the loss function may be a cross entropy function, a maximum entropy function, or the like, which is not limited in this embodiment.

Based on the above, after the prediction category and the target category corresponding to the target sample data are determined, the first loss value of the text processing model can be calculated based on the prediction category and the target category corresponding to the target sample data, so as to obtain the difference between the prediction category and the target category, and the text processing model is subjected to parameter adjustment/optimization according to the first loss value, so that the difference between the prediction category and the target category is reduced, and the prediction of the target sample data by the text processing model is closer to the target category of the target sample data. And continuously adjusting parameters of the text processing model until the iteration number reaches a preset iteration number threshold value or the first loss value is within a target range, determining that the training of the text processing model can be stopped, and determining the obtained text processing model as the trained target text processing model. By continuous tuning and training, the generalization capability of the text processing model is effectively improved.

In practical applications, the sample text is combined as described above: after the question stem A, the answer B and the analysis C are predicted to be X sections, a first loss value is calculated by combining a target type Y section of the combined sample text by using a cross entropy loss function or other loss functions shown in figure 2, and the text processing model is optimized according to the calculated first loss value. Until a target text processing model satisfying the training stop condition is obtained.

Further, in order to optimize the text processing model and improve the prediction accuracy of the text processing model, the text processing model may be optimized according to a difference between two prediction results of the same target sample by the text processing model, and the specific implementation is as follows:

inputting the target sample data to the text processing model for the second time for prediction, and acquiring a reference prediction category corresponding to the target sample data; calculating a difference value of the target sample data according to the prediction category and the reference prediction category; calculating a second loss value based on a preset weight factor and the difference value; and optimizing the text processing model according to the second loss value until a target text processing model meeting the training stopping condition is obtained.

Specifically, the reference prediction category refers to a prediction result obtained by performing first prediction on target sample data based on a text processing model and then performing second prediction, and is used for comparing the prediction result with the prediction category; the preset weight factor refers to a preset hyper-parameter and is used for calculating a loss value and optimizing a text processing model; the second loss value is a loss value obtained by calculation according to a preset weight factor and a difference value of target sample data, and is used for optimizing the text processing model and improving the prediction accuracy of the text processing model.

Based on the method, after the prediction category corresponding to the target sample data is determined through the text processing model, the same target sample data is input into the text processing model for the second time to obtain the reference prediction category corresponding to the target sample data, and the difference value of the target sample data, namely the difference between the reference prediction category and the prediction category, can be obtained through calculation according to the reference prediction category and the prediction category. The difference value between the reference prediction category and the prediction category can be obtained through calculation in a way of calculating the KL divergence, and then the second loss value is calculated according to the preset weight factor, so that the parameters of the text processing model are adjusted according to the second loss value, the purpose of optimizing the text processing model is achieved, and the accuracy of the text processing model, namely the prediction accuracy is improved.

Following the above example, RDrop is performed on the combined sample text. In the above, sample texts are to be combined: after the question stem A, the answer B and the answer C are predicted to be X sections, the text processing model is used for combining the sample texts for the second time: predicting the question stem A, the answer B and the analysis C, and combining sample texts: "stem A, answer B, resolution C" is predicted as section Z. As shown in fig. 2, KL divergence of the first prediction result and the second prediction result is calculated, a second loss value of the combined sample text is calculated based on a preset weighting factor of 0.1, and the text processing model is optimized according to the calculated second loss value. Therefore, the updating of the text processing model is realized, and the prediction accuracy of the text processing model is improved.

To sum up, in the training method of the text processing model provided in this specification, a target sample set is obtained by performing reconstruction processing on a combined sample text in an initial sample set, and target sample data included in the target sample set is encoded based on an encoding unit in the text processing model to obtain an encoding vector; updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data; and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data. Therefore, the category of the target sample data is automatically determined, the prediction accuracy of the trained text processing model is effectively guaranteed, the prediction capability of the text processing model is effectively improved, and the category prediction task is accurately and efficiently completed.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a training apparatus for a text processing model, and fig. 3 shows a schematic structural diagram of the training apparatus for a text processing model provided in an embodiment of the present specification. As shown in fig. 3, the apparatus includes:

an obtaining module 302, configured to perform reconstruction processing on a combined sample text in an initial sample set to obtain a target sample set, where the combined sample text includes at least two types of subfolders;

the encoding module 304 is configured to perform encoding processing on target sample data included in the target sample set based on an encoding unit in a text processing model to obtain an encoding vector;

a classification module 306, configured to update the coding vector, and perform classification processing on the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data;

a training module 308 configured to update the text processing model to a target text processing model according to the prediction category and a target category corresponding to the target sample data.

In an optional embodiment, the obtaining module 302 is further configured to:

extracting non-question stem sub-texts and question stem sub-texts from the combined sample text; determining a reconstruction numerical value corresponding to the non-question stem subfile; and under the condition that the reconstruction numerical value is larger than a preset reconstruction threshold value, taking the question stem sub-text as an intermediate sample text.

In an optional embodiment, the obtaining module 302 is further configured to:

determining an answer reconstruction value corresponding to an answer sub-text in the non-question stem sub-text; determining an analysis reconstruction value corresponding to an analysis sub-text in the non-question stem sub-text under the condition that the answer reconstruction value is smaller than the preset reconstruction threshold; and under the condition that the analysis reconstruction value is larger than the preset reconstruction threshold value, constructing an intermediate sample text according to the question stem sub-text of the answer sub-text.

In an optional embodiment, the classification module 306 is further configured to:

In an optional embodiment, the training module 308 is further configured to:

inputting the target sample data to the text processing model for prediction for the second time, and acquiring a reference prediction category corresponding to the target sample data; calculating a difference value of the target sample data according to the prediction category and the reference prediction category; calculating a second loss value based on a preset weight factor and the difference value; and optimizing the text processing model according to the second loss value until a target text processing model meeting the training stopping condition is obtained.

The training device of the text processing model provided by the present specification obtains a target sample set by performing reconstruction processing on a combined sample text in an initial sample set, and obtains a coding vector by performing coding processing on target sample data contained in the target sample set based on a coding unit in the text processing model; updating the coding vector, and classifying the updated coding vector based on a classification unit in the text processing model to obtain a prediction category corresponding to the target sample data; and updating the text processing model into a target text processing model according to the prediction type and the target type corresponding to the target sample data. Therefore, the category of the target sample data is automatically determined, the prediction accuracy of the trained text processing model is effectively guaranteed, the prediction capability of the text processing model is effectively improved, and the category prediction task is accurately and efficiently completed.

The above is a schematic scheme of a training apparatus for a text processing model according to this embodiment. It should be noted that the technical solution of the training apparatus for the text processing model and the technical solution of the training method for the text processing model belong to the same concept, and details that are not described in detail in the technical solution of the training apparatus for the text processing model can be referred to the description of the technical solution of the training method for the text processing model.

Fig. 4 is a flowchart illustrating a text processing method according to an embodiment of the present specification, which specifically includes the following steps:

step S402, acquiring text data uploaded by a user.

Step S404, inputting the text data into a text processing model in the training method of the text processing model for processing, and obtaining the text data category.

Step S406, the text data category is fed back to the user as chapter information of the corresponding text data.

In practical application, when the theme input by the user is "the inner angle sum of triangles is equal to 180 degrees", the theme can be input into the text processing model for processing, the corresponding chapter prediction condition is obtained according to the prediction result, the theme is predicted as "the theorem of | triangle | registered in four-year class", when the theme "the inner angle sum of triangle equals to 180 degrees" is stored, the theme is stored as the theme under the corresponding chapter, and the stored information is fed back to the user for informing the user that the theme "the inner angle sum of triangle equals to 180 degrees" is divided into "the theme corresponding to the theorem of | triangle | registered in four-year class.

In conclusion, by using the text processing model obtained by the training method to divide chapters and sections of the title, the chapter division speed and accuracy can be effectively improved, the response speed is higher, the consumption of human resources is reduced, and the use experience of the user is improved.

The following will further describe the training method of the text processing model by taking an application of the training method of the text processing model provided in this specification in a text processing process as an example with reference to fig. 5. Fig. 5 shows a processing flow chart applied to a chapter division scenario, which is provided in an embodiment of the present specification, and specifically includes the following steps:

step S502, an initial sample set is constructed.

Constructing an initial sample set according to the question stem, the answer and the analysis, wherein the initial samples in the initial sample set comprise: the method comprises the steps of obtaining an initial sample containing a stem only, an initial sample containing the stem and an answer, an initial sample containing the stem and an analysis, and an initial sample containing the stem, the answer and the analysis, wherein each sample further comprises target chapter information corresponding to the sample.

Step S504, the combined sample text contained in the initial sample set is reconstructed, and an intermediate sample set is obtained.

The initial sample set comprises at least two types of texts, namely an initial sample of a combined sample text, and correspondingly, the initial sample only comprising the question stem is a single sample text. And performing reconstruction processing on the combined sample text, wherein the reconstruction processing comprises the following steps: and respectively presetting reconstruction numerical values for the answers and the analyses in the combined sample texts, comparing the reconstruction numerical values with a preset reconstruction threshold, executing Dropout on the answers or the analyses corresponding to the reconstruction numerical values when the reconstruction numerical values are larger than the preset reconstruction threshold to obtain intermediate samples corresponding to the combined sample texts, and reconstructing each combined sample text to obtain an intermediate sample set.

Step S506, a target sample set is constructed based on the single sample text and the intermediate sample set contained in the initial sample set.

Step S508, performing encoding processing on the target sample data included in the target sample set, and converting the target sample data into an encoding vector.

And inputting the target sample text contained in the target sample set into a coding unit in the text processing model for coding, and obtaining a coding vector corresponding to the target sample text, namely converting the text content in the target sample text into an expression form of a vector.

In step S510, the classification unit predicts the classification of the encoded vector, and obtains a prediction classification of the encoded vector.

And inputting the coding vector into a classification unit in the text processing model for prediction to obtain prediction chapter information corresponding to the coding vector.

And S512, training the text processing model according to the prediction type and the target type corresponding to the target sample data until the target text processing model meeting the training stop condition is obtained.

And calculating a loss value according to the predicted chapter information and the target chapter information, optimizing the text processing model, and then obtaining the text processing model meeting the training stopping condition through continuous iteration and optimization.

Step S514, receiving the title text input by the user.

When the training of the text processing model is finished, the text processing model can be multiplexed, and the judgment question information input by the user is received, wherein the sum of two sides of the triangle is larger than the third side.

Step S516, the title text is input into the text processing model, and title chapter information is obtained.

And determining the topic chapter information of which the sum of two sides of the triangle is greater than the third side as a triangle theorem through the prediction of the text processing model.

And S518, updating the user input interface based on the title chapter information, and displaying a title classification interface to the user.

And processing the theme 'the sum of two sides of the triangle is larger than the third side' through the text processing model to obtain corresponding chapter information 'triangle theorem', feeding back a processing result to a user at the moment, and displaying a theme classification interface to the user.

In conclusion, the text processing model is trained in the above manner, so that the prediction accuracy of the trained text processing model is effectively guaranteed, model training is performed by using abundant samples in the preparation stage, the processing capacity of the text processing model is effectively improved, and the task dividing task of chapters and chapters is accurately and efficiently completed.

Corresponding to the above method embodiment, this specification further provides a text processing apparatus embodiment, and fig. 6 shows a schematic structural diagram of a text processing apparatus provided in an embodiment of this specification. As shown in fig. 6, the apparatus includes:

an obtaining text module 602 configured to obtain text data uploaded by a user;

a text processing module 604 configured to input the text data into a text processing model in a training method of the text processing model for processing, so as to obtain a text data category;

an information feedback module 606 configured to feed back the text data category as chapter information of the corresponding text data to a user.

The above is a schematic scheme of a text processing apparatus of the present embodiment. It should be noted that the technical solution of the text processing apparatus and the technical solution of the text processing method belong to the same concept, and details that are not described in detail in the technical solution of the text processing apparatus can be referred to the description of the technical solution of the text processing method.

Fig. 7 illustrates a block diagram of a computing device 700 provided according to an embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the method.

An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor, are used for implementing the training method of the text processing model or the steps of the text processing method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method for training a text processing model, comprising:

reconstructing combined sample texts in the initial sample set to obtain a target sample set, wherein the combined sample texts comprise at least two types of sub-texts;

2. The method according to claim 1, wherein the reconstructing the combined sample text in the initial sample set to obtain the target sample set comprises:

obtaining an initial sample set;

reconstructing the combined sample text to obtain an intermediate sample text;

3. The method according to claim 2, wherein the reconstructing the combined sample text to obtain an intermediate sample text comprises:

4. The method of claim 3, further comprising:

determining an answer reconstruction numerical value corresponding to an answer subfile in the non-question stem text;

5. The method of claim 1, wherein the updating the code vector comprises:

6. The method according to claim 1, wherein said updating the text processing model to a target text processing model according to the target class pair corresponding to the prediction class and the target sample data comprises:

7. The method of claim 1, wherein after obtaining the prediction class corresponding to the target sample data, further comprising:

8. An apparatus for training a text processing model, comprising:

9. A method of text processing, comprising:

acquiring text data uploaded by a user;

inputting the text data into a text processing model in the method according to any one of claims 1 to 7 for processing to obtain a text data category;

10. A text processing apparatus, comprising:

a text processing module configured to input the text data into a text processing model in the method according to any one of claims 1 to 7 for processing, and obtain a text data category;

11. A computing device comprising a memory and a processor; the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions to implement the steps of the method of any one of claims 1 to 7 or 9.

12. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7 or 9.