CN111428130A

CN111428130A - Method and device for enhancing text data in knowledge distillation process

Info

Publication number: CN111428130A
Application number: CN202010151299.6A
Authority: CN
Inventors: 姜姗
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2020-03-06
Filing date: 2020-03-06
Publication date: 2020-07-17
Anticipated expiration: 2040-03-06
Also published as: CN111428130B

Abstract

The invention discloses a method and a device for enhancing text data in a knowledge distillation process, wherein the method comprises the following steps: acquiring a first preset number of current text data; performing enhancement processing on the current text data according to the judgment result; and outputting the current text data after the enhancement processing. The requirement of knowledge distillation is guaranteed by obtaining the first preset number of current text data, more text data are obtained by judging the current text data and enhancing the current text data according to the judgment result, and then a training model can obtain a large amount of training data, so that the problems that the learning capacity of the model is reduced and data fitting occurs in the training process due to the fact that the training model cannot obtain enough training data in the prior art are solved.

Description

Method and device for enhancing text data in knowledge distillation process

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a device for enhancing text data in a knowledge distillation process.

Background

Knowledge distillation is a common model compression method, and is increasingly popularized at present, in a teacher-student framework, characteristic knowledge learned by a teacher network with complex and strong learning capacity is migrated to a student network with simple and weak learning capacity to improve the precision of the student network, however, the method only sends quantitative text data to the student network from the teacher network, the data of a training model between a teacher end and the student end is limited, the teacher network cannot meet the requirement of knowledge distillation because a large amount of data needs to be pushed as a knowledge carrier in the distillation process, and the learning capacity of the model is reduced and data fitting occurs in the training process because the training model cannot obtain enough training data.

Disclosure of Invention

Aiming at the displayed problems, the method is based on the fact that a preset number of current text data are obtained in the knowledge distillation process to ensure that the requirements of knowledge distillation can be met, then the current text data are judged, the current text data are subjected to enhancement processing according to the judgment result, and finally the current year text data after enhancement processing are output to achieve enhancement of the text data in the knowledge distillation process.

A method of enhancing textual data in a knowledge distillation process, comprising the steps of:

acquiring a first preset number of current text data;

judging the current text data to obtain a judgment result;

performing enhancement processing on the current text data according to the judgment result;

and outputting the current text data after enhancement processing.

Preferably, the acquiring the first preset number of current text data includes:

receiving first text data which is far greater than the first preset number and is sent by a teacher end;

carrying out duplication checking processing on the first text data;

confirming the first text data after the duplication checking processing as second text data;

compressing a first preset number of second text data; and acquiring compressed second text data, and determining the compressed second text data as the current text data.

Preferably, the determining the current text data to obtain a determination result includes:

decompressing the current text data to obtain a first preset number of current text data;

acquiring the text content of each current text data in the first preset number of current text data;

setting the first word sequence in the text content of each current text datum as { W₁，...，W_n}; wherein, the w₁For the first word in each text content, the w_nFor the last word in each text content;

calculating a random number X for each word in the first sequence of words_iWherein X is_IThe value range of (1) is (0);

setting a first threshold parameter P_mask∈[0,1]Second threshold parameter P_POS∈[0,1]；

Determining the random number X_iAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.

Preferably, the enhancing the current text data according to the determination result includes:

when the random number X is_iWhen the parameter is less than the first threshold parameter, the X is set_iReplacement by [ MASK ]]；

When the random number X is_iWhen the random number value is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number value X is determined_iReplacing the words with the same parts of speech;

when the random number X is_iWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;

saving the modified first word sequence;

iterating each modified word sequence for N times to obtain N enhanced word sequences;

calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using a language model and arranging the calculated confusion degrees in a sequence from small to large;

selecting a word sequence with the minimum confusion degree as a second word sequence;

replacing the second sequence of words with the first sequence of words in the current text data.

Preferably, the outputting the current text data after the enhancement processing includes:

after the first word sequences in the current text data are completely replaced, performing secondary compression on the current text data;

and sending the current text data after secondary compression to a student end.

An apparatus for enhancing textual data during knowledge distillation, the apparatus comprising:

the acquisition module is used for acquiring a first preset number of current text data;

the judging module is used for judging the current text data to obtain a judging result;

the enhancement processing module is used for enhancing the current text data according to the judgment result;

and the output module is used for outputting the current text data after the enhancement processing.

Preferably, the obtaining module includes:

the receiving submodule is used for receiving first text data which is far greater than the first preset number and is sent by a teacher end;

the duplication checking sub-module is used for carrying out duplication checking processing on the first text data;

the confirming submodule is used for confirming the first text data after the duplication checking processing as second text data;

the compression submodule is used for compressing the second text data with a first preset number; and acquiring compressed second text data, and determining the compressed second text data as the current text data.

Preferably, the determination module includes:

the decompression submodule is used for decompressing the current text data to obtain a first preset number of current text data;

the obtaining submodule is used for obtaining the text content of each current text data in the first preset number of current text data;

a first setting sub-module, configured to set a first word sequence in the text content of each current text data to { W }₁，...，W_n}; wherein, the w₁For the first word in each text content, the w_nFor the last word in each text content;

a first calculation submodule for calculating a random number X for each word of the first sequence of words_iWherein X is_IThe value range of (1) is (0);

a second setting submodule for setting the first threshold parameter P_mask∈[0,1]Second threshold parameter P_POS∈[0,1]；

A decision submodule for deciding the random number X_iAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.

Preferably, the enhancement processing module includes:

a first replacement submodule for performing a judgment of the random number X by the judgment submodule_iWhen the parameter is less than the first threshold parameter, the X is set_iReplacement by [ MASK ]]；

A second replacement submodule for performing a judgment of the random number X by the judgment submodule_iWhen the random number value is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number value X is determined_iReplacing the words with the same parts of speech;

a holding submodule for holding the random number X when the decision submodule decides the random number X_iWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;

the storage submodule is used for storing the changed first word sequence;

the iteration submodule is used for iterating each modified word sequence for N times to obtain N enhanced word sequences;

the second calculation submodule is used for calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using the language model and arranging the calculated confusion degrees in a sequence from small to large;

the selecting submodule is used for selecting the word sequence with the minimum confusion degree as a second word sequence;

a third replacement submodule configured to replace the second word sequence with the first word sequence in the current text data.

Preferably, the output module includes:

the secondary compression submodule is used for carrying out secondary compression on the current text data after the first word sequence in the current text data is completely replaced;

and the sending submodule is used for sending the current text data after the secondary compression to a student end.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a flow chart of the operation of a method for enhancing text data in a knowledge distillation process provided by the present invention;

FIG. 2 is another workflow diagram of a method of enhancing textual data during knowledge distillation provided by the present invention;

FIG. 3 is a block diagram of an apparatus for enhancing textual data during knowledge distillation provided by the present invention;

fig. 4 is another block diagram of an apparatus for enhancing text data in a knowledge distillation process according to the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Knowledge distillation is a common model compression method, and is increasingly popularized at present, in a teacher-student framework, characteristic knowledge learned by a teacher network with complex and strong learning capacity is migrated to a student network with simple and weak learning capacity to improve the precision of the student network, however, the method only sends quantitative text data to the student network from the teacher network, the data of a training model between a teacher end and the student end is limited, the teacher network cannot meet the requirement of knowledge distillation because a large amount of data needs to be pushed as a knowledge carrier in the distillation process, and the learning capacity of the model is reduced and data fitting occurs in the training process because the training model cannot obtain enough training data. The data enhancement method in the prior art is to make the training model obtain a large amount of training data by adding noise or synonymously replacing, but the method has the following disadvantages: 1. the noise method can greatly destroy the readability of the text and even cause the text data to be damaged, thereby causing the problems of data loss and property loss. 2. The synonymy substitution method can only expand data with the same semantic meaning, and contributes less to data diversity. In order to solve the above problem, the embodiment discloses a method for enhancing text data in a knowledge distillation process by acquiring a preset number of current text data in the knowledge distillation process to ensure that the requirements of knowledge distillation can be met, then determining the current text data, performing enhancement processing on the current text data according to a determination result, and finally outputting the current year text data after the enhancement processing.

A method of enhancing textual data in a knowledge distillation process, as shown in fig. 1, comprising the steps of:

s101, acquiring a first preset number of current text data;

step S102, obtaining a first preset number of current text data;

step S103, performing enhancement processing on the current text data according to the judgment result;

and step S104, outputting the current text data after the enhancement processing.

In this embodiment, the first preset number of current text data may be a number of text data that satisfies the requirement of knowledge distillation, and the enhancement processing is to obtain new text data corresponding to the current text data in a different manner.

The working principle of the technical scheme is as follows: the method comprises the steps of obtaining a first preset number of current text data, judging the current text data to obtain a judgment result, performing enhancement processing on the current text data according to the judgment result, and finally outputting the current text data after the enhancement processing.

The beneficial effects of the above technical scheme are: the requirement of knowledge distillation is guaranteed by obtaining the first preset number of current text data, more text data are obtained by judging the current text data and enhancing the current text data according to the judgment result, and then a training model can obtain a large amount of training data, so that the problems that the learning capacity of the model is reduced and data fitting occurs in the training process due to the fact that the training model cannot obtain enough training data in the prior art are solved.

In one embodiment, as shown in fig. 2, obtaining a first preset number of current text data includes:

step S201, receiving first text data which is far greater than the first preset number and is sent by a teacher end;

step S202, carrying out duplicate checking processing on the first text data;

step S203, confirming the first text data after the duplication checking processing as second text data;

s204, compressing the second text data with the first preset number; and acquiring the compressed second text data, and determining the compressed second text data as the current text data.

The beneficial effects of the above technical scheme are: the repeated first text data is removed to ensure the quality of the text data, and the second text data is compressed, so that all the second text data can be avoided being grouped together at one time and can be selectively encrypted, and the safety is improved.

In one embodiment, determining the current text data to obtain a determination result includes:

acquiring text content of each current text data in a first preset number of current text data;

setting a first word sequence in the text content of each current text data to { W }₁，...，W_n}; wherein, w₁For the first word in each text content, w_nFor the last word in each text content;

Determining a random number X_iAnd obtaining a judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.

The technical scheme has the advantages that the word sequence in each text content is judged by utilizing the judgment result so as to be convenient for strengthening the word sequence, and the two threshold parameters are set so that the random numerical value of each word can be calculated to have a more accurate reference interval, so that the calculation result is more accurate.

In one embodiment, the enhancement processing of the current text data according to the determination result includes:

when the random number X_iWhen the parameter is less than the first threshold value parameter, X is added_iReplacement by [ MASK ]]；

When the random number X_iIs greater than or equal to firstWhen the threshold parameter is less than the sum of the first threshold parameter and the second threshold parameter, the random value X is calculated_iReplacing the words with the same parts of speech;

when the random number X_iWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;

saving the modified first word sequence;

replacing the second word sequence with the first word sequence in the current text data.

The beneficial effects of the above technical scheme are: the [ MASK ] is used for covering words with random probability, the noise proportion in data can be controlled, the problems that the readability of a text can be greatly damaged by a noise adding method in the prior art, and even the text data is damaged, so that data loss and property loss are caused are solved, the integrity of the text data is maintained, meanwhile, words with the same part of speech are replaced, so that the text data are more diversified, compared with the prior art that words with the same semantic are replaced, the replaced contents are more, training models used by the training models are more, and the learning capacity of the training models is further improved.

In one embodiment, outputting the current text data after enhancement processing includes:

and sending the current text data after secondary compression to the student end.

The beneficial effects of the above technical scheme are: the compressed version is sent to the student end, so that the student end can receive the current text data at one time, the scale of the text data is enlarged, and the student can more fully learn the knowledge content of the teacher end.

In one embodiment, the method comprises the following steps:

1. for a piece of data in the standard dataset W₁，...，W_nFor each word W_iCalculating a random value;

2. setting a threshold hyperparameter P_mask∈[0，1]，P_pos∈[0，1]；

3. When X is present_i<P_maskWhen it is, W_iReplacement by [ MASK ]](ii) a When P is present_mask≤X_i<P_mask+P_posWhen it is, W_iReplacing words with the same parts of speech; when X is present_i≥P_mask+P_pos，W_iRemain unchanged. The two alternatives are mutually exclusive and do not act on a word simultaneously;

4. iterate N for each piece of data_iterThen, N can be generated_iterThe enhanced corpus is striped. And calculating the confusion degree of the enhanced corpus by using a pre-trained language model, sorting the enhanced corpus from small to large, selecting the corpus with the lowest confusion degree, removing the duplication, and adding the corpus into the original data set.

The beneficial effects of the above technical scheme are: 1. the [ MASK ] is used for covering words with random probability, so that the noise proportion in data can be controlled, and meanwhile, in a supervised learning task, the importance degree of each word to a real label can be learned by a neural network model;

2. words with the same part of speech are replaced randomly, and the enhanced text is filtered by using the language model, so that the readability and the fluency of the data enhanced text can be improved as much as possible, different semantic features are introduced, and the diversity of the data is improved;

3. through the label-free data enhancement method, the data scale can be enlarged, the knowledge of the teacher model can be learned more sufficiently by the student network, and the improvement of the knowledge distillation performance is facilitated.

This embodiment also discloses a device for enhancing text data in knowledge distillation process, as shown in fig. 3, the device includes:

an obtaining module 301, configured to obtain a first preset number of current text data;

a determining module 302, configured to determine current text data to obtain a determination result;

an enhancement processing module 303, configured to perform enhancement processing on the current text data according to the determination result;

and an output module 304, configured to output the current text data after the enhancement processing.

In one embodiment, as shown in fig. 4, the obtaining module includes:

the receiving submodule 3011 is configured to receive first text data that is far greater than a first preset number and is sent by a teacher end;

the duplication checking sub-module 3012 is configured to perform duplication checking processing on the first text data;

the confirming submodule 3013 is configured to confirm the first text data after the duplication checking processing as second text data;

the compressing submodule 3014 is configured to compress the first preset number of second text data; and acquiring the compressed second text data, and determining the compressed second text data as the current text data.

In one embodiment, the decision module includes:

the obtaining submodule is used for obtaining the text content of each current text data in a first preset number of current text data;

a first setting sub-module for setting the first word sequence in the text content of each current text data as { W }₁，...，W_n}; wherein, w₁For the first word in each text content, w_nFor the last word in each text content;

a first calculation submodule for calculating a random number X for each word in the first word sequence_iWherein X is_IThe value range of (1) is (0);

second setting submoduleFor setting a first threshold parameter P_mask∈[0,1]Second threshold parameter P_POS∈[0,1]；

A decision submodule for deciding the random number X_iAnd obtaining a judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.

In one embodiment, an enhancement processing module includes:

a first replacement submodule for determining the random number X when the determination submodule determines_iWhen the parameter is less than the first threshold value parameter, X is added_iReplacement by [ MASK ]]；

A second replacement submodule for determining the random number X when the determination submodule determines_iWhen the random number is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number X is_iReplacing the words with the same parts of speech;

a holding submodule for holding the random number X when the judgment submodule judges_iWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;

the storage submodule is used for storing the changed first word sequence;

and the third replacement submodule is used for replacing the second word sequence with the first word sequence in the current text data.

In one embodiment, an output module includes:

and the sending submodule is used for sending the current text data after the secondary compression to the student end.

It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of enhancing textual data during knowledge distillation, comprising the steps of:

acquiring a first preset number of current text data;

judging the current text data to obtain a judgment result;

and outputting the current text data after enhancement processing.

2. The method for enhancing text data in a knowledge distillation process according to claim 1, wherein the obtaining a first preset number of current text data comprises:

carrying out duplication checking processing on the first text data;

3. The method for enhancing text data in a knowledge distillation process according to claim 1, wherein the determining the current text data to obtain a determination result comprises:

4. The method for enhancing text data in the knowledge distillation process according to claim 1, wherein the enhancing the current text data according to the determination result comprises:

When the random number X is_iIs greater than or equal to the firstA threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, and comparing the random value X with the threshold value_iReplacing the words with the same parts of speech;

saving the modified first word sequence;

5. The method of claim 1, wherein the outputting the current text data after enhancement processing comprises:

and sending the current text data after secondary compression to a student end.

6. An apparatus for enhancing textual data during knowledge distillation, the apparatus comprising:

7. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the obtaining module comprises:

8. The apparatus for enhancing textual data during knowledge distillation of claim 1, wherein the decision module comprises:

9. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the enhancement processing module comprises:

the storage submodule is used for storing the changed first word sequence;

10. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the output module comprises: