CN111428130A - Method and device for enhancing text data in knowledge distillation process - Google Patents

Method and device for enhancing text data in knowledge distillation process Download PDF

Info

Publication number
CN111428130A
CN111428130A CN202010151299.6A CN202010151299A CN111428130A CN 111428130 A CN111428130 A CN 111428130A CN 202010151299 A CN202010151299 A CN 202010151299A CN 111428130 A CN111428130 A CN 111428130A
Authority
CN
China
Prior art keywords
text data
current text
submodule
threshold parameter
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010151299.6A
Other languages
Chinese (zh)
Other versions
CN111428130B (en
Inventor
姜姗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010151299.6A priority Critical patent/CN111428130B/en
Publication of CN111428130A publication Critical patent/CN111428130A/en
Application granted granted Critical
Publication of CN111428130B publication Critical patent/CN111428130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for enhancing text data in a knowledge distillation process, wherein the method comprises the following steps: acquiring a first preset number of current text data; performing enhancement processing on the current text data according to the judgment result; and outputting the current text data after the enhancement processing. The requirement of knowledge distillation is guaranteed by obtaining the first preset number of current text data, more text data are obtained by judging the current text data and enhancing the current text data according to the judgment result, and then a training model can obtain a large amount of training data, so that the problems that the learning capacity of the model is reduced and data fitting occurs in the training process due to the fact that the training model cannot obtain enough training data in the prior art are solved.

Description

Method and device for enhancing text data in knowledge distillation process
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for enhancing text data in a knowledge distillation process.
Background
Knowledge distillation is a common model compression method, and is increasingly popularized at present, in a teacher-student framework, characteristic knowledge learned by a teacher network with complex and strong learning capacity is migrated to a student network with simple and weak learning capacity to improve the precision of the student network, however, the method only sends quantitative text data to the student network from the teacher network, the data of a training model between a teacher end and the student end is limited, the teacher network cannot meet the requirement of knowledge distillation because a large amount of data needs to be pushed as a knowledge carrier in the distillation process, and the learning capacity of the model is reduced and data fitting occurs in the training process because the training model cannot obtain enough training data.
Disclosure of Invention
Aiming at the displayed problems, the method is based on the fact that a preset number of current text data are obtained in the knowledge distillation process to ensure that the requirements of knowledge distillation can be met, then the current text data are judged, the current text data are subjected to enhancement processing according to the judgment result, and finally the current year text data after enhancement processing are output to achieve enhancement of the text data in the knowledge distillation process.
A method of enhancing textual data in a knowledge distillation process, comprising the steps of:
acquiring a first preset number of current text data;
judging the current text data to obtain a judgment result;
performing enhancement processing on the current text data according to the judgment result;
and outputting the current text data after enhancement processing.
Preferably, the acquiring the first preset number of current text data includes:
receiving first text data which is far greater than the first preset number and is sent by a teacher end;
carrying out duplication checking processing on the first text data;
confirming the first text data after the duplication checking processing as second text data;
compressing a first preset number of second text data; and acquiring compressed second text data, and determining the compressed second text data as the current text data.
Preferably, the determining the current text data to obtain a determination result includes:
decompressing the current text data to obtain a first preset number of current text data;
acquiring the text content of each current text data in the first preset number of current text data;
setting the first word sequence in the text content of each current text datum as { W1,...,Wn}; wherein, the w1For the first word in each text content, the wnFor the last word in each text content;
calculating a random number X for each word in the first sequence of wordsiWherein X isIThe value range of (1) is (0);
setting a first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
Determining the random number XiAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
Preferably, the enhancing the current text data according to the determination result includes:
when the random number X isiWhen the parameter is less than the first threshold parameter, the X is setiReplacement by [ MASK ]];
When the random number X isiWhen the random number value is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number value X is determinediReplacing the words with the same parts of speech;
when the random number X isiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
saving the modified first word sequence;
iterating each modified word sequence for N times to obtain N enhanced word sequences;
calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using a language model and arranging the calculated confusion degrees in a sequence from small to large;
selecting a word sequence with the minimum confusion degree as a second word sequence;
replacing the second sequence of words with the first sequence of words in the current text data.
Preferably, the outputting the current text data after the enhancement processing includes:
after the first word sequences in the current text data are completely replaced, performing secondary compression on the current text data;
and sending the current text data after secondary compression to a student end.
An apparatus for enhancing textual data during knowledge distillation, the apparatus comprising:
the acquisition module is used for acquiring a first preset number of current text data;
the judging module is used for judging the current text data to obtain a judging result;
the enhancement processing module is used for enhancing the current text data according to the judgment result;
and the output module is used for outputting the current text data after the enhancement processing.
Preferably, the obtaining module includes:
the receiving submodule is used for receiving first text data which is far greater than the first preset number and is sent by a teacher end;
the duplication checking sub-module is used for carrying out duplication checking processing on the first text data;
the confirming submodule is used for confirming the first text data after the duplication checking processing as second text data;
the compression submodule is used for compressing the second text data with a first preset number; and acquiring compressed second text data, and determining the compressed second text data as the current text data.
Preferably, the determination module includes:
the decompression submodule is used for decompressing the current text data to obtain a first preset number of current text data;
the obtaining submodule is used for obtaining the text content of each current text data in the first preset number of current text data;
a first setting sub-module, configured to set a first word sequence in the text content of each current text data to { W }1,...,Wn}; wherein, the w1For the first word in each text content, the wnFor the last word in each text content;
a first calculation submodule for calculating a random number X for each word of the first sequence of wordsiWherein X isIThe value range of (1) is (0);
a second setting submodule for setting the first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
A decision submodule for deciding the random number XiAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
Preferably, the enhancement processing module includes:
a first replacement submodule for performing a judgment of the random number X by the judgment submoduleiWhen the parameter is less than the first threshold parameter, the X is setiReplacement by [ MASK ]];
A second replacement submodule for performing a judgment of the random number X by the judgment submoduleiWhen the random number value is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number value X is determinediReplacing the words with the same parts of speech;
a holding submodule for holding the random number X when the decision submodule decides the random number XiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
the storage submodule is used for storing the changed first word sequence;
the iteration submodule is used for iterating each modified word sequence for N times to obtain N enhanced word sequences;
the second calculation submodule is used for calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using the language model and arranging the calculated confusion degrees in a sequence from small to large;
the selecting submodule is used for selecting the word sequence with the minimum confusion degree as a second word sequence;
a third replacement submodule configured to replace the second word sequence with the first word sequence in the current text data.
Preferably, the output module includes:
the secondary compression submodule is used for carrying out secondary compression on the current text data after the first word sequence in the current text data is completely replaced;
and the sending submodule is used for sending the current text data after the secondary compression to a student end.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a flow chart of the operation of a method for enhancing text data in a knowledge distillation process provided by the present invention;
FIG. 2 is another workflow diagram of a method of enhancing textual data during knowledge distillation provided by the present invention;
FIG. 3 is a block diagram of an apparatus for enhancing textual data during knowledge distillation provided by the present invention;
fig. 4 is another block diagram of an apparatus for enhancing text data in a knowledge distillation process according to the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Knowledge distillation is a common model compression method, and is increasingly popularized at present, in a teacher-student framework, characteristic knowledge learned by a teacher network with complex and strong learning capacity is migrated to a student network with simple and weak learning capacity to improve the precision of the student network, however, the method only sends quantitative text data to the student network from the teacher network, the data of a training model between a teacher end and the student end is limited, the teacher network cannot meet the requirement of knowledge distillation because a large amount of data needs to be pushed as a knowledge carrier in the distillation process, and the learning capacity of the model is reduced and data fitting occurs in the training process because the training model cannot obtain enough training data. The data enhancement method in the prior art is to make the training model obtain a large amount of training data by adding noise or synonymously replacing, but the method has the following disadvantages: 1. the noise method can greatly destroy the readability of the text and even cause the text data to be damaged, thereby causing the problems of data loss and property loss. 2. The synonymy substitution method can only expand data with the same semantic meaning, and contributes less to data diversity. In order to solve the above problem, the embodiment discloses a method for enhancing text data in a knowledge distillation process by acquiring a preset number of current text data in the knowledge distillation process to ensure that the requirements of knowledge distillation can be met, then determining the current text data, performing enhancement processing on the current text data according to a determination result, and finally outputting the current year text data after the enhancement processing.
A method of enhancing textual data in a knowledge distillation process, as shown in fig. 1, comprising the steps of:
s101, acquiring a first preset number of current text data;
step S102, obtaining a first preset number of current text data;
step S103, performing enhancement processing on the current text data according to the judgment result;
and step S104, outputting the current text data after the enhancement processing.
In this embodiment, the first preset number of current text data may be a number of text data that satisfies the requirement of knowledge distillation, and the enhancement processing is to obtain new text data corresponding to the current text data in a different manner.
The working principle of the technical scheme is as follows: the method comprises the steps of obtaining a first preset number of current text data, judging the current text data to obtain a judgment result, performing enhancement processing on the current text data according to the judgment result, and finally outputting the current text data after the enhancement processing.
The beneficial effects of the above technical scheme are: the requirement of knowledge distillation is guaranteed by obtaining the first preset number of current text data, more text data are obtained by judging the current text data and enhancing the current text data according to the judgment result, and then a training model can obtain a large amount of training data, so that the problems that the learning capacity of the model is reduced and data fitting occurs in the training process due to the fact that the training model cannot obtain enough training data in the prior art are solved.
In one embodiment, as shown in fig. 2, obtaining a first preset number of current text data includes:
step S201, receiving first text data which is far greater than the first preset number and is sent by a teacher end;
step S202, carrying out duplicate checking processing on the first text data;
step S203, confirming the first text data after the duplication checking processing as second text data;
s204, compressing the second text data with the first preset number; and acquiring the compressed second text data, and determining the compressed second text data as the current text data.
The beneficial effects of the above technical scheme are: the repeated first text data is removed to ensure the quality of the text data, and the second text data is compressed, so that all the second text data can be avoided being grouped together at one time and can be selectively encrypted, and the safety is improved.
In one embodiment, determining the current text data to obtain a determination result includes:
decompressing the current text data to obtain a first preset number of current text data;
acquiring text content of each current text data in a first preset number of current text data;
setting a first word sequence in the text content of each current text data to { W }1,...,Wn}; wherein, w1For the first word in each text content, wnFor the last word in each text content;
calculating a random number X for each word in the first sequence of wordsiWherein X isIThe value range of (1) is (0);
setting a first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
Determining a random number XiAnd obtaining a judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
The technical scheme has the advantages that the word sequence in each text content is judged by utilizing the judgment result so as to be convenient for strengthening the word sequence, and the two threshold parameters are set so that the random numerical value of each word can be calculated to have a more accurate reference interval, so that the calculation result is more accurate.
In one embodiment, the enhancement processing of the current text data according to the determination result includes:
when the random number XiWhen the parameter is less than the first threshold value parameter, X is addediReplacement by [ MASK ]];
When the random number XiIs greater than or equal to firstWhen the threshold parameter is less than the sum of the first threshold parameter and the second threshold parameter, the random value X is calculatediReplacing the words with the same parts of speech;
when the random number XiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
saving the modified first word sequence;
iterating each modified word sequence for N times to obtain N enhanced word sequences;
calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using a language model and arranging the calculated confusion degrees in a sequence from small to large;
selecting a word sequence with the minimum confusion degree as a second word sequence;
replacing the second word sequence with the first word sequence in the current text data.
The beneficial effects of the above technical scheme are: the [ MASK ] is used for covering words with random probability, the noise proportion in data can be controlled, the problems that the readability of a text can be greatly damaged by a noise adding method in the prior art, and even the text data is damaged, so that data loss and property loss are caused are solved, the integrity of the text data is maintained, meanwhile, words with the same part of speech are replaced, so that the text data are more diversified, compared with the prior art that words with the same semantic are replaced, the replaced contents are more, training models used by the training models are more, and the learning capacity of the training models is further improved.
In one embodiment, outputting the current text data after enhancement processing includes:
after the first word sequences in the current text data are completely replaced, performing secondary compression on the current text data;
and sending the current text data after secondary compression to the student end.
The beneficial effects of the above technical scheme are: the compressed version is sent to the student end, so that the student end can receive the current text data at one time, the scale of the text data is enlarged, and the student can more fully learn the knowledge content of the teacher end.
In one embodiment, the method comprises the following steps:
1. for a piece of data in the standard dataset W1,...,WnFor each word WiCalculating a random value;
2. setting a threshold hyperparameter Pmask∈[0,1],Ppos∈[0,1];
3. When X is presenti<PmaskWhen it is, WiReplacement by [ MASK ]](ii) a When P is presentmask≤Xi<Pmask+PposWhen it is, WiReplacing words with the same parts of speech; when X is presenti≥Pmask+Ppos,WiRemain unchanged. The two alternatives are mutually exclusive and do not act on a word simultaneously;
4. iterate N for each piece of dataiterThen, N can be generatediterThe enhanced corpus is striped. And calculating the confusion degree of the enhanced corpus by using a pre-trained language model, sorting the enhanced corpus from small to large, selecting the corpus with the lowest confusion degree, removing the duplication, and adding the corpus into the original data set.
The beneficial effects of the above technical scheme are: 1. the [ MASK ] is used for covering words with random probability, so that the noise proportion in data can be controlled, and meanwhile, in a supervised learning task, the importance degree of each word to a real label can be learned by a neural network model;
2. words with the same part of speech are replaced randomly, and the enhanced text is filtered by using the language model, so that the readability and the fluency of the data enhanced text can be improved as much as possible, different semantic features are introduced, and the diversity of the data is improved;
3. through the label-free data enhancement method, the data scale can be enlarged, the knowledge of the teacher model can be learned more sufficiently by the student network, and the improvement of the knowledge distillation performance is facilitated.
This embodiment also discloses a device for enhancing text data in knowledge distillation process, as shown in fig. 3, the device includes:
an obtaining module 301, configured to obtain a first preset number of current text data;
a determining module 302, configured to determine current text data to obtain a determination result;
an enhancement processing module 303, configured to perform enhancement processing on the current text data according to the determination result;
and an output module 304, configured to output the current text data after the enhancement processing.
In one embodiment, as shown in fig. 4, the obtaining module includes:
the receiving submodule 3011 is configured to receive first text data that is far greater than a first preset number and is sent by a teacher end;
the duplication checking sub-module 3012 is configured to perform duplication checking processing on the first text data;
the confirming submodule 3013 is configured to confirm the first text data after the duplication checking processing as second text data;
the compressing submodule 3014 is configured to compress the first preset number of second text data; and acquiring the compressed second text data, and determining the compressed second text data as the current text data.
In one embodiment, the decision module includes:
the decompression submodule is used for decompressing the current text data to obtain a first preset number of current text data;
the obtaining submodule is used for obtaining the text content of each current text data in a first preset number of current text data;
a first setting sub-module for setting the first word sequence in the text content of each current text data as { W }1,...,Wn}; wherein, w1For the first word in each text content, wnFor the last word in each text content;
a first calculation submodule for calculating a random number X for each word in the first word sequenceiWherein X isIThe value range of (1) is (0);
second setting submoduleFor setting a first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
A decision submodule for deciding the random number XiAnd obtaining a judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
In one embodiment, an enhancement processing module includes:
a first replacement submodule for determining the random number X when the determination submodule determinesiWhen the parameter is less than the first threshold value parameter, X is addediReplacement by [ MASK ]];
A second replacement submodule for determining the random number X when the determination submodule determinesiWhen the random number is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number X isiReplacing the words with the same parts of speech;
a holding submodule for holding the random number X when the judgment submodule judgesiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
the storage submodule is used for storing the changed first word sequence;
the iteration submodule is used for iterating each modified word sequence for N times to obtain N enhanced word sequences;
the second calculation submodule is used for calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using the language model and arranging the calculated confusion degrees in a sequence from small to large;
the selecting submodule is used for selecting the word sequence with the minimum confusion degree as a second word sequence;
and the third replacement submodule is used for replacing the second word sequence with the first word sequence in the current text data.
In one embodiment, an output module includes:
the secondary compression submodule is used for carrying out secondary compression on the current text data after the first word sequence in the current text data is completely replaced;
and the sending submodule is used for sending the current text data after the secondary compression to the student end.
It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method of enhancing textual data during knowledge distillation, comprising the steps of:
acquiring a first preset number of current text data;
judging the current text data to obtain a judgment result;
performing enhancement processing on the current text data according to the judgment result;
and outputting the current text data after enhancement processing.
2. The method for enhancing text data in a knowledge distillation process according to claim 1, wherein the obtaining a first preset number of current text data comprises:
receiving first text data which is far greater than the first preset number and is sent by a teacher end;
carrying out duplication checking processing on the first text data;
confirming the first text data after the duplication checking processing as second text data;
compressing a first preset number of second text data; and acquiring compressed second text data, and determining the compressed second text data as the current text data.
3. The method for enhancing text data in a knowledge distillation process according to claim 1, wherein the determining the current text data to obtain a determination result comprises:
decompressing the current text data to obtain a first preset number of current text data;
acquiring the text content of each current text data in the first preset number of current text data;
setting the first word sequence in the text content of each current text datum as { W1,...,Wn}; wherein, the w1For the first word in each text content, the wnFor the last word in each text content;
calculating a random number X for each word in the first sequence of wordsiWherein X isIThe value range of (1) is (0);
setting a first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
Determining the random number XiAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
4. The method for enhancing text data in the knowledge distillation process according to claim 1, wherein the enhancing the current text data according to the determination result comprises:
when the random number X isiWhen the parameter is less than the first threshold parameter, the X is setiReplacement by [ MASK ]];
When the random number X isiIs greater than or equal to the firstA threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, and comparing the random value X with the threshold valueiReplacing the words with the same parts of speech;
when the random number X isiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
saving the modified first word sequence;
iterating each modified word sequence for N times to obtain N enhanced word sequences;
calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using a language model and arranging the calculated confusion degrees in a sequence from small to large;
selecting a word sequence with the minimum confusion degree as a second word sequence;
replacing the second sequence of words with the first sequence of words in the current text data.
5. The method of claim 1, wherein the outputting the current text data after enhancement processing comprises:
after the first word sequences in the current text data are completely replaced, performing secondary compression on the current text data;
and sending the current text data after secondary compression to a student end.
6. An apparatus for enhancing textual data during knowledge distillation, the apparatus comprising:
the acquisition module is used for acquiring a first preset number of current text data;
the judging module is used for judging the current text data to obtain a judging result;
the enhancement processing module is used for enhancing the current text data according to the judgment result;
and the output module is used for outputting the current text data after the enhancement processing.
7. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the obtaining module comprises:
the receiving submodule is used for receiving first text data which is far greater than the first preset number and is sent by a teacher end;
the duplication checking sub-module is used for carrying out duplication checking processing on the first text data;
the confirming submodule is used for confirming the first text data after the duplication checking processing as second text data;
the compression submodule is used for compressing the second text data with a first preset number; and acquiring compressed second text data, and determining the compressed second text data as the current text data.
8. The apparatus for enhancing textual data during knowledge distillation of claim 1, wherein the decision module comprises:
the decompression submodule is used for decompressing the current text data to obtain a first preset number of current text data;
the obtaining submodule is used for obtaining the text content of each current text data in the first preset number of current text data;
a first setting sub-module, configured to set a first word sequence in the text content of each current text data to { W }1,...,Wn}; wherein, the w1For the first word in each text content, the wnFor the last word in each text content;
a first calculation submodule for calculating a random number X for each word of the first sequence of wordsiWherein X isIThe value range of (1) is (0);
a second setting submodule for setting the first threshold parameter Pmask∈[0,1]Second threshold parameter PPOS∈[0,1];
A decision submodule for deciding the random number XiAnd obtaining the judgment result according to the magnitude relation between the first threshold parameter and the second threshold parameter.
9. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the enhancement processing module comprises:
a first replacement submodule for performing a judgment of the random number X by the judgment submoduleiWhen the parameter is less than the first threshold parameter, the X is setiReplacement by [ MASK ]];
A second replacement submodule for performing a judgment of the random number X by the judgment submoduleiWhen the random number value is greater than or equal to the first threshold parameter and less than the sum of the first threshold parameter and the second threshold parameter, the random number value X is determinediReplacing the words with the same parts of speech;
a holding submodule for holding the random number X when the decision submodule decides the random number XiWhen the sum of the first threshold parameter and the second threshold parameter is larger than or equal to the sum of the first threshold parameter and the second threshold parameter, no change is needed;
the storage submodule is used for storing the changed first word sequence;
the iteration submodule is used for iterating each modified word sequence for N times to obtain N enhanced word sequences;
the second calculation submodule is used for calculating the confusion degree of each modified word sequence and the N enhanced word sequences corresponding to the modified word sequence by using the language model and arranging the calculated confusion degrees in a sequence from small to large;
the selecting submodule is used for selecting the word sequence with the minimum confusion degree as a second word sequence;
a third replacement submodule configured to replace the second word sequence with the first word sequence in the current text data.
10. The apparatus for enhancing text data during knowledge distillation according to claim 6, wherein the output module comprises:
the secondary compression submodule is used for carrying out secondary compression on the current text data after the first word sequence in the current text data is completely replaced;
and the sending submodule is used for sending the current text data after the secondary compression to a student end.
CN202010151299.6A 2020-03-06 2020-03-06 Method and device for enhancing text data in knowledge distillation process Active CN111428130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010151299.6A CN111428130B (en) 2020-03-06 2020-03-06 Method and device for enhancing text data in knowledge distillation process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151299.6A CN111428130B (en) 2020-03-06 2020-03-06 Method and device for enhancing text data in knowledge distillation process

Publications (2)

Publication Number Publication Date
CN111428130A true CN111428130A (en) 2020-07-17
CN111428130B CN111428130B (en) 2023-04-18

Family

ID=71546153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151299.6A Active CN111428130B (en) 2020-03-06 2020-03-06 Method and device for enhancing text data in knowledge distillation process

Country Status (1)

Country Link
CN (1) CN111428130B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637546A (en) * 2018-12-29 2019-04-16 苏州思必驰信息科技有限公司 Knowledge distillating method and device
US20190129932A1 (en) * 2017-10-30 2019-05-02 Panasonic Intellectual Property Management Co., Ltd. Information processing method, information processing apparatus, and program
CN110458765A (en) * 2019-01-25 2019-11-15 西安电子科技大学 The method for enhancing image quality of convolutional network is kept based on perception
CN110795939A (en) * 2019-10-15 2020-02-14 腾讯科技(深圳)有限公司 Text processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190129932A1 (en) * 2017-10-30 2019-05-02 Panasonic Intellectual Property Management Co., Ltd. Information processing method, information processing apparatus, and program
CN109637546A (en) * 2018-12-29 2019-04-16 苏州思必驰信息科技有限公司 Knowledge distillating method and device
CN110458765A (en) * 2019-01-25 2019-11-15 西安电子科技大学 The method for enhancing image quality of convolutional network is kept based on perception
CN110795939A (en) * 2019-10-15 2020-02-14 腾讯科技(深圳)有限公司 Text processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
葛仕明;赵胜伟;刘文瑜;李晨钰;: "基于深度特征蒸馏的人脸识别" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance

Also Published As

Publication number Publication date
CN111428130B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN112070138B (en) Construction method of multi-label mixed classification model, news classification method and system
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN112669215A (en) Training text image generation model, text image generation method and device
CN111476658A (en) Loan continuous overdue prediction method and device
CN112183065A (en) Text evaluation method and device, computer readable storage medium and terminal equipment
CN111428130B (en) Method and device for enhancing text data in knowledge distillation process
CN111190973A (en) Method, device, equipment and storage medium for classifying statement forms
CN114254077A (en) Method for evaluating integrity of manuscript based on natural language
CN115859128B (en) Analysis method and system based on interaction similarity of archive data
CN114547391A (en) Message auditing method and device
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
Agarwal et al. Contextual Derivation of Stable BKT Parameters for Analysing Content Efficacy.
CN115617959A (en) Question answering method and device
CN113313615A (en) Method and device for quantitatively grading and grading enterprise judicial risks
CN113849634A (en) Method for improving interpretability of depth model recommendation scheme
CN107977360B (en) Method for identifying and dividing character formation in movie and television script
CN112528887A (en) Auditing method and device
CN115905500B (en) Question-answer pair data generation method and device
CN112037770B (en) Method for generating pronunciation dictionary and method and device for word speech recognition
CN116501764B (en) Automatic SQL optimization method based on generated pre-training model
CN112528019A (en) Method and device for processing entity relationship in text, electronic equipment and storage medium
CN115147132A (en) Method and device for generating customer service conversation template
CN114021667A (en) Method and device for determining training data and electronic equipment
CN115759068A (en) Self-learning-based scene text matching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant