CN114969332A - Method and device for training text audit model - Google Patents
Method and device for training text audit model Download PDFInfo
- Publication number
- CN114969332A CN114969332A CN202210546544.2A CN202210546544A CN114969332A CN 114969332 A CN114969332 A CN 114969332A CN 202210546544 A CN202210546544 A CN 202210546544A CN 114969332 A CN114969332 A CN 114969332A
- Authority
- CN
- China
- Prior art keywords
- model
- text
- student
- distillation loss
- teacher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012550 audit Methods 0.000 title claims abstract description 48
- 238000004821 distillation Methods 0.000 claims description 99
- 239000013598 vector Substances 0.000 claims description 62
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 13
- 238000012552 review Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 238000002372 labelling Methods 0.000 abstract 2
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 230000008092 positive effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002687 intercalation Effects 0.000 description 1
- 238000009830 intercalation Methods 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure provides a method and a device for training a text audit model, and relates to the field of artificial intelligence, in particular to the field of natural language processing. The specific implementation scheme is as follows: acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data; inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model; inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model; and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model. The implementation mode can train on small-scale manual labeling data and large-scale non-labeling data, and a text auditing model with good effect and high speed is obtained.
Description
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to the field of natural language processing, and specifically relates to a method and a device for training a text audit model.
Background
The text auditing system is an automatic and intelligent system which is based on natural language processing technology and used for judging whether a section of text content complies with the content specification of platforms such as Internet, media and the like. Common text auditing application scenes comprise user signature/nickname, comment/message, instant messaging text content, user posts, media information, commodity information, live video barrage, image-text information and the like. Types of contraband for text audit processing include: political, pornographic, violent terrorism, advertising promotion, vulgar abuse. Huge amounts of user data are generated on the internet every day, and heavy auditing tasks cannot be borne by manpower. The text auditing system realizes the automatic content violation detection and identification, and the function of leading or assisting manual auditing by using a computer and a natural language processing technology, thereby greatly reducing the working cost of related personnel.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, storage medium, and computer program product for training a text audit model.
According to a first aspect of the present disclosure, there is provided a method of training a text audit model, comprising: acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data; inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model; inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model; and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
According to a second aspect of the present disclosure, there is provided a text auditing method, including: acquiring text information to be audited; and inputting the text information into a text auditing model trained according to the method in any one of the first aspect, and outputting an auditing result.
According to a third aspect of the present disclosure, there is provided an apparatus for training a text audit model, comprising: an acquisition unit configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data; a first training unit configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model; the second training unit is configured to input the marking data into a pre-training language micro model for supervised training to obtain a student model; and the distillation unit is configured to input the label-free data into the teacher model and the student model respectively, and distill the student model by using the teacher model to obtain a text auditing model.
According to a fourth aspect of the present disclosure, there is provided a text auditing apparatus including: an acquisition unit configured to acquire text information to be audited; and the auditing unit is configured to input the text information into a text auditing model trained by the device according to any one of the third aspects and output auditing results.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the first and second aspects.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first and second aspects.
According to the method and the device for training the text audit model, the knowledge of the pre-training language model is transferred to the pre-training language micro model through the model distillation method, the prediction speed can be improved by thousands of times with little loss, and the method and the device have important positive effects on the text audit business model landing.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of training a text audit model according to the present disclosure;
3a-3b are schematic diagrams of an application scenario of a method of training a text audit model according to the present disclosure;
FIG. 4 is a flow diagram of one embodiment of a text review method according to the present disclosure;
FIG. 5 is a block diagram illustrating one embodiment of an apparatus for training a text audit model according to the present disclosure;
FIG. 6 is a schematic structural diagram of one embodiment of a text review device according to the present disclosure;
FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which a method of training a text audit model, an apparatus for training a text audit model, a text audit method, or a text audit apparatus of embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the terminals 101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminals 101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The terminals 101 and 102 may have various client applications installed thereon, such as a model training application, a text auditing application, a shopping application, a payment application, a web browser, an instant messenger, and the like.
Here, the terminals 101 and 102 may be hardware or software. When the terminals 101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model using samples in the sample set sent by the terminals 101 and 102, and may send a training result (e.g., a generated text audit model) to the terminals 101 and 102. In this way, the user can apply the generated text auditing model to perform text auditing.
Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein. Database server 104 and server 105 may also be servers of a distributed system or servers that incorporate a blockchain. Database server 104 and server 105 may also be cloud servers, or smart cloud computing servers or smart cloud hosts with artificial intelligence technology.
It should be noted that the method for training the text auditing model or the text auditing method provided by the embodiments of the present disclosure is generally executed by the server 105. Accordingly, a device for training a text audit model or a text audit device is also typically disposed in the server 105.
It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.
It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a text audit model according to the present disclosure is shown. The method for training the text audit model can comprise the following steps:
In this embodiment, an executive (e.g., the server 105 shown in fig. 1) of the method for training a text audit model may obtain the pre-trained language model, the pre-trained language micro model, the label-free data and the label data in various ways. For example, the executing entity may obtain the existing pre-trained language model, pre-trained language micro model, non-annotated data and annotated data stored therein from a database server (e.g., the database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, a user may collect a sample set via a terminal (e.g., terminals 101, 102 shown in fig. 1), including: and marking data and no marking data. In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set. Annotation data refers to textual information labeled with a category label (forbidden or not forbidden), while unlabeled data includes only textual information but no labeled category label. The text audit model can be a classifier, and the classification includes both non-banned and banned. The textual audit model may also be a multi-classifier, and the categories may include non-contraband, as well as various types of contraband (e.g., political, pornographic, violent terrorism, advertising promotions, vulgar abuse, irrigation, etc.).
The pre-trained language model may be a general semantic understanding model, such as ERNIE, BERT, and the like.
The pre-training language micro model (which may be named ERNIE-Tiny) is a lightweight pre-training language model, and the number of network layers is one third or one fourth of the pre-training language model. For example, ERNIE may employ a 12-or 24-layer network, while ERNIE-Tiny employs a three-, four-, or six-layer network.
And 202, inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model.
In this embodiment, a small amount of annotation data (e.g., tens of thousands of pieces) can be used to fine-tune the pre-trained language model, resulting in a Teacher model (Teacher model). The pre-training language model is an ultra-large-scale deep model, and can achieve good effect by utilizing fine adjustment of small-scale marking data, but the model is large in scale, so that the prediction speed is very low. The annotation data can be selected according to the type of the audit task, for example, if the pornographic text is audited, the pornographic text and the non-pornographic text are selected as the annotation data.
And 203, inputting the labeled data into the pre-training language micro model for supervised training to obtain a student model.
In this embodiment, the pre-trained language micro model is supervised-trained using the same labeled data, and a pre-trained language micro model for performing a specific audit task can be generated as a Student model (Student model).
And step 204, inputting the non-labeled data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
In this example, data distillation can be used to distill the ability of the Teacher model to the Student model, since the effect of the Teacher model is superior to that of the Student model. The total loss value may be calculated from a weighted sum between the mean square errors between the output vectors of each layer or a particular layer. And network parameters of the student model are adjusted in the training process, so that output vectors of all layers of the student model are closer to output vectors of the teacher model, namely the total loss value is smaller. Alternatively, the output vectors of different types of layers may be weighted differently, for example, the output vector weight of the prediction layer is the largest and the output vector weight of the embedding layer is the smallest.
In the method for training the text audit model in this embodiment, the knowledge of the pre-trained language model is transferred to the pre-trained language micro model by the model distillation method, so that the prediction speed can be increased by thousands of times with only little loss, and the method has an important positive effect on the landing of the text audit business model. The text auditing model can be deployed at a mobile terminal with low performance, and the auditing accuracy can be ensured.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model, and distilling the student model with the teacher model to obtain a text auditing model includes: inputting the label-free data into the teacher model and outputting a soft label vector; inputting the label-free data into the student model and outputting a prediction probability vector; calculating soft label distillation loss according to the soft label vector and the prediction probability vector; and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model. The Soft label (Soft-label) vector represents the probability distribution vector output by the model at the last classification layer, taking the pornographic recognition task as an example, the task is a two-classification task, 0 represents no violation, and 1 represents violation, so the Soft-label represents the probability value predicted by the model on two categories of 0 and 1 for a certain input, and the Soft-label vector of the ENRIE model (teacher) is led to approach to obtain the same prediction result as the teacher model for the same input by letting the ENRIE-Tiny model (student) learn the Soft-label vector of the ENRIE model (teacher).
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model; calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss; and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model. Embedding (embedding) distances of an ENRIE model (teacher) and an ENRIE-Tiny model (student) are constrained, and an MSE (Mean Square Error) loss function is used. And adjusting the network parameters of the student model to enable the loss value between the output vectors of the embedding layer to be smaller than a preset embedding layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model; calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer; and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model. Firstly, mapping the layer number of an ENRIE model and an ENRIE-Tiny model in an arithmetic progression mode, wherein if the ENRIE-Tiny model is 4 layers and the ERNIE model is 12 layers, the 1 st, 2 nd, 3 th and 4 th layer transformers of the ENRIE-Tiny model respectively correspond to the 3 rd, 6 th, 9 th and 12 th layers of the ENRIE model. Model distillation of the hidden layer is achieved by constraining the distance of the hidden layer output vectors at each layer by the ENRIE model (teacher) and the ENRIE-Tiny model (student). And adjusting network parameters of the student model to enable the loss value between the output vectors of the hidden layer to be smaller than a preset hidden layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model; calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss; and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model. The attention matrix corresponding to each element in the unmarked word sequence reflects the dependency relationship of the attention matrix on other elements in the word sequence, and the ENRIE-Tiny model (student) learns the attention matrix of the ENRIE model (teacher) at each layer, so that richer linguistic knowledge is learned. And adjusting the network parameters of the student model to enable the loss value between the output vectors of the attention layer to be smaller than a preset attention layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss; and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model. Soft label distillation loss, intercalation layer distillation loss, cryptic layer distillation loss, attention distillation loss can be obtained by the methods described above. The weighted sum of these 4 losses is then calculated as the total loss. And adjusting the network parameters of the student model to enable the total loss to be smaller than a preset loss threshold value, and obtaining a text auditing model. The weight of the loss may be 0, i.e., the total loss value may be calculated by arbitrarily combining the 4 kinds of losses. The choice can also be made according to the model structure, e.g. if there is no attention layer, no attention loss is used. And the four parts of the embedded layer, the middle layer, the attention mechanism and the soft label are combined for distillation during model distillation, so that the student model can learn more knowledge of the teacher model, and a better prediction effect is achieved.
With further reference to fig. 3a-3b, fig. 3a-3b are schematic diagrams of an application scenario of the method for training a text audit model according to the present embodiment. In the application scenario of fig. 3a-3b, in the training phase, three phases of training steps are mainly included:
1. and training the general ERNIE model by using the task data to obtain the pornographic text recognition Teacher model.
2. And training a universal ERNIE-Tiny model by using the task data to obtain a pornographic text recognition Student model.
3. Based on a large amount of unmarked auditing business data, model distillation is carried out on the Student model by using the Teacher model, the learned knowledge in the ERNIE model is transferred to the ERNIE-Tiny model, and finally the ERNIE-Tiny pornography text recognition model with the model prediction effect similar to that of the TEAcher model is obtained.
In the prediction phase, for each user input text data, the ERNIE-Tiny porn text recognition model predicts whether it violates rules.
Wherein the specific flow of the model distillation stage is shown in FIG. 3 b: for each input text, inputting the input text into a Teacher model and a Student model respectively, and realizing model distillation through the following four parts of distillation loss: embedded layer distillation loss (embedding layer distillation loss), hidden layer distillation loss (hidden state distillation loss), attention distillation loss (attentional distillation loss), soft-label distillation loss (soft-label distillation loss).
Referring to fig. 4, a flow 400 of one embodiment of a text review method provided by the present disclosure is shown. The text auditing method can comprise the following steps:
In this embodiment, an executive body (e.g., the server 105 shown in fig. 1) of the text auditing method may acquire the text information to be audited in various ways. For example, the execution subject may obtain the text information to be checked stored in the database server (e.g., database server 104 shown in fig. 1) through a wired connection manner or a wireless connection manner. For another example, the executing entity may also receive text information to be checked, which is collected by a terminal (e.g., the terminals 101 and 102 shown in fig. 1) or other device. Common text auditing application scenes comprise user signature/nickname, comment/message, IM instant messaging text content, user posts, media information, commodity information, live video barrage, image-text information and the like. Types of contraband for text audit processing include: politics, pornography, violent terrorism, advertising promotion, vulgar abuse, and the like.
If the image to be checked is the image, OCR recognition can be carried out, and the text to be checked is recognized.
Optionally, the text information to be reviewed may be preprocessed. E.g. punctuation or segmentation of the text into a predetermined number of words (within 500), etc. And keyword screening can be performed firstly, and text information comprising forbidden words is directly filtered out.
And step 402, inputting the text information into a text auditing model and outputting an auditing result.
In this embodiment, the execution subject may input the text information acquired in step 401 into a text auditing model, so as to generate an auditing result. The result of the audit may be banned or not, or may be a type of banning.
In this embodiment, the text audit model may be generated by the method described in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.
It should be noted that the text auditing method of the present embodiment may be used to test the text auditing model generated by the foregoing embodiments. And then the text auditing model can be continuously optimized according to the test result. The method may also be a practical application method of the text auditing model generated by the above embodiments. The text auditing method and the device adopt the text auditing model generated by the embodiments to audit the text, are beneficial to improving the performance of the text auditing model, improving the auditing efficiency and accuracy and the like, and reduce the labor cost. Meanwhile, the auditing time can be shortened, and the user can not be aware of the auditing and influence the user experience.
With continuing reference to FIG. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for training a text audit model. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for training a text audit model according to this embodiment may include: an acquisition unit 501, a first training unit 502, a second training unit 503, and a distillation unit 504. The acquiring unit 501 is configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data; a first training unit 502 configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model; a second training unit 503, configured to input the labeled data into a pre-training language micro model for supervised training, so as to obtain a student model; a distilling unit 504 configured to input the label-free data into the teacher model and the student model, respectively, and distill the student model with the teacher model to obtain a text audit model.
In some alternative implementations of the present embodiment, the distillation unit 504 is further configured to: inputting the label-free data into the teacher model and outputting a soft label vector; inputting the label-free data into the student model and outputting a prediction probability vector; calculating soft label distillation loss according to the soft label vector and the prediction probability vector; and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model; calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss; and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
In some alternative implementations of the present embodiment, the distillation unit 504 is further configured to: acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model; calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer; and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model; calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss; and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss; and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
With continuing reference to FIG. 6, the present disclosure provides one embodiment of a text review device as an implementation of the methods illustrated in the above figures. The embodiment of the device corresponds to the embodiment of the method shown in fig. 4, and the device can be applied to various electronic devices.
As shown in fig. 6, the text auditing apparatus 600 of the present embodiment may include: an acquisition unit 601 configured to acquire text information to be audited; the auditing unit 602 is configured to input the text information into a text auditing model trained by the apparatus 500, and output an auditing result.
An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of flows 200 or 400.
A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of flow 200 or 400.
A computer program product comprising a computer program which, when executed by a processor, implements the method of flow 200 or 400.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (17)
1. A method of training a text audit model, comprising:
acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data;
inputting the marking data into a pre-training language model for supervised training to obtain a teacher model;
inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model;
and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
2. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
inputting the label-free data into the teacher model and outputting a soft label vector;
inputting the label-free data into the student model and outputting a prediction probability vector;
calculating soft label distillation loss according to the soft label vector and the prediction probability vector;
and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
3. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model;
calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss;
and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
4. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model;
calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer;
and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
5. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model;
calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss;
and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
6. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss;
and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
7. A text auditing method comprises the following steps:
acquiring text information to be audited;
inputting the text information into a text auditing model trained according to the method of any one of claims 1-6, and outputting the auditing result.
8. An apparatus for training a text audit model, comprising:
an acquisition unit configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data;
a first training unit configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model;
the second training unit is configured to input the marking data into a pre-training language micro model for supervised training to obtain a student model;
and the distillation unit is configured to input the label-free data into the teacher model and the student model respectively, and distill the student model by using the teacher model to obtain a text auditing model.
9. The apparatus of claim 8, wherein the distillation unit is further configured to:
inputting the label-free data into the teacher model and outputting a soft label vector;
inputting the label-free data into the student model and outputting a prediction probability vector;
calculating soft label distillation loss according to the soft label vector and the prediction probability vector;
and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
10. The apparatus of claim 8, wherein the distillation unit is further configured to:
acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model;
calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss;
and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
11. The apparatus of claim 8, wherein the distillation unit is further configured to:
acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model;
calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer;
and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
12. The apparatus of claim 8, wherein the distillation unit is further configured to:
for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model;
calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss;
and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
13. The apparatus of claim 8, wherein the distillation unit is further configured to:
obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss;
and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
14. A text review apparatus comprising:
an acquisition unit configured to acquire text information to be audited;
an auditing unit configured to input the text information into a text auditing model trained by the apparatus according to any one of claims 8-13, and output auditing results.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210546544.2A CN114969332A (en) | 2022-05-18 | 2022-05-18 | Method and device for training text audit model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210546544.2A CN114969332A (en) | 2022-05-18 | 2022-05-18 | Method and device for training text audit model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114969332A true CN114969332A (en) | 2022-08-30 |
Family
ID=82985952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210546544.2A Pending CN114969332A (en) | 2022-05-18 | 2022-05-18 | Method and device for training text audit model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114969332A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116186200A (en) * | 2023-01-19 | 2023-05-30 | 北京百度网讯科技有限公司 | Model training method, device, electronic equipment and storage medium |
CN117292395A (en) * | 2023-09-27 | 2023-12-26 | 自然资源部地图技术审查中心 | Training method and training device for drawing-examining model and drawing-examining method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111611377A (en) * | 2020-04-22 | 2020-09-01 | 淮阴工学院 | Knowledge distillation-based multi-layer neural network language model training method and device |
CN112613273A (en) * | 2020-12-16 | 2021-04-06 | 上海交通大学 | Compression method and system of multi-language BERT sequence labeling model |
CN112949766A (en) * | 2021-04-07 | 2021-06-11 | 成都数之联科技有限公司 | Target area detection model training method, system, device and medium |
CN113592007A (en) * | 2021-08-05 | 2021-11-02 | 哈尔滨理工大学 | Knowledge distillation-based bad picture identification system and method, computer and storage medium |
US20220067274A1 (en) * | 2020-09-02 | 2022-03-03 | Zhejiang Lab | Compression method and platform of pre-training language model based on knowledge distillation |
-
2022
- 2022-05-18 CN CN202210546544.2A patent/CN114969332A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111611377A (en) * | 2020-04-22 | 2020-09-01 | 淮阴工学院 | Knowledge distillation-based multi-layer neural network language model training method and device |
US20220067274A1 (en) * | 2020-09-02 | 2022-03-03 | Zhejiang Lab | Compression method and platform of pre-training language model based on knowledge distillation |
CN112613273A (en) * | 2020-12-16 | 2021-04-06 | 上海交通大学 | Compression method and system of multi-language BERT sequence labeling model |
CN112949766A (en) * | 2021-04-07 | 2021-06-11 | 成都数之联科技有限公司 | Target area detection model training method, system, device and medium |
CN113592007A (en) * | 2021-08-05 | 2021-11-02 | 哈尔滨理工大学 | Knowledge distillation-based bad picture identification system and method, computer and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116186200A (en) * | 2023-01-19 | 2023-05-30 | 北京百度网讯科技有限公司 | Model training method, device, electronic equipment and storage medium |
CN116186200B (en) * | 2023-01-19 | 2024-02-09 | 北京百度网讯科技有限公司 | Model training method, device, electronic equipment and storage medium |
CN117292395A (en) * | 2023-09-27 | 2023-12-26 | 自然资源部地图技术审查中心 | Training method and training device for drawing-examining model and drawing-examining method and device |
CN117292395B (en) * | 2023-09-27 | 2024-05-24 | 自然资源部地图技术审查中心 | Training method and training device for drawing-examining model and drawing-examining method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113326764B (en) | Method and device for training image recognition model and image recognition | |
CN111414482A (en) | Event argument extraction method and device and electronic equipment | |
CN114969332A (en) | Method and device for training text audit model | |
KR20220125672A (en) | Video classification method and device, electronic equipment and storage medium | |
CN113360660B (en) | Text category recognition method, device, electronic equipment and storage medium | |
CN114663952A (en) | Object classification method, deep learning model training method, device and equipment | |
CN113051911B (en) | Method, apparatus, device, medium and program product for extracting sensitive words | |
CN114970540A (en) | Method and device for training text audit model | |
CN115114439A (en) | Method and device for multi-task model reasoning and multi-task information processing | |
CN115982376A (en) | Method and apparatus for training models based on text, multimodal data and knowledge | |
CN115099239A (en) | Resource identification method, device, equipment and storage medium | |
CN114693970A (en) | Object classification method, deep learning model training method, device and equipment | |
CN112906368B (en) | Industry text increment method, related device and computer program product | |
CN113688232A (en) | Method and device for classifying bidding texts, storage medium and terminal | |
CN112948584A (en) | Short text classification method, device, equipment and storage medium | |
CN116048463A (en) | Intelligent recommendation method and device for content of demand item based on label management | |
CN114863162A (en) | Object classification method, deep learning model training method, device and equipment | |
CN115048523A (en) | Text classification method, device, equipment and storage medium | |
US20210295036A1 (en) | Systematic language to enable natural language processing on technical diagrams | |
CN113806541A (en) | Emotion classification method and emotion classification model training method and device | |
CN113239215A (en) | Multimedia resource classification method and device, electronic equipment and storage medium | |
CN113886543A (en) | Method, apparatus, medium, and program product for generating an intent recognition model | |
CN113688938A (en) | Method for determining object emotion and method and device for training emotion classification model | |
CN113850072A (en) | Text emotion analysis method, emotion analysis model training method, device, equipment and medium | |
CN113392215A (en) | Training method of production problem classification model, and production problem classification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |