CN114969332A - Method and device for training text audit model - Google Patents

Method and device for training text audit model Download PDF

Info

Publication number
CN114969332A
CN114969332A CN202210546544.2A CN202210546544A CN114969332A CN 114969332 A CN114969332 A CN 114969332A CN 202210546544 A CN202210546544 A CN 202210546544A CN 114969332 A CN114969332 A CN 114969332A
Authority
CN
China
Prior art keywords
model
text
student
distillation loss
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210546544.2A
Other languages
Chinese (zh)
Inventor
王赞博
曹宇慧
黄硕
陈永锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210546544.2A priority Critical patent/CN114969332A/en
Publication of CN114969332A publication Critical patent/CN114969332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a method and a device for training a text audit model, and relates to the field of artificial intelligence, in particular to the field of natural language processing. The specific implementation scheme is as follows: acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data; inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model; inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model; and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model. The implementation mode can train on small-scale manual labeling data and large-scale non-labeling data, and a text auditing model with good effect and high speed is obtained.

Description

Method and device for training text audit model
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to the field of natural language processing, and specifically relates to a method and a device for training a text audit model.
Background
The text auditing system is an automatic and intelligent system which is based on natural language processing technology and used for judging whether a section of text content complies with the content specification of platforms such as Internet, media and the like. Common text auditing application scenes comprise user signature/nickname, comment/message, instant messaging text content, user posts, media information, commodity information, live video barrage, image-text information and the like. Types of contraband for text audit processing include: political, pornographic, violent terrorism, advertising promotion, vulgar abuse. Huge amounts of user data are generated on the internet every day, and heavy auditing tasks cannot be borne by manpower. The text auditing system realizes the automatic content violation detection and identification, and the function of leading or assisting manual auditing by using a computer and a natural language processing technology, thereby greatly reducing the working cost of related personnel.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, storage medium, and computer program product for training a text audit model.
According to a first aspect of the present disclosure, there is provided a method of training a text audit model, comprising: acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data; inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model; inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model; and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
According to a second aspect of the present disclosure, there is provided a text auditing method, including: acquiring text information to be audited; and inputting the text information into a text auditing model trained according to the method in any one of the first aspect, and outputting an auditing result.
According to a third aspect of the present disclosure, there is provided an apparatus for training a text audit model, comprising: an acquisition unit configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data; a first training unit configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model; the second training unit is configured to input the marking data into a pre-training language micro model for supervised training to obtain a student model; and the distillation unit is configured to input the label-free data into the teacher model and the student model respectively, and distill the student model by using the teacher model to obtain a text auditing model.
According to a fourth aspect of the present disclosure, there is provided a text auditing apparatus including: an acquisition unit configured to acquire text information to be audited; and the auditing unit is configured to input the text information into a text auditing model trained by the device according to any one of the third aspects and output auditing results.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the first and second aspects.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first and second aspects.
According to the method and the device for training the text audit model, the knowledge of the pre-training language model is transferred to the pre-training language micro model through the model distillation method, the prediction speed can be improved by thousands of times with little loss, and the method and the device have important positive effects on the text audit business model landing.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of training a text audit model according to the present disclosure;
3a-3b are schematic diagrams of an application scenario of a method of training a text audit model according to the present disclosure;
FIG. 4 is a flow diagram of one embodiment of a text review method according to the present disclosure;
FIG. 5 is a block diagram illustrating one embodiment of an apparatus for training a text audit model according to the present disclosure;
FIG. 6 is a schematic structural diagram of one embodiment of a text review device according to the present disclosure;
FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which a method of training a text audit model, an apparatus for training a text audit model, a text audit method, or a text audit apparatus of embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the terminals 101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminals 101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The terminals 101 and 102 may have various client applications installed thereon, such as a model training application, a text auditing application, a shopping application, a payment application, a web browser, an instant messenger, and the like.
Here, the terminals 101 and 102 may be hardware or software. When the terminals 101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. Wherein the sample may include non-annotated data and annotated data. In this way, the user 110 may also select samples from a set of samples stored by the database server 104 via the terminals 101, 102.
The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model using samples in the sample set sent by the terminals 101 and 102, and may send a training result (e.g., a generated text audit model) to the terminals 101 and 102. In this way, the user can apply the generated text auditing model to perform text auditing.
Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein. Database server 104 and server 105 may also be servers of a distributed system or servers that incorporate a blockchain. Database server 104 and server 105 may also be cloud servers, or smart cloud computing servers or smart cloud hosts with artificial intelligence technology.
It should be noted that the method for training the text auditing model or the text auditing method provided by the embodiments of the present disclosure is generally executed by the server 105. Accordingly, a device for training a text audit model or a text audit device is also typically disposed in the server 105.
It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.
It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a text audit model according to the present disclosure is shown. The method for training the text audit model can comprise the following steps:
step 201, obtaining a pre-training language model, a pre-training language micro model, labeled data and non-labeled data.
In this embodiment, an executive (e.g., the server 105 shown in fig. 1) of the method for training a text audit model may obtain the pre-trained language model, the pre-trained language micro model, the label-free data and the label data in various ways. For example, the executing entity may obtain the existing pre-trained language model, pre-trained language micro model, non-annotated data and annotated data stored therein from a database server (e.g., the database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, a user may collect a sample set via a terminal (e.g., terminals 101, 102 shown in fig. 1), including: and marking data and no marking data. In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set. Annotation data refers to textual information labeled with a category label (forbidden or not forbidden), while unlabeled data includes only textual information but no labeled category label. The text audit model can be a classifier, and the classification includes both non-banned and banned. The textual audit model may also be a multi-classifier, and the categories may include non-contraband, as well as various types of contraband (e.g., political, pornographic, violent terrorism, advertising promotions, vulgar abuse, irrigation, etc.).
The pre-trained language model may be a general semantic understanding model, such as ERNIE, BERT, and the like.
The pre-training language micro model (which may be named ERNIE-Tiny) is a lightweight pre-training language model, and the number of network layers is one third or one fourth of the pre-training language model. For example, ERNIE may employ a 12-or 24-layer network, while ERNIE-Tiny employs a three-, four-, or six-layer network.
And 202, inputting the labeled data into a pre-training language model for supervised training to obtain a teacher model.
In this embodiment, a small amount of annotation data (e.g., tens of thousands of pieces) can be used to fine-tune the pre-trained language model, resulting in a Teacher model (Teacher model). The pre-training language model is an ultra-large-scale deep model, and can achieve good effect by utilizing fine adjustment of small-scale marking data, but the model is large in scale, so that the prediction speed is very low. The annotation data can be selected according to the type of the audit task, for example, if the pornographic text is audited, the pornographic text and the non-pornographic text are selected as the annotation data.
And 203, inputting the labeled data into the pre-training language micro model for supervised training to obtain a student model.
In this embodiment, the pre-trained language micro model is supervised-trained using the same labeled data, and a pre-trained language micro model for performing a specific audit task can be generated as a Student model (Student model).
And step 204, inputting the non-labeled data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
In this example, data distillation can be used to distill the ability of the Teacher model to the Student model, since the effect of the Teacher model is superior to that of the Student model. The total loss value may be calculated from a weighted sum between the mean square errors between the output vectors of each layer or a particular layer. And network parameters of the student model are adjusted in the training process, so that output vectors of all layers of the student model are closer to output vectors of the teacher model, namely the total loss value is smaller. Alternatively, the output vectors of different types of layers may be weighted differently, for example, the output vector weight of the prediction layer is the largest and the output vector weight of the embedding layer is the smallest.
In the method for training the text audit model in this embodiment, the knowledge of the pre-trained language model is transferred to the pre-trained language micro model by the model distillation method, so that the prediction speed can be increased by thousands of times with only little loss, and the method has an important positive effect on the landing of the text audit business model. The text auditing model can be deployed at a mobile terminal with low performance, and the auditing accuracy can be ensured.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model, and distilling the student model with the teacher model to obtain a text auditing model includes: inputting the label-free data into the teacher model and outputting a soft label vector; inputting the label-free data into the student model and outputting a prediction probability vector; calculating soft label distillation loss according to the soft label vector and the prediction probability vector; and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model. The Soft label (Soft-label) vector represents the probability distribution vector output by the model at the last classification layer, taking the pornographic recognition task as an example, the task is a two-classification task, 0 represents no violation, and 1 represents violation, so the Soft-label represents the probability value predicted by the model on two categories of 0 and 1 for a certain input, and the Soft-label vector of the ENRIE model (teacher) is led to approach to obtain the same prediction result as the teacher model for the same input by letting the ENRIE-Tiny model (student) learn the Soft-label vector of the ENRIE model (teacher).
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model; calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss; and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model. Embedding (embedding) distances of an ENRIE model (teacher) and an ENRIE-Tiny model (student) are constrained, and an MSE (Mean Square Error) loss function is used. And adjusting the network parameters of the student model to enable the loss value between the output vectors of the embedding layer to be smaller than a preset embedding layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model; calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer; and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model. Firstly, mapping the layer number of an ENRIE model and an ENRIE-Tiny model in an arithmetic progression mode, wherein if the ENRIE-Tiny model is 4 layers and the ERNIE model is 12 layers, the 1 st, 2 nd, 3 th and 4 th layer transformers of the ENRIE-Tiny model respectively correspond to the 3 rd, 6 th, 9 th and 12 th layers of the ENRIE model. Model distillation of the hidden layer is achieved by constraining the distance of the hidden layer output vectors at each layer by the ENRIE model (teacher) and the ENRIE-Tiny model (student). And adjusting network parameters of the student model to enable the loss value between the output vectors of the hidden layer to be smaller than a preset hidden layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model; calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss; and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model. The attention matrix corresponding to each element in the unmarked word sequence reflects the dependency relationship of the attention matrix on other elements in the word sequence, and the ENRIE-Tiny model (student) learns the attention matrix of the ENRIE model (teacher) at each layer, so that richer linguistic knowledge is learned. And adjusting the network parameters of the student model to enable the loss value between the output vectors of the attention layer to be smaller than a preset attention layer loss threshold value. The convergence rate of the model can be accelerated, and the accuracy of text auditing is improved.
In some optional implementation manners of this embodiment, the inputting the label-free data into the teacher model and the student model respectively, and distilling the student model using the teacher model to obtain a text auditing model includes: obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss; and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model. Soft label distillation loss, intercalation layer distillation loss, cryptic layer distillation loss, attention distillation loss can be obtained by the methods described above. The weighted sum of these 4 losses is then calculated as the total loss. And adjusting the network parameters of the student model to enable the total loss to be smaller than a preset loss threshold value, and obtaining a text auditing model. The weight of the loss may be 0, i.e., the total loss value may be calculated by arbitrarily combining the 4 kinds of losses. The choice can also be made according to the model structure, e.g. if there is no attention layer, no attention loss is used. And the four parts of the embedded layer, the middle layer, the attention mechanism and the soft label are combined for distillation during model distillation, so that the student model can learn more knowledge of the teacher model, and a better prediction effect is achieved.
With further reference to fig. 3a-3b, fig. 3a-3b are schematic diagrams of an application scenario of the method for training a text audit model according to the present embodiment. In the application scenario of fig. 3a-3b, in the training phase, three phases of training steps are mainly included:
1. and training the general ERNIE model by using the task data to obtain the pornographic text recognition Teacher model.
2. And training a universal ERNIE-Tiny model by using the task data to obtain a pornographic text recognition Student model.
3. Based on a large amount of unmarked auditing business data, model distillation is carried out on the Student model by using the Teacher model, the learned knowledge in the ERNIE model is transferred to the ERNIE-Tiny model, and finally the ERNIE-Tiny pornography text recognition model with the model prediction effect similar to that of the TEAcher model is obtained.
In the prediction phase, for each user input text data, the ERNIE-Tiny porn text recognition model predicts whether it violates rules.
Wherein the specific flow of the model distillation stage is shown in FIG. 3 b: for each input text, inputting the input text into a Teacher model and a Student model respectively, and realizing model distillation through the following four parts of distillation loss: embedded layer distillation loss (embedding layer distillation loss), hidden layer distillation loss (hidden state distillation loss), attention distillation loss (attentional distillation loss), soft-label distillation loss (soft-label distillation loss).
Referring to fig. 4, a flow 400 of one embodiment of a text review method provided by the present disclosure is shown. The text auditing method can comprise the following steps:
step 401, obtaining text information to be audited.
In this embodiment, an executive body (e.g., the server 105 shown in fig. 1) of the text auditing method may acquire the text information to be audited in various ways. For example, the execution subject may obtain the text information to be checked stored in the database server (e.g., database server 104 shown in fig. 1) through a wired connection manner or a wireless connection manner. For another example, the executing entity may also receive text information to be checked, which is collected by a terminal (e.g., the terminals 101 and 102 shown in fig. 1) or other device. Common text auditing application scenes comprise user signature/nickname, comment/message, IM instant messaging text content, user posts, media information, commodity information, live video barrage, image-text information and the like. Types of contraband for text audit processing include: politics, pornography, violent terrorism, advertising promotion, vulgar abuse, and the like.
If the image to be checked is the image, OCR recognition can be carried out, and the text to be checked is recognized.
Optionally, the text information to be reviewed may be preprocessed. E.g. punctuation or segmentation of the text into a predetermined number of words (within 500), etc. And keyword screening can be performed firstly, and text information comprising forbidden words is directly filtered out.
And step 402, inputting the text information into a text auditing model and outputting an auditing result.
In this embodiment, the execution subject may input the text information acquired in step 401 into a text auditing model, so as to generate an auditing result. The result of the audit may be banned or not, or may be a type of banning.
In this embodiment, the text audit model may be generated by the method described in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.
It should be noted that the text auditing method of the present embodiment may be used to test the text auditing model generated by the foregoing embodiments. And then the text auditing model can be continuously optimized according to the test result. The method may also be a practical application method of the text auditing model generated by the above embodiments. The text auditing method and the device adopt the text auditing model generated by the embodiments to audit the text, are beneficial to improving the performance of the text auditing model, improving the auditing efficiency and accuracy and the like, and reduce the labor cost. Meanwhile, the auditing time can be shortened, and the user can not be aware of the auditing and influence the user experience.
With continuing reference to FIG. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for training a text audit model. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for training a text audit model according to this embodiment may include: an acquisition unit 501, a first training unit 502, a second training unit 503, and a distillation unit 504. The acquiring unit 501 is configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data; a first training unit 502 configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model; a second training unit 503, configured to input the labeled data into a pre-training language micro model for supervised training, so as to obtain a student model; a distilling unit 504 configured to input the label-free data into the teacher model and the student model, respectively, and distill the student model with the teacher model to obtain a text audit model.
In some alternative implementations of the present embodiment, the distillation unit 504 is further configured to: inputting the label-free data into the teacher model and outputting a soft label vector; inputting the label-free data into the student model and outputting a prediction probability vector; calculating soft label distillation loss according to the soft label vector and the prediction probability vector; and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model; calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss; and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
In some alternative implementations of the present embodiment, the distillation unit 504 is further configured to: acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model; calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer; and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model; calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss; and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
In some optional implementations of the present embodiment, the distillation unit 504 is further configured to: obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss; and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
With continuing reference to FIG. 6, the present disclosure provides one embodiment of a text review device as an implementation of the methods illustrated in the above figures. The embodiment of the device corresponds to the embodiment of the method shown in fig. 4, and the device can be applied to various electronic devices.
As shown in fig. 6, the text auditing apparatus 600 of the present embodiment may include: an acquisition unit 601 configured to acquire text information to be audited; the auditing unit 602 is configured to input the text information into a text auditing model trained by the apparatus 500, and output an auditing result.
An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of flows 200 or 400.
A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of flow 200 or 400.
A computer program product comprising a computer program which, when executed by a processor, implements the method of flow 200 or 400.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the method of training a text audit model. For example, in some embodiments, the method of training a text audit model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more of the steps of the method of training a text audit model described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of training a text audit model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method of training a text audit model, comprising:
acquiring a pre-training language model, a pre-training language micro model, labeled data and non-labeled data;
inputting the marking data into a pre-training language model for supervised training to obtain a teacher model;
inputting the labeled data into a pre-training language micro model for supervised training to obtain a student model;
and inputting the label-free data into the teacher model and the student model respectively, and distilling the student model by using the teacher model to obtain a text auditing model.
2. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
inputting the label-free data into the teacher model and outputting a soft label vector;
inputting the label-free data into the student model and outputting a prediction probability vector;
calculating soft label distillation loss according to the soft label vector and the prediction probability vector;
and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
3. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model;
calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss;
and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
4. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model;
calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer;
and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
5. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model;
calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss;
and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
6. The method of claim 1, wherein said inputting said label-free data into said teacher model and said student model, respectively, and distilling student model using teacher model to obtain a text audit model comprises:
obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss;
and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
7. A text auditing method comprises the following steps:
acquiring text information to be audited;
inputting the text information into a text auditing model trained according to the method of any one of claims 1-6, and outputting the auditing result.
8. An apparatus for training a text audit model, comprising:
an acquisition unit configured to acquire a pre-training language model, a pre-training language micro model, labeled data, and label-free data;
a first training unit configured to input the annotation data into a pre-training language model for supervised training, resulting in a teacher model;
the second training unit is configured to input the marking data into a pre-training language micro model for supervised training to obtain a student model;
and the distillation unit is configured to input the label-free data into the teacher model and the student model respectively, and distill the student model by using the teacher model to obtain a text auditing model.
9. The apparatus of claim 8, wherein the distillation unit is further configured to:
inputting the label-free data into the teacher model and outputting a soft label vector;
inputting the label-free data into the student model and outputting a prediction probability vector;
calculating soft label distillation loss according to the soft label vector and the prediction probability vector;
and adjusting the network parameters of the student model based on the soft label distillation loss to obtain a text auditing model.
10. The apparatus of claim 8, wherein the distillation unit is further configured to:
acquiring a first output vector of an embedding layer of the teacher model and a second output vector of an embedding layer of the student model;
calculating a mean square error of the first embedded layer output vector and the second embedded layer output vector as an embedded layer distillation loss;
and adjusting the network parameters of the student model based on the distillation loss of the embedded layer to obtain a text auditing model.
11. The apparatus of claim 8, wherein the distillation unit is further configured to:
acquiring a hidden layer first output vector of the teacher model and a hidden layer second output vector of the student model;
calculating the mean square error of the first output vector of the hidden layer and the second output vector of the hidden layer as the distillation loss of the hidden layer;
and adjusting the network parameters of the student model based on the hidden layer distillation loss to obtain a text auditing model.
12. The apparatus of claim 8, wherein the distillation unit is further configured to:
for each element of the unlabeled data, acquiring a first attention matrix of the element in the teacher model and a second attention matrix of the element in the student model;
calculating the mean square error of the attention first matrix and the attention second matrix of each element as the attention distillation loss;
and adjusting network parameters of the student model based on the attention distillation loss to obtain a text auditing model.
13. The apparatus of claim 8, wherein the distillation unit is further configured to:
obtaining soft label distillation loss, embedded layer distillation loss, hidden layer distillation loss and attention distillation loss;
and adjusting the network parameters of the student model based on the weighting and adjustment of the soft label distillation loss, the embedded layer distillation loss, the hidden layer distillation loss and the attention distillation loss to obtain a text auditing model.
14. A text review apparatus comprising:
an acquisition unit configured to acquire text information to be audited;
an auditing unit configured to input the text information into a text auditing model trained by the apparatus according to any one of claims 8-13, and output auditing results.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202210546544.2A 2022-05-18 2022-05-18 Method and device for training text audit model Pending CN114969332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210546544.2A CN114969332A (en) 2022-05-18 2022-05-18 Method and device for training text audit model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210546544.2A CN114969332A (en) 2022-05-18 2022-05-18 Method and device for training text audit model

Publications (1)

Publication Number Publication Date
CN114969332A true CN114969332A (en) 2022-08-30

Family

ID=82985952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210546544.2A Pending CN114969332A (en) 2022-05-18 2022-05-18 Method and device for training text audit model

Country Status (1)

Country Link
CN (1) CN114969332A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186200A (en) * 2023-01-19 2023-05-30 北京百度网讯科技有限公司 Model training method, device, electronic equipment and storage medium
CN117292395A (en) * 2023-09-27 2023-12-26 自然资源部地图技术审查中心 Training method and training device for drawing-examining model and drawing-examining method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model
CN112949766A (en) * 2021-04-07 2021-06-11 成都数之联科技有限公司 Target area detection model training method, system, device and medium
CN113592007A (en) * 2021-08-05 2021-11-02 哈尔滨理工大学 Knowledge distillation-based bad picture identification system and method, computer and storage medium
US20220067274A1 (en) * 2020-09-02 2022-03-03 Zhejiang Lab Compression method and platform of pre-training language model based on knowledge distillation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
US20220067274A1 (en) * 2020-09-02 2022-03-03 Zhejiang Lab Compression method and platform of pre-training language model based on knowledge distillation
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model
CN112949766A (en) * 2021-04-07 2021-06-11 成都数之联科技有限公司 Target area detection model training method, system, device and medium
CN113592007A (en) * 2021-08-05 2021-11-02 哈尔滨理工大学 Knowledge distillation-based bad picture identification system and method, computer and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186200A (en) * 2023-01-19 2023-05-30 北京百度网讯科技有限公司 Model training method, device, electronic equipment and storage medium
CN116186200B (en) * 2023-01-19 2024-02-09 北京百度网讯科技有限公司 Model training method, device, electronic equipment and storage medium
CN117292395A (en) * 2023-09-27 2023-12-26 自然资源部地图技术审查中心 Training method and training device for drawing-examining model and drawing-examining method and device
CN117292395B (en) * 2023-09-27 2024-05-24 自然资源部地图技术审查中心 Training method and training device for drawing-examining model and drawing-examining method and device

Similar Documents

Publication Publication Date Title
CN113326764B (en) Method and device for training image recognition model and image recognition
CN111414482A (en) Event argument extraction method and device and electronic equipment
CN114969332A (en) Method and device for training text audit model
KR20220125672A (en) Video classification method and device, electronic equipment and storage medium
CN113360660B (en) Text category recognition method, device, electronic equipment and storage medium
CN114663952A (en) Object classification method, deep learning model training method, device and equipment
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN114970540A (en) Method and device for training text audit model
CN115114439A (en) Method and device for multi-task model reasoning and multi-task information processing
CN115982376A (en) Method and apparatus for training models based on text, multimodal data and knowledge
CN115099239A (en) Resource identification method, device, equipment and storage medium
CN114693970A (en) Object classification method, deep learning model training method, device and equipment
CN112906368B (en) Industry text increment method, related device and computer program product
CN113688232A (en) Method and device for classifying bidding texts, storage medium and terminal
CN112948584A (en) Short text classification method, device, equipment and storage medium
CN116048463A (en) Intelligent recommendation method and device for content of demand item based on label management
CN114863162A (en) Object classification method, deep learning model training method, device and equipment
CN115048523A (en) Text classification method, device, equipment and storage medium
US20210295036A1 (en) Systematic language to enable natural language processing on technical diagrams
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN113239215A (en) Multimedia resource classification method and device, electronic equipment and storage medium
CN113886543A (en) Method, apparatus, medium, and program product for generating an intent recognition model
CN113688938A (en) Method for determining object emotion and method and device for training emotion classification model
CN113850072A (en) Text emotion analysis method, emotion analysis model training method, device, equipment and medium
CN113392215A (en) Training method of production problem classification model, and production problem classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination