CN117473339B

CN117473339B - Content auditing method and device, electronic equipment and storage medium

Info

Publication number: CN117473339B
Application number: CN202311824927.2A
Authority: CN
Inventors: 杜萌; 蒋树; 薛娇; 李大海
Original assignee: Zhizhe Sihai Beijing Technology Co Ltd
Current assignee: Zhizhe Sihai Beijing Technology Co Ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-30
Anticipated expiration: 2043-12-28
Also published as: CN117473339A

Abstract

The embodiment of the invention provides a content auditing method, a device, electronic equipment and a storage medium, belonging to the field of data processing, wherein the method comprises the following steps: inputting the content to be audited into an audit model which is obtained after training by adopting a content standard pair in advance to obtain a content representation of the content to be audited, wherein the content standard pair comprises a content sample and an audit standard hit by the content sample; matching the content characterization with standard characterization in a standard characterization library to obtain a matching score between the content characterization and each standard characterization, wherein the standard characterization is obtained by processing an audit standard by using an audit model; and determining the auditing result of the content to be audited according to the matching score and all auditing standards. Therefore, the accuracy of content auditing can be improved by matching and determining the auditing standards and the understanding results (namely the content characterization and the standard characterization) of the content to be audited, and the method is suitable for content auditing in any scene.

Description

Content auditing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and apparatus for auditing content, an electronic device, and a storage medium.

Background

Content classification refers to automatic classification of content sets (pictures, texts, audios and videos, etc.) according to a certain classification system or standard. With the development of machine learning, machine learning has been largely applied to the field of content classification.

At present, a model is usually trained by using a set of already-labeled training contents, so that the model learns the relation between the characteristics of the contents and the categories of the contents, and then category identification is carried out on new contents by using the model with the learned relation. The method is only suitable for content auditing classification of the appointed scene, and when the auditing classification scene changes or is updated, the model needs to be retrained so as to ensure the accuracy of the content auditing of the model. However, the manner of retraining is costly.

Disclosure of Invention

Accordingly, the present invention is directed to a method, an apparatus, an electronic device, and a storage medium for content auditing, which can improve accuracy of content auditing, and is suitable for content auditing in any scenario, thereby greatly reducing cost of content auditing.

In order to achieve the above object, the technical scheme adopted by the embodiment of the invention is as follows:

in a first aspect, an embodiment of the present invention provides a content auditing method, including:

Inputting the content to be checked into a pre-trained checking model to obtain the content representation of the content to be checked; the auditing model is a characterization extraction model obtained after training by adopting a content standard pair, and the content standard pair comprises a content sample and an auditing standard hit by the content sample;

Matching the content characterization with standard characterization in a preset standard characterization library to obtain a matching score between the content characterization and each standard characterization; the standard characterization in the standard characterization library is obtained after the auditing model processes the auditing standard;

and determining the auditing result of the content to be audited according to the matching score and all auditing standards.

In one possible embodiment, the method further comprises the step of training the audit model, comprising:

Constructing a plurality of content standard pairs according to the content samples and the auditing standards hit by the content samples, and dividing the plurality of content standard pairs into a plurality of sample batches; wherein each content standard pair in the sample batch is a positive sample, and the content sample of any one content standard pair in the sample batch and the audit standard of any other content standard pair form a negative sample;

and based on the plurality of sample batches, performing contrast learning training on the model to be trained to obtain an auditing model.

In a possible implementation manner, the step of performing contrast learning training on the model to be trained based on the plurality of sample batches to obtain an audit model includes:

Extracting a sample batch from the plurality of sample batches, and inputting the sample batch into a model to be trained to obtain a content characterization matrix and a standard characterization matrix;

Dot product is carried out on the content characterization matrix and the standard characterization matrix to obtain a matching matrix, and a loss value of the matching matrix is calculated based on the label matrix of the sample batch;

If the loss value and the iteration number do not meet the ending condition, adjusting parameters of the model to be trained according to the loss value, returning to execute the step of extracting one sample batch from the plurality of sample batches, and inputting the sample batch into the model to be trained to obtain a content characterization matrix and a standard characterization matrix until the loss value or the iteration number meets the ending condition;

and if the loss value or the iteration number meets the ending condition, taking the current model to be trained as an auditing model.

In one possible implementation, the model to be trained includes a backbone network, a content optimization network, and a standard optimization network;

the step of inputting the sample batch and the model to be trained to obtain a content characterization matrix and a standard characterization matrix comprises the following steps:

Processing the sample batch through the backbone network to obtain the content implicit representation of each content sample in the sample batch and the standard implicit representation of each auditing standard;

Optimizing and normalizing each content implicit representation through the content optimizing network to obtain a content representation corresponding to each content implicit representation, and integrating the content representations to obtain a content representation matrix;

Optimizing and normalizing each standard implicit representation through the standard optimization network to obtain a standard representation corresponding to each standard implicit representation, and integrating the standard representations to obtain a standard representation matrix.

In one possible embodiment, the step of calculating the loss value of the matching matrix based on the label matrix of the sample lot includes:

And calculating a loss value by adopting a cross entropy loss function according to the label matrix of the sample batch and the matching division matrix.

In a possible implementation manner, the step of determining the auditing result of the content to be audited according to the matching score and all the auditing standards includes:

and determining the maximum matching score from all the matching scores, taking an auditing standard of standard representation corresponding to the maximum matching score as a hit standard of the content to be audited, and taking the category to which the hit standard belongs as the content category of the content to be audited.

In a possible implementation manner, the step of determining the audit standard of the content hit to be audited according to the matching score from all audit standards corresponding to standard characterizations of the standard characterizations library includes:

Sorting all the matching scores according to the sequence from large to small, and taking the auditing standard of standard characterization corresponding to the matching score before the preset sequence as a preselected standard;

and voting the content categories according to the preselected standard, and taking the content category with the largest vote number as the content category of the content to be audited.

In a possible implementation manner, the step of matching the content token with standard tokens in a preset standard token library to obtain a matching score between the content token and each standard token includes:

And carrying out point multiplication on the content characterization and the standard characterization aiming at each standard characterization in the standard characterization library to obtain a matching score between the content characterization and the standard characterization.

In one possible implementation, the audit model includes a backbone network and a content optimization network;

the step of inputting the content to be checked into a pre-trained checking model to obtain the content representation of the content to be checked comprises the following steps:

Processing the content to be checked through the backbone network to obtain the content implicit characterization of the content to be checked;

optimizing and normalizing the content implicit characterization of the content to be audited through the content optimizing network to obtain the content characterization of the content to be audited.

In one possible implementation, the audit model includes a backbone network and a standard optimization network;

the method further comprises the step of obtaining a standard token library, comprising:

processing all auditing standards through the backbone network to obtain at least one standard implicit characterization of each auditing standard;

And optimizing and normalizing all the standard implicit characterizations through the standard optimizing network to obtain a standard characterization, and storing the standard characterization and the auditing standard association into a standard characterization library.

In one possible embodiment, the method further comprises:

When a new audit standard is acquired, inputting the new audit standard into the audit model to obtain a new standard representation, and storing the new standard representation and the new audit standard association into the standard representation library.

In a second aspect, an embodiment of the present invention provides a content auditing apparatus, including a characterization acquisition module, a standard matching module, and an auditing determination module;

the characterization acquisition module is used for inputting the content to be checked into a pre-trained checking model to obtain the content characterization extracted by the checking model; the auditing model is a characterization extraction model obtained after training by adopting a content standard pair, and the content standard pair comprises a content sample and an auditing standard hit by the content sample;

The standard matching module is used for matching the content characterization with standard characterization in a preset standard characterization library to obtain matching points between the content characterization and each standard characterization; the standard characterization in the standard characterization library is obtained after the auditing model processes the auditing standard;

And the auditing judging module is used for determining auditing results of the content to be audited according to the matching score and all auditing standards.

In one possible embodiment, the apparatus further comprises a model training module;

the model training module is used for:

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions executable by the processor, the processor being capable of executing the machine executable instructions to implement a content auditing method according to any one of the possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a content auditing method according to any of the possible implementations of the first aspect.

The embodiment of the invention provides a content auditing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: inputting the content to be audited into an audit model which is obtained after training by adopting a content standard pair in advance to obtain a content representation of the content to be audited, wherein the content standard pair comprises a content sample and an audit standard hit by the content sample; matching the content characterization with standard characterization in a standard characterization library to obtain a matching score between the content characterization and each standard characterization, wherein the standard characterization is obtained by processing an audit standard by using an audit model; and determining the auditing result of the content to be audited according to the matching score and all auditing standards. Therefore, the auditing model respectively understands the auditing standard and the content to be audited, so that the accuracy of content auditing can be improved according to matching and auditing result determination of the auditing standard and the understanding result (namely the content characterization and the standard characterization) of the content to be audited, and the method is suitable for content auditing in any scene.

In addition, when a new auditing standard appears due to the change or update of the content auditing scene, only the standard characterization library is required to be updated, and a model is not required to be retrained, so that auditing cost is greatly reduced, and auditing time is shortened.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic structural diagram of a content auditing system according to an embodiment of the present invention.

Fig. 2 shows one of flow diagrams of a content auditing method according to an embodiment of the present invention.

Fig. 3 shows a second flowchart of a content auditing method according to an embodiment of the present invention.

Fig. 4 shows a schematic flow chart of a partial sub-step of step S23 in fig. 3.

Fig. 5 shows a schematic structural diagram of a model to be trained according to an embodiment of the present invention.

Fig. 6 shows a schematic flow chart of a partial sub-step of step S15 in fig. 1.

Fig. 7 is a schematic structural diagram of a content auditing apparatus according to an embodiment of the present invention.

Fig. 8 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Reference numerals illustrate: 1000-a content auditing system; 10-client; 20-auditing equipment; 30-training equipment; 40-content auditing means; 401-a characterization acquisition module; 402-a standard matching module; 403-an audit decision module; 50-electronic device.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The content auditing method provided by the embodiment of the invention can be applied to the content auditing system 1000 shown in fig. 1, and the content auditing system 1000 can comprise a client 10, auditing equipment 20 and training equipment 30. Audit device 20 may be communicatively coupled to client 10 and training device 30 via a network.

Training device 30 is configured to train to obtain an audit model, and migrate the audit model to audit device 20.

The client 10 is configured to send the content to be audited to the auditing apparatus 20.

The auditing equipment 20 is used for deploying an auditing model, receiving the content to be audited sent by any client 10, and auditing the content to be audited by adopting the content auditing method provided by the embodiment of the invention.

Wherein training device 30 and auditing device 20 may be, but are not limited to: independent servers and server clusters, etc., and training device 30 and auditing device 20 may be the same independent server or server cluster, or may be different independent servers or server clusters. The client 10 includes, but is not limited to: personal computers, notebook computers, tablet computers, mobile terminals, wearable portable devices, and the like.

In one possible embodiment, a method of content auditing is provided, and referring to fig. 2, may include the following steps. In the present embodiment, the content auditing method is applied to the auditing apparatus 20 in fig. 1 for illustration.

And S11, inputting the content to be checked into a pre-trained checking model to obtain the content characterization of the content to be checked.

In this embodiment, the audit model is a characterization extraction model obtained after training using a content standard pair, which includes a content sample and a content sample hit audit standard.

And S13, matching the content characterization with standard characterization in a preset standard characterization library to obtain a matching score between the content characterization and each standard characterization.

In this embodiment, the standard token in the standard token library is obtained after the audit model processes the audit standard.

And S15, determining an auditing result of the content to be audited according to the matching score and all auditing standards.

Training device 30 may employ content criteria pairs for model training to obtain a characterization extraction model, i.e., an audit model. The audit model is essentially a model that understands content and audit criteria. After training, the training model processes all the auditing standards by using the auditing model to obtain at least one standard representation corresponding to each auditing standard, and stores all the auditing standards into a standard representation library. Further, training device 30 may migrate the audit model and standard characterization library onto audit device 20.

It should be noted that the audit standard may be a standard entry of the classification standard of any content. For example, if the offending content is identified, the offending types may include those related to pornography, to violence, etc., where there are multiple entries for each offending criteria of the offending type, such as criteria a.1, criteria a.2, etc., related to violence. If the content category is identified, the content category may include traditional culture, non-genetic, random, science and education, and the like, and at this time, the decision standard of each content category has a plurality of entries, such as standard 1.2, standard 1.1, and the like of science and technology. The entry of the minimum unit is the auditing standard, namely the auditing standard is the judging standard of the minimum unit of the violation type and the content category.

When the content to be audited (i.e. the content to be audited) is generated on the client 10, the content to be audited is sent to the auditing equipment 20, the auditing equipment 20 inputs the content to be audited into the deployed auditing model, and the auditing model outputs at least one content representation of the content to be audited. The auditing device 20 matches each content token with all the standard tokens in the standard token library one by one to obtain a match score between the content token and each standard token. Further, the auditing apparatus 20 may determine the auditing result of the content to be audited according to the matching score and all auditing criteria.

In the traditional content auditing method, the category to which the content belongs is used as the label of the content during labeling, and then the labeled content is used for training the model so as to enable the model to learn the association relationship between the content and the label. Once the classification scene (i.e. the audit standard or the classification standard) changes, the label corresponding to the content also changes, and at this time, in order to ensure the accuracy of the model, the model needs to be retrained, which is long in time and high in cost.

According to the content auditing method provided by the embodiment of the invention, like the steps S11 to S13, the auditing model respectively understand the auditing standard and the content to be audited, so that the accuracy of content auditing can be improved and the method is suitable for content auditing in any scene according to matching and auditing result determination of the auditing standard and the understanding result (namely, content characterization and standard characterization) of the content to be audited.

In order to describe the content auditing method provided by the embodiment of the present invention in detail, a model training stage and a model using stage (i.e., the implementation of steps S11 to S15) are described below.

Optionally, the method for auditing contents provided by the embodiment of the present invention further includes a step of training to obtain an auditing model, and referring to fig. 3, that is, in a model training stage, the following implementation manner may be included.

S21, constructing a plurality of content standard pairs according to the content samples and the auditing standards hit by the content samples, and dividing the plurality of content standard pairs into a plurality of sample batches.

In this embodiment, each content standard pair in the sample lot is a positive sample, and the content sample of one content standard pair in the sample lot and the audit standard of any other content standard pair constitute a negative sample.

S23, based on a plurality of sample batches, performing contrast learning training on the model to be trained to obtain an audit model.

For step S21, the content sample may be the smallest sample that hits only one audit standard, and each content sample within a sample batch hits only one audit standard within the sample batch, so each content standard pair within the sample batch is a positive sample, and the content samples in the content standard pair and audit standards in other content standard pairs all constitute negative samples. For example, if the sample lot includes a content standard pair numbered 1-32, the content standard pair numbered 1 is a positive sample, and the content sample in the content standard pair numbered 1 and the audit standard in the content standard pair numbered 2-32 form a negative sample.

Thus, each sample batch is internally provided with a plurality of positive samples and a plurality of negative samples, so that the model to be trained can be subjected to contrast learning training based on the positive samples and the negative samples in the sample batch.

For step S23, referring to fig. 4, the process of performing contrast learning training on the model to be trained to obtain the audit model may include the following embodiments.

S231, extracting one sample batch from the plurality of sample batches, and inputting the sample batch into the model to be trained to obtain the content characterization matrix and the standard characterization matrix.

S232, dot product is carried out on the content characterization matrix and the standard characterization matrix to obtain a matching matrix, and a loss value of the matching matrix is calculated based on the label matrix of the batch.

S233, judging whether the loss value or the current iteration number meets the end condition. If yes, step S234 is executed, and if no (i.e., the loss value and the current iteration number do not satisfy the end condition), step S235 is executed.

S234, taking the current model to be trained as an audit model.

S235, adjusting parameters of the model to be trained according to the loss value. And returns to execute step S231 after step S235.

The model to be trained may include a backbone network, a content optimization network, and a standard optimization network. The backbone network can be any large language model with a parameter amount greater than 10B (the large language model is obtained through large-scale internet text corpus training). The content optimization network may include a feed forward neural network (FNN, feedforward Neural Network) and a normalization layer. The standard optimization network may also include a feedforward neural network and a normalization layer, and the structure of the model to be trained may be as shown in fig. 5.

Alternatively, the normalization layer may be an L2-Norm layer. The dot product result (i.e. matching score) of the content representation and the standard representation after L2-Norm layer normalization is equivalent to the similarity of cosine, the value ranges are all between [ -1,1], and 1 represents the most relevant, i.e. the most matched, of the content representation and the standard representation. Thus, the matching degree between the content characterization and the standard characterization can be more intuitively explained.

For step S231, after the sample lot is input into the model to be trained, the sample lot is first processed through the backbone network to obtain the content implicit representation of each content sample and the standard implicit representation of each auditing standard in the sample lot. And secondly, optimizing and normalizing each content implicit representation through a content optimizing network to obtain a content representation corresponding to each content implicit representation, and integrating the content representations to obtain a content representation matrix. And simultaneously, optimizing and normalizing each standard implicit representation through a standard optimization network to obtain a standard representation corresponding to each standard implicit representation, and integrating the standard representations to obtain a standard representation matrix.

The content tokens in the content token matrix and the standard tokens in the standard token matrix can be arranged according to a preset sequence, and the number of the content tokens and the number of the standard tokens are the same, for example, the content tokens and the standard tokens can beIs a matrix of (a) in the matrix.

For step S232, the content characterization matrix and the standard characterization matrix are subjected to dot product, and the obtained matching matrix isWherein the content characterization matrix and the standard characterization matrix are both/>And the content characterization matrix and the standard characterization matrix at the same position are characterized by the same content standard pair. The matches in the match matrix are divided into products of a content representation and a standard representation, and each match is divided into match fractions of a positive or negative sample of the sample batch.

In addition, the label matrix of the sample lot is a matrix of expected predicted values of positive samples and negative samples of the sample lot, which may be a matrix of 01, i.e., the expected predicted value of the positive sample is 1, and the expected predicted value of the negative sample is 0.

For example, assume that the match matrix may be: 0.9, 0.6 and 0.8 are the matching scores of positive samples, the rest are the matching scores of negative samples, at which point the tag matrix may be: /(I) 。

The manner of calculating the loss function may be flexibly set, for example, any loss function may be used to calculate the loss value, or the loss value may be calculated by customizing the loss function, which is not limited in this embodiment.

In a possible implementation, step S232 may calculate the loss value according to the label matrix and the matching matrix of the sample lot, using a cross entropy loss function.

For step S233, the end condition may include: the loss value is less than the loss threshold; and (II) the iteration number is larger than the iteration threshold. And when one of the conditions is met, training can be completed, otherwise, training is continued.

For step S235, the manner of adjusting the parameters of the model to be trained may be flexibly set, for example, the parameters of the backbone network, the content optimization network and the standard optimization network may be optimally adjusted by Adam optimization algorithm, any other parameter adjustment algorithm may be also used for optimization adjustment, and the adjustment may be performed according to a set rule, which is not limited in this embodiment.

In one possible implementation manner, for the backbone network, according to the previous loss value and the backbone network parameter corresponding to the loss value, the parameter tuning algorithm is utilized to search out the first preferred parameter, and the first preferred parameter is used as the parameter of the backbone network. For example, a loss value-parameter curve may be fitted according to a previous loss value and a backbone network parameter corresponding to the loss value, and a parameter minimizing the loss value may be selected from the loss value-parameter curve by using a parameter adjustment algorithm.

Aiming at the content optimization network, searching out a second preferred parameter by utilizing a parameter adjusting algorithm according to the past loss value and the content optimization network parameter corresponding to the loss value, and taking the second preferred parameter as the parameter of the content optimization network. For example, a loss value-parameter curve may be fitted according to a loss value of the past time and a content optimization network parameter corresponding to the loss value, and a parameter minimizing the loss value may be selected from the loss value-parameter curve by using a parameter adjustment algorithm as a parameter of the content optimization network.

Aiming at the standard optimization network, searching a third preferred parameter by utilizing a parameter adjusting algorithm according to the past loss value and the standard optimization network parameter corresponding to the loss value, and taking the third preferred parameter as the parameter of the standard optimization network. For example, a loss value-parameter curve may be fitted according to a loss value of the past time and a standard optimization network parameter corresponding to the loss value, and a parameter minimizing the loss value may be selected from the loss value-parameter curve by using a parameter adjustment algorithm as a parameter of the standard optimization network.

Through the steps S231 to S235, the matching score of the negative sample is made as small as possible, the matching score of the positive sample is made as high as possible, the backbone network, the content optimization network and the standard optimization network in the model to be trained are trained, the backbone network and the standard optimization network learn and understand the audit standard, and the backbone network and the content optimization network learn and understand the content. Therefore, the auditing model for understanding the auditing standard and the content is obtained, and the auditing model can cope with the auditing standard change or update, so that the model does not need to be retrained when the auditing standard is changed or updated, and the auditing model can be suitable for content auditing in any scene (any auditing standard).

Thus, after the training stage is completed, a well-trained audit model can be obtained, wherein the audit model comprises a backbone network, a content optimization network and a standard optimization network.

The following embodiments may be included in the model use stage, that is, the above steps S11 to S15.

In order to enable steps S11 to S15 to be implemented quickly and accurately, before the audit model is deployed to the audit device 20 or after the audit model is deployed to the audit device 20, before step S11, a step of processing all audit standards using the audit model to obtain a standard characterization library may be further included.

Optionally, first, all audit standards are processed through a backbone network to obtain at least one standard implicit characterization of each audit standard. And secondly, optimizing and normalizing all the standard implicit characterizations through a standard optimizing network to obtain a standard characterization, and storing the standard characterization and the auditing standard association into a standard characterization library. Finally, the standard token library is deployed to the auditing device 20.

The standard tokens and audit criteria in the standard token library may be stored as key-value pairs, e.g., with the audit criteria as values and the standard tokens corresponding to the audit criteria as keys.

Optionally, in step S11, after the content to be audited is input into the audit model, the content to be audited is first processed through the backbone network, so as to obtain a content implicit representation of the content to be audited. And secondly, optimizing and normalizing the implicit content characterization of the content to be audited through a content optimizing network to obtain the content characterization of the content to be audited. Thus, the content characterization of the content to be audited can be obtained.

For step S13, the process of obtaining the matching score may be implemented as: and carrying out dot multiplication on the content characterization and the standard characterization aiming at each standard characterization in the standard characterization library to obtain a matching score between the content characterization and the standard characterization.

For step S15, the manner of determining the audit result in the to-be-audited may be flexibly set according to the matching score and all audit criteria, for example, a voting method may be adopted, or determination may be performed according to a preset rule, which is not limited in this embodiment.

In one possible implementation manner, in step S15, the maximum matching score may be determined from all the matching scores, the audit standard represented by the standard corresponding to the maximum matching score is used as the hit standard of the content to be audited, and the category to which the hit standard belongs is used as the content category of the content to be audited.

For example, the maximum matching score corresponds to a standard token a, the standard token a corresponds to an audit standard a, and the audit standard a belongs to a category a, and then the hit standard of the content to be audited is the audit standard a, and the content category of the content to be audited is the category a.

Alternatively, referring to fig. 6, the following embodiment may be further included in step S15.

S151, sorting all the matching scores according to the sequence from large to small, and taking the auditing standard of the standard representation corresponding to the matching score before the preset sequence as a preselected standard.

And S152, voting for each content category according to a preselected standard, and taking the content category with the largest vote count as the content category of the content to be audited.

It can be understood that if the order of the matching score is top5, the audit standard corresponding to the matching score ranked as the first 5 is selected as the pre-selected standard. Further, each content category is voted for using a preselected criteria. For example, categories include science and technology, education, traditional culture, etc., voting on science and technology if the preselected criteria are auditing criteria related to science and technology, voting on traditional culture if the preselected criteria are auditing criteria related to traditional culture. And finally, taking the content category with the largest ticket number as the content category of the content to be audited.

By the above embodiment of step S15, the auditing standard and/or the content category of the content name to be audited can be determined more accurately.

In order to further improve the application range of the content auditing method, in a possible implementation manner, the content auditing method provided by the embodiment of the invention further may further include a step of updating the standard token library: when the newly added auditing standard is acquired, inputting the newly added auditing standard into an auditing model to obtain a newly added standard representation, and storing the newly added standard representation and the newly added auditing standard in a standard representation library in a correlated manner.

The auditing model processes the newly added auditing standard through a backbone network to obtain a new standard implicit characterization, optimizes and normalizes the new standard implicit characterization through a standard optimizing network to obtain a new standard characterization, and further stores the new standard characterization and the newly added auditing standard in a key value equivalent mode to a standard characterization library.

Therefore, when a new auditing standard or auditing scene changes, the auditing model is used for updating the standard characterization library, and the model is not required to be retrained, so that the cost and time consumption of model training are saved. Meanwhile, the method has a larger application range.

In a possible implementation manner, the embodiment of the present invention further provides a content auditing apparatus 40, referring to fig. 7, including a characterization acquisition module 401, a standard matching module 402, and an audit determination module 403.

The representation acquisition module 401 is configured to input the content to be verified into a pre-trained audit model, and obtain a content representation extracted by the audit model. The auditing model is a characterization extraction model obtained after training by adopting a content standard pair, and the content standard pair comprises a content sample and an auditing standard of hit of the content sample.

The standard matching module 402 is configured to match a content token with a standard token in a preset standard token library, so as to obtain a matching score between the content token and each standard token. The standard characterization in the standard characterization library is obtained after the auditing model processes the auditing standard;

and the auditing judging module 403 is used for determining auditing results of the content to be audited according to the matching score and all auditing standards.

Optionally, a model training module and an updating module can be further included.

Model training module for:

constructing a plurality of content standard pairs according to the content samples and the auditing standards hit by the content samples, and dividing the plurality of content standard pairs into a plurality of sample batches; wherein each content standard pair in the sample batch is a positive sample, and the content sample of any content standard pair in the sample batch and the audit standard of any other content standard pair form a negative sample;

Based on a plurality of sample batches, the model to be trained is subjected to contrast learning training, and an audit model is obtained.

An updating module for: when the newly added auditing standard is acquired, inputting the newly added auditing standard into an auditing model to obtain a newly added standard representation, and storing the newly added standard representation and the newly added auditing standard in a standard representation library in a correlated manner.

In the content auditing device 40, the auditing model respectively understands the auditing standard and the content to be audited through the synergistic effect of the characterization acquisition module 401, the standard matching module 402 and the auditing judging module 403, so that matching and auditing result determination are performed according to the auditing standard and the understanding result of the content to be audited (namely, the content characterization and the standard characterization), the accuracy of content auditing can be improved, and the content auditing device is suitable for content auditing in any scene.

It should be noted that, the content auditing apparatus 40 provided in this embodiment may execute the method flow shown in the foregoing content auditing method flow implementation manner, so as to achieve the corresponding technical effects. For a brief description, reference is made to the corresponding parts of the above embodiments, where this embodiment is not mentioned.

Optionally, the feature obtaining module 401, the standard matching module 402, the audit judging module 403, the model training module and the updating module may be separately provided, or may be integrated in a unit, that is, a processing unit, and the specific implementation manner of the feature obtaining module 401, the standard matching module 402, the audit judging module 403, the model training module and the updating module is not specifically limited.

Optionally, the content auditing apparatus 40 may further include a storage unit, where a program or instructions are stored. The program or instructions, when executed by the token acquisition module 401, the criteria matching module 402, the audit decision module 403, the model training module, and the update module, enable the content audit device 40 to perform any one of the possible implementations of the content audit method of the present invention.

The content auditing device 40 may be a central control system of the auditing apparatus 20, a server, an auditing module of an audio-video sharing platform, or a computer apparatus connected to the audio-video sharing platform in a communication manner, for example, a mobile phone, a tablet computer, a notebook computer, a server, etc., which is not limited in this invention.

In addition, the technical effects of the content auditing apparatus 40 may be the technical effects of the method described in the content auditing method embodiment, and will not be described herein.

The following provides an electronic device 50, which may be a central control system, a server, an audit module of an audio/video sharing platform of the audit device 20, or may be a computer device, such as a mobile phone, a tablet computer, a notebook computer, a server, etc., which is communicatively connected to the audio/video sharing platform. The electronic device 50 is shown in fig. 8, and the above method may be implemented; specifically, the electronic device 50 includes a processor, a memory, and a communication module connected by a system bus. The processor may be a CPU. The memory is used for storing one or more programs, and when the one or more programs are executed by the processor, the motor angle sampling method provided by the embodiment is executed. The memory, the processor and the communication module are electrically connected with each other directly or indirectly so as to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

Wherein the memory is used for storing programs or data. The Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor is configured to read/write data or programs stored in the memory, such as the above-mentioned LVDS, TDC, and programs and data for realizing the functions of the processing unit, and perform the method provided by any embodiment of the present invention.

The communication module is used for establishing communication connection between the electronic device 50 and other communication terminals through a network, and is used for receiving and transmitting data through the network.

It should be understood that the configuration shown in fig. 8 is merely a schematic diagram of the electronic device 50, and that the electronic device 50 may also include more or fewer components than those shown in fig. 8, or have a different configuration than that shown in fig. 8.

Embodiments of the present invention also provide an electronic device 50, a processor and a memory for storing one or more programs; when the one or more programs are executed by the processor, a content auditing method according to any one of the possible implementations of the method embodiments of the present invention is implemented.

Embodiments of the present invention provide a computer-readable storage medium including: a computer program (also referred to as code, or instructions), when executed, causes a computer to perform the content auditing method of any one of the possible implementations of the method embodiments of the present invention. The storage medium may include memory, flash memory, registers, combinations thereof, or the like.

In summary, the content auditing method, device, electronic equipment and storage medium provided by the embodiment of the invention have the following beneficial effects: (1) The method has strong interpretability, and the output result of the auditing equipment is a (content, auditing standard, category) relation pair, and has self-interpretation; (2) The method can deal with the problem of content auditing standards change or update, when a sudden new auditing standard is added, the new standard or the updated standard is input into an auditing model and updated into a standard characterization library; (3) The method can migrate to other scenes, and only the labels of the other scenes are input into the audit model to obtain the standard characterization library of the other scenes.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of content auditing, the method comprising:

Inputting the content to be checked into a pre-trained checking model to obtain the content representation of the content to be checked; the auditing model is a representation extraction model obtained after training by adopting a content standard pair, wherein the content standard pair comprises a content sample and an auditing standard hit by the content sample, and the content to be audited comprises texts, pictures and audios and videos;

Determining an auditing result of the content to be audited according to the matching score and all auditing standards;

the step of determining the auditing result of the content to be audited according to the matching score and all auditing standards comprises the following steps:

Voting each content category according to the preselected standard, and taking the content category with the largest vote number as the content category of the content to be audited;

The method also comprises the step of training to obtain an audit model, and comprises the following steps:

if the loss value or the iteration number meets the ending condition, taking the current model to be trained as an auditing model;

the model to be trained comprises a backbone network, a content optimization network and a standard optimization network;

2. The content auditing method according to claim 1, characterized in that the step of adjusting parameters of the model to be trained according to the loss value comprises:

For the backbone network, searching a first preferred parameter by utilizing a parameter adjustment algorithm according to the loss value and backbone network parameters corresponding to the loss value in the past, and taking the first preferred parameter as a parameter of the backbone network;

Aiming at the content optimization network, searching a second preferred parameter by utilizing a parameter adjustment algorithm according to the loss value and the content optimization network parameter corresponding to the loss value in the past, and taking the second preferred parameter as the parameter of the content optimization network;

and searching a third preferred parameter for the standard optimization network according to the loss value and the standard optimization network parameter corresponding to the loss value in the past by utilizing a parameter adjustment algorithm, and taking the third preferred parameter as the parameter of the standard optimization network.

3. The content auditing method according to claim 1, characterized in that the step of calculating a loss value of the matching score matrix based on a label matrix of the sample lot comprises:

4. A content auditing method according to any of claims 1-3, in which the step of determining the auditing results of the content to be audited based on the matching scores and all of the auditing criteria includes:

5. A content auditing method according to any of claims 1 to 3, in which the step of matching the content representation with standard representations in a pre-set standard representation library to obtain a match score between the content representation and each of the standard representations comprises:

6. A content auditing method according to any of claims 1 to 3, in which the auditing model includes a backbone network and a content optimization network;

7. The content auditing method of claim 1, in which the auditing model includes a backbone network and a standard optimization network;

8. The content auditing method of claim 7, further comprising:

9. The content auditing device is characterized by comprising a characterization acquisition module, a standard matching module, an auditing judging module and a model training module;

The characterization acquisition module is used for inputting the content to be checked into a pre-trained checking model to obtain the content characterization extracted by the checking model; the auditing model is a representation extraction model obtained after training by adopting a content standard pair, wherein the content standard pair comprises a content sample and an auditing standard hit by the content sample, and the content to be audited comprises texts, pictures and audios and videos;

The auditing judging module is used for determining auditing results of the content to be audited according to the matching score and all auditing standards, and comprises the following steps: sorting all the matching scores according to the sequence from large to small, and taking the auditing standard of standard characterization corresponding to the matching score before the preset sequence as a preselected standard; voting each content category according to the preselected standard, and taking the content category with the largest vote number as the content category of the content to be audited;

The model training module is used for constructing a plurality of content standard pairs according to the content samples and the auditing standards hit by the content samples, and dividing the plurality of content standard pairs into a plurality of sample batches; wherein each content standard pair in the sample batch is a positive sample, and the content sample of any one content standard pair in the sample batch and the audit standard of any other content standard pair form a negative sample;

the model training module is further configured to:

10. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the content auditing method of any of claims 1-8.

11. A storage medium having stored thereon a computer program which, when executed by a processor, implements the content auditing method of any of claims 1 to 8.