CN112765402A - Sensitive information identification method, device, equipment and storage medium - Google Patents

Sensitive information identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112765402A
CN112765402A CN202011639261.XA CN202011639261A CN112765402A CN 112765402 A CN112765402 A CN 112765402A CN 202011639261 A CN202011639261 A CN 202011639261A CN 112765402 A CN112765402 A CN 112765402A
Authority
CN
China
Prior art keywords
video sequence
training
training sample
sample
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011639261.XA
Other languages
Chinese (zh)
Inventor
刘庆宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202011639261.XA priority Critical patent/CN112765402A/en
Publication of CN112765402A publication Critical patent/CN112765402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for identifying sensitive information, which comprise the following steps: obtaining metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture; extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of a sample to be identified; inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information. Therefore, sensitive information in a sample to be identified does not need to be identified manually, so that the labor cost can be reduced, and meanwhile, a better identification effect can be achieved on the obscure sensitive information, so that the efficiency of video identification is improved.

Description

Sensitive information identification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of video recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing sensitive information.
Background
With the development of informatization and the popularity of social networks, a large number of videos are uploaded to a video website and serve as a platform facing public groups, the video website needs to examine and verify the uploaded videos, delete and shield the sensitive videos containing the sensitive information in time, and adverse effects caused by the propagation of the sensitive information are avoided.
At present, the method for identifying the sensitive video mainly depends on manual review by reviewers, but with the explosive increase of the number of videos in a video website, the time and cost consumed by the manual review are rapidly increased, and the manual review method has the problems of omission, false inspection, strong subjectivity of review standards and the like, and has poor identification effect and influences the operation of the video website.
Therefore, a method for identifying sensitive information without relying on manpower is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a sensitive information identification method, a sensitive information identification device, sensitive information identification equipment and a storage medium, so that sensitive information identification independent of manpower is realized. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a sensitive information identification method, including:
acquiring metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the sample to be identified;
inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating an analysis result to obtain the probability that the sample to be identified contains sensitive information.
Optionally, the performing feature extraction on the video sequence to obtain the video sequence feature vector of the sample to be identified includes:
extracting key frames from the video sequence;
inputting the key frame into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector;
and aggregating the key frame feature vectors to obtain the video sequence feature vectors of the samples to be identified.
Optionally, the extracting key frames from the video sequence includes:
a first preset number of key frames is extracted from the video sequence.
Optionally, in a case that the reference information includes a title, the reference feature vector includes a text feature vector;
the extracting the features of the reference information to obtain the reference feature vector of the sample to be identified includes:
and inputting the title into a text feature extraction model obtained by pre-training for feature extraction to obtain a text feature vector of the sample to be recognized.
Optionally, in a case that the reference information includes a cover picture, the reference feature vector includes an image feature vector;
the extracting the features of the reference information to obtain the reference feature vector of the sample to be identified includes:
and inputting the cover picture into an image feature extraction model obtained by pre-training for feature extraction to obtain an image feature vector of the sample to be identified.
Optionally, the calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information includes:
and carrying out weighted summation on the analysis results to obtain the probability that the sample to be identified contains sensitive information.
In a second aspect of the present invention, there is also provided a method for training a sensitive information recognition model, where the method includes:
acquiring metadata of a training sample and an annotation result of the training sample, wherein the annotation result of the training sample is used for indicating whether the training sample contains sensitive information;
extracting the characteristics of the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the training sample;
inputting the video sequence feature vector and the reference feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculating the analysis result to obtain the probability that the training sample contains the sensitive information;
calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information identification model.
Optionally, the performing feature extraction on the video sequence of the training sample to obtain a video sequence feature vector of the training sample includes:
extracting key frames from the video sequence of the training samples;
inputting the key frame of the training sample into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector of the training sample;
and aggregating the key frame feature vectors to obtain the predicted video sequence feature vectors of the training samples.
Optionally, the extracting key frames from the video sequence of the training samples includes:
and extracting a first preset number of key frames from the video sequence of the training sample.
Optionally, the obtaining metadata of the training sample and the labeling result of the training sample further includes:
acquiring a labeling result of the video sequence of the training sample, wherein the labeling result of the video sequence is used for indicating whether the video sequence of the training sample contains sensitive information;
after aggregating the keyframe feature vectors to obtain the predicted video sequence feature vectors of the training samples, the method further includes:
calculating a fourth loss value between the predicted video sequence feature vector and a sensitive information labeling result of the video sequence of the training sample, adjusting parameters of the video sequence feature extraction model until the fourth loss value is smaller than a fourth preset threshold value to obtain a new video sequence feature extraction model, and taking the video sequence feature extraction model and the full connection layer model as the sensitive information identification model.
Optionally, in a case that the reference information includes a title, the reference feature vector includes a text feature vector;
the extracting the features of the reference information of the training sample to obtain the reference feature vector of the training sample includes:
and inputting the title of the training sample into a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample.
Optionally, the obtaining metadata of the training sample and the labeling result of the training sample further includes:
obtaining a labeling result of the title of the training sample, wherein the labeling result of the title is used for indicating whether the title of the training sample contains sensitive information;
after the title of the training sample is input to a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample, the method further comprises:
calculating a second loss value between the predicted text feature vector and the labeling result of the title of the training sample, adjusting parameters of the text feature extraction model until the second loss value is smaller than a second preset threshold value to obtain a new text feature extraction model, and taking the video sequence feature extraction model, the text feature extraction model and the full connection layer model as the sensitive information identification model.
Optionally, in a case that the reference information includes a cover picture, the reference feature vector includes an image feature vector;
the extracting the features of the reference information of the training sample to obtain the reference feature vector of the training sample includes:
and inputting the cover picture of the training sample into an image feature extraction model obtained by pre-training for feature extraction to obtain a predicted image feature vector of the training sample.
Optionally, the obtaining metadata of the training sample and the labeling result of the training sample further includes:
acquiring a labeling result of a cover picture of the training sample, wherein the labeling result of the cover picture is used for indicating whether the cover picture of the training sample contains sensitive information;
after the cover picture of the training sample is input to an image feature extraction model obtained by pre-training for feature extraction, and a predicted image feature vector of the training sample is obtained, the method further comprises the following steps:
calculating a third loss value between the predicted image feature vector and the labeling result of the cover map of the training sample, adjusting the parameters of the image feature extraction model until the third loss value is smaller than a third preset threshold value to obtain a new image feature extraction model, and taking the video sequence feature extraction model, the image feature extraction model and the full connection layer model as the sensitive information identification model.
Optionally, the calculating the analysis result to obtain the probability that the training sample contains the sensitive information includes:
and carrying out weighted summation on the analysis results to obtain the probability that the training sample contains sensitive information.
In a third aspect of the present invention, there is also provided a sensitive information identification apparatus, including:
the system comprises a to-be-identified sample acquisition module, a to-be-identified sample acquisition module and a to-be-identified sample identification module, wherein the to-be-identified sample acquisition module is used for acquiring metadata of a to-be-identified sample, the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
the characteristic extraction module is used for extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the sample to be identified;
and the sensitive information analysis module is used for inputting the video sequence feature vector and the reference feature vector into a full connection layer model obtained by pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating an analysis result to obtain the probability that the sample to be identified contains sensitive information.
In a fourth aspect of the present invention, there is also provided an apparatus for training a sensitive information recognition model, the apparatus including:
the training sample acquisition module is used for acquiring metadata of a training sample and an annotation result of the training sample, wherein the annotation result of the training sample is used for indicating whether the training sample contains sensitive information;
the characteristic extraction module is used for extracting characteristics of the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the training sample;
the sensitive information analysis module is used for inputting the video sequence feature vector and the reference feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculating an analysis result to obtain the probability that the training sample contains sensitive information;
and the parameter adjusting module is used for calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information recognition model.
In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the sensitive information identification methods when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium, having stored therein instructions, which when executed on a computer, cause the computer to execute any one of the above-mentioned sensitive information recognition or sensitive information recognition model training methods.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any one of the above-mentioned sensitive information recognition or sensitive information recognition model training methods.
According to the sensitive information identification method, the sensitive information identification device, the sensitive information identification equipment and the storage medium, metadata of a sample to be identified are obtained, the metadata comprise a video sequence and reference information, and the reference information comprises a title and/or a cover picture; extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of a sample to be identified; inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information.
In this way, the probability that the sample to be recognized contains sensitive information is calculated by combining any one or two items in the title or cover picture of the sample to be recognized and the video sequence of the sample to be recognized, the sensitive information in the sample to be recognized does not need to be recognized manually, so that the labor cost can be reduced, meanwhile, the better recognition effect can be achieved on the more obscure sensitive information, and the efficiency of video recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of a sensitive information identification method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another sensitive information identification method according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart of a sensitive information recognition model training method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating another sensitive information recognition model training method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another embodiment of a sensitive information recognition model training method according to the present invention;
fig. 6 is a schematic structural diagram of a sensitive information identification apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a sensitive information recognition model training apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In the related technology, the method for identifying the sensitive video mainly depends on manual review by a reviewer, but with the explosive increase of the number of videos in a video website, the time and cost consumed by the manual review are rapidly increased, and the manual review method has the problems of missed inspection, false inspection, strong subjectivity of review standards and the like, has poor identification effect and affects the operation of the video website.
In order to solve the above problem, an embodiment of the present invention provides a method for identifying sensitive information, and the method for identifying sensitive information provided by the embodiment of the present invention is generally described below, and includes the following steps:
acquiring metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of a sample to be identified;
inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information.
As can be seen from the above, in the sensitive information identification method provided in the embodiment of the present invention, the probability that the sample to be identified contains sensitive information is calculated by combining any one or two of the title or the cover picture of the sample to be identified and the video sequence of the sample to be identified, and the sensitive information in the sample to be identified does not need to be manually identified, so that the labor cost can be reduced, and meanwhile, a better identification effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video identification.
The sensitive information identification method provided by the embodiment of the invention is described in detail by specific embodiments.
As shown in fig. 1, a schematic flow chart of a sensitive information identification method provided in an embodiment of the present invention is shown, where the method includes the following steps:
s101, obtaining metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture.
In some scenes, a sample to be identified can be a video in a video website, along with the development of informatization and the popularity of social networks, a large number of videos are uploaded to the video website and serve as a platform facing public groups, the video website needs to audit the uploaded videos, sensitive videos containing sensitive information are deleted and shielded in time, and adverse effects caused by the propagation of the sensitive information are avoided.
Each sample to be identified is provided with reference information and a video sequence, wherein the reference information may include a title and/or a cover map of the sample to be identified, the title is text information for describing the sample to be identified, the cover map is an image capable of embodying the content of the sample to be identified, and may be a frame in the video sequence of the sample to be identified, or may not exist in the video sequence, the video sequence is a plurality of video frames including the sample to be identified in the sample to be identified, and sensitive information may exist in any one or more of the title, the cover map and the video sequence of the sample to be identified, and is relatively hidden.
S102: and performing feature extraction on the video sequence and the reference information to respectively obtain a video sequence feature vector and a reference feature vector of the sample to be identified.
In this step, two parts of content are extracted by performing feature extraction on the video sequence and the reference information.
The specific step of performing feature extraction on the video sequence to obtain the video sequence feature vector of the sample to be identified may include: firstly, extracting key frames from a video sequence, then inputting the key frames into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain key frame feature vectors, and further aggregating the key frame feature vectors to obtain the video sequence feature vectors of a sample to be identified.
The key frame is a video frame of an object in the video sequence in a key action, contains more information compared with a reference frame, and the key frame extracted from the video sequence is used for identifying the sensitive information, so that the omission of the information in the analysis process can be reduced as much as possible.
The key frames extracted from the video sequence may be all key frames in the video sequence, or a first preset number of key frames, and are not limited specifically. In this step, the first preset number may be 8, or may be any preset natural number, and the key frames of the first preset number are extracted from the video sequence, so that the computation load for analyzing the video sequence may be reduced.
After the key frames are extracted, the key frames can be adjusted to be in a preset size, and then the adjusted key frames are input to the video feature layer for feature extraction, so that the sizes of the key frames input by the video sequence feature extraction model each time are consistent, and the obtained result is more comparable and more accurate.
In one implementation, the video sequence feature extraction model includes a video feature layer, a video feature pooling layer, and a video feature attention layer, and the step of inputting the keyframes to the video sequence feature extraction model obtained by pre-training to perform feature extraction to obtain the keyframe feature vectors may include: firstly, inputting a key frame into a video feature layer for feature extraction to obtain a key frame feature extraction result; then, inputting the key frame feature extraction result into a video feature pooling layer for global pooling to obtain a pooling result; and inputting the pooling result into a video feature attention layer for processing, and distributing the weight of each pooling result, so that the weight of the key frame feature extraction result containing the sensitive information is higher than the weight of the key frame feature extraction result not containing the sensitive information, thereby obtaining a key frame feature vector.
For example, a method may be used to aggregate the key frame feature vectors, where nextvrad is a modified VLAD (Vector of Local Aggregated descriptor Vector) method, which is an encoding method that may use the Aggregated Local features to represent the global features.
The method comprises the steps of extracting features of the reference information to obtain a reference feature vector of a sample to be identified, extracting features of a cover picture and/or extracting features of a title to obtain a text feature vector and/or an image feature vector respectively, and understandably, compared with a method of extracting features of one of the cover picture and the title, the method has a better sensitive information identification effect.
The step of extracting the features of the title may include: and inputting the title into a text feature extraction model obtained by pre-training for feature extraction to obtain a text feature vector of the sample to be identified. The text feature vector can embody the distribution relation of each word in the title.
In one implementation, the text feature extraction model may include a text feature layer and a text attention layer, and the step of inputting a title to the text feature extraction model obtained by pre-training to perform feature extraction to obtain a text feature vector of the sample to be recognized may include: firstly, performing word segmentation processing on a title by using a conditional random field model to obtain a word segmentation processing result, then inputting the word segmentation processing result into a word vector extraction model for processing to obtain a word vector, further inputting the word vector into a text feature layer for feature extraction to obtain a text feature extraction result, then inputting the text feature extraction result into a text attention layer for processing, and distributing the weight of each text feature extraction result to enable the weight of the text feature extraction result containing sensitive information to be higher than that of the text feature extraction result not containing sensitive information, thereby obtaining the text feature vector of a sample to be recognized.
The word vector extraction model can be an unsupervised word2vec model, when the word segmentation processing result is input into the word vector extraction model for processing, words containing sensitive information in the word segmentation processing result can be labeled firstly, so that the weight of the words containing the sensitive information in the obtained word vector is higher, and the obtained text feature vector is more sensitive to identification of the sensitive information.
The step of extracting features of the cover map may include: and inputting the cover picture into an image feature extraction model obtained by pre-training for feature extraction to obtain an image feature vector of the sample to be identified. The image feature vector may embody a distribution of features in the cover image.
In one implementation, the image feature extraction model may include an image feature layer, an image pooling layer, and an image attention layer, and the step of inputting the cover map into the image feature extraction model obtained by pre-training to perform feature extraction to obtain an image feature vector of the sample to be recognized may include: firstly, inputting a cover map into an image feature layer for feature extraction to obtain an image feature extraction result; inputting the image feature extraction result to an image pooling layer for global pooling to obtain a first pooling result; and then, inputting the pooling results to an image attention layer for processing, and distributing the weight of each pooling result, so that the weight of the image feature extraction result containing the sensitive information is higher than the weight of the image feature extraction result not containing the sensitive information, thereby obtaining the image feature vector of the sample to be identified.
Before the cover sheet is input to the image feature layer for feature extraction, the cover sheet may be adjusted to a preset size, for example, the preset size may be 224 × 224, so that the sizes of the images input by the image feature extraction model each time are consistent, and the obtained result is more comparable and accurate. The image Attention layer may be an Attention layer of CBAM (Convolutional Block Attention Module), or may be an Attention model, and is not limited specifically.
S103: inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information.
In this step, the reference feature vector and the video sequence feature vector may be input into a full link layer model obtained by pre-training, the probabilities that the video sequence feature vector and the reference feature vector are sensitive feature information are analyzed, the analysis results are subjected to weighted summation, the probability that the sample to be identified contains the sensitive information is obtained, and the probability value is (0, 1).
As can be seen from the above, in the sensitive information identification method provided in the embodiment of the present invention, the probability that the sample to be identified contains sensitive information is calculated by combining any one or two of the title or the cover picture of the sample to be identified and the video sequence of the sample to be identified, and the sensitive information in the sample to be identified does not need to be manually identified, so that the labor cost can be reduced, and meanwhile, a better identification effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video identification.
As shown in fig. 2, a schematic flow chart of another sensitive information identification method provided in an embodiment of the present invention is shown, where the method includes the following steps:
s201, obtaining metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and a cover picture.
In some scenes, a sample to be identified can be a video in a video website, along with the development of informatization and the popularity of social networks, a large number of videos are uploaded to the video website and serve as a platform facing public groups, the video website needs to audit the uploaded videos, sensitive videos containing sensitive information are deleted and shielded in time, and adverse effects caused by the propagation of the sensitive information are avoided.
Each sample to be identified is provided with reference information and a video sequence, wherein the reference information may include a title and a cover map of the sample to be identified, the title is text information for describing the sample to be identified, the cover map is an image capable of embodying the content of the sample to be identified, and may be a frame in the video sequence of the sample to be identified, or may not exist in the video sequence, the video sequence is a plurality of video frames including the sample to be identified in the sample to be identified, and sensitive information may exist in any one or more of the title, the cover map and the video sequence of the sample to be identified, and is relatively hidden.
S202: and inputting the title into a text feature extraction model obtained by pre-training for feature extraction to obtain a text feature vector of the sample to be identified.
The text feature vector can reflect the distribution relation of each word in the title.
In one implementation, the text feature extraction model may include a text feature layer and a text attention layer, and the step of inputting a title to the text feature extraction model obtained by pre-training to perform feature extraction to obtain a text feature vector of the sample to be recognized may include: firstly, performing word segmentation processing on a title by using a conditional random field model to obtain a word segmentation processing result, then inputting the word segmentation processing result into a word vector extraction model for processing to obtain a word vector, further inputting the word vector into a text feature layer for feature extraction to obtain a text feature extraction result, then inputting the text feature extraction result into a text attention layer for processing, and distributing the weight of each text feature extraction result to enable the weight of the text feature extraction result containing sensitive information to be higher than that of the text feature extraction result not containing sensitive information, thereby obtaining the text feature vector of a sample to be recognized.
The word vector extraction model can be an unsupervised word2vec model, when the word segmentation processing result is input into the word vector extraction model for processing, words containing sensitive information in the word segmentation processing result can be labeled firstly, so that the weight of the words containing the sensitive information in the obtained word vector is higher, and the obtained text feature vector is more sensitive to identification of the sensitive information.
S203: and inputting the cover picture into an image feature extraction model obtained by pre-training for feature extraction to obtain an image feature vector of the sample to be identified.
The image feature vector can represent feature distribution in the cover image.
In one implementation, the image feature extraction model may include an image feature layer, an image pooling layer, and an image attention layer, and the step of inputting the cover map into the image feature extraction model obtained by pre-training to perform feature extraction to obtain an image feature vector of the sample to be recognized may include: firstly, inputting a cover map into an image feature layer for feature extraction to obtain an image feature extraction result; inputting the image feature extraction result to an image pooling layer for global pooling to obtain a first pooling result; and then, inputting the pooling results to an image attention layer for processing, and distributing the weight of each pooling result, so that the weight of the image feature extraction result containing the sensitive information is higher than the weight of the image feature extraction result not containing the sensitive information, thereby obtaining the image feature vector of the sample to be identified.
Before the cover sheet is input to the image feature layer for feature extraction, the cover sheet may be adjusted to a preset size, for example, the preset size may be 224 × 224, so that the sizes of the images input by the image feature extraction model each time are consistent, and the obtained result is more comparable and accurate. The image Attention layer may be an Attention layer of CBAM (Convolutional Block Attention Module), or may be an Attention model, and is not limited specifically.
S204: and extracting key frames from the video sequence, inputting the key frames into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain key frame feature vectors, and further aggregating the key frame feature vectors to obtain the video sequence feature vectors of the samples to be identified.
The key frame is a video frame of an object in the video sequence in a key action, contains more information compared with a reference frame, and the key frame extracted from the video sequence is used for identifying the sensitive information, so that the omission of the information in the analysis process can be reduced as much as possible.
The key frames extracted from the video sequence may be all key frames in the video sequence, or a first preset number of key frames, and are not limited specifically. In this step, the first preset number may be 8, or may be any preset natural number, and the key frames of the first preset number are extracted from the video sequence, so that the computation load for analyzing the video sequence may be reduced.
After the key frames are extracted, the key frames can be adjusted to be in a preset size, and then the adjusted key frames are input to the video feature layer for feature extraction, so that the sizes of the key frames input by the video sequence feature extraction model each time are consistent, and the obtained result is more comparable and more accurate.
In one implementation, the video sequence feature extraction model includes a video feature layer, a video feature pooling layer, and a video feature attention layer, and the step of inputting the keyframes to the video sequence feature extraction model obtained by pre-training to perform feature extraction to obtain the keyframe feature vectors may include: firstly, inputting a key frame into a video feature layer for feature extraction to obtain a key frame feature extraction result; then, inputting the key frame feature extraction result into a video feature pooling layer for global pooling to obtain a pooling result; and inputting the pooling result into a video feature attention layer for processing, and distributing the weight of each pooling result, so that the weight of the key frame feature extraction result containing the sensitive information is higher than the weight of the key frame feature extraction result not containing the sensitive information, thereby obtaining a key frame feature vector.
For example, a method may be used to aggregate the key frame feature vectors, where nextvrad is a modified VLAD (Vector of Local Aggregated descriptor Vector) method, which is an encoding method that may use the Aggregated Local features to represent the global features.
S205: inputting the video sequence feature vector, the text feature vector and the image feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector, the text feature vector and the image feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information.
In this step, the text feature vector, the image feature vector and the video sequence feature vector may be input into a full connection layer model obtained by pre-training, the probabilities that the video sequence feature vector, the text feature vector and the image feature vector are sensitive feature information are analyzed, the analysis results are subjected to weighted summation, the probability that the sample to be identified contains the sensitive information is obtained, and the probability value is (0, 1).
As can be seen from the above, in the sensitive information identification method provided in the embodiment of the present invention, the probability that the sample to be identified contains sensitive information is calculated by combining any one or two of the title or the cover picture of the sample to be identified and the video sequence of the sample to be identified, and the sensitive information in the sample to be identified does not need to be manually identified, so that the labor cost can be reduced, and meanwhile, a better identification effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video identification.
Corresponding to the sensitive information identification method, the embodiment of the invention also provides a sensitive information identification model training method. As shown in fig. 3, which is a schematic flow chart of a sensitive information recognition model training method provided in an embodiment of the present invention, the method includes the following steps:
s301: and acquiring metadata of the training samples and labeling results of the training samples, wherein the labeling results of the training samples are used for indicating whether the training samples contain sensitive information.
S302: and performing feature extraction on the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence feature vector and a reference feature vector of the training sample.
In this step, two parts of content are extracted by performing feature extraction on the video sequence and the reference information.
The method for extracting the characteristics of the video sequence of the training sample to respectively obtain the video sequence characteristic vectors of the training sample comprises the following steps: extracting key frames from a video sequence of training samples; inputting the key frame of the training sample into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector of the training sample; and aggregating the key frame feature vectors to obtain the predicted video sequence feature vectors of the training samples. Wherein a first preset number of key frames may be extracted from the video sequence of training samples.
In one implementation manner, in the step of obtaining the metadata of the training sample and the annotation result of the training sample, the annotation result of the video sequence of the training sample may also be obtained, where the annotation result of the video sequence is used to indicate whether the video sequence of the training sample contains sensitive information.
After the key frame feature vectors are aggregated to obtain the predicted video sequence feature vectors of the training samples, a fourth loss value between the predicted video sequence feature vectors and the sensitive information labeling results of the video sequences of the training samples can be calculated, parameters of the video sequence feature extraction model are adjusted until the fourth loss value is smaller than a fourth preset threshold value, and a new video sequence feature extraction model is obtained.
The video sequence feature extraction model can be an xception network pre-trained by using imagenet, that is, the video sequence feature extraction model can be a pre-trained network, so that parameters in the video sequence feature extraction model only need to be finely adjusted in the training process, and the training difficulty is low.
The method comprises the steps of extracting features of reference information to obtain a reference feature vector of a training sample, extracting features of a cover picture and/or extracting features of a title to obtain a text feature vector and/or an image feature vector respectively, and understandably, compared with a method of extracting features of one of the cover picture and the title, the method has a better sensitive information identification effect.
The step of extracting the features of the title may include: and inputting the title of the training sample into a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample.
In one implementation manner, in the step of obtaining the metadata of the training sample and the labeling result of the training sample, a labeling result of a title of the training sample may also be obtained, where the labeling result of the title is used to indicate whether the title of the training sample contains sensitive information.
After the title of the training sample is input into a text feature extraction model obtained by pre-training for feature extraction, and a predicted text feature vector of the training sample is obtained, a second loss value between the predicted text feature vector and a labeling result of the title of the training sample can be calculated, parameters of the text feature extraction model are adjusted until the second loss value is smaller than a second preset threshold value, a new text feature extraction model is obtained, and the video sequence feature extraction model, the text feature extraction model and the full connection layer model are used as a sensitive information identification model.
The text feature extraction model can be a CNN (Convolutional Neural Networks) model and an LSTM (Long Short-Term Memory) model, and the LSTM model is a time recursive Neural network model and is suitable for processing and predicting important events with relatively Long intervals and delays in a time sequence.
The step of extracting features of the cover map may include: and inputting the cover picture of the training sample into an image feature extraction model obtained by pre-training for feature extraction to obtain a predicted image feature vector of the training sample.
In one implementation manner, in the step of obtaining the metadata of the training sample and the labeling result of the training sample, a labeling result of a cover map of the training sample may also be obtained, where the labeling result of the cover map is used to indicate whether the cover map of the training sample contains sensitive information;
after inputting the cover picture of the training sample into the image feature extraction model obtained by pre-training for feature extraction to obtain the predictive image feature vector of the training sample, calculating a third loss value between the predictive image feature vector and the labeling result of the cover picture of the training sample, adjusting the parameters of the image feature extraction model until the third loss value is smaller than a third preset threshold value to obtain a new image feature extraction model, and taking the video sequence feature extraction model, the image feature extraction model and the full connection layer model as the sensitive information identification model.
The image feature extraction model can be an xpection (extension inclusion with introduced depth separable convolution) network pre-trained by imagenet, the imagenet is a computer vision system identification project name and is the largest database for image identification in the world at present, and the xpection network is formed by improving a depth neural network inclusion structure and a depth separable convolution and residual error network structure.
In other words, the image feature extraction model may be a pre-trained network, so that only parameters in the image feature extraction model need to be fine-tuned in the training process, the training difficulty is low, and the obtained operation result of the image feature extraction model is more accurate.
S303: inputting the video sequence characteristic vector and the reference characteristic vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence characteristic vector and the reference characteristic vector of the training sample are sensitive characteristic information, and calculating the analysis result to obtain the probability that the training sample contains the sensitive information.
Specifically, the analysis result may be subjected to weighted summation to obtain a probability that the training sample includes the sensitive information, where a value of the probability is (0, 1).
S304: calculating a first loss value between the probability that the training sample contains the sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information recognition model.
The full-connection layer function can be trained by using a cross entropy loss function, and a discarding algorithm can be added to reduce network overfitting.
In the embodiment of the invention, the training process can divide the training sample into a training set, a verification set and a test set according to a mode of 8:1:1, wherein the training set is used for training the model parameters, the verification set is used for verifying the model parameters, and the test set is used for testing the stability of the model obtained by training. The training samples may be a preset number of video data sets in a video website.
As can be seen from the above, the sensitive information recognition model obtained by the sensitive information recognition model training method provided in the embodiment of the present invention combines any one or two of the title or the cover picture of the sample to be recognized and the video sequence of the sample to be recognized, calculates the probability that the sample to be recognized contains sensitive information, and does not need to manually recognize the sensitive information in the sample to be recognized, so that the labor cost can be reduced, and meanwhile, a better recognition effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video recognition.
In addition, in order to learn the influence of different types of characteristics on the models, the embodiment of the invention adopts a multi-task learning mechanism, combines any one or two of a text characteristic extraction model and an image characteristic extraction model, a video sequence characteristic extraction model and a full connection layer model to identify sensitive information, fuses multi-source characteristics and multi-tasks, optimizes the loss value of the characteristic extraction model and the loss value of the full connection layer model together, ensures that the models are easy to interpret, and can improve the generalization and the robustness of the models. Meanwhile, a new hot start training mode is adopted, the reference information of the sample to be recognized and the video sequence feature extraction model are respectively trained in advance, and then the model parameters are hot started on the full connection layer model, so that the mutual disturbance phenomenon among multi-task learning can be reduced.
As shown in fig. 4, a schematic flow chart of another sensitive information recognition model training method provided in the embodiment of the present invention includes the following steps:
s401: and acquiring metadata of the training samples and labeling results of the training samples, wherein the labeling results of the training samples are used for indicating whether the training samples contain sensitive information.
S402: inputting the title of the training sample into a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample; and calculating a second loss value between the predicted text feature vector and the labeling result of the title of the training sample, and adjusting the parameters of the text feature extraction model until the second loss value is smaller than a second preset threshold value to obtain a new text feature extraction model.
The text feature extraction model can be a CNN (Convolutional Neural Networks) model and an LSTM (Long Short-Term Memory) model, and the LSTM model is a time recursive Neural network model and is suitable for processing and predicting important events with relatively Long intervals and delays in a time sequence.
S403: inputting a cover picture of a training sample into an image feature extraction model obtained by pre-training for feature extraction to obtain a predicted image feature vector of the training sample; and calculating a third loss value between the predicted image feature vector and the labeling result of the cover map of the training sample, adjusting the parameters of the image feature extraction model until the third loss value is smaller than a third preset threshold value to obtain a new image feature extraction model, and taking the video sequence feature extraction model, the image feature extraction model and the full-connection layer model as a sensitive information identification model.
The image feature extraction model can be an xpection (extension inclusion with introduced depth separable convolution) network pre-trained by imagenet, the imagenet is a computer vision system identification project name and is the largest database for image identification in the world at present, and the xpection network is formed by improving a depth neural network inclusion structure and a depth separable convolution and residual error network structure.
In other words, the image feature extraction model may be a pre-trained network, so that only parameters in the image feature extraction model need to be fine-tuned in the training process, the training difficulty is low, and the obtained operation result of the image feature extraction model is more accurate.
S404: extracting key frames from a video sequence of training samples; inputting the key frame of the training sample into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector of the training sample; aggregating the key frame feature vectors to obtain a prediction video sequence feature vector of the training sample; and calculating a fourth loss value between the predicted video sequence feature vector and the sensitive information labeling result of the video sequence of the training sample, and adjusting the parameters of the video sequence feature extraction model until the fourth loss value is smaller than a fourth preset threshold value to obtain a new video sequence feature extraction model.
Wherein a first preset number of key frames may be extracted from the video sequence of training samples. The video sequence feature extraction model can be an xception network pre-trained by using imagenet, that is, the video sequence feature extraction model can be a pre-trained network, so that parameters in the video sequence feature extraction model only need to be finely adjusted in the training process, and the training difficulty is low.
The embodiment of the invention adopts a multi-task learning mechanism, combines any one or two of a text characteristic extraction model and an image characteristic extraction model, a video sequence characteristic extraction model and a full connection layer model to identify sensitive information, fuses multi-source characteristics and multi-tasks, and optimizes the loss value of the characteristic extraction model and the loss value of the full connection layer model together, so that the model is easy to explain, and the generalization and the robustness of the model can be improved.
S405: inputting the video sequence feature vector, the text feature vector and the image feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector, the text feature vector and the image feature vector of the training sample are sensitive feature information, and calculating the analysis result to obtain the probability that the training sample contains the sensitive information.
Specifically, the analysis result may be subjected to weighted summation to obtain a probability that the training sample includes the sensitive information, where a value of the probability is (0, 1).
S406: calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of a preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model, a video sequence feature extraction model, a text feature extraction model and an image feature extraction model as sensitive information recognition models.
The full-connection layer function can be trained by using a cross entropy loss function, and a discarding algorithm can be added to reduce network overfitting.
In the embodiment of the invention, the training process can divide the training sample into a training set, a verification set and a test set according to a mode of 8:1:1, wherein the training set is used for training the model parameters, the verification set is used for verifying the model parameters, and the test set is used for testing the stability of the model obtained by training. The training samples may be a preset number of video data sets in a video website.
As can be seen from the above, the sensitive information recognition model obtained by the sensitive information recognition model training method provided in the embodiment of the present invention combines any one or two of the title or the cover picture of the sample to be recognized and the video sequence of the sample to be recognized, calculates the probability that the sample to be recognized contains sensitive information, and does not need to manually recognize the sensitive information in the sample to be recognized, so that the labor cost can be reduced, and meanwhile, a better recognition effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video recognition.
Meanwhile, a new hot start training mode is adopted, the reference information of the sample to be recognized and the video sequence feature extraction model are respectively trained in advance, and then the model parameters are hot started on the full connection layer model, so that the mutual disturbance phenomenon among multi-task learning can be reduced.
Fig. 5 is a schematic diagram of a scheme of a sensitive information recognition model training method in an embodiment of the present invention.
The title of the training sample is firstly processed by a word vector extraction model to obtain a word vector, then a discarding algorithm can be added to reduce network overfitting, the word vector is input to a bidirectional LSTM model to be processed to obtain a text characteristic vector, then a loss value of the text characteristic extraction model is calculated by a full-connection neural network, and the text characteristic extraction model is obtained by training according to an iterative algorithm.
The cover of a training sample firstly passes through an Xception network, wherein the Xception network structure comprises a plurality of modules, the output of the 14 th module can be used as the input of a channel attention model in the application, then the channel attention model and a space attention model are processed, then the global pooling is carried out to obtain an image feature vector, further, the loss value of the image feature extraction model is calculated through a fully-connected neural network, and the image feature extraction model is obtained according to the iterative algorithm training.
The method comprises the steps that key frames in a video sequence of a training sample pass through an xception network to obtain parameters of the xception network, then the key frames pass through a fully-connected neural network and are input into a soft differentiation clustering model or a group attention model in a NeXtVLAD network to be processed, then the key frames pass through an L2 regularization, discarding algorithm and the fully-connected neural network, then the key frames pass through an SE (Squeeze-and-Excitation) nonlinear unit to be processed to obtain a video sequence feature vector, then the loss value of a video sequence feature extraction model is calculated through the fully-connected neural network, and the video sequence feature extraction model is obtained through training according to an iterative algorithm.
Meanwhile, the full-connection layer model can be trained according to the text characteristic vector, the image characteristic vector and the video sequence characteristic vector of the training sample, the loss value of the full-connection layer model is calculated, and the full-connection layer model is obtained according to the iterative algorithm training.
In this way, after the sample to be recognized is obtained, the title of the sample to be recognized can be input into the text feature extraction model for feature extraction, and the text feature vector of the sample to be recognized is obtained; inputting the cover of the sample to be identified into an image feature extraction model for feature extraction to obtain an image feature vector of the sample to be identified; extracting a first preset number of key frames from a video sequence of a sample to be identified, inputting the key frames into a video sequence feature extraction model for feature extraction to obtain key frame feature vectors, and aggregating the key frame feature vectors to obtain the video sequence feature vectors of the sample to be identified; and then, inputting the text feature vector, the image feature vector and the video sequence feature vector of the sample to be recognized into a full-connection layer model obtained by pre-training for calculation to obtain the probability that the sample to be recognized contains sensitive information.
As shown in fig. 6, an embodiment of the present invention further provides a sensitive information identification apparatus, where the apparatus includes:
a to-be-identified sample obtaining module 501, configured to obtain metadata of a to-be-identified sample, where the metadata includes a video sequence and reference information, and the reference information includes a title and/or a jacket photograph;
a feature extraction module 502, configured to perform feature extraction on the video sequence and the reference information to obtain a video sequence feature vector and a reference feature vector of the sample to be identified respectively;
the sensitive information analysis module 503 is configured to input the video sequence feature vector and the reference feature vector into a full link layer model obtained through pre-training, analyze the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculate an analysis result to obtain the probability that the sample to be identified contains sensitive information.
As can be seen from the above, the sensitive information identification device provided in the embodiment of the present invention calculates the probability that the sample to be identified contains sensitive information by combining any one or two of the title or the cover picture of the sample to be identified and the video sequence of the sample to be identified, and does not need to manually identify the sensitive information in the sample to be identified, so that the labor cost can be reduced, and meanwhile, a better identification effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video identification.
The embodiment of the invention also provides a training device of the sensitive information identification model, which corresponds to the sensitive information identification device. As shown in fig. 7, which is a schematic flow chart of a sensitive information recognition model training method provided in an embodiment of the present invention, the apparatus includes:
a training sample obtaining module 601, configured to obtain metadata of a training sample and an annotation result of the training sample, where the annotation result of the training sample is used to indicate whether the training sample contains sensitive information;
a feature extraction module 602, configured to perform feature extraction on the video sequence of the training sample and the reference information of the training sample to obtain a video sequence feature vector and a reference feature vector of the training sample, respectively;
the sensitive information analysis module 603 is configured to input the video sequence feature vector and the reference feature vector of the training sample into a preset full-connected layer model, analyze the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculate an analysis result to obtain the probability that the training sample contains sensitive information;
the parameter adjusting module 604 is configured to calculate a first loss value between a probability that the training sample includes sensitive information and a labeling result of the training sample, adjust a parameter of the preset full-link layer model until the first loss value is smaller than a first preset threshold, and use the obtained full-link layer model as the sensitive information identification model.
As can be seen from the above, the sensitive information recognition model obtained by the sensitive information recognition model training device provided in the embodiment of the present invention combines any one or two of the title or the cover picture of the sample to be recognized and the video sequence of the sample to be recognized, calculates the probability that the sample to be recognized contains sensitive information, and does not need to manually recognize the sensitive information in the sample to be recognized, so that the labor cost can be reduced, and meanwhile, a better recognition effect can be obtained for more obscure sensitive information, thereby improving the efficiency of video recognition.
An embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:
acquiring metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of a sample to be identified;
inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating the analysis result to obtain the probability that the sample to be identified contains the sensitive information.
Alternatively, the following steps are implemented:
acquiring metadata of a training sample and an annotation result of the training sample, wherein the annotation result of the training sample is used for indicating whether the training sample contains sensitive information;
extracting the characteristics of the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the training sample;
inputting the video sequence feature vector and the reference feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculating the analysis result to obtain the probability that the training sample contains the sensitive information;
calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information identification model.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute the sensitive information recognition or the sensitive information recognition model training method described in any of the above embodiments.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the sensitive information recognition or the sensitive information recognition model training method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (19)

1. A method for identifying sensitive information, the method comprising:
acquiring metadata of a sample to be identified, wherein the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the sample to be identified;
inputting the video sequence feature vector and the reference feature vector into a full-connection layer model obtained through pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating an analysis result to obtain the probability that the sample to be identified contains sensitive information.
2. The method according to claim 1, wherein the performing feature extraction on the video sequence to obtain a video sequence feature vector of the sample to be identified comprises:
extracting key frames from the video sequence;
inputting the key frame into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector;
and aggregating the key frame feature vectors to obtain the video sequence feature vectors of the samples to be identified.
3. The method of claim 2, wherein said extracting key frames from said video sequence comprises:
a first preset number of key frames is extracted from the video sequence.
4. The method according to claim 1, wherein in the case where the reference information includes a title, the reference feature vector includes a text feature vector;
the extracting the features of the reference information to obtain the reference feature vector of the sample to be identified includes:
and inputting the title into a text feature extraction model obtained by pre-training for feature extraction to obtain a text feature vector of the sample to be recognized.
5. The method according to claim 1, wherein in the case where the reference information includes a jacket photograph, the reference feature vector includes an image feature vector;
the extracting the features of the reference information to obtain the reference feature vector of the sample to be identified includes:
and inputting the cover picture into an image feature extraction model obtained by pre-training for feature extraction to obtain an image feature vector of the sample to be identified.
6. The method of claim 1, wherein the calculating the analysis result to obtain the probability that the sample to be identified contains sensitive information comprises:
and carrying out weighted summation on the analysis results to obtain the probability that the sample to be identified contains sensitive information.
7. A training method for a sensitive information recognition model is characterized by comprising the following steps:
acquiring metadata of a training sample and an annotation result of the training sample, wherein the annotation result of the training sample is used for indicating whether the training sample contains sensitive information;
extracting the characteristics of the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the training sample;
inputting the video sequence feature vector and the reference feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculating the analysis result to obtain the probability that the training sample contains the sensitive information;
calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information identification model.
8. The method according to claim 7, wherein the performing feature extraction on the video sequence of the training samples to obtain the video sequence feature vector of the training samples comprises:
extracting key frames from the video sequence of the training samples;
inputting the key frame of the training sample into a video sequence feature extraction model obtained by pre-training for feature extraction to obtain a key frame feature vector of the training sample;
and aggregating the key frame feature vectors to obtain the predicted video sequence feature vectors of the training samples.
9. The method of claim 8, wherein extracting key frames from the video sequence of training samples comprises:
and extracting a first preset number of key frames from the video sequence of the training sample.
10. The method of claim 8, wherein obtaining metadata of training samples and labeling results of the training samples further comprises:
acquiring a labeling result of the video sequence of the training sample, wherein the labeling result of the video sequence is used for indicating whether the video sequence of the training sample contains sensitive information;
after aggregating the keyframe feature vectors to obtain the predicted video sequence feature vectors of the training samples, the method further includes:
calculating a fourth loss value between the predicted video sequence feature vector and a sensitive information labeling result of the video sequence of the training sample, adjusting parameters of the video sequence feature extraction model until the fourth loss value is smaller than a fourth preset threshold value to obtain a new video sequence feature extraction model, and taking the video sequence feature extraction model and the full connection layer model as the sensitive information identification model.
11. The method according to claim 8, wherein in the case where the reference information includes a title, the reference feature vector includes a text feature vector;
the extracting the features of the reference information of the training sample to obtain the reference feature vector of the training sample includes:
and inputting the title of the training sample into a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample.
12. The method of claim 11, wherein obtaining metadata of training samples and labeling results of the training samples further comprises:
obtaining a labeling result of the title of the training sample, wherein the labeling result of the title is used for indicating whether the title of the training sample contains sensitive information;
after the title of the training sample is input to a text feature extraction model obtained by pre-training for feature extraction to obtain a predicted text feature vector of the training sample, the method further comprises:
calculating a second loss value between the predicted text feature vector and the labeling result of the title of the training sample, adjusting parameters of the text feature extraction model until the second loss value is smaller than a second preset threshold value to obtain a new text feature extraction model, and taking the video sequence feature extraction model, the text feature extraction model and the full connection layer model as the sensitive information identification model.
13. The method according to claim 8, wherein in the case where the reference information includes a jacket photograph, the reference feature vector includes an image feature vector;
the extracting the features of the reference information of the training sample to obtain the reference feature vector of the training sample includes:
and inputting the cover picture of the training sample into an image feature extraction model obtained by pre-training for feature extraction to obtain a predicted image feature vector of the training sample.
14. The method of claim 13, wherein obtaining metadata of training samples and labeling results of the training samples further comprises:
acquiring a labeling result of a cover picture of the training sample, wherein the labeling result of the cover picture is used for indicating whether the cover picture of the training sample contains sensitive information;
after the cover picture of the training sample is input to an image feature extraction model obtained by pre-training for feature extraction, and a predicted image feature vector of the training sample is obtained, the method further comprises the following steps:
calculating a third loss value between the predicted image feature vector and the labeling result of the cover map of the training sample, adjusting the parameters of the image feature extraction model until the third loss value is smaller than a third preset threshold value to obtain a new image feature extraction model, and taking the video sequence feature extraction model, the image feature extraction model and the full connection layer model as the sensitive information identification model.
15. The method of claim 7, wherein the calculating the analysis result to obtain the probability that the training sample contains sensitive information comprises:
and carrying out weighted summation on the analysis results to obtain the probability that the training sample contains sensitive information.
16. An apparatus for identifying sensitive information, the apparatus comprising:
the system comprises a to-be-identified sample acquisition module, a to-be-identified sample acquisition module and a to-be-identified sample identification module, wherein the to-be-identified sample acquisition module is used for acquiring metadata of a to-be-identified sample, the metadata comprises a video sequence and reference information, and the reference information comprises a title and/or a cover picture;
the characteristic extraction module is used for extracting the characteristics of the video sequence and the reference information to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the sample to be identified;
and the sensitive information analysis module is used for inputting the video sequence feature vector and the reference feature vector into a full connection layer model obtained by pre-training, analyzing the probability that the video sequence feature vector and the reference feature vector are sensitive feature information, and calculating an analysis result to obtain the probability that the sample to be identified contains sensitive information.
17. An apparatus for training a sensitive information recognition model, the apparatus comprising:
the training sample acquisition module is used for acquiring metadata of a training sample and an annotation result of the training sample, wherein the annotation result of the training sample is used for indicating whether the training sample contains sensitive information;
the characteristic extraction module is used for extracting characteristics of the video sequence of the training sample and the reference information of the training sample to respectively obtain a video sequence characteristic vector and a reference characteristic vector of the training sample;
the sensitive information analysis module is used for inputting the video sequence feature vector and the reference feature vector of the training sample into a preset full-connection layer model, analyzing the probability that the video sequence feature vector and the reference feature vector of the training sample are sensitive feature information, and calculating an analysis result to obtain the probability that the training sample contains sensitive information;
and the parameter adjusting module is used for calculating a first loss value between the probability that the training sample contains sensitive information and the labeling result of the training sample, adjusting the parameters of the preset full-connection layer model until the first loss value is smaller than a first preset threshold value, and taking the obtained full-connection layer model as the sensitive information recognition model.
18. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-15 when executing a program stored in the memory.
19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-15.
CN202011639261.XA 2020-12-31 2020-12-31 Sensitive information identification method, device, equipment and storage medium Pending CN112765402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011639261.XA CN112765402A (en) 2020-12-31 2020-12-31 Sensitive information identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011639261.XA CN112765402A (en) 2020-12-31 2020-12-31 Sensitive information identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112765402A true CN112765402A (en) 2021-05-07

Family

ID=75698389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011639261.XA Pending CN112765402A (en) 2020-12-31 2020-12-31 Sensitive information identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112765402A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673427A (en) * 2021-08-20 2021-11-19 北京达佳互联信息技术有限公司 Video identification determination method and device, electronic equipment and storage medium
CN115909390A (en) * 2021-09-30 2023-04-04 腾讯科技(深圳)有限公司 Vulgar content identification method, vulgar content identification device, computer equipment and storage medium
CN117541969A (en) * 2024-01-09 2024-02-09 四川大学 Pornography video detection method based on semantics and image enhancement

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684513A (en) * 2018-12-14 2019-04-26 北京奇艺世纪科技有限公司 A kind of low quality video recognition methods and device
CN109977848A (en) * 2019-03-22 2019-07-05 广州新视展投资咨询有限公司 Training method and device, the computer equipment and readable medium of pornographic detection model
CN110991171A (en) * 2019-09-30 2020-04-10 奇安信科技集团股份有限公司 Sensitive word detection method and device
CN111144360A (en) * 2019-12-31 2020-05-12 新疆联海创智信息科技有限公司 Multimode information identification method and device, storage medium and electronic equipment
CN111222450A (en) * 2020-01-02 2020-06-02 广州虎牙科技有限公司 Model training method, model training device, model live broadcast processing equipment and storage medium
CN111523399A (en) * 2020-03-31 2020-08-11 易视腾科技股份有限公司 Sensitive video detection and device
CN111723784A (en) * 2020-07-30 2020-09-29 腾讯科技(深圳)有限公司 Risk video identification method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684513A (en) * 2018-12-14 2019-04-26 北京奇艺世纪科技有限公司 A kind of low quality video recognition methods and device
CN109977848A (en) * 2019-03-22 2019-07-05 广州新视展投资咨询有限公司 Training method and device, the computer equipment and readable medium of pornographic detection model
CN110991171A (en) * 2019-09-30 2020-04-10 奇安信科技集团股份有限公司 Sensitive word detection method and device
CN111144360A (en) * 2019-12-31 2020-05-12 新疆联海创智信息科技有限公司 Multimode information identification method and device, storage medium and electronic equipment
CN111222450A (en) * 2020-01-02 2020-06-02 广州虎牙科技有限公司 Model training method, model training device, model live broadcast processing equipment and storage medium
CN111523399A (en) * 2020-03-31 2020-08-11 易视腾科技股份有限公司 Sensitive video detection and device
CN111723784A (en) * 2020-07-30 2020-09-29 腾讯科技(深圳)有限公司 Risk video identification method and device and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673427A (en) * 2021-08-20 2021-11-19 北京达佳互联信息技术有限公司 Video identification determination method and device, electronic equipment and storage medium
CN113673427B (en) * 2021-08-20 2024-03-22 北京达佳互联信息技术有限公司 Video identification method, device, electronic equipment and storage medium
CN115909390A (en) * 2021-09-30 2023-04-04 腾讯科技(深圳)有限公司 Vulgar content identification method, vulgar content identification device, computer equipment and storage medium
CN117541969A (en) * 2024-01-09 2024-02-09 四川大学 Pornography video detection method based on semantics and image enhancement
CN117541969B (en) * 2024-01-09 2024-04-16 四川大学 Pornography video detection method based on semantics and image enhancement

Similar Documents

Publication Publication Date Title
CN109117777B (en) Method and device for generating information
CN112765402A (en) Sensitive information identification method, device, equipment and storage medium
CN108376129B (en) Error correction method and device
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN112419268A (en) Method, device, equipment and medium for detecting image defects of power transmission line
CN110968689A (en) Training method of criminal name and law bar prediction model and criminal name and law bar prediction method
CN112507167A (en) Method and device for identifying video collection, electronic equipment and storage medium
WO2024131406A1 (en) Model construction method and apparatus, image segmentation method and apparatus, and device and medium
CN116129224A (en) Training method, classifying method and device for detection model and electronic equipment
CN117036843A (en) Target detection model training method, target detection method and device
CN114638304A (en) Training method of image recognition model, image recognition method and device
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN115131695A (en) Training method of video detection model, video detection method and device
CN112434717A (en) Model training method and device
CN113971402A (en) Content identification method, device, medium and electronic equipment
CN113297525A (en) Webpage classification method and device, electronic equipment and storage medium
CN113656575A (en) Training data generation method and device, electronic equipment and readable medium
CN113239883A (en) Method and device for training classification model, electronic equipment and storage medium
CN113033500B (en) Motion segment detection method, model training method and device
CN116776932A (en) E-commerce behavior recognition method and device for user
WO2022237065A1 (en) Classification model training method, video classification method, and related device
CN115984734A (en) Model training method, video recall method, model training device, video recall device, electronic equipment and storage medium
CN115205619A (en) Training method, detection method, device and storage medium for detection model
CN114625860A (en) Contract clause identification method, device, equipment and medium
CN117992835B (en) Multi-strategy label disambiguation partial multi-label classification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination