CN113033610B

CN113033610B - Multi-mode fusion sensitive information classification detection method

Info

Publication number: CN113033610B
Application number: CN202110203458.7A
Authority: CN
Inventors: 张志勇; 宋斌; 张蓝方; 梁腾翔; 徐艳艳; 苗坤霖; 赵长伟; 黄帅娜; 李静; 张孝国
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2022-09-13
Anticipated expiration: 2041-02-23
Also published as: CN113033610A

Abstract

A multi-mode fusion sensitive information classification detection method comprises the steps of 1, carrying out primary sensitivity detection on texts and pictures, 2, judging the sensitivity of the texts based on emotion, and 3, carrying out multi-mode sensitivity detection of image-text fusion. Sensitivity detection needs to be performed on the text and the picture respectively, and the sensitivity of the content can be accurately judged by combining the influence of the emotion polarity and intensity on the sensitive information. The problem of image-text sensitivity is solved according to a proper fusion method, and the detection precision is high.

Description

Multi-mode fusion sensitive information classification detection method

Technical Field

The invention relates to the technical field of internet, in particular to a multi-mode fusion sensitive information classification detection method.

Background

The number of global netizens is large, and an online social network has become a preferred platform for information interaction. With the popularization and the application of the social network becoming more and more extensive, the information of the social network takes pictures, texts and audios and videos as carriers, the trends of diversification, complication and quantification are presented, and a large amount of sensitive information is filled in the social network, thereby seriously affecting the network safety and the physical and psychological health of people. How to detect sensitive information efficiently and accurately by using an artificial intelligence technology becomes an urgent problem to be solved in academic and industrial fields.

Most of the existing researches on sensitive information detection adopt single-mode characteristics to perform sensitive identification, namely, single-mode data analysis. For example, Watanabe et al, paper "bite Speech on Twitter: APragmatic Approach to collectHateful and Offensive Expressions and PerformHate Speech Detection" proposes a method for detecting the information of a bite Wentre for Twitter, which can automatically detect the linguistic patterns and the most common phrase combinations of the bite, and classify them into hatable, objectionable and clean tweets by combining emotion and semantic features. An article "Sensitive Information Detection on Cyber-Space" of LinM et al proposes an iteration-based semi-supervised deep learning model and a humming melody-based search model to detect abnormal audio and video Information. The paper Text classification based on deep belief network and softmaxregression of Jiang, M. et al proposes a mixed Text classification model based on deep belief network and Softmax regression. The method solves the problem of sparse high-dimensional matrix calculation of text data by introducing a deep belief network, and classifies texts in a learning feature space by using Softmax regression after feature extraction is carried out on DBN. A paper "Convolitional Neural Network for Portomographic Images Classification" by IMade Artha Agastya et al proposes a Pornographic picture Classification based on a Convolutional Neural Network. The method adapts to the detection of pornographic pictures by changing the learning rate, the algorithm, the structure of a complete connection layer and the like, and improves the accuracy of the detection result. A paper "Bag of rocks for Efficient Text Classification" by JoulinA et al proposes a fast Text classifier, which is equivalent to a deep learning classifier in terms of accuracy and is faster than the deep learning classifier by multiple orders of magnitude in terms of training and evaluation. An Anthony Hu et al paper, Multimodal Sentiment Analysis To apply the Structure of issues, proposes a Multimodal emotion Analysis method, utilizes a fusion technology, combines Multimodal features, fully excavates emotion types of users, and improves emotion classification accuracy.

Some achievements have been made in the aspect of sensitive information detection, but the following problems still exist: 1) no internal connections and complementary roles between multimodal data features are considered. In fact, in the sensitive information detection, the consideration of interaction among modes is necessary, and the mutual supplement among the information can more fully understand the sensitive information. 2) Although the importance of the emotion factors on the text sensitivity detection is partially considered, the influence of emotion polarity and emotion intensity on the text sensitivity judgment is ignored. 3) Neglecting the problem of picture classification, the essence of illegal quality inspection of pictures is that picture classification is different. Sensitive features of the sensitive picture are numerous, the feature parts are difficult to extract, and if only two categories of simple sensitive and non-sensitive are carried out, the accuracy is low. Compared with a single-mode sensitivity method, the global sensitivity analysis of the tweet has unique advantages, so that the influence of the combined action of data among the modes on the result is considered more reasonably and accurately.

Disclosure of Invention

In order to solve the technical problems, the invention provides a multi-mode fusion sensitive information classification detection method based on deep learning by combining two modes of texts and pictures aiming at the problems of insufficient and inaccurate detection of sensitive information in a social network.

In order to realize the technical purpose, the adopted technical scheme is as follows: a multi-modal fusion sensitive information classification detection method comprises the following steps:

step 1, carrying out sensitivity primary detection on texts and pictures

Adopting FastText to detect the sensitivity of the text, judging which sensitive class or non-sensitive class the text belongs to, and obtaining a classification probability set of characters, which is recorded as:

detecting the sensitivity of the picture by adopting an Inception V3 network, judging which sensitive class or non-sensitive class the picture belongs to, and obtaining a classification probability set of the picture, which is recorded as:

wherein n represents the classification number of the pictures or the texts, and the classification number of the pictures and the classification number of the texts are equal; if the text belongs to the sensitive class, executing the step 2, and if the text belongs to the non-sensitive class, executing the step 3;

step 2, judging the sensitivity of the text based on the emotion

Step 2.1, segmenting the text into a plurality of words by adopting jieba word segmentation, matching the words with the existing emotion word library and sensitive word library to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the two sets to judge whether the emotion words and the sensitive words co-occur or not, wherein the emotion words have emotion polarity intensity comprising emotion polarity and emotion intensity;

step 2.2, judging the sensitivity of the text by combining the emotional polarity intensity of the emotional words and the sensitive words, wherein the calculation method comprises the following steps:

AllSensitiveCount＝PositiveSensitiveCount-NegativeSensitiveCount (4)

wherein PositiveSe sensible count represents the positive emotion score of the sensitive word, NegativeSesensible count represents the negative emotion score of the sensitive word, AllSesensible C count represents the overall emotion score of the sensitive word,

is a sensitive word w _i With positive emotional words w _j The number of co-occurrences is such that,

is a sensitive word w _i With negative emotional words w _j The number of co-occurrence times, n is the total word number after the jieba word segmentation, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words;

step 2.3, if the total emotion score AllSensitivecount is greater than 0, the text is directly judged as the original sensitive classification, and the probability set formed by the sensitive classification probabilities is still recorded as:

when the AllSensitivecount is less than or equal to 0, secondary judgment is needed, the word frequency of the sensitive words is calculated, when the word frequency is greater than a set threshold value, the original sensitive classification is directly judged, and the probability set formed by the sensitive classification probabilities is still recorded as:

otherwise, the text is judged as other sensitive classifications, and a probability set formed by the sensitive classification probabilities is recorded as: p _g ＝[0,0,...,0,1]；

Step 3, multi-modal sensitivity detection of image-text fusion

Obtaining the sensitivity type probability P by the fusion algorithm according to the sensitivity classification probability of the text and the sensitivity classification probability of the picture _i Through MAX (P) _i ) And taking the maximum sensitivity type probability P, and taking the sensitivity classification corresponding to the maximum sensitivity type probability P as a final sensitivity classification result.

The final sensitive classification probability P distribution calculation formula is as follows:

P＝MAX(P _i ) (6)

wherein w is a fusion weight and has a value range of [0, 1%]，P _i Is a sensitive type probability distribution.

The method for judging whether the emotional words and the sensitive words are co-occurring is to take the minimum distance in the sentence where the emotional words and the sensitive words are located as the co-occurrence of the emotional words and the sensitive words according to the principle of the shortest distance.

The invention has the beneficial effects that: the invention provides a multi-mode fusion sensitive information classification detection method based on deep learning, which can accurately judge the sensitivity of contents by combining the influence of emotion polarity and intensity on sensitive information. The problem of image-text sensitivity is solved according to a proper fusion method, and the detection precision is high.

Drawings

FIG. 1 is a diagram of a detection framework of the present invention;

FIG. 2 is a flow chart of the detection according to the present invention.

Detailed Description

The multi-mode fusion sensitive information classification detection method provided by the invention can be roughly divided into three stages: the complete framework of the image-text sensitive feature extraction stage, the sensitive detection classification stage and the image-text feature fusion stage is shown in figure 1. The device mainly comprises three parts: text sensitive information classification, picture sensitive information classification and image-text fusion sensitive classification based on the text sensitive information classification and the picture sensitive information classification. And the text sensitive classification model is used for performing sensitive classification on the text finally by combining the training result of the FastText model and the text classification word bank. And the picture sensitive classification model is characterized in that firstly, a skeleton model is loaded, parameters of each layer are adjusted, then fine tuning training is carried out on a picture training set, and finally the trained model is applied to picture classification of a test set. And calculating to obtain the final image-text fusion sensitive classification probability according to a fusion formula by combining the sensitive classification probabilities of the text and the image.

The invention combines two modes of text and picture to detect and classify sensitive information. The sensitivity detection needs to be performed on the text and the picture respectively, and then the detection results are subjected to fusion processing to obtain the final detection result.

1. Primary detection of text sensitivity. The invention uses FastText to detect the sensitivity of the text. The method includes determining which of the sensitive classes or the non-sensitive classes the text belongs to, for example, classifying the text into four classes including a sensitive class a, a sensitive class B, a sensitive class C, and others. The sensitive classes comprise a sensitive class A, a sensitive class B and a sensitive class C (three sensitive classes), the other classes can be a single class or a plurality of classes which do not belong to the sensitive classes, and FastText is a machine learning training tool which integrates word2vec, text classification and the like, and is a simple and efficient text classification model. The FastText model includes three layers, an input layer, a hidden layer, and an output layer. The model firstly decomposes a text vocabulary through a character level N-gram characteristic format, adds the text vocabulary and an original word to obtain a text sequence (x) ₁ ,x ₂ ,...,x _n-1 ,x _n ) As data input for the network input layer. The hidden layer is the superposition average of a plurality of word vectors, and finally, classified categories are output at the output layer. FastText adopts layered Softmax to construct a Huffman tree according to class frequency, so that the number of model prediction targets is greatly reduced, and the training efficiency and the classification efficiency of the model are improved.

And carrying out sensitivity classification on the text by using a FastText method, and acquiring a probability set of each sensitivity classification, wherein the probability set is recorded as:

and the number of the sensitive classifications of the n texts is consistent with the number of the classifications of the pictures and the texts.

2. A text sensitivity judgment method based on fine-grained emotion. The sensitivity detection and classification of text directly using the FastText model has certain errors, such as: a tweet contains a certain amount of information related to terrorism, but the text context reflects objection and reprimation to the sensitive information, and if the text is directly defined as a sensitive type, the text is certainly biased, so that the subjective emotion of an author in the text has a certain decisive effect on the judgment of the sensitivity of the text, and therefore, the invention introduces emotional polarity to judge the overall sensitivity of the text. And if the text belongs to the sensitive class in the primary detection of the text sensitivity, executing a text sensitivity judgment method based on the fine-grained emotion, and if the text belongs to the non-sensitive class, not performing the text sensitivity judgment method based on the fine-grained emotion.

(1) And (5) performing fine-grained emotion analysis. And taking the emotion polarity of the text into consideration, and adopting fine-grained emotion analysis according to the emotion polarity and strength of the emotion words in the text. A sensitive information identification method based on emotional word and sensitive word co-occurrence analysis is provided. The invention uses the ontology library of the emotional vocabulary of the university of the great courseware to match the emotional words in the text, and each emotional word in the ontology library is divided into three emotional polarities of positive (1), negative (-1) and neutral (0). The emotion word has emotion polarity strength comprising emotion polarity and emotion strength, the emotion polarity strength is higher processing of emotion analysis, the invention sets the value range of the emotion strength as [ -3,3], positive and negative values of the value represent emotion polarity, negative values represent negative emotion, positive values represent positive emotion, 0 represents neutral attitude, and the size represents emotion strength. The method comprises the steps of dividing a text into a plurality of words by using jieba participles, matching the words with an existing emotion word bank and a sensitive word bank to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the emotion word set and the sensitive word set. Acquiring word frequency of the co-occurrence of the emotion words and the sensitive words and emotion intensity of a single emotion word according to whether elements in a Cartesian product co-occur or not, so as to calculate the emotion polarity of the text, wherein the co-occurrence in the sensitive information identification method based on the co-occurrence analysis of the emotion words and the sensitive words refers to the co-occurrence of the emotion words and the sensitive words, and the minimum distance between the emotion words and the sentence in which the sensitive words are located is taken as the co-occurrence of the emotion words and the sensitive words according to the nearest distance principle. The distance between the two is calculated as follows:

dis(w _i ,w _j )＝|index(w _i )-index(w _j ) (1)

wherein, dis (w) _i ,w _j ) The expression w _i And w _j Distance of, index (w) _i ) And index (w) _j ) The subscripts of the positions of the two in the phrases after the Jieba word segmentation are shown, and the subscript of the first word is 1 and is increased in sequence.

(2) And determining sensitivity based on emotion. The invention reflects the emotional characteristics of the text from two aspects of the text emotional polarity and the emotional intensity, and judges the sensitivity of the text by combining the emotional polarity intensity of the emotional words and the sensitive words. The calculation method has the following formula:

AllSensitiveCount＝PositiveSensitiveCount-NegativeSensitiveCount (4)

is a sensitive word w _i With negative emotional words w _j The number of co-occurrence times, n is the total word number after the jieba word segmentation, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words.

The invention divides the emotion polarity into three categories, which can be obtained according to experience and most researchers' results, because most sensitive words contain negative part of speech, and the combination with positive emotion shows that the sensitive words are supported or acquiescent, the sensitive information text containing positive emotion is more tolerantAnd (3) easily obtaining the conclusion of sensitive information, and if the total emotion score AllSensitivecount is greater than 0 and the text is directly judged as the original sensitive classification, the probability set formed by the sensitive classification probabilities is still recorded as:

when AllSensitivecount is less than or equal to 0, secondary judgment is needed, the word frequency of the sensitive words is calculated, when the word frequency is greater than a set threshold value, the sensitive words are directly judged to be original sensitive classification, and a probability set formed by the sensitive classification probabilities is still recorded as:

otherwise, the text is judged as other sensitive classifications, and a probability set formed by the sensitive classification probabilities is recorded as: p _g ＝[0,0,...,0,1]Except that the probability of other classes of text judgment is 1, the probability of other classes of text judgment is 0.

3. And detecting the sensitivity of the picture. The invention uses an IncepotionV 3 network to detect the sensitivity of pictures. The classification number of the sensitive classes is the same as that of the text, for example, the text is classified into four classes including a sensitive class a, a sensitive class B, a sensitive class C and other classes, and the picture is also classified into four classes including a sensitive class a, a sensitive class B, a sensitive class C and other classes. Firstly loading a skeleton model, constructing a pre-training model without a classifier, then adding a global average pooling layer, on one hand, saving a large number of parameters, accelerating operation and reducing overfitting, simultaneously adding a layer of nonlinear expansion model expression capability, then connecting a full-connection layer with 512 nodes, and finally an output layer with 4 nodes, carrying out probability filtering by using a Softmax activation function, detecting the sensitivity of a picture through an IncepotionV 3 network, judging which sensitive class or non-sensitive class the picture belongs to, and obtaining a classification probability set of the picture, and recording as:

4. and (3) multi-modal sensitivity detection of image-text fusion. The invention adopts a decision layer fusion strategy to classify the sensitivity of the text into the classification probability and the sensitivity of the pictureThe classification probability obtains the sensitive type probability P through a fusion algorithm _i By MAX (P) _i ) And taking the maximum sensitivity type probability P, and taking the sensitivity classification corresponding to the maximum sensitivity type probability P as a final sensitivity classification result.

Compared with single-mode sensitivity detection, the image-text fusion mode can effectively form feature complementation, and the final classification probability distribution calculation formula is as follows:

P＝MAX(P _i ) (6)

wherein w is fusion weight, and the numeric area is [0,1 ]]，P _i For sensitive type probability distribution, by adding fusion weight w, converting single-mode text or picture into a multi-mode detection method determined by the two,

and

respectively multiplied by the fusion weights corresponding to the fusion weights respectively and then added to obtain the sensitive type probability P _i Through MAX (P) _i ) And taking the maximum type probability, and taking the corresponding classification as a final sensitive classification result after the fusion algorithm is adopted.

Example 1

In order to verify the effectiveness of the invention, sensitive classes are classified into four classes including a sensitive class A, a sensitive class B, a sensitive class C and other classes by using different types of pictures crawled from the web by a crawler program. The sensitive text data set is obtained by carrying out manual processing modes such as splicing, recombination and the like on related sensitive words, and comprises a sensitive class A, a sensitive class B, a sensitive class C and other classes. The normal text data set is derived from a normal review set of microblogs. The technical scheme of the invention can be implemented as follows:

(1) in the sensitive model training phase. Training a text model by using a sensitive word bank of jieba participles; in the process of training the picture model, firstly, a data set is expanded by randomly turning, cutting, amplifying, cutting and the like the input picture, the diversity of the picture is increased, finally, the picture is normalized to the same size and the label lengths are consistent, the size of the input picture is uniformly set to be 3 multiplied by 224, and the size of the batch _ size is set to be 32. In order to further improve the performance of the model, fine tuning training is carried out on the model, rmsprop is used as an optimizer, a cross entropy function is used as a loss function, an accuracy is used as an evaluation function, and the learning rate of the model is set to be 0.001.

(2) In the stage of fusing the image-text detection results. Setting the fusion weight w to 0.5 indicates that the text and the picture have equal influence on the sensitivity determination.

(3) In the stage of evaluating the detection result. The method is used for carrying out sensitivity detection on 1000 pieces of tweets containing pictures and texts, and the detection result is evaluated through the accuracy, the recall rate and the F value.

(4) Specific examples are given.

After the text content contained in one text is 'support of strict attack of' X work 'of a public security department, opposition to X education' and contains related pictures of X work activities, after the text content is published on a social network, the algorithm firstly classifies the text content into sensitive type B sensitive information by using FastText, then according to an emotional word and sensitive word co-occurrence algorithm, an emotional word set is { 'support', 'strict attack', 'opposition' }, a sensitive word set is { 'X work', 'X education' }, the Cartesian product of the two sets is { ('support', 'X work'), ('support', 'X education'), ('strict harsh attack', 'X work'), 'X education'), the ('opposition', 'X'), 'X'), and then screening is carried out according to the co-occurrence algorithm to obtain a set { ('strict attack', 'X work'), (opposition and teaching of X) }, obtaining posivesensitvecount 0 and negotivesensitvecount 3 according to formulas (2), (3) and (4), (assuming that the harsh emotion intensity is 2, the sensitive word is specified as a negative value, and the negative emotion word is also a negative value) and allsensitvecount-3, and determining that the text is of a non-sensitive type according to the condition that the number of the sensitive words is less than a threshold value 5; and then detecting the picture content by using an Inception V3 model, obtaining a detection result as a sensitive B-type sensitive picture, and judging the whole tweet to be sensitive B according to a multi-mode fusion algorithm. According to the reality, it is obvious that the whole tweet is resisting the X work, but since the picture is sensitive content of sensitive class B and is sensitive information which is not allowed to be issued, the whole tweet still belongs to sensitivity, and therefore the whole tweet is still judged to be sensitive information, namely sensitive class B. Otherwise, if the picture is a normal (other class) picture, the text is non-sensitive information.

Claims

1. A multi-mode fusion sensitive information classification detection method is characterized by comprising the following steps: the method comprises the following steps:

step 1, carrying out sensitivity primary detection on texts and pictures

Adopting FastText to detect the sensitivity of the text, judging which sensitive class or non-sensitive class the text belongs to, and obtaining a classification probability set of the text, and recording the classification probability set as:

detecting the sensitivity of the pictures by adopting an Inception V3 network, judging which sensitive class or non-sensitive class the pictures belong to, and obtaining a classification probability set of the pictures, wherein the classification probability set is recorded as:

step 2, judging the text sensitivity based on emotion

Step 2.1, segmenting the text into a plurality of words by adopting jieba word segmentation, matching the words with the existing emotion word library and sensitive word library to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the two sets to judge whether emotion and sensitive words coexist, wherein the emotion words have emotion polarity strength including emotion polarity and emotion strength;

AllSensitiveCount＝PositiveSensitiveCount-NegativeSensitiveCount (4)

wherein PositiveSensitiveCount represents the positive emotion score of the sensitive word, NegativeSensitiveCount represents the negative emotion score of the sensitive word, AllSensiveCount represents the overall emotion score of the sensitive word,

the number of times the sensitive word co-occurs with the positive emotion word,

the number of the co-occurrence of the sensitive words and the negative emotion words is n, the number of the total words after the jieba word segmentation is n, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words;

otherwise, the reverse is carried outAnd judging the text as other sensitive classifications, and recording a probability set formed by the sensitive classification probabilities as: p _g ＝[0,0,...,0,1]；

Step 3, multi-modal sensitivity detection of image-text fusion

2. The method according to claim 1, wherein the multi-modal fusion sensitive information classification detection method comprises: the final sensitive classification probability P distribution calculation formula is as follows:

P＝MAX(P _i ) (6)

3. The method according to claim 1, wherein the multi-modal fusion sensitive information classification detection method comprises: the method for judging whether the sentiment words and the sensitive words are co-occurring comprises the step of taking the minimum distance in the sentence where the sentiment words and the sensitive words are located as the co-occurrence of the sentiment words and the sensitive words according to the principle of the minimum distance.