CN113033610B - Multi-mode fusion sensitive information classification detection method - Google Patents
Multi-mode fusion sensitive information classification detection method Download PDFInfo
- Publication number
- CN113033610B CN113033610B CN202110203458.7A CN202110203458A CN113033610B CN 113033610 B CN113033610 B CN 113033610B CN 202110203458 A CN202110203458 A CN 202110203458A CN 113033610 B CN113033610 B CN 113033610B
- Authority
- CN
- China
- Prior art keywords
- sensitive
- emotion
- classification
- text
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
A multi-mode fusion sensitive information classification detection method comprises the steps of 1, carrying out primary sensitivity detection on texts and pictures, 2, judging the sensitivity of the texts based on emotion, and 3, carrying out multi-mode sensitivity detection of image-text fusion. Sensitivity detection needs to be performed on the text and the picture respectively, and the sensitivity of the content can be accurately judged by combining the influence of the emotion polarity and intensity on the sensitive information. The problem of image-text sensitivity is solved according to a proper fusion method, and the detection precision is high.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a multi-mode fusion sensitive information classification detection method.
Background
The number of global netizens is large, and an online social network has become a preferred platform for information interaction. With the popularization and the application of the social network becoming more and more extensive, the information of the social network takes pictures, texts and audios and videos as carriers, the trends of diversification, complication and quantification are presented, and a large amount of sensitive information is filled in the social network, thereby seriously affecting the network safety and the physical and psychological health of people. How to detect sensitive information efficiently and accurately by using an artificial intelligence technology becomes an urgent problem to be solved in academic and industrial fields.
Most of the existing researches on sensitive information detection adopt single-mode characteristics to perform sensitive identification, namely, single-mode data analysis. For example, Watanabe et al, paper "bite Speech on Twitter: APragmatic Approach to collectHateful and Offensive Expressions and PerformHate Speech Detection" proposes a method for detecting the information of a bite Wentre for Twitter, which can automatically detect the linguistic patterns and the most common phrase combinations of the bite, and classify them into hatable, objectionable and clean tweets by combining emotion and semantic features. An article "Sensitive Information Detection on Cyber-Space" of LinM et al proposes an iteration-based semi-supervised deep learning model and a humming melody-based search model to detect abnormal audio and video Information. The paper Text classification based on deep belief network and softmaxregression of Jiang, M. et al proposes a mixed Text classification model based on deep belief network and Softmax regression. The method solves the problem of sparse high-dimensional matrix calculation of text data by introducing a deep belief network, and classifies texts in a learning feature space by using Softmax regression after feature extraction is carried out on DBN. A paper "Convolitional Neural Network for Portomographic Images Classification" by IMade Artha Agastya et al proposes a Pornographic picture Classification based on a Convolutional Neural Network. The method adapts to the detection of pornographic pictures by changing the learning rate, the algorithm, the structure of a complete connection layer and the like, and improves the accuracy of the detection result. A paper "Bag of rocks for Efficient Text Classification" by JoulinA et al proposes a fast Text classifier, which is equivalent to a deep learning classifier in terms of accuracy and is faster than the deep learning classifier by multiple orders of magnitude in terms of training and evaluation. An Anthony Hu et al paper, Multimodal Sentiment Analysis To apply the Structure of issues, proposes a Multimodal emotion Analysis method, utilizes a fusion technology, combines Multimodal features, fully excavates emotion types of users, and improves emotion classification accuracy.
Some achievements have been made in the aspect of sensitive information detection, but the following problems still exist: 1) no internal connections and complementary roles between multimodal data features are considered. In fact, in the sensitive information detection, the consideration of interaction among modes is necessary, and the mutual supplement among the information can more fully understand the sensitive information. 2) Although the importance of the emotion factors on the text sensitivity detection is partially considered, the influence of emotion polarity and emotion intensity on the text sensitivity judgment is ignored. 3) Neglecting the problem of picture classification, the essence of illegal quality inspection of pictures is that picture classification is different. Sensitive features of the sensitive picture are numerous, the feature parts are difficult to extract, and if only two categories of simple sensitive and non-sensitive are carried out, the accuracy is low. Compared with a single-mode sensitivity method, the global sensitivity analysis of the tweet has unique advantages, so that the influence of the combined action of data among the modes on the result is considered more reasonably and accurately.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multi-mode fusion sensitive information classification detection method based on deep learning by combining two modes of texts and pictures aiming at the problems of insufficient and inaccurate detection of sensitive information in a social network.
In order to realize the technical purpose, the adopted technical scheme is as follows: a multi-modal fusion sensitive information classification detection method comprises the following steps:
step 1, carrying out sensitivity primary detection on texts and pictures
Adopting FastText to detect the sensitivity of the text, judging which sensitive class or non-sensitive class the text belongs to, and obtaining a classification probability set of characters, which is recorded as:detecting the sensitivity of the picture by adopting an Inception V3 network, judging which sensitive class or non-sensitive class the picture belongs to, and obtaining a classification probability set of the picture, which is recorded as:wherein n represents the classification number of the pictures or the texts, and the classification number of the pictures and the classification number of the texts are equal; if the text belongs to the sensitive class, executing the step 2, and if the text belongs to the non-sensitive class, executing the step 3;
step 2, judging the sensitivity of the text based on the emotion
Step 2.1, segmenting the text into a plurality of words by adopting jieba word segmentation, matching the words with the existing emotion word library and sensitive word library to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the two sets to judge whether the emotion words and the sensitive words co-occur or not, wherein the emotion words have emotion polarity intensity comprising emotion polarity and emotion intensity;
step 2.2, judging the sensitivity of the text by combining the emotional polarity intensity of the emotional words and the sensitive words, wherein the calculation method comprises the following steps:
AllSensitiveCount=PositiveSensitiveCount-NegativeSensitiveCount (4)
wherein PositiveSe sensible count represents the positive emotion score of the sensitive word, NegativeSesensible count represents the negative emotion score of the sensitive word, AllSesensible C count represents the overall emotion score of the sensitive word,is a sensitive word w i With positive emotional words w j The number of co-occurrences is such that,is a sensitive word w i With negative emotional words w j The number of co-occurrence times, n is the total word number after the jieba word segmentation, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words;
step 2.3, if the total emotion score AllSensitivecount is greater than 0, the text is directly judged as the original sensitive classification, and the probability set formed by the sensitive classification probabilities is still recorded as:when the AllSensitivecount is less than or equal to 0, secondary judgment is needed, the word frequency of the sensitive words is calculated, when the word frequency is greater than a set threshold value, the original sensitive classification is directly judged, and the probability set formed by the sensitive classification probabilities is still recorded as:otherwise, the text is judged as other sensitive classifications, and a probability set formed by the sensitive classification probabilities is recorded as: p g =[0,0,...,0,1];
Step 3, multi-modal sensitivity detection of image-text fusion
Obtaining the sensitivity type probability P by the fusion algorithm according to the sensitivity classification probability of the text and the sensitivity classification probability of the picture i Through MAX (P) i ) And taking the maximum sensitivity type probability P, and taking the sensitivity classification corresponding to the maximum sensitivity type probability P as a final sensitivity classification result.
The final sensitive classification probability P distribution calculation formula is as follows:
P=MAX(P i ) (6)
wherein w is a fusion weight and has a value range of [0, 1%],P i Is a sensitive type probability distribution.
The method for judging whether the emotional words and the sensitive words are co-occurring is to take the minimum distance in the sentence where the emotional words and the sensitive words are located as the co-occurrence of the emotional words and the sensitive words according to the principle of the shortest distance.
The invention has the beneficial effects that: the invention provides a multi-mode fusion sensitive information classification detection method based on deep learning, which can accurately judge the sensitivity of contents by combining the influence of emotion polarity and intensity on sensitive information. The problem of image-text sensitivity is solved according to a proper fusion method, and the detection precision is high.
Drawings
FIG. 1 is a diagram of a detection framework of the present invention;
FIG. 2 is a flow chart of the detection according to the present invention.
Detailed Description
The multi-mode fusion sensitive information classification detection method provided by the invention can be roughly divided into three stages: the complete framework of the image-text sensitive feature extraction stage, the sensitive detection classification stage and the image-text feature fusion stage is shown in figure 1. The device mainly comprises three parts: text sensitive information classification, picture sensitive information classification and image-text fusion sensitive classification based on the text sensitive information classification and the picture sensitive information classification. And the text sensitive classification model is used for performing sensitive classification on the text finally by combining the training result of the FastText model and the text classification word bank. And the picture sensitive classification model is characterized in that firstly, a skeleton model is loaded, parameters of each layer are adjusted, then fine tuning training is carried out on a picture training set, and finally the trained model is applied to picture classification of a test set. And calculating to obtain the final image-text fusion sensitive classification probability according to a fusion formula by combining the sensitive classification probabilities of the text and the image.
The invention combines two modes of text and picture to detect and classify sensitive information. The sensitivity detection needs to be performed on the text and the picture respectively, and then the detection results are subjected to fusion processing to obtain the final detection result.
1. Primary detection of text sensitivity. The invention uses FastText to detect the sensitivity of the text. The method includes determining which of the sensitive classes or the non-sensitive classes the text belongs to, for example, classifying the text into four classes including a sensitive class a, a sensitive class B, a sensitive class C, and others. The sensitive classes comprise a sensitive class A, a sensitive class B and a sensitive class C (three sensitive classes), the other classes can be a single class or a plurality of classes which do not belong to the sensitive classes, and FastText is a machine learning training tool which integrates word2vec, text classification and the like, and is a simple and efficient text classification model. The FastText model includes three layers, an input layer, a hidden layer, and an output layer. The model firstly decomposes a text vocabulary through a character level N-gram characteristic format, adds the text vocabulary and an original word to obtain a text sequence (x) 1 ,x 2 ,...,x n-1 ,x n ) As data input for the network input layer. The hidden layer is the superposition average of a plurality of word vectors, and finally, classified categories are output at the output layer. FastText adopts layered Softmax to construct a Huffman tree according to class frequency, so that the number of model prediction targets is greatly reduced, and the training efficiency and the classification efficiency of the model are improved.
And carrying out sensitivity classification on the text by using a FastText method, and acquiring a probability set of each sensitivity classification, wherein the probability set is recorded as:and the number of the sensitive classifications of the n texts is consistent with the number of the classifications of the pictures and the texts.
2. A text sensitivity judgment method based on fine-grained emotion. The sensitivity detection and classification of text directly using the FastText model has certain errors, such as: a tweet contains a certain amount of information related to terrorism, but the text context reflects objection and reprimation to the sensitive information, and if the text is directly defined as a sensitive type, the text is certainly biased, so that the subjective emotion of an author in the text has a certain decisive effect on the judgment of the sensitivity of the text, and therefore, the invention introduces emotional polarity to judge the overall sensitivity of the text. And if the text belongs to the sensitive class in the primary detection of the text sensitivity, executing a text sensitivity judgment method based on the fine-grained emotion, and if the text belongs to the non-sensitive class, not performing the text sensitivity judgment method based on the fine-grained emotion.
(1) And (5) performing fine-grained emotion analysis. And taking the emotion polarity of the text into consideration, and adopting fine-grained emotion analysis according to the emotion polarity and strength of the emotion words in the text. A sensitive information identification method based on emotional word and sensitive word co-occurrence analysis is provided. The invention uses the ontology library of the emotional vocabulary of the university of the great courseware to match the emotional words in the text, and each emotional word in the ontology library is divided into three emotional polarities of positive (1), negative (-1) and neutral (0). The emotion word has emotion polarity strength comprising emotion polarity and emotion strength, the emotion polarity strength is higher processing of emotion analysis, the invention sets the value range of the emotion strength as [ -3,3], positive and negative values of the value represent emotion polarity, negative values represent negative emotion, positive values represent positive emotion, 0 represents neutral attitude, and the size represents emotion strength. The method comprises the steps of dividing a text into a plurality of words by using jieba participles, matching the words with an existing emotion word bank and a sensitive word bank to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the emotion word set and the sensitive word set. Acquiring word frequency of the co-occurrence of the emotion words and the sensitive words and emotion intensity of a single emotion word according to whether elements in a Cartesian product co-occur or not, so as to calculate the emotion polarity of the text, wherein the co-occurrence in the sensitive information identification method based on the co-occurrence analysis of the emotion words and the sensitive words refers to the co-occurrence of the emotion words and the sensitive words, and the minimum distance between the emotion words and the sentence in which the sensitive words are located is taken as the co-occurrence of the emotion words and the sensitive words according to the nearest distance principle. The distance between the two is calculated as follows:
dis(w i ,w j )=|index(w i )-index(w j ) (1)
wherein, dis (w) i ,w j ) The expression w i And w j Distance of, index (w) i ) And index (w) j ) The subscripts of the positions of the two in the phrases after the Jieba word segmentation are shown, and the subscript of the first word is 1 and is increased in sequence.
(2) And determining sensitivity based on emotion. The invention reflects the emotional characteristics of the text from two aspects of the text emotional polarity and the emotional intensity, and judges the sensitivity of the text by combining the emotional polarity intensity of the emotional words and the sensitive words. The calculation method has the following formula:
AllSensitiveCount=PositiveSensitiveCount-NegativeSensitiveCount (4)
wherein PositiveSe sensible count represents the positive emotion score of the sensitive word, NegativeSesensible count represents the negative emotion score of the sensitive word, AllSesensible C count represents the overall emotion score of the sensitive word,is a sensitive word w i With positive emotional words w j The number of co-occurrences is such that,is a sensitive word w i With negative emotional words w j The number of co-occurrence times, n is the total word number after the jieba word segmentation, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words.
The invention divides the emotion polarity into three categories, which can be obtained according to experience and most researchers' results, because most sensitive words contain negative part of speech, and the combination with positive emotion shows that the sensitive words are supported or acquiescent, the sensitive information text containing positive emotion is more tolerantAnd (3) easily obtaining the conclusion of sensitive information, and if the total emotion score AllSensitivecount is greater than 0 and the text is directly judged as the original sensitive classification, the probability set formed by the sensitive classification probabilities is still recorded as:when AllSensitivecount is less than or equal to 0, secondary judgment is needed, the word frequency of the sensitive words is calculated, when the word frequency is greater than a set threshold value, the sensitive words are directly judged to be original sensitive classification, and a probability set formed by the sensitive classification probabilities is still recorded as:otherwise, the text is judged as other sensitive classifications, and a probability set formed by the sensitive classification probabilities is recorded as: p g =[0,0,...,0,1]Except that the probability of other classes of text judgment is 1, the probability of other classes of text judgment is 0.
3. And detecting the sensitivity of the picture. The invention uses an IncepotionV 3 network to detect the sensitivity of pictures. The classification number of the sensitive classes is the same as that of the text, for example, the text is classified into four classes including a sensitive class a, a sensitive class B, a sensitive class C and other classes, and the picture is also classified into four classes including a sensitive class a, a sensitive class B, a sensitive class C and other classes. Firstly loading a skeleton model, constructing a pre-training model without a classifier, then adding a global average pooling layer, on one hand, saving a large number of parameters, accelerating operation and reducing overfitting, simultaneously adding a layer of nonlinear expansion model expression capability, then connecting a full-connection layer with 512 nodes, and finally an output layer with 4 nodes, carrying out probability filtering by using a Softmax activation function, detecting the sensitivity of a picture through an IncepotionV 3 network, judging which sensitive class or non-sensitive class the picture belongs to, and obtaining a classification probability set of the picture, and recording as:
4. and (3) multi-modal sensitivity detection of image-text fusion. The invention adopts a decision layer fusion strategy to classify the sensitivity of the text into the classification probability and the sensitivity of the pictureThe classification probability obtains the sensitive type probability P through a fusion algorithm i By MAX (P) i ) And taking the maximum sensitivity type probability P, and taking the sensitivity classification corresponding to the maximum sensitivity type probability P as a final sensitivity classification result.
Compared with single-mode sensitivity detection, the image-text fusion mode can effectively form feature complementation, and the final classification probability distribution calculation formula is as follows:
P=MAX(P i ) (6)
wherein w is fusion weight, and the numeric area is [0,1 ]],P i For sensitive type probability distribution, by adding fusion weight w, converting single-mode text or picture into a multi-mode detection method determined by the two,andrespectively multiplied by the fusion weights corresponding to the fusion weights respectively and then added to obtain the sensitive type probability P i Through MAX (P) i ) And taking the maximum type probability, and taking the corresponding classification as a final sensitive classification result after the fusion algorithm is adopted.
Example 1
In order to verify the effectiveness of the invention, sensitive classes are classified into four classes including a sensitive class A, a sensitive class B, a sensitive class C and other classes by using different types of pictures crawled from the web by a crawler program. The sensitive text data set is obtained by carrying out manual processing modes such as splicing, recombination and the like on related sensitive words, and comprises a sensitive class A, a sensitive class B, a sensitive class C and other classes. The normal text data set is derived from a normal review set of microblogs. The technical scheme of the invention can be implemented as follows:
(1) in the sensitive model training phase. Training a text model by using a sensitive word bank of jieba participles; in the process of training the picture model, firstly, a data set is expanded by randomly turning, cutting, amplifying, cutting and the like the input picture, the diversity of the picture is increased, finally, the picture is normalized to the same size and the label lengths are consistent, the size of the input picture is uniformly set to be 3 multiplied by 224, and the size of the batch _ size is set to be 32. In order to further improve the performance of the model, fine tuning training is carried out on the model, rmsprop is used as an optimizer, a cross entropy function is used as a loss function, an accuracy is used as an evaluation function, and the learning rate of the model is set to be 0.001.
(2) In the stage of fusing the image-text detection results. Setting the fusion weight w to 0.5 indicates that the text and the picture have equal influence on the sensitivity determination.
(3) In the stage of evaluating the detection result. The method is used for carrying out sensitivity detection on 1000 pieces of tweets containing pictures and texts, and the detection result is evaluated through the accuracy, the recall rate and the F value.
(4) Specific examples are given.
After the text content contained in one text is 'support of strict attack of' X work 'of a public security department, opposition to X education' and contains related pictures of X work activities, after the text content is published on a social network, the algorithm firstly classifies the text content into sensitive type B sensitive information by using FastText, then according to an emotional word and sensitive word co-occurrence algorithm, an emotional word set is { 'support', 'strict attack', 'opposition' }, a sensitive word set is { 'X work', 'X education' }, the Cartesian product of the two sets is { ('support', 'X work'), ('support', 'X education'), ('strict harsh attack', 'X work'), 'X education'), the ('opposition', 'X'), 'X'), and then screening is carried out according to the co-occurrence algorithm to obtain a set { ('strict attack', 'X work'), (opposition and teaching of X) }, obtaining posivesensitvecount 0 and negotivesensitvecount 3 according to formulas (2), (3) and (4), (assuming that the harsh emotion intensity is 2, the sensitive word is specified as a negative value, and the negative emotion word is also a negative value) and allsensitvecount-3, and determining that the text is of a non-sensitive type according to the condition that the number of the sensitive words is less than a threshold value 5; and then detecting the picture content by using an Inception V3 model, obtaining a detection result as a sensitive B-type sensitive picture, and judging the whole tweet to be sensitive B according to a multi-mode fusion algorithm. According to the reality, it is obvious that the whole tweet is resisting the X work, but since the picture is sensitive content of sensitive class B and is sensitive information which is not allowed to be issued, the whole tweet still belongs to sensitivity, and therefore the whole tweet is still judged to be sensitive information, namely sensitive class B. Otherwise, if the picture is a normal (other class) picture, the text is non-sensitive information.
Claims (3)
1. A multi-mode fusion sensitive information classification detection method is characterized by comprising the following steps: the method comprises the following steps:
step 1, carrying out sensitivity primary detection on texts and pictures
Adopting FastText to detect the sensitivity of the text, judging which sensitive class or non-sensitive class the text belongs to, and obtaining a classification probability set of the text, and recording the classification probability set as:detecting the sensitivity of the pictures by adopting an Inception V3 network, judging which sensitive class or non-sensitive class the pictures belong to, and obtaining a classification probability set of the pictures, wherein the classification probability set is recorded as:wherein n represents the classification number of the pictures or the texts, and the classification number of the pictures and the classification number of the texts are equal; if the text belongs to the sensitive class, executing the step 2, and if the text belongs to the non-sensitive class, executing the step 3;
step 2, judging the text sensitivity based on emotion
Step 2.1, segmenting the text into a plurality of words by adopting jieba word segmentation, matching the words with the existing emotion word library and sensitive word library to obtain an emotion word set and a sensitive word set, and performing Cartesian product operation on the two sets to judge whether emotion and sensitive words coexist, wherein the emotion words have emotion polarity strength including emotion polarity and emotion strength;
step 2.2, judging the sensitivity of the text by combining the emotional polarity intensity of the emotional words and the sensitive words, wherein the calculation method comprises the following steps:
AllSensitiveCount=PositiveSensitiveCount-NegativeSensitiveCount (4)
wherein PositiveSensitiveCount represents the positive emotion score of the sensitive word, NegativeSensitiveCount represents the negative emotion score of the sensitive word, AllSensiveCount represents the overall emotion score of the sensitive word,the number of times the sensitive word co-occurs with the positive emotion word,the number of the co-occurrence of the sensitive words and the negative emotion words is n, the number of the total words after the jieba word segmentation is n, lambda is the positive emotion intensity of the emotion words, and beta is the negative emotion intensity of the emotion words;
step 2.3, if the total emotion score AllSensitivecount is greater than 0, the text is directly judged as the original sensitive classification, and the probability set formed by the sensitive classification probabilities is still recorded as:when the AllSensitivecount is less than or equal to 0, secondary judgment is needed, the word frequency of the sensitive words is calculated, when the word frequency is greater than a set threshold value, the original sensitive classification is directly judged, and the probability set formed by the sensitive classification probabilities is still recorded as:otherwise, the reverse is carried outAnd judging the text as other sensitive classifications, and recording a probability set formed by the sensitive classification probabilities as: p g =[0,0,...,0,1];
Step 3, multi-modal sensitivity detection of image-text fusion
Obtaining the sensitivity type probability P by the fusion algorithm according to the sensitivity classification probability of the text and the sensitivity classification probability of the picture i Through MAX (P) i ) And taking the maximum sensitivity type probability P, and taking the sensitivity classification corresponding to the maximum sensitivity type probability P as a final sensitivity classification result.
2. The method according to claim 1, wherein the multi-modal fusion sensitive information classification detection method comprises: the final sensitive classification probability P distribution calculation formula is as follows:
P=MAX(P i ) (6)
wherein w is a fusion weight and has a value range of [0, 1%],P i Is a sensitive type probability distribution.
3. The method according to claim 1, wherein the multi-modal fusion sensitive information classification detection method comprises: the method for judging whether the sentiment words and the sensitive words are co-occurring comprises the step of taking the minimum distance in the sentence where the sentiment words and the sensitive words are located as the co-occurrence of the sentiment words and the sensitive words according to the principle of the minimum distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110203458.7A CN113033610B (en) | 2021-02-23 | 2021-02-23 | Multi-mode fusion sensitive information classification detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110203458.7A CN113033610B (en) | 2021-02-23 | 2021-02-23 | Multi-mode fusion sensitive information classification detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113033610A CN113033610A (en) | 2021-06-25 |
CN113033610B true CN113033610B (en) | 2022-09-13 |
Family
ID=76460956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110203458.7A Active CN113033610B (en) | 2021-02-23 | 2021-02-23 | Multi-mode fusion sensitive information classification detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113033610B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627550A (en) * | 2021-08-17 | 2021-11-09 | 北京计算机技术及应用研究所 | Image-text emotion analysis method based on multi-mode fusion |
CN115909374A (en) * | 2021-09-30 | 2023-04-04 | 腾讯科技(深圳)有限公司 | Information identification method, device, equipment, storage medium and program product |
CN114579964A (en) * | 2022-04-29 | 2022-06-03 | 成都明途科技有限公司 | Information monitoring method and device, electronic equipment and storage medium |
CN114782670A (en) * | 2022-05-11 | 2022-07-22 | 中航信移动科技有限公司 | Multi-mode sensitive information identification method, equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN107908715A (en) * | 2017-11-10 | 2018-04-13 | 中国民航大学 | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion |
CN108984530A (en) * | 2018-07-23 | 2018-12-11 | 北京信息科技大学 | A kind of detection method and detection system of network sensitive content |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
CN110874531A (en) * | 2020-01-20 | 2020-03-10 | 湖南蚁坊软件股份有限公司 | Topic analysis method and device and storage medium |
CN112256878A (en) * | 2020-10-29 | 2021-01-22 | 沈阳农业大学 | Rice knowledge text classification method based on deep convolution |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200293874A1 (en) * | 2019-03-12 | 2020-09-17 | Microsoft Technology Licensing, Llc | Matching based intent understanding with transfer learning |
-
2021
- 2021-02-23 CN CN202110203458.7A patent/CN113033610B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN107908715A (en) * | 2017-11-10 | 2018-04-13 | 中国民航大学 | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion |
CN108984530A (en) * | 2018-07-23 | 2018-12-11 | 北京信息科技大学 | A kind of detection method and detection system of network sensitive content |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
CN110874531A (en) * | 2020-01-20 | 2020-03-10 | 湖南蚁坊软件股份有限公司 | Topic analysis method and device and storage medium |
CN112256878A (en) * | 2020-10-29 | 2021-01-22 | 沈阳农业大学 | Rice knowledge text classification method based on deep convolution |
Non-Patent Citations (2)
Title |
---|
A Survey of Popular Image and Text analysis Techniques;Rahul Suresh,and etc;《2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS)》;20200312;第1-8页 * |
基于空间变换密集卷积网络的图片敏感文字识别;林金朝等;《计算机系统应用》;20200131;第29卷(第1期);第137-143页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113033610A (en) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113033610B (en) | Multi-mode fusion sensitive information classification detection method | |
CN109933664B (en) | Fine-grained emotion analysis improvement method based on emotion word embedding | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
KR102008845B1 (en) | Automatic classification method of unstructured data | |
Sundararajan et al. | Multi-rule based ensemble feature selection model for sarcasm type detection in twitter | |
CN109670039B (en) | Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis | |
CN107092596A (en) | Text emotion analysis method based on attention CNNs and CCR | |
CN113343126B (en) | Rumor detection method based on event and propagation structure | |
CN116578705A (en) | Microblog emotion classification method based on pre-training language model and integrated neural network | |
CN110909529A (en) | User emotion analysis and prejudgment system of company image promotion system | |
CN106599824A (en) | GIF cartoon emotion identification method based on emotion pairs | |
Rauf et al. | Using bert for checking the polarity of movie reviews | |
CN115329085A (en) | Social robot classification method and system | |
Saha et al. | The Corporeality of Infotainment on Fans Feedback Towards Sports Comment Employing Convolutional Long-Short Term Neural Network | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN111259651A (en) | User emotion analysis method based on multi-model fusion | |
Shalinda et al. | Hate words detection among sri lankan social media text messages | |
Al-Onazi et al. | Modified Seagull Optimization with Deep Learning for Affect Classification in Arabic Tweets | |
Jalani et al. | Performance of Sentiment Classification on Tweets of Clothing Brands | |
Gao et al. | Chinese short text classification method based on word embedding and Long Short-Term Memory Neural Network | |
Cumalat Puig | Sentiment analysis on short Spanish and Catalan texts using contextual word embeddings | |
Lora et al. | Ben-sarc: A corpus for sarcasm detection from bengali social media comments and its baseline evaluation | |
Lv et al. | Stakeholder opinion classification for supporting large-scale transportation project decision making | |
Agbesi et al. | Multichannel 2D-CNN Attention-Based BiLSTM Method for Low-Resource Ewe Sentiment Analysis | |
CN113535948B (en) | LSTM-Attention text classification method introducing essential point information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |