CN111832588A

CN111832588A - Riot and terrorist image labeling method based on integrated classification

Info

Publication number: CN111832588A
Application number: CN201910315804.3A
Authority: CN
Inventors: 何小海; 严靓; 周欣; 熊淑华; 卿粼波; 吴小强; 滕奇志
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2020-10-27

Abstract

The invention discloses an riot and terrorist image labeling method based on integrated classification. The method comprises the following steps: normalizing the image to be annotated to 224 multiplied by 224 and then inputting the image annotation integrated network; each sub-network in the integrated network maps the extracted image features into a label semantic space to obtain a label probability vector with the size of 1 multiplied by N of an image to be labeled; combining the label probability vectors output by each sub-network into a matrix, and performing a series of operations with the weight distribution matrix to obtain a label probability vector with the final size of 1 xN of the image to be labeled; and setting a threshold value for the calculated label probability vector, wherein all labels larger than the threshold value are the final labeling result of the image to be labeled. The labeling method described by the invention has the advantages of short network training time, high labeling accuracy, strong stability and the like, and compared with the traditional machine learning, the labeling method has great improvement on the labeling accuracy and the recall rate, and has certain practical value in the specific field of terrorist-related information.

Description

Riot and terrorist image labeling method based on integrated classification

Technical Field

The invention designs an ensemble classification-based violence and terrorist image labeling method, and relates to the technical field of deep learning and computer vision.

Background

With the rapid development of internet social platforms and the popularization of image acquisition devices such as mobile phones and digital cameras, image and video data that people can access everyday show explosive growth. The mass image data brings convenience to daily life of people, and meanwhile, the partial violence images bring negative effects to the health growth of society and harmonious teenagers. How to effectively manage these data becomes a problem that needs to be solved urgently. The automatic image annotation technology gradually becomes one of the key technologies in the fields of image analysis and application due to the characteristic that text characteristic information reflecting the content of the image is automatically added to the image.

The current automatic image labeling methods are mainly divided into two categories: an image labeling method based on classification idea and an image labeling method based on a correlation model. The image labeling method based on the classification idea treats the automatic labeling of the image as an image classification problem, treats each labeling keyword as a category, firstly divides the image into regular areas, and then classifies the divided image areas. The image labeling method based on the correlation model comprises the steps of firstly extracting visual features such as color and texture of an image or an image area, then calculating joint probability distribution between the visual features of the image and labeling keywords, and finally labeling the image or the area to be labeled by establishing a probability correlation model.

The traditional method makes a certain progress in the field of image annotation, but because characteristics need to be manually selected, information loss is caused, so that the annotation precision is insufficient, and the recall rate is low; although the deep learning model achieves higher achievement in the field of image recognition and classification, most of the deep learning model aims at improvement of network or single-label learning, and application and improvement of image labeling belonging to multi-label learning are less. Imbalances in label classes in multi-label databases can also lead to a reduction in the quality of the trained model labeling.

Disclosure of Invention

The invention provides an integration classification-based riot and terrorist image labeling method, which treats the labeling problem of the riot and terrorist image as a multi-label classification problem according to a classification thought, and improves the labeling accuracy of each type of label by a model through the improvement of an integration network combination module in consideration of the condition of uneven distribution of labeling key words.

The invention realizes the purpose through the following technical scheme, which comprises the following steps:

the method comprises the following steps: and after the image to be annotated is normalized to 224 multiplied by 224, the image to be annotated is input into the riot and terrorist image annotation integrated network, and the image characteristics extracted by each sub-network in the integrated network are obtained.

Step two: and each sub-network maps the extracted image features to a label semantic space to obtain a label probability vector of the image to be labeled, wherein the size of the label probability vector obtained by each sub-network is 1 multiplied by N.

Step three: and combining the label probability vectors output by each sub-network into a matrix, and performing a series of operations on the matrix and the weight distribution matrix to obtain a final label probability vector of the image to be labeled, wherein the size of the final label probability vector is 1 multiplied by N.

Step four: and setting a threshold value for the label probability vector obtained by calculation in the step three, wherein all labels larger than the threshold value are the final labeling result of the image to be labeled.

Step one, the riot and terrorist image labeling integrated network consists of a plurality of sub-networks, and the training steps of the sub-networks are as follows:

(1) the method comprises the steps of manufacturing a data set, wherein the data set is an image containing a plurality of riot and terror elements, the corresponding label is an N-dimensional vector, and for each type of riot and terror elements, if the image contains the type of riot and terror elements, the corresponding dimension of the label vector is 1, otherwise, the label vector is 0; due to the problem of uneven label distribution of the data set, the invention adopts replaced random sampling on each type of label-containing images in sample processing, and combines the samples into a data-balanced sampling set for training the sub-network.

(2) The network structure of the sub-network is a convolutional neural network, all parameters obtained by training the sub-network in an ImageNet classification task are applied to a riot and terrorist image labeling task through transfer learning, and a full connection layer is additionally added at the last of the sub-network for training.

The weight distribution matrix in the third step is essentially that after the training of the sub-networks is completed, the sub-networks are subjected to weight distribution according to different classification accuracy rates of the sub-networks on the same label class, and then the weight distribution matrix W is obtained_N×MCan be represented by the following formula:

W_N×M＝[W₁,W₂,...,W_N]^T

wherein W_i＝[w_i1,w_i2,...,w_iM]Representing the weight distribution of the i-th class label on the M sub-networks, and enabling O_i＝[o_i1,o_i2,...,o_iM]The prediction probability values of the ith class label representing the input image on M sub-networks, and the final label probability vector Prob_1×NFrom W_iAnd O_iThe product of (a) is calculated to yield:

Prob_1×N＝[W₁·O₁ ^T,W₂·O₂ ^T,...,W_N·O_N ^T]。

the threshold value in the fourth step is usually set to be an empirical value of 0.5, and a relatively good image labeling effect can be achieved through testing the threshold value.

The invention mainly aims to provide two improvements:

(1) the improvement of the training mode of the sub-network is different from the common training mode of transfer learning, namely the output dimension of the last full connection layer of the neural network is changed into the classification dimension of a target task for training, all parameter information of a pre-training model is reserved, and a full connection layer is additionally added at the last of the sub-network for training.

(2) The improvement of the sub-network combination mode is a network fusion mode for carrying out weight distribution on each sub-network according to different classification accuracy rates of the same label class of each sub-network, the improvement mode solves the problem of unbalanced label class in a data set, and the labeling accuracy rate of the integrated network to each class of labels is improved.

Drawings

FIG. 1 is a schematic diagram of the labeling result of an image of a riot and terrorist

FIG. 2 is a flowchart of an ensemble classification-based violence and terrorist image labeling method according to the present invention

FIG. 3 is a flow chart of the present invention for training subnetworks based on transfer learning

Detailed Description

The invention is further described below with reference to the accompanying drawings:

fig. 1 is a schematic diagram of an image annotation result of a riot terrorism. The riot and terrorist image labeling network can label the riot and terrorist elements such as guns, ships, airplanes, fires, armored vehicles, artillery and the like in the image.

In fig. 2, a riot and terrorist image labeling method based on ensemble classification includes the following steps:

the method comprises the following steps: and respectively taking replaced random samples of each type of label containing images in the initial training set, and combining the samples into a data balanced sample set for training each sub-network (namely, the individual learners in the graph) of the integrated network.

Step two: parameters obtained by training the convolutional neural network in the ImageNet classification task are applied to the violence image labeling task through transfer learning to generate mutually independent sub-networks.

Step three: and combining the output results of the sub-networks in a weight distribution mode to obtain a final labeling result.

FIG. 3 is a flow chart of the present invention for training subnetworks based on migration learning. The common transfer learning training method is to change the output of the last full connection layer of the neural network from 1000 dimensions to the category number of a target task for training, in the violence and terrorism image labeling task, in order to keep the pre-training information of the full connection layer, the pre-training information of the convolutional neural network is completely transferred to the task, and the full connection layer with the input of 1000 dimensions and the output of label dimensions is added at the end of the network for training, so that the training mode saves the training time and improves the labeling accuracy of the sub-network.

In order to verify the effectiveness of the ensemble classification-based violence and terrorist image labeling method, the method provided by the invention is compared with a classical machine learning algorithm SVM, a decision tree C4.5 and KNN, and the effectiveness of each method is evaluated according to three indexes of average accuracy P, average recall rate R and F1 values of the labeling result, so that the experimental result of each image labeling method shown in the table 1 is obtained.

TABLE 1 Experimental results of the labeling methods for each image

Experiments show that the SVM performs best in the traditional machine learning method, and compared with the SVM, the method provided by the invention has the advantages that the accuracy is improved by 26%, the recall rate is improved by 30%, and the improvement is great.

In order to verify the improved effectiveness of the method in the combination mode, the weight distribution combination mode proposed by the method is compared with the averaging and majority voting combination mode commonly used in ensemble learning, and the accuracy of different combination modes on each label category is shown in table 2.

TABLE 2 comparison of accuracy of different combinations on each label class

As can be seen from the table above, the labeling accuracy of the weight distribution combination mode on most label categories is highest, and the numerical value fluctuation is small. The explanation shows that the labeling network using the weight distribution combination mode is stable on different label types, and the situation that the difference of accuracy on each label type is large due to the problem of uneven distribution of the labels of the data set does not exist.

Claims

1. An violence and terrorism image labeling method based on integrated classification is characterized by comprising the following steps:

the method comprises the following steps: normalizing the image to be annotated to 224 multiplied by 224 and then inputting the image to be annotated into the riot and terrorist annotation integrated network to obtain the image characteristics extracted by each sub network in the integrated network;

step two: each sub-network maps the extracted image features to a label semantic space to obtain a label probability vector of the image to be labeled, and the size of the label probability vector obtained by each sub-network is 1 multiplied by N;

step three: combining the label probability vectors output by each sub-network into a matrix, and performing a series of operations with the weight distribution matrix to obtain a final label probability vector of the image to be labeled, wherein the size of the final label probability vector is 1 multiplied by N;

2. The riot terrorist image annotation integrated network of claim 1, comprising a plurality of sub-networks, wherein the sub-networks are trained by the following steps:

(1) the method comprises the steps of manufacturing a data set, wherein the data set is an image containing a plurality of riot and terror elements, the corresponding label is an N-dimensional vector, and for each type of riot and terror elements, if the image contains the type of riot and terror elements, the corresponding dimension of the label vector is 1, otherwise, the label vector is 0;

(2) the network structure of the sub-network is a convolutional neural network, parameters obtained by training the sub-network in the ImageNet classification task are applied to the violence image labeling task through transfer learning, the training speed of the sub-network can be increased, and a loss function used for guiding the training of the sub-network is cross entropy loss.

3. The weight distribution matrix of claim 1, wherein the weight distribution matrix is substantially configured to distribute the weights of the sub-networks according to different classification accuracy rates of the sub-networks for the same label class after the sub-networks are trained, so as to further improve the accuracy of the labeling result.

4. The threshold value of claim 1 is usually set to an empirical value of 0.5, and the threshold value is tested to achieve a better image labeling effect.

5. N as claimed in claim 1 and claim 2 is an integer greater than 1.