CN107169954B

CN107169954B - Image significance detection method based on parallel convolutional neural network

Info

Publication number: CN107169954B
Application number: CN201710253255.2A
Authority: CN
Inventors: 王伟凝; 肖纯; 师婷婷; 赵明权
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-04-18
Filing date: 2017-04-18
Publication date: 2020-06-19
Anticipated expiration: 2037-04-18
Also published as: CN107169954A

Abstract

The invention discloses an image significance detection method based on a parallel convolutional neural network, which comprises the following steps: the method comprises the following steps: (1) designing a parallel convolutional neural network structure; (2) designing two network input graphs, and defining a tag based on a super pixel aiming at input; (3) carrying out data set balancing processing and input preprocessing; (4) model training: the model comprises a data preprocessing module and a parallel convolutional neural network structure; (5) a saliency map is computed using the trained model for the target image. The method can effectively detect the internal semantics of the salient main body and the difference between the salient main body and the background, detect the saliency from the global and local angles, and realize the automatic saliency detection of the image.

Description

Image significance detection method based on parallel convolutional neural network

Technical Field

The invention relates to an image detection method, in particular to an image significance detection method based on a parallel convolution neural network.

Background

The purpose of image saliency detection is to identify the visually most prominent region of an image, which is a very important issue in the field of computer vision and image processing. The saliency detection as a preprocessing means has wide application in computer vision and image processing, such as multimedia information transmission, image video reconstruction, image video quality evaluation and the like. Meanwhile, significance detection is also widely applied to high-level visual tasks, such as object detection and identity recognition. As a very mature topic, a large number of significance detection models have been proposed by scholars.

Traditional saliency detection models fall into manual feature-based methods and prior knowledge-based methods. The method based on the manual features is dedicated to design various manual features such as color, brightness and texture, when an image has more complex semantics, the method cannot effectively detect a significant subject, and when the difference between the color and the brightness of the subject and the background is small, the method based on the manual features cannot effectively distinguish the significant subject from the background. The method based on a priori knowledge defines the common characteristics of the salient bodies, for example, the method based on the background a priori assumes that the edge region close to the image is the background, but some salient bodies of the image are at the edge of the image, which makes the method based on the a priori knowledge have limitations.

Disclosure of Invention

In order to overcome the above disadvantages and shortcomings of the prior art, the present invention aims to provide an image saliency detection method based on a parallel convolutional neural network, which effectively detects the intrinsic semantics of a salient subject and the difference from the background, detects saliency from global and local angles, and realizes automatic saliency detection on an image.

The purpose of the invention is realized by the following technical scheme:

an image significance detection method based on a parallel convolutional neural network comprises the following steps:

(1) designing a parallel convolutional neural network structure; the parallel convolutional neural network structure comprises a global angle detection module CNN-G and a local angle detection module CNN-L;

the global angle detection module CNN-G is a single-path convolutional neural network; the local angle detection module CNN-L is a two-way parallel convolutional neural network; the global angle detection module CNN-G and the local angle detection module CNN-L realize parallelism through a full connection layer;

(2) designing two network input graphs, and defining a tag based on a super pixel aiming at input; the network input graph comprises a global filling graph and a local cutting graph;

the global filling graph takes the super-pixel as the center, contains all information of the original graph, represents global characteristics and is used as the input of the global angle detection module CNN-G;

the local cutting graph takes the super-pixel as a center and contains the cutting graph of the detail information of the super-pixel neighborhood, represents local characteristics and is used as the input of the local angle detection module CNN-L;

(3) carrying out data set balancing processing and input preprocessing;

(4) model training: the model comprises a data preprocessing module and a parallel convolutional neural network structure;

(5) a saliency map is computed using the trained model for the target image.

The defining of the tag based on the super-pixel for the input in the step (2) is specifically as follows:

the super pixel label is determined by the overlapping rate of the super pixel and the real label of the saliency map, if the super pixel label is greater than a set threshold value, the label is 1, and the super pixel label is regarded as saliency; otherwise, if the overlap ratio is smaller than the set threshold, the label is 0, and the label is considered as insignificant.

The data set balancing processing in the step (3) specifically comprises the following steps:

all positive samples obtained from one image are adopted, and negative samples with the same number as the positive samples are randomly selected; the specification for all samples was normalized to 256 x 256 size.

The first 5 layers of the parallel convolutional neural network structure in the step (1) are 5 convolutional layers; the first convolution layer has 96 convolution kernels with a size of 11 x 3; layer 2 has 256 convolution kernels with a size of 5 x 48. The third layer of convolution layers has 384 cores, the size is 3 x 256; the fourth convolution layer has 384 cores, and the size is 3 × 192; the 5 th convolution layer has 256 cores with the size of 3 × 192; the back of the first two layers and the fifth layer of convolution layers are connected with a pooling layer and a regularization layer.

And (1) sharing parameters of the convolution layer at the same layer of the parallel convolution neural network structure to learn scale invariance characteristics.

In the step (4), the training of the parallel convolutional neural network comprises the following steps:

(4-1) initializing network parameters;

(4-2) setting training parameters;

(4-3) loading training data;

and (4-4) iteratively training.

Initializing the network parameters in the step (4-1), specifically: initializing the first six layers of the parallel convolutional neural network by using the first six layer network parameters of the AlexNet model by adopting a fine-tune strategy; the initialization setting of the fully-connected layer is random value initialization.

The training parameters in the step (4-2) are specifically: the initial learning rate of the first 5 layers of the parallel convolutional neural network is set to 0.0001; the initial learning rate of the full connection layer parameters is 0.001; the training process is set to reduce the learning rate by 40% after each 8-time sample set traversal.

Step (4-3) the iterative training: iterative training is carried out on the parallel convolution neural network by adopting a random gradient descent algorithm, network parameters are stored once every 1000 times of iteration, and the optimal solution of the network is obtained through continuous iteration.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention detects the significance from the global and local angles at the same time, thereby effectively avoiding the defect of detecting the significance from a single angle; and multi-scale information is considered at the same time, so that the detection result is clearer and more complete.

2. Compared with the method using pixel points as basic processing units, the method provided by the invention has the advantages that the calculation amount is greatly reduced, and the algorithm effect is improved to a certain extent.

3. The invention is based on the parallel convolution neural network, and the trained model can adapt to various conditions, such as the image has a plurality of significant subjects, the significant subjects are too big or too small, the significant subjects are at the edge of the image, the significant subjects are similar to the background, the image background is complex, and the like.

Drawings

Fig. 1 is a flowchart of an image saliency detection method based on a parallel convolutional neural network according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.

Examples

As shown in fig. 1, the image saliency detection method based on the parallel convolutional neural network of the present embodiment includes the following steps:

(1) designing a parallel convolutional neural network structure; the parallel convolutional neural network structure comprises a global angle detection module CNN-G and a local angle detection module CNN-L.

The global angle detection module CNN-G is a single-path convolutional neural network; the local angle detection module CNN-L is a two-way parallel convolutional neural network; the global angle detection module CNN-G and the local angle detection module CNN-L are parallel through a full connection layer.

The first six layers of Alex network [ A.Krizhevsky, I.Sutskeeper, G.E.Hinton, ImageNetclassification with deep dependent Processing networks in: Proceedings of the annual Conference on Neural Information Processing System (NIPS),2012, pp.1097-1105 ] were used as one-way reference networks.

The input image size of the parallel convolutional neural network structure is 227 x 3, and the three-dimensional values are width, height and channel number respectively. The first 5 layers are 5 convolutional layers. The first convolution layer has 96 convolution kernels with a size of 11 x 3. Layer 2 has 256 convolution kernels with a size of 5 x 48. The third layer of convolutional layers has 384 cores, with a size of 3 x 256. The fourth convolution layer has 384 cores, with a size of 3 × 192. The 5 th convolution layer has 256 cores, 3 x 192 in size. The front two and the fifth convolutional layers are followed by a Pooling layer (Pooling) and a regularization layer (Normalization). CNN-G and CNN-L are parallel through a full connection layer with 4096 neuron number, so that the model detects significance from the global and local angles. The last layer of the parallel convolutional neural network structure is an output layer with only 2 neurons, and represents the significance value of the super-pixel to be predicted.

(2) Designing two network input graphs, and defining a tag based on a super pixel aiming at input; the network input graph comprises a global filling graph and a local cutting graph; the global filling graph takes the super-pixel as the center, contains all information of the original graph, represents global characteristics and is used as the input of the global angle detection module CNN-G; the local cropping map takes the super-pixels as the center, contains the cropping map of the super-pixel field detail information, represents local characteristics, and is used as the input of the local angle detection module CNN-L.

In the embodiment, an image is segmented by using an SLIC superpixel segmentation algorithm, then three input graphs including a global filling graph and two local cutting graphs are filled or cut by taking a certain superpixel S as a center, and the part exceeding the original image area is filled by using the pixel average value of a database. The three different size maps are then scaled to the same size, and each is used as input to three convolutional neural networks in a parallel network.

When designing how much of the original image information is included in the three input images, the following is defined: is provided with (W)_o，H_o) Width and height of the original image, respectively, (W)_p，H_p) The width and the height of the input image are respectively, and the calculation formula between the width and the height is as follows:

(W_p，H_p)＝2×(W_o，H_o)×cp

where cp is the clipping factor. Since there are three different input images, cp also has three different values, cp being [1, 1/4, 1/8] in the present invention. The input image is a filling graph containing all the information of the original image, the filling graph is used as the input of the global network, and the saliency is detected from the global angle; in the local network, cp is [1/4, 1/8], the input image contains local detail information of different scales in the domain of the superpixel S, and the two cropping images are used as the input of the local network to detect local saliency in a multi-scale mode. Finally, the parallelism of the network enables the whole network to have the capability of detecting the significance from the global and local angles at the same time.

The label of the super-pixel is defined as follows, S is the super-pixel, G is the true label of the saliency map, (1) if | S ∩ G |/S > 0.9, the label is 1, indicating that the super-pixel is significant, (2) if | S ∩ G |/S < 0.1, the label is 0, indicating that the super-pixel is insignificant, (3) if 0.1 < | S ∩ G |/S < 0.9, the super-pixel is discarded, not being used as training data.

(3) Data set balancing processing and input preprocessing:

unbalanced training data sets can have adverse effects on classification results and weaken the ability to learn to obtain features. When the positive and negative samples are taken according to the method in (2), the number of the positive samples obtained from the database is far less than that of the negative samples, in order to enable the number of the positive and negative samples to be consistent, all the positive samples obtained from one image are adopted in the training process, the auxiliary samples with the same number as the positive samples are randomly selected, and the specifications of all the samples are normalized to 256 × 256.

the parallel convolutional neural network comprises the following specific training steps:

(4-1) network parameter initialization: a fine-tune strategy is adopted, and the first six layers of the parallel convolutional neural network are initialized by utilizing the first six layer network parameters of the AlexNet model; the initialization setting of the fully-connected layer is random value initialization.

(4-2) setting training parameters: the initial learning rate of the first 5 layers is set to 0.0001. The initial learning rate of the full connectivity layer parameters was 0.001. The training process is set to reduce the learning rate by 40% after each 8-time sample set traversal.

(4-3) training data were loaded, wherein the training set was 6000 images randomly selected from the MSRA10K database and 3500 images randomly selected from the DUT-OMRON database, and the verification set was 800 images randomly selected from the MSRA10K database and 468 images randomly selected from the DUT-OMRON database. The images of the training and validation sets do not coincide.

And (4-4) performing iterative training on the parallel convolutional neural network by adopting a random gradient descent algorithm, storing the network parameters once every 1000 times of iteration, and obtaining the optimal solution of the network through continuous iteration. The network with high accuracy and low loss function on the verification set is comprehensively considered as the optimal network of the invention.

(5) A saliency map is computed using the trained model for the target image.

By using the significance detection model designed by the invention, after a user gives an image, the system calculates a significance map according to the trained and learned depth model.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An image significance detection method based on a parallel convolution neural network is characterized by comprising the following steps:

(2) designing two network input graphs, and defining a tag based on a super pixel aiming at input; the definition of the tag based on the super-pixel for the input is specifically as follows: the super pixel label is determined by the overlapping rate of the super pixel and the real label of the saliency map, if the super pixel label is greater than a set threshold value, the label is 1, and the super pixel label is regarded as saliency; otherwise, if the overlapping rate is smaller than the set threshold, the label is 0, and the label is considered as non-significant;

the network input graph comprises a global filling graph and a local cutting graph;

the local cutting graph takes the super-pixel as the center and contains the cutting graph of the detail information in the super-pixel field, represents local characteristics and is used as the input of the local angle detection module CNN-L;

(3) carrying out data set balancing processing and input preprocessing;

(5) a saliency map is computed using the trained model for the target image.

2. The image saliency detection method based on parallel convolutional neural network of claim 1, characterized in that, the data set balancing process of step (3) is specifically:

3. The parallel convolutional neural network-based image saliency detection method of claim 1, wherein the first 5 layers of the parallel convolutional neural network structure of step (1) are 5 convolutional layers; the first convolution layer has 96 convolution kernels with a size of 11 x 3; layer 2 has 256 convolution kernels, size 5 x 48; the third layer of convolution layers has 384 cores, the size is 3 x 256; the fourth convolution layer has 384 cores, and the size is 3 × 192; the 5 th convolution layer has 256 cores with the size of 3 × 192; the back of the first two layers and the fifth layer of convolution layers are connected with a pooling layer and a regularization layer.

4. The parallel convolutional neural network-based image significance detection method as claimed in claim 3, wherein the parameters of the layer convolutional layers of the parallel convolutional neural network structure in step (1) are shared to learn the scale invariance features.

5. The parallel convolutional neural network-based image saliency detection method according to claim 3, wherein in step (4), the training of the parallel convolutional neural network comprises the following steps:

(4-1) initializing network parameters;

(4-2) setting training parameters;

(4-3) loading training data;

and (4-4) iteratively training.

6. The image saliency detection method based on parallel convolutional neural network of claim 5, characterized in that, the network parameters of step (4-1) are initialized, specifically: initializing the first six layers of the parallel convolutional neural network by using the first six layer network parameters of the AlexNet model by adopting a fine-tune strategy; the initialization setting of the fully-connected layer is random value initialization.

7. The image saliency detection method based on parallel convolutional neural network of claim 5, characterized in that, the training parameters of step (4-2) are specifically: the initial learning rate of the first 5 layers of the parallel convolutional neural network is set to 0.0001; the initial learning rate of the full connection layer parameters is 0.001; the training process is set to reduce the learning rate by 40% after each 8-time sample set traversal.

8. The parallel convolutional neural network-based image saliency detection method of claim 5, wherein the step (4-3) of iteratively training: iterative training is carried out on the parallel convolution neural network by adopting a random gradient descent algorithm, network parameters are stored once every 1000 times of iteration, and the optimal solution of the network is obtained through continuous iteration.