CN110322509A

CN110322509A - Object localization method, system and computer equipment based on level Class Activation figure

Info

Publication number: CN110322509A
Application number: CN201910559655.5A
Authority: CN
Inventors: 李鸿健; 程卓; 曾祥燕; 段小林; 汪美琦
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2019-10-11
Anticipated expiration: 2039-06-26
Also published as: CN110322509B

Abstract

The present invention relates to deep learning field and object detecting areas, the invention discloses object localization method, system and computer equipments based on level Class Activation figure, this method constructs a hierarchal model, traditional full articulamentum is substituted using the average pond layer of the overall situation or pyramid pond layer behind convolutional layer, loses image structure information to avoid in full articulamentum.Multiple convolutional layers of this method in low layer acquire corresponding characteristic information, to obtain level Class Activation figure.Level Class Activation figure of the invention is acquired in multiple convolutional layers in low layer not only from the last layer acquisition characteristics figure, to reduce the loss of low layer pictures information, improves framing ability.

Description

Object localization method, system and computer equipment based on level Class Activation figure

Technical field

The present invention relates to deep learning field and object detecting areas, specifically realize that object is examined using depth learning technology Target positioning under surveying；Specially a kind of object localization method based on level Class Activation figure.

Background technique

In recent years, with the quick emergence of depth learning technology, the research of object detection is had been achieved under image domains Critically important progress.Wherein most popular object detection algorithms can be divided into two types: (1) two-step method first passes through CNN net Network structure generates a series of sparse candidate frames, then carries out classification processing to these candidate frames.(2) one-step method is similar to SSD Thought carries out intensive sampling to image using the length-width ratio of different scale in image different location, extracts feature using CNN, directly Tap into capable classification.The wherein target positioning under object detection is mainly, our targets of interest from image or video It is separated with background.Weakly supervised method and strong measure of supervision are divided into for the method for target location tasks.

Weakly supervised object localization method is different with strong measure of supervision, and Weakly supervised method only needs the mark of image level, is not required to Very important person is the location information of label target, size information, such as bounding box.The workload of the mankind can be reduced in this way, while Reduce calculation amount.Due to the data set of bounding box be it is a small number of, more data sets only have the mark of image level, so phase For strong measure of supervision, Weakly supervised method applicability is wider.

Many researchers are all selected using Weakly supervised method at present, but during the experiment, they have found using convolution mind Convolutional layer through network (CNN) can directly position target, however but will lead to targeting capability after full articulamentum and lack It loses.In order to enhance the ability of target positioning, many people propose the network structure using full convolutional layer, such as the network in network , (NIN) and the proposition of full convolutional network (FCN) quantity of parameter is minimized to avoid using full articulamentum, while can also protect Hold high-performance.

For the network structure used before, many persons select the extraction feature map in top convolutional layer, carry out Pondization operation, with retaining space characteristic.In the adaptive convolutional layer of such as oquab and global maximum pondization substitution convolutional network Full coupling part enhances stationkeeping ability of the convolutional network in Weakly supervised.But this method can only position a point, and used Evaluation method be author oneself definition, cannot be general.The 3*3 convolution in one layer of 1024 channel such as Zhou on this basis Layer and global average pondization replace full articulamentum.Stationkeeping ability has greatly improved compared with the method for oquab etc., but can only determine A part of target is arrived in position.This method only extracts characteristic information from the top of convolutional network, excessive for wisp background Situation, the opposite information that can lack low layer.Zhiqiang etc. is substituted on the basis of the methods of zhou with spatial pyramid pondization The average pond of the overall situation, further increases stationkeeping ability.Comprehensive above method has a problem that they are all from convolutional layer Top extract characteristic information, will cause bottom-up information missing, locating accuracy relative reduction.

Summary of the invention

Based on problem of the existing technology, the present invention is directed in target positioning because positioning energy caused by bottom-up information missing This disadvantage of power inaccuracy, has carried out some modifications to basic convolutional network.Respectively in convolutional layer 4-3, convolutional layer 4-4, convolution Layer 5-3, convolutional layer 5-4 etc. add the 3*3 convolutional layer in one layer of 1024 channel below, and the padding of a1 and a2 are set as 0, It is proposed a kind of novel hierarchical network structure.And level Class Activation figure is obtained from the characteristic pattern of different levels, it proposes a kind of new The Class Activation figure of type, to make up the missing of opposite low level information, to improve stationkeeping ability.

A kind of object localization method based on level Class Activation figure of the invention, the method includes being by image to be predicted Input picture inputs convolution hierarchical structure, and extracts the hierarchy characteristic of image to be predicted, generates the level class of image to be predicted Activation figure, retains the partial value in level Class Activation figure, and generate the bounding box that can predict object to be measured in image to be predicted； The target position after positioning according to the bounding box to export image to be predicted；The position of bounding box namely target, in target In positioning or object detection task, it should be clear to those skilled in the art that the position of ideal is with bounding box table Show.

Wherein, the map generalization of level Class Activation the following steps are included:

The convolution hierarchical structure of S1, building image to be predicted, including the convolutional layer 4-3 in VGG19 network structure, convolution Layer 4-4, convolutional layer 5-3, one layer of customized convolutional layer is added respectively behind convolutional layer 5-4；

The step-length and padding of customized convolutional layer are added in S2, setting S1；

S3, the corresponding customized convolutional layer of convolutional layer 4-3, convolutional layer 4-4 in S2 is overlapped according to channel To the first superimposed layer；The corresponding customized convolutional layer of convolutional layer 5-3, convolutional layer 5-4 is overlapped to obtain according to channel Second superimposed layer；

S4, the first superimposed layer and the second superimposed layer progress pond are respectively obtained into TA_nAnd TB_n；

S5, TA_nAnd TB_nIt is sent in linear layer, acquires the score Sc of classification；

S6, it is obtained according to classification score Sc using softmax function and cross entropy loss function to convolutional network training WeightWith

S7, the notable figure I for finding out the first superimposed layer and the second superimposed layer respectively_AAnd I_B, put using linear two interpolation method Greatly to input picture it is consistent after, two notable figures are added, level Class Activation figure is obtained；Retain it greater than maximum activation value 20% part, for generating the bounding box of prediction.

Further, in the step S2, the customized convolutional layer step-length after convolutional layer 4-3 and convolutional layer 4-4 is set as 1, padding is set as 0；It is equal that customized convolutional layer step-length after convolutional layer 5-3, convolutional layer 5-4 is set as 1, padding It is set as 1.

Further, the calculation formula of the score of classification includes:

Wherein, P_cIndicate that object to be measured is the probability of c class；S_cIndicate that object to be measured is the score of c class；Indicate first The weight of c class in superimposed layer；Indicate the weight of c class in the second superimposed layer, n indicates the number of characteristic pattern.

Further, the calculation formula of the cell value of the notable figure of the first superimposed layer and the second superimposed layer is represented sequentially as:

Wherein, n indicates the number of characteristic pattern；Indicate the weight of c class in the first superimposed layer；Indicate that second is folded Add the weight of c class in layer；F_Akn(x, y) indicates (x, y) unit of n-th characteristic pattern in the first superimposed layer；F_Bkn(x, y) table Show (x, y) unit of n-th characteristic pattern in the second superimposed layer.

Further, after causing input picture consistent using the amplification of linear two interpolation method the first superimposed layer and the second superimposed layer It is added, obtains level Class Activation figure I, the part greater than maximum activation value 20% for retaining I generates prediction block.Wherein obtain level The formula of Class Activation figure I are as follows:

I=I_A+I_B。

The invention also provides a kind of object locating system based on level Class Activation figure, the system comprises:

Image collection module, for obtaining image to be predicted；

Hierarchy characteristic extraction module, for extracting the hierarchy characteristic in image to be predicted；

Level Class Activation figure constructs module, for hierarchy characteristic to be constructed level Class Activation figure；

Predicted boundary frame computing module, for predicting the side of object to be measured in image to be predicted according to level Class Activation figure Boundary's frame；

Wherein, the hierarchy characteristic extraction module includes VGG19 network structure, customized convolution layer unit, superimposed layer list Member and pond layer unit；The customized convolution layer unit is to the convolutional layer 4-3 of VGG19 network structure, convolutional layer 4-4, volume One layer of customized convolutional layer is added behind lamination 5-3, convolutional layer 5-4 respectively；The superposition layer unit is respectively by 4-3, convolutional layer The characteristic pattern of the corresponding customized convolutional layer output of 4-4 is overlapped according to channel and by convolutional layer 5-3, convolutional layer 5- The characteristic pattern of 4 corresponding customized convolutional layer outputs is overlapped according to channel；The pond layer unit will be for that will be superimposed Layer unit treated characteristic pattern carries out pond；

The level Class Activation figure building module includes classification score calculation unit, classification function computing unit, loss letter Counting unit, notable figure superpositing unit and activation computing unit；The classification score calculation unit is used for computing pool layer unit Export the classification score of result；The classification function computing unit and the loss function unit are used for convolutional neural networks Training；The notable figure superpositing unit is for the notable figure that superimposed layer exports to be overlapped；The activation computing unit is used for The part for calculating the maximum activation value of level Class Activation figure, and will be greater than maximum activation value 20% generates the bounding box of prediction.

Further, a kind of computer equipment, including memory, processor and storage on a memory and can handled The computer program run on device；For realizing the object localization method when processor executes described program.

Beneficial effects of the present invention:

1. the present invention can extract information from the convolutional layer of opposite bottom, the missing of opposite bottom-up information can be made up.

2. the present invention can test on multiple data sets, and its targeting capability is significant.

3. every input picture only needs to carry out a propagated forward in the present invention, reduce computational complexity, saves Time cost.

4. the present invention can be used for fine-grained classification, target tracking etc. task.

Detailed description of the invention

Fig. 1 is level Class Activation map generalization procedure chart in the present invention；

Fig. 2 is flow chart of the method for the present invention；

Fig. 3 is feature extraction figure of the invention；

Fig. 4 is the hierarchical structure figure of the invention based on Class Activation figure；

Fig. 5 is level Class Activation figure of the invention；

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to of the invention real The technical solution applied in example is clearly and completely described, it is clear that described embodiment is only that present invention a part is implemented Example, instead of all the embodiments.

Embodiment 1

A kind of object localization method based on level Class Activation figure of the invention, the method includes image to be predicted is defeated Enter into convolution hierarchical structure, and extract the hierarchy characteristic of image to be predicted, generates the level Class Activation figure of image to be predicted； Retain the partial value in level Class Activation figure, and generates the bounding box that can predict object to be measured；

Wherein, as shown in Figure 1, the map generalization of level Class Activation the following steps are included:

S7, the notable figure for finding out the first superimposed layer and the second superimposed layer respectively are amplified to using linear two interpolation method After input picture is consistent, two importance are added, level Class Activation figure is obtained；Retain it and is greater than the 20% of maximum activation value Part, for generating the bounding box of prediction.

Embodiment 2

The present embodiment provides another embodiment of the present invention, in the embodiment, inputs testing image into model, calculates damage Function is lost, until loss function is restrained, then trains model, otherwise parameters is updated using gradient descent algorithm and continues to input It is trained into model；After model training is good, testing image is inputted, convolutional layer 4-3, convolutional layer 4-4, convolutional layer are extracted The characteristic pattern of 5-3, convolutional layer 5-4 determine the notable figure I of classification according to formula (5)_AAnd I_B；And the two notable figures are carried out Superposition obtains level Class Activation figure；Retain and activate partial value in figure, in the present embodiment, selection is greater than the value of maximum activation value 20% Retained；It is used for generating the bounding box of prediction.

Wherein, a few class loss functions well known to those of ordinary skill in the art, such as cross entropy can be used in loss function Loss function, hinge loss function, figure penalties function etc..

Specifically,

As shown in figure 3, after inputting image to be predicted in the present invention, in the Part IV and convolutional layer of the convolutional layer of VGG Part V carries out feature extraction respectively；Maximum pond is carried out to the feature after extraction respectively or pyramid pond, formation obtain Level Class Activation figure, retain partial value in level Class Activation figure, it is after output category as a result, so that it is determined that testing image out Target positioning.

Since each part includes multiple convolutional layers in VGG network structure.Such as the 4th in VGG19 network structure Part includes conv4-1, conv4-2, conv4-3, conv4-4.In the present invention at preferred Part IV and Part V Reason.

As shown in figure 4, the present embodiment is basic network with VGG19net, in convolutional layer 4-3, convolutional layer 4-4, convolutional layer 5- 3, the 3*3 convolutional layer in one layer of 1024 channel is added behind convolutional layer 5-4 respectively, and be successively named as convolutional layer a1, convolutional layer a2, Convolutional layer b1, convolutional layer b2.Convolutional layer a1 is set, the step-length of convolutional layer a2 is 1, padding 0；Convolutional layer b1, convolution are set Layer b2 step-length be 1, padding 1.Convolutional layer a1 and a2, b1 and b2 are overlapped to the formula for respectively obtaining A, B by channel For (1)

The size of A and B is not identical, so to carry out pond respectively.Here we turn to example with the average pond of the overall situation, empty Between the pyramid pondization that is averaged it is similar.Global average pondization is carried out to A and utilizes formula (2)

F_kn(x, y) indicates (x, y) unit of n-th characteristic pattern in A, N_AIt is the unit number of every characteristic pattern in A. B also carries out same pondization operation.

Then obtained TA_nAnd TB_nIt is transmitted in linear layer and softmax function, such as formula (3) and formula (4) institute Show.

WithIt is the weight of c class.S_cIt is the score that image to be predicted is c class.Then, formula 5 illustrate for Notable figure I of (x, the y) unit for classification in A_A.Also there is similar I for B_B。

As shown in figure 5, using linear two interpolation method I_AAnd I_BAfter being amplified to big size as input picture, I_A And I_BIt is added, level Class Activation figure is obtained, shown in formula (6).

I=I_A+I_B (6)

20% part greater than maximum activation value for finally retaining level Class Activation figure I, for generating the boundary of prediction Frame；Prediction block in Fig. 5 can be used for predicting the position of object to be measured in image to be predicted.

Embodiment 4

The present embodiment provides the related description of object locating system in the present invention；

Image collection module, for obtaining image to be predicted；

Embodiment 5

The embodiment of the invention also provides a kind of computer equipment, including memory, processor and it is stored in memory Computer program that is upper and can running on a processor；The processor executes fixed for realizing the target when described program Position method.

Each technical characteristic of embodiment described above can be combined arbitrarily, in order to avoid repeating, mesh in the present invention The feature of mark localization method, system and computer equipment can be quoted mutually.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..

Embodiment provided above has carried out further detailed description, institute to the object, technical solutions and advantages of the present invention It should be understood that embodiment provided above is only the preferred embodiment of the present invention, be not intended to limit the invention, it is all Any modification, equivalent substitution, improvement and etc. made for the present invention, should be included in the present invention within the spirit and principles in the present invention Protection scope within.

Claims

1. a kind of object localization method based on level Class Activation figure, the method includes image to be predicted is input to convolutional layer In level structure, and the hierarchy characteristic of image to be predicted is extracted, generates the level Class Activation figure of image to be predicted；Retain level class The partial value in figure is activated, and generates the bounding box that can predict object to be measured in image to be predicted；

It is characterized in that, the map generalization of level Class Activation the following steps are included:

The convolution hierarchical structure of S1, building image to be predicted, including convolutional layer 4-3, the convolutional layer 4- in VGG19 network structure 4, one layer of customized convolutional layer is added respectively behind convolutional layer 5-3, convolutional layer 5-4；

S3, by the characteristic pattern of the corresponding customized convolutional layer output of convolutional layer 4-3, convolutional layer 4-4 in S2 according to channel into Row superposition obtains the first superimposed layer；By the characteristic pattern of the corresponding customized convolutional layer output of convolutional layer 5-3, convolutional layer 5-4 It is overlapped to obtain the second superimposed layer according to channel；

S4, the first superimposed layer and the second superimposed layer are carried out to the TA that pondization successively obtains the output of pond layer respectively_nAnd TB_n；

S6, weight is obtained using softmax function and cross entropy loss function to convolutional network training according to classification score ScWith

S7, the notable figure for finding out the first superimposed layer and the second superimposed layer respectively are amplified to using linear two interpolation method to pre- After altimetric image is consistent, two notable figures are added, level Class Activation figure is obtained；Retain its 20% portion for being greater than maximum activation value Point, and be used for generating the bounding box of prediction.

2. a kind of object localization method based on level Class Activation figure according to claim 1, which is characterized in that the step In rapid S2, the customized convolutional layer step-length after convolutional layer 4-3 and convolutional layer 4-4 is set as 1, padding and is set as 0；Convolution Customized convolutional layer step-length after layer 5-3, convolutional layer 5-4 is set as 1, padding and is set as 1.

3. a kind of object localization method based on level Class Activation figure according to claim 1, which is characterized in that classification The calculation formula of score includes:

Wherein, P_cIndicate that object to be measured is the probability of c classification；S_cIndicate that object to be measured is the score of c classification；Indicate first Object to be measured is the weight of c in superimposed layer；Indicate the weight that object to be measured is c in the second superimposed layer, n indicates characteristic pattern Number.

4. a kind of object localization method based on level Class Activation figure according to claim 1, which is characterized in that first is folded The calculation formula of the notable figure cell value of layer and the second superimposed layer is added to be represented sequentially as:

Wherein, n indicates the number of characteristic pattern；Indicate the weight of c class in the first superimposed layer；Indicate the second superimposed layer The weight of middle c class；F_Akn(x, y) indicates (x, y) unit of n-th characteristic pattern in the first superimposed layer；F_Bkn(x, y) is indicated (x, y) unit of n-th characteristic pattern in second superimposed layer.

5. a kind of object locating system based on level Class Activation figure, which is characterized in that the system comprises:

Image collection module, for obtaining image to be predicted；

Predicted boundary frame computing module, for predicting the boundary of object to be measured in image to be predicted according to level Class Activation figure Frame；

Wherein, the hierarchy characteristic extraction module include VGG19 network structure, customized convolution layer unit, superposition layer unit with And pond layer unit；Convolutional layer 4-3, convolutional layer 4-4, convolutional layer of the customized convolution layer unit to VGG19 network structure One layer of customized convolutional layer is added behind 5-3, convolutional layer 5-4 respectively；The superposition layer unit is each by 4-3, convolutional layer 4-4 respectively The characteristic pattern of self-corresponding customized convolutional layer output be overlapped according to channel and by convolutional layer 5-3, convolutional layer 5-4 respectively The characteristic pattern of corresponding customized convolutional layer output is overlapped according to channel；The pond layer unit will be for that will be superimposed layer unit Treated, and characteristic pattern carries out pond；

The level Class Activation figure building module includes classification score calculation unit, classification function computing unit, loss function list Member, notable figure superpositing unit and activation computing unit；The classification score calculation unit is exported for computing pool layer unit As a result classification score；The classification function computing unit and the loss function unit are used for the instruction to convolutional neural networks Practice；The notable figure superpositing unit is for the notable figure that superimposed layer exports to be overlapped；The activation computing unit is based on The part for calculating the maximum activation value of level Class Activation figure, and will be greater than maximum activation value 20% generates the bounding box of prediction.

6. a kind of computer equipment, can run on a memory and on a processor including memory, processor and storage Computer program；It is characterized in that, for realizing as described in Claims 1 to 4 is any when the processor executes described program Method.