CN111640092B

CN111640092B - Method for reconstructing target counting network based on multi-task cooperative characteristics

Info

Publication number: CN111640092B
Application number: CN202010430090.3A
Authority: CN
Inventors: 成锋娜; 张玉言; 张镜洋; 周宏平; 茹煜
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2024-01-16
Anticipated expiration: 2040-05-20
Also published as: CN111640092A

Abstract

The invention discloses a method for reconstructing a target counting network based on a multi-task cooperative characteristic, wherein the network constructs a multi-task strategy of mask estimation, density distribution estimation and density level, respectively learns foreground/background, local and whole context information in data, further reconstructs the information to reduce the difference between tasks and enhance the complementarity between tasks, and combines with picture characteristics to improve the diversity of characterization. The method reduces the difficulty of directly carrying out density regression through a progressive learning mode, and has important application value in the fields of forestry, agriculture, traffic and the like.

Description

Method for reconstructing target counting network based on multi-task cooperative characteristics

Technical Field

The invention relates to the technical field of image processing and pattern recognition, in particular to a method for reconstructing a target counting network based on a multi-task cooperative characteristic.

Background

Object counting is an extremely important task in computer vision scene understanding and analysis, which assists people in managing and guiding life production by counting the number of objects of interest. For example, in the medical field, the number of cells is counted to help researchers to see how many cells divide; in the field of agriculture, fruits of fruit trees are counted, and a planter is helped to check the yield or the development condition of the current plant; in the traffic field, the congestion level of a current road and the like are monitored by detecting the number of vehicles.

This task faces significant challenges compared to target detection, such as severe occlusion, scale change, etc. A single task often makes the network overly optimized for a single information source, unable to perceive and extract other more discernable features. This will limit the network adaptation capability, especially in complex environments and tasks. Thus, the invention designs a target counting network for reconstructing the multi-task cooperative features from the multi-task perspective.

The invention enables the network to learn the context information of the foreground/background, local and whole of the data before the density estimation through the mask estimation, the density distribution estimation and the density grade learning. The perception capability of the network to the effective information is further improved through the reconstruction of the information and the combination of the picture characteristics.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the method for reconstructing the target counting network based on the multi-task cooperative characteristics comprises the following steps:

step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network; setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } ₁ ,I ₂ ,...,I _N Label { l } corresponding to each picture ₁ ,l ₂ ,...,l _N -wherein the ith picture is marked as Respectively represent the abscissa and the sum of the j-th target positionsOn the ordinate, m represents the number of objects in the picture.

Step 101: will be the ith picture I _i Tag l of (2) _i Generating a gaussian density map den _i The calculation can be performed by the following formula:

wherein the method comprises the steps ofRepresenting the coordinates, x, of a given picture _j Represents the j-th labeling target position-> Representing a gaussian kernel, wherein σ ² Is the variance term. The addition 1 of the Gaussian values of all pixel points is ensured through averaging; if x is not x _j Only calculating Gaussian values in a neighborhood range, normalizing the summation of the positions calculated in the neighborhood range, and ensuring that the summation of the Gaussian generated in the j-th position is 1; at this time, label l of ith picture _i Conversion to den _i ；

Step 102: sequentially executing the operation of the step 101 on the 1 st to N th pictures in the step 101, and converting the labels of the pictures into a Gaussian density map; further converting the label of the training data into a training target density map label { den } ₁ ,den ₂ ,...,den _M }；

Step 103: binarizing the target density map label generated in step 102 to generate a mask label, i.e. (den) _i > 0) =1, which is the conversion of a non-zero value to 1. Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { bina } ₁ ,bina ₂ ,...,bina _M }；

Step 104: performing density segmentation on the target density icon generated in the step 102 to generate density distributionLabel, i.e. (den) _i ＞θ ₁ )＝4,(θ ₁ Default to 0.9), (θ ₁ ≥den _i ＞θ ₂ )＝3,(θ ₂ Default to 0.6), (θ ₂ ≥den _i ＞θ ₃ )＝2,(θ ₃ Default to 0.3), (θ ₃ ≥den _i ＞θ ₄ )＝1,(θ ₄ Default to 0), (den) _i ≤θ ₄ ) =0. Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { distr } ₁ ,distr ₂ ,...,distr _M }；

Step 105: dividing the total number of people generating the target density map label in the step 102 into density grade labels, namely @(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { level } ₁ ,level ₂ ,...,level _M }；

Step 2: establishing a target counting network based on a multi-task cooperative reconstruction feature, wherein the specific model of the network is as follows:

convolution layer 1: deconvolving the image input as xxyx3 by using 24 convolution kernels of 3 x 3, and obtaining xxyx24 characteristics after a ReLU activation function;

convolution layer 2: deconvolving the output of convolution layer 1 using 22 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 2) x (y% 2) x 22 feature;

convolution layer 3: deconvolving the output of convolution layer 2 using 41 3 x 3 convolution kernels, and performing a ReLU activation function to obtain a (x% 2) × (y% 2) ×41 feature;

convolution layer 4: deconvolving the output of convolution layer 3 using 51 convolution kernels of 3 x 3, and pooling the layer with a maximum of 2 x 2 and a ReLU activation function to obtain a (x% 4) × (y% 4) ×51 signature;

convolution layer 5: deconvolving the output of convolutional layer 4 using 108 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×108 feature;

convolution layer 6: deconvolving the output of convolutional layer 5 using 89 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×89 feature;

convolution layer 7: deconvolving the output of convolution layer 6 using 111 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 8) × (y% 8) ×111 feature;

convolution layer 8: deconvolving the output of convolution layer 7 using 184 3 x 3 convolution kernels, and performing a ReLU activation function to obtain (x% 8) × (y% 8) ×184 features;

convolution layer 9: deconvolving the output of convolution layer 8 using 276 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×276 characteristic;

convolution layer 10: deconvolving the output of convolution layer 9 using 228 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×228 feature;

convolution layer 11_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;

convolution layer 11_2: deconvolving the output of the convolutional layer 11_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 11_3: deconvolving the output of the convolutional layer 11_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 11_4: deconvolving the output of the convolutional layer 11_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;

convolution layer 12_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;

convolution layer 12_2: deconvolving the output of the convolutional layer 12_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 12_3: deconvolving the output of the convolutional layer 12_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 12_4: deconvolving the output of the convolutional layer 12_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;

convolution layer 13_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;

convolution layer 13_2: deconvolving the output of the convolutional layer 13_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 13_3: the output of the convolution layer 13_2 is deconvolved by using 256 convolution kernels of 3×3, and the characteristics of 1×1×256 are obtained after the PReLU activation function and the adaptive average pooling layer;

convolution layer 13_4: multiplying the tensor conversion output by the convolution layer 13_3 by a 256×128 linear layer, and obtaining a 1×128 feature after a Sigmoid activation function;

convolution layer 13_5: multiplying the output of the convolution layer 13_4 by a 128×4 linear layer, and obtaining a 1×4 feature after a Sigmoid activation function;

convolution layer 14_1: deconvolving the output of the convolutional layer 11_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 14_2: deconvolving the output of convolutional layer 14_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;

convolution layer 14_3: deconvolving the output of convolutional layer 14_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;

convolution layer 15_1: deconvolving the output of the convolutional layer 12_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;

convolution layer 15_2: deconvolving the output of convolutional layer 15_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;

convolution layer 15_3: deconvolving the output of convolutional layer 15_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;

convolution layer 16_1: deconvolution of tensor conversion of the convolution layer 13_5 output using 256 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 64) × (y% 64) ×256 feature;

convolution layer 16_2: deconvolution of the output of convolution layer 16_1 using 128 3×3 convolution kernels, followed by a prerlu activation function, yields a characteristic of (x% 32) × (y% 32) ×256;

convolution layer 16_3: deconvolution of the output of convolution layer 16_2 using 128 3×3 convolution kernels, followed by a PReLU activation function to obtain a (x% 16) × (y% 16) ×128 feature;

convolution layer 16_4: deconvolution of the output of convolution layer 16_3 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;

convolution layer 17: deconvolving the output of the convolutional layer 10 using 128 3 x 3 convolution kernels, and performing a PReLU activation function to obtain (x% 8) × (y% 8) ×128 features;

aggregation layer 1: cascading the outputs of convolution layers 14_3, 15_3, 16_4, and 17 along the channel dimension to obtain the (x% 8) × (y% 8) ×512 feature;

convolution layer 18: deconvolving the output of aggregation layer 1 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;

convolution layer 19: deconvolution of the output of convolutional layer 18 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 4) x (y% 4) x 128 feature;

convolution layer 20: deconvolution of the output of convolution layer 19 using 64 3 x 3 convolution kernels, followed by a PReLU activation function to obtain the (x% 2) x (y% 2) x 64 feature;

convolution layer 21: deconvolution of the output of the convolutional layer 20 using 32 3×3 convolution kernels, followed by a PReLU activation function to obtain the characteristics of xxyx32;

convolution layer 22: deconvolving the output of the convolutional layer 21 using 1 x 1 convolution kernels to obtain an x y x 1 feature;

step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, and carrying out parameter learning on the network through an Adam optimization strategy, wherein the method specifically comprises the following steps of:

step 301: the network designed by the invention adopts a multitasking mode to train network parameters, and the initial learning rate of the network is set as l;

step 302: the output of the convolution layer 11_4 is pre_bin, the output of the convolution layer 12_4 is pre_den, the output of the convolution layer 13_5 is pre_level, and the output of the convolution layer 22 is pre_net. Based on the label given in the step 1, the parameters in the network are learned, and the loss function is as follows

Step 4: testing a deep network model; after the network is trained in the step 3, parameters of a convolution layer of the network are reserved; changing the size of the test picture to 256×256, and summing the outputs pre_net of the convolutional layer 22 in step 2 to obtain the target number of the current test picture.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention designs an information extraction module with cooperative tasks, and enables a network to learn foreground/background, local and whole context information in data before density estimation through mask estimation, density distribution estimation and density level learning.

2. The invention designs the task feature reconstruction module, and the single multitask model ignores the difference and complementarity between the tasks, and the project enhances the collaborative interaction capability between the tasks in a feature reconstruction mode.

3. The method simplifies the difficulty of estimating the density map by a progressive learning mode and enhances the robustness of the network.

Drawings

FIG. 1 is a block diagram of a deep network model according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

Example 1: referring to fig. 1, a method for reconstructing a target count network based on a multitasking cooperative feature includes the steps of:

step 1: building training samples and labels; and preprocessing the acquired training pictures and labels so as to facilitate the training of the network. Setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } ₁ ,I ₂ ,...,I _N Label { l } corresponding to each picture ₁ ,l ₂ ,...,l _N -wherein the ith picture is marked as The abscissa and ordinate, respectively, represent the j-th object position, and m represents the number of objects in the picture.

Step (a)101: will be the ith picture I _i Tag l of (2) _i Generating a gaussian density map den _i The calculation can be performed by the following formula:

Step 104: performing density segmentation on the target density label generated in step 102 to generate a density distribution label, namely (den) _i ＞θ ₁ )＝4,(θ ₁ Default to 0.9), (θ ₁ ≥den _i ＞θ ₂ )＝3,(θ ₂ Default to 0.6), (θ ₂ ≥den _i ＞θ ₃ )＝2,(θ ₃ Default to 0.3), (θ ₃ ≥den _i ＞θ ₄ )＝1,(θ ₄ Default to 0), (den) _i ≤θ ₄ ) =0. Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { distr } ₁ ,distr ₂ ,...,distr _M }；

Step 105: dividing the population count of the target density map label generated in the step 102 into density grade labels, namely(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { level } ₁ ,level ₂ ,...,level _M }；

Step 2: and establishing a target counting network based on the multi-task cooperative reconstruction characteristic. Specific model of network:

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A method for reconstructing a target count network based on a multitasking cooperative feature, the method comprising the steps of:

step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network;

step 2: establishing a target counting network based on the multi-task cooperative reconstruction characteristic;

step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, carrying out parameter learning on the network through an Adam optimization strategy,

step 4: testing a deep network model;

wherein, step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network; the method comprises the following steps: setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } ₁ ,I ₂ ,...,I _N Label { l } corresponding to each picture ₁ ,l ₂ ,...,l _N -wherein the ith picture is marked as Respectively representing the abscissa and the ordinate of the jth target position, and m represents the number of targets in the picture;

wherein the method comprises the steps ofRepresenting the coordinates, x, of a given picture _j Represents the j-th labeling target position-> Representing a gaussian kernel, wherein σ ² Is a variance term, and ensures the sum 1 of Gaussian values of all pixel points through averaging; if x is not x _j Only calculating Gaussian values in a neighborhood range, normalizing the summation of the positions calculated in the neighborhood range, and ensuring that the summation of the Gaussian generated in the j-th position is 1; at this time, label l of ith picture _i Conversion to den _i ；

Step 103: binarizing the target density map label generated in step 102 to generate a mask label, i.e. (den) _i >0) =1, which is the conversion of a non-zero value to 1, tag { den, step 102 ₁ ,den ₂ ,...,den _M Respectively generate { bina } ₁ ,bina ₂ ,...,bina _M }；

Step 104: performing density segmentation on the target density label generated in step 102 to generate a density distribution label, namely (den) _i >θ ₁ )＝4,(θ ₁ Default to 0.9), (θ ₁ ≥den _i >θ ₂ )＝3,(θ ₂ Default to 0.6),

(θ ₂ ≥den _i >θ ₃ )＝2，(θ ₃ default to 0.3),

(θ ₃ ≥den _i >θ ₄ )＝1,(θ ₄ default to 0), (den) _i ≤θ ₄ )＝0；

Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { distr } ₁ ,distr ₂ ,...,distr _M }；

Step 105: dividing the population count of the target density map label generated in the step 102 into density grade labels, namely(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 ₁ ,den ₂ ,...,den _M Respectively generate { level } ₁ ,level ₂ ,...,level _M }。

2. The method for reconstructing a target-counting network based on a multi-tasking collaborative feature according to claim 1,

step 2: establishing a target counting network based on the multi-task cooperative reconstruction characteristic; the specific model of the network is as follows:

convolution layer 22: the output of the convolutional layer 21 is deconvolved using 11×1 convolution kernel, resulting in an x×y×1 feature.

3. The method for reconstructing a target count network based on a multitasking cooperative feature of claim 1, wherein step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, and carrying out parameter learning on the network through an Adam optimization strategy, wherein the method specifically comprises the following steps of:

step 302: the output of the convolution layer 11_4 is recorded as Pre_bin, the output of the convolution layer 12_4 is recorded as Pre_den, the output of the convolution layer 13_5 is recorded as Pre_level, the output of the convolution layer 22 is recorded as Pre_net, the parameters in the network are learned based on the labels given in the step 1, and the loss function is

4. The method for reconstructing a target-counting network based on a multi-tasking collaborative feature according to claim 1,