CN111640092B - Method for reconstructing target counting network based on multi-task cooperative characteristics - Google Patents

Method for reconstructing target counting network based on multi-task cooperative characteristics Download PDF

Info

Publication number
CN111640092B
CN111640092B CN202010430090.3A CN202010430090A CN111640092B CN 111640092 B CN111640092 B CN 111640092B CN 202010430090 A CN202010430090 A CN 202010430090A CN 111640092 B CN111640092 B CN 111640092B
Authority
CN
China
Prior art keywords
convolution
layer
output
activation function
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010430090.3A
Other languages
Chinese (zh)
Other versions
CN111640092A (en
Inventor
成锋娜
张玉言
张镜洋
周宏平
茹煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Forestry University
Original Assignee
Nanjing Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Forestry University filed Critical Nanjing Forestry University
Priority to CN202010430090.3A priority Critical patent/CN111640092B/en
Publication of CN111640092A publication Critical patent/CN111640092A/en
Application granted granted Critical
Publication of CN111640092B publication Critical patent/CN111640092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for reconstructing a target counting network based on a multi-task cooperative characteristic, wherein the network constructs a multi-task strategy of mask estimation, density distribution estimation and density level, respectively learns foreground/background, local and whole context information in data, further reconstructs the information to reduce the difference between tasks and enhance the complementarity between tasks, and combines with picture characteristics to improve the diversity of characterization. The method reduces the difficulty of directly carrying out density regression through a progressive learning mode, and has important application value in the fields of forestry, agriculture, traffic and the like.

Description

Method for reconstructing target counting network based on multi-task cooperative characteristics
Technical Field
The invention relates to the technical field of image processing and pattern recognition, in particular to a method for reconstructing a target counting network based on a multi-task cooperative characteristic.
Background
Object counting is an extremely important task in computer vision scene understanding and analysis, which assists people in managing and guiding life production by counting the number of objects of interest. For example, in the medical field, the number of cells is counted to help researchers to see how many cells divide; in the field of agriculture, fruits of fruit trees are counted, and a planter is helped to check the yield or the development condition of the current plant; in the traffic field, the congestion level of a current road and the like are monitored by detecting the number of vehicles.
This task faces significant challenges compared to target detection, such as severe occlusion, scale change, etc. A single task often makes the network overly optimized for a single information source, unable to perceive and extract other more discernable features. This will limit the network adaptation capability, especially in complex environments and tasks. Thus, the invention designs a target counting network for reconstructing the multi-task cooperative features from the multi-task perspective.
The invention enables the network to learn the context information of the foreground/background, local and whole of the data before the density estimation through the mask estimation, the density distribution estimation and the density grade learning. The perception capability of the network to the effective information is further improved through the reconstruction of the information and the combination of the picture characteristics.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the method for reconstructing the target counting network based on the multi-task cooperative characteristics comprises the following steps:
step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network; setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } 1 ,I 2 ,...,I N Label { l } corresponding to each picture 1 ,l 2 ,...,l N -wherein the ith picture is marked as Respectively represent the abscissa and the sum of the j-th target positionsOn the ordinate, m represents the number of objects in the picture.
Step 101: will be the ith picture I i Tag l of (2) i Generating a gaussian density map den i The calculation can be performed by the following formula:
wherein the method comprises the steps ofRepresenting the coordinates, x, of a given picture j Represents the j-th labeling target position-> Representing a gaussian kernel, wherein σ 2 Is the variance term. The addition 1 of the Gaussian values of all pixel points is ensured through averaging; if x is not x j Only calculating Gaussian values in a neighborhood range, normalizing the summation of the positions calculated in the neighborhood range, and ensuring that the summation of the Gaussian generated in the j-th position is 1; at this time, label l of ith picture i Conversion to den i
Step 102: sequentially executing the operation of the step 101 on the 1 st to N th pictures in the step 101, and converting the labels of the pictures into a Gaussian density map; further converting the label of the training data into a training target density map label { den } 1 ,den 2 ,...,den M };
Step 103: binarizing the target density map label generated in step 102 to generate a mask label, i.e. (den) i > 0) =1, which is the conversion of a non-zero value to 1. Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { bina } 1 ,bina 2 ,...,bina M };
Step 104: performing density segmentation on the target density icon generated in the step 102 to generate density distributionLabel, i.e. (den) i >θ 1 )=4,(θ 1 Default to 0.9), (θ 1 ≥den i >θ 2 )=3,(θ 2 Default to 0.6), (θ 2 ≥den i >θ 3 )=2,(θ 3 Default to 0.3), (θ 3 ≥den i >θ 4 )=1,(θ 4 Default to 0), (den) i ≤θ 4 ) =0. Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { distr } 1 ,distr 2 ,...,distr M };
Step 105: dividing the total number of people generating the target density map label in the step 102 into density grade labels, namely @(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { level } 1 ,level 2 ,...,level M };
Step 2: establishing a target counting network based on a multi-task cooperative reconstruction feature, wherein the specific model of the network is as follows:
convolution layer 1: deconvolving the image input as xxyx3 by using 24 convolution kernels of 3 x 3, and obtaining xxyx24 characteristics after a ReLU activation function;
convolution layer 2: deconvolving the output of convolution layer 1 using 22 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 2) x (y% 2) x 22 feature;
convolution layer 3: deconvolving the output of convolution layer 2 using 41 3 x 3 convolution kernels, and performing a ReLU activation function to obtain a (x% 2) × (y% 2) ×41 feature;
convolution layer 4: deconvolving the output of convolution layer 3 using 51 convolution kernels of 3 x 3, and pooling the layer with a maximum of 2 x 2 and a ReLU activation function to obtain a (x% 4) × (y% 4) ×51 signature;
convolution layer 5: deconvolving the output of convolutional layer 4 using 108 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×108 feature;
convolution layer 6: deconvolving the output of convolutional layer 5 using 89 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×89 feature;
convolution layer 7: deconvolving the output of convolution layer 6 using 111 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 8) × (y% 8) ×111 feature;
convolution layer 8: deconvolving the output of convolution layer 7 using 184 3 x 3 convolution kernels, and performing a ReLU activation function to obtain (x% 8) × (y% 8) ×184 features;
convolution layer 9: deconvolving the output of convolution layer 8 using 276 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×276 characteristic;
convolution layer 10: deconvolving the output of convolution layer 9 using 228 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×228 feature;
convolution layer 11_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 11_2: deconvolving the output of the convolutional layer 11_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_3: deconvolving the output of the convolutional layer 11_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_4: deconvolving the output of the convolutional layer 11_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 12_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 12_2: deconvolving the output of the convolutional layer 12_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_3: deconvolving the output of the convolutional layer 12_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_4: deconvolving the output of the convolutional layer 12_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 13_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 13_2: deconvolving the output of the convolutional layer 13_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 13_3: the output of the convolution layer 13_2 is deconvolved by using 256 convolution kernels of 3×3, and the characteristics of 1×1×256 are obtained after the PReLU activation function and the adaptive average pooling layer;
convolution layer 13_4: multiplying the tensor conversion output by the convolution layer 13_3 by a 256×128 linear layer, and obtaining a 1×128 feature after a Sigmoid activation function;
convolution layer 13_5: multiplying the output of the convolution layer 13_4 by a 128×4 linear layer, and obtaining a 1×4 feature after a Sigmoid activation function;
convolution layer 14_1: deconvolving the output of the convolutional layer 11_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 14_2: deconvolving the output of convolutional layer 14_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 14_3: deconvolving the output of convolutional layer 14_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 15_1: deconvolving the output of the convolutional layer 12_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 15_2: deconvolving the output of convolutional layer 15_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 15_3: deconvolving the output of convolutional layer 15_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 16_1: deconvolution of tensor conversion of the convolution layer 13_5 output using 256 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 64) × (y% 64) ×256 feature;
convolution layer 16_2: deconvolution of the output of convolution layer 16_1 using 128 3×3 convolution kernels, followed by a prerlu activation function, yields a characteristic of (x% 32) × (y% 32) ×256;
convolution layer 16_3: deconvolution of the output of convolution layer 16_2 using 128 3×3 convolution kernels, followed by a PReLU activation function to obtain a (x% 16) × (y% 16) ×128 feature;
convolution layer 16_4: deconvolution of the output of convolution layer 16_3 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 17: deconvolving the output of the convolutional layer 10 using 128 3 x 3 convolution kernels, and performing a PReLU activation function to obtain (x% 8) × (y% 8) ×128 features;
aggregation layer 1: cascading the outputs of convolution layers 14_3, 15_3, 16_4, and 17 along the channel dimension to obtain the (x% 8) × (y% 8) ×512 feature;
convolution layer 18: deconvolving the output of aggregation layer 1 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 19: deconvolution of the output of convolutional layer 18 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 4) x (y% 4) x 128 feature;
convolution layer 20: deconvolution of the output of convolution layer 19 using 64 3 x 3 convolution kernels, followed by a PReLU activation function to obtain the (x% 2) x (y% 2) x 64 feature;
convolution layer 21: deconvolution of the output of the convolutional layer 20 using 32 3×3 convolution kernels, followed by a PReLU activation function to obtain the characteristics of xxyx32;
convolution layer 22: deconvolving the output of the convolutional layer 21 using 1 x 1 convolution kernels to obtain an x y x 1 feature;
step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, and carrying out parameter learning on the network through an Adam optimization strategy, wherein the method specifically comprises the following steps of:
step 301: the network designed by the invention adopts a multitasking mode to train network parameters, and the initial learning rate of the network is set as l;
step 302: the output of the convolution layer 11_4 is pre_bin, the output of the convolution layer 12_4 is pre_den, the output of the convolution layer 13_5 is pre_level, and the output of the convolution layer 22 is pre_net. Based on the label given in the step 1, the parameters in the network are learned, and the loss function is as follows
Step 4: testing a deep network model; after the network is trained in the step 3, parameters of a convolution layer of the network are reserved; changing the size of the test picture to 256×256, and summing the outputs pre_net of the convolutional layer 22 in step 2 to obtain the target number of the current test picture.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention designs an information extraction module with cooperative tasks, and enables a network to learn foreground/background, local and whole context information in data before density estimation through mask estimation, density distribution estimation and density level learning.
2. The invention designs the task feature reconstruction module, and the single multitask model ignores the difference and complementarity between the tasks, and the project enhances the collaborative interaction capability between the tasks in a feature reconstruction mode.
3. The method simplifies the difficulty of estimating the density map by a progressive learning mode and enhances the robustness of the network.
Drawings
FIG. 1 is a block diagram of a deep network model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1: referring to fig. 1, a method for reconstructing a target count network based on a multitasking cooperative feature includes the steps of:
step 1: building training samples and labels; and preprocessing the acquired training pictures and labels so as to facilitate the training of the network. Setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } 1 ,I 2 ,...,I N Label { l } corresponding to each picture 1 ,l 2 ,...,l N -wherein the ith picture is marked as The abscissa and ordinate, respectively, represent the j-th object position, and m represents the number of objects in the picture.
Step (a)101: will be the ith picture I i Tag l of (2) i Generating a gaussian density map den i The calculation can be performed by the following formula:
wherein the method comprises the steps ofRepresenting the coordinates, x, of a given picture j Represents the j-th labeling target position-> Representing a gaussian kernel, wherein σ 2 Is the variance term. The addition 1 of the Gaussian values of all pixel points is ensured through averaging; if x is not x j Only calculating Gaussian values in a neighborhood range, normalizing the summation of the positions calculated in the neighborhood range, and ensuring that the summation of the Gaussian generated in the j-th position is 1; at this time, label l of ith picture i Conversion to den i
Step 102: sequentially executing the operation of the step 101 on the 1 st to N th pictures in the step 101, and converting the labels of the pictures into a Gaussian density map; further converting the label of the training data into a training target density map label { den } 1 ,den 2 ,...,den M };
Step 103: binarizing the target density map label generated in step 102 to generate a mask label, i.e. (den) i > 0) =1, which is the conversion of a non-zero value to 1. Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { bina } 1 ,bina 2 ,...,bina M };
Step 104: performing density segmentation on the target density label generated in step 102 to generate a density distribution label, namely (den) i >θ 1 )=4,(θ 1 Default to 0.9), (θ 1 ≥den i >θ 2 )=3,(θ 2 Default to 0.6), (θ 2 ≥den i >θ 3 )=2,(θ 3 Default to 0.3), (θ 3 ≥den i >θ 4 )=1,(θ 4 Default to 0), (den) i ≤θ 4 ) =0. Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { distr } 1 ,distr 2 ,...,distr M };
Step 105: dividing the population count of the target density map label generated in the step 102 into density grade labels, namely(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { level } 1 ,level 2 ,...,level M };
Step 2: and establishing a target counting network based on the multi-task cooperative reconstruction characteristic. Specific model of network:
convolution layer 1: deconvolving the image input as xxyx3 by using 24 convolution kernels of 3 x 3, and obtaining xxyx24 characteristics after a ReLU activation function;
convolution layer 2: deconvolving the output of convolution layer 1 using 22 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 2) x (y% 2) x 22 feature;
convolution layer 3: deconvolving the output of convolution layer 2 using 41 3 x 3 convolution kernels, and performing a ReLU activation function to obtain a (x% 2) × (y% 2) ×41 feature;
convolution layer 4: deconvolving the output of convolution layer 3 using 51 convolution kernels of 3 x 3, and pooling the layer with a maximum of 2 x 2 and a ReLU activation function to obtain a (x% 4) × (y% 4) ×51 signature;
convolution layer 5: deconvolving the output of convolutional layer 4 using 108 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×108 feature;
convolution layer 6: deconvolving the output of convolutional layer 5 using 89 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×89 feature;
convolution layer 7: deconvolving the output of convolution layer 6 using 111 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 8) × (y% 8) ×111 feature;
convolution layer 8: deconvolving the output of convolution layer 7 using 184 3 x 3 convolution kernels, and performing a ReLU activation function to obtain (x% 8) × (y% 8) ×184 features;
convolution layer 9: deconvolving the output of convolution layer 8 using 276 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×276 characteristic;
convolution layer 10: deconvolving the output of convolution layer 9 using 228 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×228 feature;
convolution layer 11_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 11_2: deconvolving the output of the convolutional layer 11_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_3: deconvolving the output of the convolutional layer 11_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_4: deconvolving the output of the convolutional layer 11_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 12_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 12_2: deconvolving the output of the convolutional layer 12_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_3: deconvolving the output of the convolutional layer 12_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_4: deconvolving the output of the convolutional layer 12_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 13_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 13_2: deconvolving the output of the convolutional layer 13_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 13_3: the output of the convolution layer 13_2 is deconvolved by using 256 convolution kernels of 3×3, and the characteristics of 1×1×256 are obtained after the PReLU activation function and the adaptive average pooling layer;
convolution layer 13_4: multiplying the tensor conversion output by the convolution layer 13_3 by a 256×128 linear layer, and obtaining a 1×128 feature after a Sigmoid activation function;
convolution layer 13_5: multiplying the output of the convolution layer 13_4 by a 128×4 linear layer, and obtaining a 1×4 feature after a Sigmoid activation function;
convolution layer 14_1: deconvolving the output of the convolutional layer 11_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 14_2: deconvolving the output of convolutional layer 14_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 14_3: deconvolving the output of convolutional layer 14_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 15_1: deconvolving the output of the convolutional layer 12_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 15_2: deconvolving the output of convolutional layer 15_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 15_3: deconvolving the output of convolutional layer 15_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 16_1: deconvolution of tensor conversion of the convolution layer 13_5 output using 256 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 64) × (y% 64) ×256 feature;
convolution layer 16_2: deconvolution of the output of convolution layer 16_1 using 128 3×3 convolution kernels, followed by a prerlu activation function, yields a characteristic of (x% 32) × (y% 32) ×256;
convolution layer 16_3: deconvolution of the output of convolution layer 16_2 using 128 3×3 convolution kernels, followed by a PReLU activation function to obtain a (x% 16) × (y% 16) ×128 feature;
convolution layer 16_4: deconvolution of the output of convolution layer 16_3 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 17: deconvolving the output of the convolutional layer 10 using 128 3 x 3 convolution kernels, and performing a PReLU activation function to obtain (x% 8) × (y% 8) ×128 features;
aggregation layer 1: cascading the outputs of convolution layers 14_3, 15_3, 16_4, and 17 along the channel dimension to obtain the (x% 8) × (y% 8) ×512 feature;
convolution layer 18: deconvolving the output of aggregation layer 1 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 19: deconvolution of the output of convolutional layer 18 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 4) x (y% 4) x 128 feature;
convolution layer 20: deconvolution of the output of convolution layer 19 using 64 3 x 3 convolution kernels, followed by a PReLU activation function to obtain the (x% 2) x (y% 2) x 64 feature;
convolution layer 21: deconvolution of the output of the convolutional layer 20 using 32 3×3 convolution kernels, followed by a PReLU activation function to obtain the characteristics of xxyx32;
convolution layer 22: deconvolving the output of the convolutional layer 21 using 1 x 1 convolution kernels to obtain an x y x 1 feature;
step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, and carrying out parameter learning on the network through an Adam optimization strategy, wherein the method specifically comprises the following steps of:
step 301: the network designed by the invention adopts a multitasking mode to train network parameters, and the initial learning rate of the network is set as l;
step 302: the output of the convolution layer 11_4 is pre_bin, the output of the convolution layer 12_4 is pre_den, the output of the convolution layer 13_5 is pre_level, and the output of the convolution layer 22 is pre_net. Based on the label given in the step 1, the parameters in the network are learned, and the loss function is as follows
Step 4: testing a deep network model; after the network is trained in the step 3, parameters of a convolution layer of the network are reserved; changing the size of the test picture to 256×256, and summing the outputs pre_net of the convolutional layer 22 in step 2 to obtain the target number of the current test picture.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (4)

1. A method for reconstructing a target count network based on a multitasking cooperative feature, the method comprising the steps of:
step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network;
step 2: establishing a target counting network based on the multi-task cooperative reconstruction characteristic;
step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, carrying out parameter learning on the network through an Adam optimization strategy,
step 4: testing a deep network model;
wherein, step 1: building training samples and labels; preprocessing the acquired training pictures and labels so as to facilitate training of a network; the method comprises the following steps: setting training data to have N pictures and corresponding labels thereof (wherein the length and width of each picture are 256), and recording the pictures in the training set as { I } 1 ,I 2 ,...,I N Label { l } corresponding to each picture 1 ,l 2 ,...,l N -wherein the ith picture is marked as Respectively representing the abscissa and the ordinate of the jth target position, and m represents the number of targets in the picture;
step 101: will be the ith picture I i Tag l of (2) i Generating a gaussian density map den i The calculation can be performed by the following formula:
wherein the method comprises the steps ofRepresenting the coordinates, x, of a given picture j Represents the j-th labeling target position-> Representing a gaussian kernel, wherein σ 2 Is a variance term, and ensures the sum 1 of Gaussian values of all pixel points through averaging; if x is not x j Only calculating Gaussian values in a neighborhood range, normalizing the summation of the positions calculated in the neighborhood range, and ensuring that the summation of the Gaussian generated in the j-th position is 1; at this time, label l of ith picture i Conversion to den i
Step 102: sequentially executing the operation of the step 101 on the 1 st to N th pictures in the step 101, and converting the labels of the pictures into a Gaussian density map; further converting the label of the training data into a training target density map label { den } 1 ,den 2 ,...,den M };
Step 103: binarizing the target density map label generated in step 102 to generate a mask label, i.e. (den) i >0) =1, which is the conversion of a non-zero value to 1, tag { den, step 102 1 ,den 2 ,...,den M Respectively generate { bina } 1 ,bina 2 ,...,bina M };
Step 104: performing density segmentation on the target density label generated in step 102 to generate a density distribution label, namely (den) i1 )=4,(θ 1 Default to 0.9), (θ 1 ≥den i2 )=3,(θ 2 Default to 0.6),
2 ≥den i3 )=2,(θ 3 default to 0.3),
3 ≥den i4 )=1,(θ 4 default to 0), (den) i ≤θ 4 )=0;
Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { distr } 1 ,distr 2 ,...,distr M };
Step 105: dividing the population count of the target density map label generated in the step 102 into density grade labels, namely(/>Default to 150), -on->(/>Default to 100),(/>default to 50), -f>Tag { den } of step 102 1 ,den 2 ,...,den M Respectively generate { level } 1 ,level 2 ,...,level M }。
2. The method for reconstructing a target-counting network based on a multi-tasking collaborative feature according to claim 1,
step 2: establishing a target counting network based on the multi-task cooperative reconstruction characteristic; the specific model of the network is as follows:
convolution layer 1: deconvolving the image input as xxyx3 by using 24 convolution kernels of 3 x 3, and obtaining xxyx24 characteristics after a ReLU activation function;
convolution layer 2: deconvolving the output of convolution layer 1 using 22 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 2) x (y% 2) x 22 feature;
convolution layer 3: deconvolving the output of convolution layer 2 using 41 3 x 3 convolution kernels, and performing a ReLU activation function to obtain a (x% 2) × (y% 2) ×41 feature;
convolution layer 4: deconvolving the output of convolution layer 3 using 51 convolution kernels of 3 x 3, and pooling the layer with a maximum of 2 x 2 and a ReLU activation function to obtain a (x% 4) × (y% 4) ×51 signature;
convolution layer 5: deconvolving the output of convolutional layer 4 using 108 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×108 feature;
convolution layer 6: deconvolving the output of convolutional layer 5 using 89 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 4) × (y% 4) ×89 feature;
convolution layer 7: deconvolving the output of convolution layer 6 using 111 3 x 3 convolution kernels, and pooling the layer with a ReLU activation function and a maximum of 2 x 2 to obtain a (x% 8) × (y% 8) ×111 feature;
convolution layer 8: deconvolving the output of convolution layer 7 using 184 3 x 3 convolution kernels, and performing a ReLU activation function to obtain (x% 8) × (y% 8) ×184 features;
convolution layer 9: deconvolving the output of convolution layer 8 using 276 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×276 characteristic;
convolution layer 10: deconvolving the output of convolution layer 9 using 228 3 x 3 convolution kernels, and performing a ReLU activation function to obtain the (x% 8) × (y% 8) ×228 feature;
convolution layer 11_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 11_2: deconvolving the output of the convolutional layer 11_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_3: deconvolving the output of the convolutional layer 11_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 11_4: deconvolving the output of the convolutional layer 11_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 12_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 12_2: deconvolving the output of the convolutional layer 12_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_3: deconvolving the output of the convolutional layer 12_2 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 12_4: deconvolving the output of the convolutional layer 12_3 using 13×3 convolution kernels, and performing a Sigmoid activation function to obtain a feature of (x% 8) × (y% 8) ×1;
convolution layer 13_1: deconvolving the output of the convolutional layer 10 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 13_2: deconvolving the output of the convolutional layer 13_1 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 13_3: the output of the convolution layer 13_2 is deconvolved by using 256 convolution kernels of 3×3, and the characteristics of 1×1×256 are obtained after the PReLU activation function and the adaptive average pooling layer;
convolution layer 13_4: multiplying the tensor conversion output by the convolution layer 13_3 by a 256×128 linear layer, and obtaining a 1×128 feature after a Sigmoid activation function;
convolution layer 13_5: multiplying the output of the convolution layer 13_4 by a 128×4 linear layer, and obtaining a 1×4 feature after a Sigmoid activation function;
convolution layer 14_1: deconvolving the output of the convolutional layer 11_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 14_2: deconvolving the output of convolutional layer 14_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 14_3: deconvolving the output of convolutional layer 14_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 15_1: deconvolving the output of the convolutional layer 12_4 using 256 3×3 convolution kernels, and performing a PReLU activation function to obtain a characteristic of (x% 8) × (y% 8) ×256;
convolution layer 15_2: deconvolving the output of convolutional layer 15_1 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 15_3: deconvolving the output of convolutional layer 15_2 using 128 3×3 convolution kernels, and performing a PReLU activation function to obtain a feature of (x% 8) × (y% 8) ×128;
convolution layer 16_1: deconvolution of tensor conversion of the convolution layer 13_5 output using 256 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 64) × (y% 64) ×256 feature;
convolution layer 16_2: deconvolution of the output of convolution layer 16_1 using 128 3×3 convolution kernels, followed by a prerlu activation function, yields a characteristic of (x% 32) × (y% 32) ×256;
convolution layer 16_3: deconvolution of the output of convolution layer 16_2 using 128 3×3 convolution kernels, followed by a PReLU activation function to obtain a (x% 16) × (y% 16) ×128 feature;
convolution layer 16_4: deconvolution of the output of convolution layer 16_3 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 8) × (y% 8) ×128 feature;
convolution layer 17: deconvolving the output of the convolutional layer 10 using 128 3 x 3 convolution kernels, and performing a PReLU activation function to obtain (x% 8) × (y% 8) ×128 features;
aggregation layer 1: cascading the outputs of convolution layers 14_3, 15_3, 16_4, and 17 along the channel dimension to obtain the (x% 8) × (y% 8) ×512 feature;
convolution layer 18: deconvolving the output of aggregation layer 1 using 256 3 x 3 convolution kernels, and performing a PReLU activation function to obtain a (x% 8) × (y% 8) ×256 feature;
convolution layer 19: deconvolution of the output of convolutional layer 18 using 128 3 x 3 convolution kernels, followed by a PReLU activation function to obtain a (x% 4) x (y% 4) x 128 feature;
convolution layer 20: deconvolution of the output of convolution layer 19 using 64 3 x 3 convolution kernels, followed by a PReLU activation function to obtain the (x% 2) x (y% 2) x 64 feature;
convolution layer 21: deconvolution of the output of the convolutional layer 20 using 32 3×3 convolution kernels, followed by a PReLU activation function to obtain the characteristics of xxyx32;
convolution layer 22: the output of the convolutional layer 21 is deconvolved using 11×1 convolution kernel, resulting in an x×y×1 feature.
3. The method for reconstructing a target count network based on a multitasking cooperative feature of claim 1, wherein step 3: inputting the training sample in the step 1 into the convolutional network model established in the step 2, and carrying out parameter learning on the network through an Adam optimization strategy, wherein the method specifically comprises the following steps of:
step 301: the network designed by the invention adopts a multitasking mode to train network parameters, and the initial learning rate of the network is set as l;
step 302: the output of the convolution layer 11_4 is recorded as Pre_bin, the output of the convolution layer 12_4 is recorded as Pre_den, the output of the convolution layer 13_5 is recorded as Pre_level, the output of the convolution layer 22 is recorded as Pre_net, the parameters in the network are learned based on the labels given in the step 1, and the loss function is
4. The method for reconstructing a target-counting network based on a multi-tasking collaborative feature according to claim 1,
step 4: testing a deep network model; after the network is trained in the step 3, parameters of a convolution layer of the network are reserved; changing the size of the test picture to 256×256, and summing the outputs pre_net of the convolutional layer 22 in step 2 to obtain the target number of the current test picture.
CN202010430090.3A 2020-05-20 2020-05-20 Method for reconstructing target counting network based on multi-task cooperative characteristics Active CN111640092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430090.3A CN111640092B (en) 2020-05-20 2020-05-20 Method for reconstructing target counting network based on multi-task cooperative characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430090.3A CN111640092B (en) 2020-05-20 2020-05-20 Method for reconstructing target counting network based on multi-task cooperative characteristics

Publications (2)

Publication Number Publication Date
CN111640092A CN111640092A (en) 2020-09-08
CN111640092B true CN111640092B (en) 2024-01-16

Family

ID=72332117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430090.3A Active CN111640092B (en) 2020-05-20 2020-05-20 Method for reconstructing target counting network based on multi-task cooperative characteristics

Country Status (1)

Country Link
CN (1) CN111640092B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN109166100A (en) * 2018-07-24 2019-01-08 中南大学 Multi-task learning method for cell count based on convolutional neural networks
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks
CN110503014A (en) * 2019-08-08 2019-11-26 东南大学 Demographic method based on multiple dimensioned mask perception feedback convolutional neural networks
CN110705698A (en) * 2019-10-16 2020-01-17 南京林业大学 Target counting depth network design method based on scale self-adaptive perception

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN109166100A (en) * 2018-07-24 2019-01-08 中南大学 Multi-task learning method for cell count based on convolutional neural networks
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks
CN110503014A (en) * 2019-08-08 2019-11-26 东南大学 Demographic method based on multiple dimensioned mask perception feedback convolutional neural networks
CN110705698A (en) * 2019-10-16 2020-01-17 南京林业大学 Target counting depth network design method based on scale self-adaptive perception

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔高歌.基于多任务学习和密度感知神经网络的人群统计.中国优秀硕士学位论文全文数据库 信息科技辑.2019,全文. *

Also Published As

Publication number Publication date
CN111640092A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
Sadeghi-Tehran et al. DeepCount: in-field automatic quantification of wheat spikes using simple linear iterative clustering and deep convolutional neural networks
CN109800736B (en) Road extraction method based on remote sensing image and deep learning
Das et al. Leaf disease detection using support vector machine
CN107316307B (en) Automatic segmentation method of traditional Chinese medicine tongue image based on deep convolutional neural network
CN109754015B (en) Neural networks for drawing multi-label recognition and related methods, media and devices
Roscher et al. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields
CN107688856B (en) Indoor robot scene active identification method based on deep reinforcement learning
Bhagat et al. Eff-UNet++: A novel architecture for plant leaf segmentation and counting
Das Choudhury et al. Automated stem angle determination for temporal plant phenotyping analysis
CN113919442B (en) Tobacco maturity state identification method based on convolutional neural network
CN111507227B (en) Multi-student individual segmentation and state autonomous identification method based on deep learning
CN110853070A (en) Underwater sea cucumber image segmentation method based on significance and Grabcut
Pan et al. Automatic strawberry leaf scorch severity estimation via faster R-CNN and few-shot learning
Arunnehru et al. Plant leaf diseases recognition using convolutional neural network and transfer learning
CN114419468A (en) Paddy field segmentation method combining attention mechanism and spatial feature fusion algorithm
CN104573701B (en) A kind of automatic testing method of Tassel of Corn
Raj et al. Computer aided agriculture development for crop disease detection by segmentation and classification using deep learning architectures
Shen et al. Identifying veraison process of colored wine grapes in field conditions combining deep learning and image analysis
Kini et al. Techniques of deep learning and image processing in plant leaf disease detection: A review
CN111401163B (en) Target quantity statistical method based on multi-scale attention-aware convolutional network
CN111640092B (en) Method for reconstructing target counting network based on multi-task cooperative characteristics
Rony et al. BottleNet18: Deep Learning-Based Bottle Gourd Leaf Disease Classification
Wang et al. Strawberry ripeness classification method in facility environment based on red color ratio of fruit rind
Murthi et al. A semi-automated system for smart harvesting of tea leaves
Ullah et al. Quantifying consistency of crop establishment using a lightweight U-Net deep learning architecture and image processing techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant