CN112966600B - Self-adaptive multi-scale context aggregation method for crowded population counting - Google Patents
Self-adaptive multi-scale context aggregation method for crowded population counting Download PDFInfo
- Publication number
- CN112966600B CN112966600B CN202110242403.7A CN202110242403A CN112966600B CN 112966600 B CN112966600 B CN 112966600B CN 202110242403 A CN202110242403 A CN 202110242403A CN 112966600 B CN112966600 B CN 112966600B
- Authority
- CN
- China
- Prior art keywords
- scale
- representing
- context
- feature map
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 38
- 238000004220 aggregation Methods 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000007246 mechanism Effects 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims abstract description 4
- 230000003044 adaptive effect Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 12
- 239000011800 void material Substances 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 5
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a self-adaptive multi-scale context aggregation method for crowds counting, which comprises the following steps: inputting a sample picture into a backbone network, and extracting a feature map with the size j times of the resolution of an input image; inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; performing convolution layer processing on the generated multi-scale context characteristics to generate a density map; and carrying out integral summation on the density map to obtain the number of predicted people. The method effectively extracts multi-scale information, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a self-adaptive multi-scale context aggregation method for crowded crowd counting.
Background
Crowd counting is a basic task of crowd analysis based on computer vision, aimed at automatically detecting crowd conditions.
However, in crowd scenarios, tasks often encounter challenging factors such as severe occlusions, scale changes, diversity of crowd distribution, etc., especially in very crowded scenarios, where estimating crowdedness is difficult due to the visual similarity of foreground and background objects and scale changes of the head.
Networks that directly aggregate different scale context features currently exist, but not all features are useful for final population counting, and direct aggregation creates redundancy of information that can affect the performance of the counting network.
Disclosure of Invention
In view of the drawbacks of the prior art, an object of the present invention is to provide an adaptive multi-scale context aggregation method for crowding.
The invention provides a self-adaptive multi-scale context aggregation method for crowds counting, which comprises the following steps:
step 1: inputting a sample picture into a backbone network, and extracting a feature map with the size i times of the resolution of an input image;
step 2: inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and is used for converting multi-scale context characteristics into a characteristic diagram with higher resolution;
step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density map;
step 4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step 5: and integrating and summing the generated density maps to obtain the number of predicted people.
Optionally, the step 4 includes:
generating a true value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark points, wherein the calculation formula of the density map is as follows:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ Representing gaussian kernel, delta (·) representing dirac function, sigma being standard deviation, N representing the total number of people in the picture, xRepresenting the pixel points of the picture.
Optionally, the step 2 includes:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features with large-scale context features; the multi-scale context aggregation module comprises a plurality of branches with different void convolutions and different void ratios;
by usingTo represent features extracted by the i-th scale of the hole convolution; wherein i represents the void fraction of the convolution kernel, < >>Representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>The method comprises the steps of representing a feature map extracted by cavity convolution of an ith scale, wherein the resolution of the feature map is j times of that of the original feature map; w x H represents the resolution of the image, C represents the number of channels of the image, and R represents the set of all feature maps of j times the resolution;
the feature diagram extracted by the cavity convolution is input into a channel attention module, and the channel attention module adopts self-adaptive selection of a selection function fUseful context feature information in the document, and outputs a feature map Y in which the context information is aggregated j ∈R jW×jH×C Wherein Yj is defined as follows:
Y j a feature map representing j times the resolution extracted by the aggregation module,representing element-by-element summation>Representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of the feature map of scale 3,the feature map of the nth scale is extracted, and j represents that the resolution is j times that of the input picture.
Optionally, the selection is adaptive using a selection function fIncluding:
each context feature is subjected to pooling processing through a global space average pooling layer, and feature information is output
The feature information F is composed of two layers of fully connected bottleneck structure avg Processing is carried out, and output characteristics are normalized to be (0, 1) through a sigmoid function, wherein the calculation formula of the adaptive output coefficient is as follows:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
adding a residual connection between the input and output of the channel attention mechanism, the resulting selection function is defined as follows:
wherein:representing the output of the ith channel attention mechanism module,/->A feature map representing a convolution extraction of a hole representing the ith scale,/i>Representing the adaptive coefficients of the ith channel attention mechanism module.
Compared with the prior art, the invention has the following beneficial effects:
the self-adaptive multi-scale context aggregation method for crowded counting effectively extracts multi-scale information, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which is used for crowd density estimation in crowded scenes. The method mainly comprises the following steps: and inputting a picture, firstly extracting characteristic information through a backbone network, and then inputting the extracted characteristic picture into a plurality of multi-scale context aggregation modules in a cascading mode. The module firstly extracts multi-scale information by convolution kernels with different void ratios, and then self-adaptively selects channel context characteristic information through a channel attention mechanism and performs aggregation. Each time a multiscale context aggregation module is used, the feature map is converted into a feature map with higher resolution through upsampling, finally, an estimated density map is output through a convolution kernel of 1*1, and the number of people to be predicted is obtained through integral summation. The method provided by the invention effectively extracts multi-scale information through a plurality of convolution kernels with different void ratios, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.
Fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention, as shown in fig. 1, may include the following steps:
step S1: and inputting the sample picture into a backbone network, and extracting a feature map with the size i times of the resolution of the original image.
Step S2: the extracted feature images are input into a plurality of self-adaptive multi-scale context aggregation modules in a cascading mode, multi-scale context information is extracted and self-adaptively aggregated, and an up-sampling layer is arranged behind each module and used for converting multi-scale context features into feature images with higher resolution.
Step S3: the generated multi-scale context features are subjected to 1*1 convolution layer processing to generate a density map.
Step S4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step S5: and integrating and summing the density map to obtain the predicted number of people.
In this embodiment, according to the picture with the head mark point, a real density map of the crowd is generated by gaussian kernel convolution, and the pixel point with the head is expressed as x i The Gaussian kernel is denoted as G σ The true density map can be expressed as:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ The method is characterized in that the method comprises the steps of expressing a Gaussian kernel, delta (·) represents a dirac function, sigma is a standard deviation, N represents the total number of people in a picture, and x represents the pixel point of the picture.
Specifically, the adaptive multi-scale context aggregation module in step S2 is shown in fig. 1, and adaptively selects and aggregates reliable small-scale context features with large-scale context features. The specific operation is as follows:
the multi-scale context aggregation module comprises a plurality of branches with different void rate and different void convolution, and is used for To represent features extracted by the i-th scale of the hole convolution; where i represents the void fraction of the convolution kernel,representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>Representing a feature map extracted by hole convolution of the ith scale, wherein the resolution of the feature map is i times of the original resolution, W multiplied by H represents the resolution of an image, C represents the channel number of the image, and R represents the set of all feature maps with j times of the resolution; the feature map extracted by the hole convolution is then input to a channel attention module (CA) which uses a selection function f adaptive selection +.>Useful context feature information, and finally output feature map Y aggregated with the context information j ∈R jW×jH×C The definition is as follows:
wherein: y is Y j A feature map representing j times the resolution extracted by the aggregation module,representing element-by-element summation>Representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of a feature map of scale 3, < ->The feature map of the nth scale is extracted, and j represents that the resolution is j times that of the input picture.
Illustratively, the selection function f employs a channel attention mechanism for aggregating multi-scale context information, specifically operating as:
each feature is first pooled by a global spatial averaging layer (denoted F avg ) The feature is then processed with a bottleneck structure consisting of two fully connected layers, and finally the output feature is normalized to (0, 1) by a sigmoid function. The adaptive output coefficient may be expressed as:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
furthermore, for better optimization, a residual connection is added between the input and output of the channel attention mechanism, and the final selection function is defined as:
compared with the existing counting, the embodiment adopts a plurality of convolutions with different void ratios to extract multi-scale information, and self-adaptively selects and aggregates the multi-scale context information through a channel attention mechanism, so that good performance is shown in crowded scenes, and the accuracy of crowd counting is improved.
The technical scheme of the invention is described in more detail below with reference to specific embodiments. Knowing the pixel value and the label of a picture, the true value density map corresponding to the picture is obtained through gaussian convolution, which can be expressed as: wherein xi represents pixel points with human head, x represents all pixel points, G σ Expressed as gaussian kernel, δ (·) represents dirac function, σ is standard deviation, and N represents the total number of people in the picture.
The complex nonlinear mapping from the input image to the crowd estimated density map is then learned by a multi-scale context aggregation network, as follows:
the first ten layers of VGG-16 are selected as a backbone network, the pictures are input into the backbone network, the characteristic information is extracted, and the size of the characteristic diagram is 1\8 of the input image.
The extracted feature map is convolved with a convolution kernel of 3*3 and the feature information is then sent to the multi-scale context aggregation module. Firstly, extracting different scale features through a plurality of branches of cavity convolution with different cavity rates, wherein each scale feature is marked asThere are n pieces of scale information in total.
Will beThe feature information of (1) adaptively aggregates multi-scale context information through an attention module. The method comprises the steps of firstly extracting context information through a global space average pooling layer, then processing the characteristics by adopting a bottleneck structure formed by two layers which are completely connected, and finally normalizing the output characteristics into (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
finally, we directly connect the input and output of the channel attention mechanism with the residual, and the final output result is:
will beMulti-scale contextual profile selected by attention mechanisms->And the 2 nd scale information->Pixel-by-pixel summing is performed, which can be expressed as: />
Extracting the extractThe feature information is sent to the channel attention mechanism to adaptively select the context information, and the context information and the feature information of the 3 rd scale are subjected to pixel summation, and the like, so that the feature mapping which aggregates the multi-scale context information is finally obtained:
And after the multi-scale context information is extracted by the multi-scale context aggregation module, the multi-scale context information is converted into a characteristic diagram with higher resolution by upsampling. And then the obtained image is sent to a multi-scale context aggregation module to perform feature extraction in the same mode, the three multi-scale context aggregation modules are sequentially processed, and finally, an estimated density map is output through a 1*1 convolution kernel to calculate a loss function L (theta):
wherein F (I) i The method comprises the steps of carrying out a first treatment on the surface of the θ) is a density map of the output of the network, F i The method is a true density map, theta is a parameter which needs to be optimized by the network, the network continuously optimizes the parameter theta through a gradient descent method, and a parameter value which enables a loss function to be minimum is found.
It should be noted that, the steps in the adaptive multi-scale context aggregation method for crowd counting provided in the present invention may be implemented by using corresponding modules, devices, units, etc. in the adaptive multi-scale context aggregation system for crowd counting, and those skilled in the art may refer to the technical scheme of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred embodiment for implementing the method, which is not repeated herein.
Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.
Claims (2)
1. A method for adaptive multi-scale context aggregation for crowd counting, comprising:
step 1: inputting a sample picture into a backbone network, and extracting a feature map with the size j times of the resolution of an input image;
step 2: inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and is used for converting multi-scale context characteristics into a characteristic diagram with higher resolution;
step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density map;
step 4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step 5: integrating and summing the generated density maps to obtain the number of predicted people;
the step 2 comprises the following steps:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features with large-scale context features; the multi-scale context aggregation module comprises a plurality of branches with different void convolutions and different void ratios;
by usingTo represent features extracted by the i-th scale of the hole convolution; wherein i represents the void fraction of the convolution kernel, < >>Representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>The method comprises the steps of representing a feature map extracted by cavity convolution of an ith scale, wherein the resolution of the feature map is j times of that of the original feature map; w x H represents the resolution of the image, C represents the number of channels of the image, and R represents the set of all feature maps of j times the resolution;
the feature diagram extracted by the cavity convolution is input into a channel attention module, and the channel attention module adopts self-adaptive selection of a selection function fUseful context feature information in the document, and outputs a feature map Y in which the context information is aggregated j ∈R jW×jH×C Wherein Y is j The definition is as follows:
Y j a feature map representing j times resolution extracted by the aggregation module, the tie representing element-by-element summation,representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of a feature map of scale 3, < ->Representing extracting a feature map of an nth scale, j representing a resolution j times that of the input picture;
said adaptive selection using a selection function fIncluding:
each context feature is subjected to pooling processing through a global space average pooling layer, and feature information is output
The feature information F is composed of two layers of fully connected bottleneck structure avg Processing is carried out, and output characteristics are normalized to be (0, 1) through a sigmoid function, wherein the calculation formula of the adaptive output coefficient is as follows:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
adding a residual connection between the input and output of the channel attention mechanism, the resulting selection function is defined as follows:
wherein:representing the output of the ith channel attention mechanism module,/->A feature map representing a convolution extraction of a hole representing the ith scale,/i>Representing the adaptive coefficients of the ith channel attention mechanism module.
2. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 4 comprises:
generating a true value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark points, wherein the calculation formula of the density map is as follows:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ Representing gaussian kernel, delta (·) representing dirac function, sigma being standard deviation, N representing the total number of people in the picture, x representing the picturePixel points of the tile.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242403.7A CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242403.7A CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112966600A CN112966600A (en) | 2021-06-15 |
CN112966600B true CN112966600B (en) | 2024-04-16 |
Family
ID=76277443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110242403.7A Active CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966600B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120233B (en) * | 2021-11-29 | 2024-04-16 | 上海应用技术大学 | Training method of lightweight pyramid cavity convolution aggregation network for crowd counting |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263849A (en) * | 2019-06-19 | 2019-09-20 | 合肥工业大学 | A kind of crowd density estimation method based on multiple dimensioned attention mechanism |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
WO2020169043A1 (en) * | 2019-02-21 | 2020-08-27 | 苏州大学 | Dense crowd counting method, apparatus and device, and storage medium |
CN111709290A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Crowd counting method based on coding and decoding-jumping connection scale pyramid network |
CN112132023A (en) * | 2020-09-22 | 2020-12-25 | 上海应用技术大学 | Crowd counting method based on multi-scale context enhanced network |
-
2021
- 2021-03-04 CN CN202110242403.7A patent/CN112966600B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020169043A1 (en) * | 2019-02-21 | 2020-08-27 | 苏州大学 | Dense crowd counting method, apparatus and device, and storage medium |
CN110263849A (en) * | 2019-06-19 | 2019-09-20 | 合肥工业大学 | A kind of crowd density estimation method based on multiple dimensioned attention mechanism |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
CN111709290A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Crowd counting method based on coding and decoding-jumping connection scale pyramid network |
CN112132023A (en) * | 2020-09-22 | 2020-12-25 | 上海应用技术大学 | Crowd counting method based on multi-scale context enhanced network |
Non-Patent Citations (1)
Title |
---|
多层次特征融合的人群密度估计;陈朋;汤一平;王丽冉;何霞;;中国图象图形学报(第08期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112966600A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189334B (en) | Medical image segmentation method of residual error type full convolution neural network based on attention mechanism | |
CN112132023B (en) | Crowd counting method based on multi-scale context enhancement network | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN109949255B (en) | Image reconstruction method and device | |
CN110097086B (en) | Image generation model training method, image generation method, device, equipment and storage medium | |
US10339421B2 (en) | RGB-D scene labeling with multimodal recurrent neural networks | |
CN108921225B (en) | Image processing method and device, computer equipment and storage medium | |
EP3427195B1 (en) | Convolutional neural networks, particularly for image analysis | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
CN111582483B (en) | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism | |
CN111914997B (en) | Method for training neural network, image processing method and device | |
CN111062355A (en) | Human body action recognition method | |
CN110176024B (en) | Method, device, equipment and storage medium for detecting target in video | |
CN107506792B (en) | Semi-supervised salient object detection method | |
JP2014041476A (en) | Image processing apparatus, image processing method, and program | |
CN113743269B (en) | Method for recognizing human body gesture of video in lightweight manner | |
CN111815665A (en) | Single image crowd counting method based on depth information and scale perception information | |
CN112990219A (en) | Method and apparatus for image semantic segmentation | |
WO2022064656A1 (en) | Processing system, processing method, and processing program | |
CN114140346A (en) | Image processing method and device | |
CN111523548B (en) | Image semantic segmentation and intelligent driving control method and device | |
CN114781499B (en) | Method for constructing ViT model-based intensive prediction task adapter | |
CN112966600B (en) | Self-adaptive multi-scale context aggregation method for crowded population counting | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
CN114049491A (en) | Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |