CN111488834A - Crowd counting method based on multi-level feature fusion - Google Patents
Crowd counting method based on multi-level feature fusion Download PDFInfo
- Publication number
- CN111488834A CN111488834A CN202010284030.5A CN202010284030A CN111488834A CN 111488834 A CN111488834 A CN 111488834A CN 202010284030 A CN202010284030 A CN 202010284030A CN 111488834 A CN111488834 A CN 111488834A
- Authority
- CN
- China
- Prior art keywords
- crowd
- feature
- convolution
- layer
- density map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a crowd counting method based on multi-level feature fusion, which comprises the following steps: preprocessing the acquired crowd image, generating a corresponding crowd density map by utilizing label information, constructing a multi-level feature fusion crowd counting network, initializing network weight parameters, inputting the preprocessed crowd image and the crowd density map into the network, completing forward propagation, calculating loss of a forward propagation result and a real density map, updating model parameters, iterating the forward propagation and updating the model parameters to appointed times, and acquiring the crowd density map to obtain the estimated number of people. The method provided by the invention can overcome the problem of crowd scale change in the crowd counting task, and the crowd counting is more accurate.
Description
Technical Field
The invention relates to the field of image crowd counting and deep learning, in particular to a crowd counting method based on deep learning.
Background
People counting is an important problem in the field of image processing and computer vision, and aims to: a crowd density map is automatically generated from the crowd images and the number of people in the scene is estimated. The crowd counting is widely applied to the fields of traffic scheduling, safety prevention and control, city management and the like.
The traditional crowd counting method needs to carry out complex preprocessing on crowd images, needs to manually design and extract human body features, needs to re-extract features under the condition of crossing scenes, and is poor in adaptability. In recent years, the successful application of convolutional neural networks has brought about a major breakthrough to the task of population counting. Zhang [1] et al propose a convolutional neural network model suitable for crowd counting, which realizes end-to-end training without foreground segmentation and artificial design and feature extraction, obtains high-level features through multilayer convolution, and improves the performance of crowd counting in cross-scene. However, in different crowded scenes, the crowd scales are different greatly, and the density and distribution of the crowd also differ in the same image due to different distances from the camera, so that the method is lower in accuracy when processing scenes with large crowd scale differences.
In order to solve the problem of population scale variation, the attention of the existing research work is mainly focused on extracting a plurality of features with different scales to reduce the influence of the scale variation. Zhang [2] et al propose a multi-branch convolutional neural network, in which each branch is composed of convolutional kernels of different sizes, and the problem of crowd scale variation is solved by extracting features of different scales through the convolutional kernels of different branches. Cao 3 et al propose a scale aware network that solves the scale variation problem by designing feature extraction modules consisting of convolution kernels of different sizes. The above methods all solve the problem of scale variation of the crowd by extracting features of different scales through convolution kernels of different sizes. However, the scale variation of the population size in an image is continuous, and only the features of the population at discrete scales can be extracted by convolution kernels of different sizes, which ignores the population at other scales. Therefore, the problem of the scale difference of the crowd in different scenes is not completely solved.
Reference documents:
1.C.Zhang,H.Li,X.Wang,and X.Yang,Cross-Scene Crowd Counting via DeepConvolutional Neural Networks[C].Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition,2015,833-841.
2.Y.Zhang,D.Zhou,S.Chen,et al.Single-image crowd counting via multi-column convolutional neural network[C].Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition,2016,589-597.
3.X.Cao,Z.Wang,Y.Zhao,and F.Su,Scale aggregation network for accurateand efficient crowd counting[C].European Conference on Computer Vision,2018,734-750.
disclosure of Invention
The invention provides a crowd counting method based on multi-level feature fusion, which aims to solve the problem of crowd scale difference in different scenes in the prior art. The method mainly comprises the following steps:
step S1: preprocessing the acquired crowd image, and generating a corresponding crowd density map by using the labeling information;
step S2: constructing a multi-level feature fused crowd counting network;
step S3: initializing a network weight parameter;
step S4: inputting the preprocessed crowd image and the crowd density map of the S1 into a network to finish forward propagation;
step S5: calculating loss by using the result of forward propagation of S4 and a real density map, and updating model parameters;
step S6: iterating steps S4, S5 a specified number of times;
step S7: and acquiring a crowd density map to obtain the estimated number of people.
Compared with the current method for solving the crowd scale change by adopting multi-branch and multi-size convolution kernels, the invention provides a method based on multi-level feature fusion, wherein the shallow output features of the VGG16 feature extractor contained in the network comprise the spatial information and the texture information of the crowd, and the high-level output features comprise the semantic information of the crowd. The shallow features describe the spatial position of the crowd, and the high-level features provide specific details of the crowd features. The method combines the low-level features and the high-level features, can effectively solve the problem of crowd scale change, and overcomes the defect that the method adopting multi-branch and multi-size convolution kernels can only extract the crowd features with discrete scales. Compared with the existing method, the method provided by the invention is more accurate.
Drawings
Fig. 1 is a flowchart of a crowd counting method based on multi-level feature fusion according to the present invention.
Fig. 2 is a diagram of a crowd counting network structure based on multi-level feature fusion according to the present invention.
Fig. 3 is a structural diagram of a channel domain attention module of a crowd counting network based on multi-level feature fusion according to the present invention.
Detailed Description
Fig. 1 is a flowchart of a crowd counting method based on multi-level feature fusion according to the present invention. The method mainly comprises the following steps: preprocessing the acquired crowd image, generating a corresponding crowd density map by utilizing label information, constructing a multi-level feature fusion crowd counting network, initializing network weight parameters, inputting the preprocessed crowd image and the crowd density map into the network, completing forward propagation, calculating loss of a forward propagation result and a real density map, updating model parameters, iterating the forward propagation and updating the model parameters to specified times, and acquiring the crowd density map to obtain an estimated number of people, wherein the specific implementation details of each step are as follows:
step S1: preprocessing the acquired crowd image, and generating a corresponding crowd density map by using the labeling information, wherein the specific mode is as follows:
step S11: the collected crowd image is subjected to centralization processing, specifically, the average value corresponding to the channel is subtracted from the elements on the three channels of the image R, G and B, and then the average value is divided by the standard deviation corresponding to the channel, wherein the average value corresponding to the three channels of R, G and B is (0.485,0.456,0.406), and the corresponding standard deviation is (0.229,0.224, 0.225).
Step S12: and generating a position matrix for the provided labeling information, wherein the specific mode is that a matrix with the elements which are the same as the corresponding image resolution and are all 0 is created, and the elements at the corresponding positions of the matrix are set to be 1 according to the coordinates provided by the labeling information.
And step S13, randomly cutting the centralized crowd image and the corresponding position matrix into image blocks and matrixes with fixed sizes, wherein the cutting size is 400 × 400 in the specific embodiment of the invention.
Step S14: and performing convolution operation on the two-dimensional Gaussian convolution kernels and elements with the size of 1 in the position matrix to generate the crowd density map.
And step S15, the density map generated in the step S14 is down-sampled to 200 × 200 resolution, specifically, the density map is convolved by taking steps as 2 through a convolution kernel with 2 × 2 parameters being 1.
Step S2: a multi-level feature fusion crowd counting network is constructed, as shown in fig. 2, in a specific manner as follows:
step S21: a VGG16 network was built that did not contain a full connectivity layer.
And S22, building a channel domain attention module, as shown in FIG. 3, specifically, building a global average pooling layer on the channel domain, pooling input features X into features of 1 × 1 × C, adding two full connection layers behind the pooling layer, wherein the number of neurons is C/4 and C respectively, building a Sigmoid activation layer behind the two full connection layers, and performing element multiplication operation on the activation layer output and the input features X to obtain the output of the channel domain attention module.
Step S23: outputting characteristics X of the fifth layer to the fourth layer of the VGG16 network constructed in the step S2150,X40Performing feature fusion by outputting the fifth layer with the feature X50Performing an upsampling operation (the amplification factors of the upsampling layer are all 2 in the invention), and combining the upsampled characteristics with the output characteristics X of the fourth layer40Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, and inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 256 of 3 × 3 to obtain the output characteristic X of the convolution block41。
Step S24: outputting characteristics X of the fourth layer to the third layer of the VGG16 network constructed in the step S2140,X30And the feature X obtained in step S2341Performing feature fusion by combining the features X40Up-sampling is carried out, and the up-sampled result and the characteristic X are30Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 128 of 3 × 3 to obtain characteristics X31The feature X41Performing an upsampling operation to obtain a feature X32The feature X31And feature X32Performing splicing operation on a channel domain, inputting the spliced characteristics into a channel domain attention module, and inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 128 of 3 × 3 to obtain the output characteristics X of the convolution block33。
Step S25: outputting characteristics X from the third layer to the second layer of the VGG16 network constructed in the step S2130,X20And the feature X obtained in step S2431,X33Performing feature fusion by combining the features X30Performing an upsampling operation, and comparing the upsampled feature with the feature X20Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 to obtain characteristics X21The feature X31Performing an upsampling operation to obtain a feature X22The feature X21And feature X22Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 and obtaining the output characteristics X of the convolution block23The feature X33Performing an upsampling operation to obtain a feature X24The feature X23And feature X24And performing splicing operation on a channel domain, inputting the spliced features into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 and a convolution layer with the channel number of 3 × 3 being 32, and inputting the output of the convolution block into a convolution layer with the channel number of 1 × 1 being 1, so as to complete the construction of the crowd counting network with multi-level feature fusion.
Step S3, initializing network weight parameters, specifically, for the crowd counting network obtained in step S2, the initial value of the feature extractor VGG16 is the classification weight of ImageNet of VGG16 not including the full connection layer, and other convolutional layers and the full connection layer all adopt positive-too-distribution initialization parameters, wherein: μ ═ 0 and σ ═ 0.01.
And step S4, inputting the crowd image and the crowd density map preprocessed in the step S1 into a network to finish forward propagation.
And S5, calculating loss by the result of forward propagation in the step S4 and the real density map of the input network, and updating model parameters in the following specific mode:
step S51 calculating mean square error loss L of the result of forward propagation and the true density mapMSEThe concrete mode is as follows:
where N represents the number of samples of input data that are propagated forward at one time, where N is 8 in the present invention,a density map representing the current ith data forward propagation computation,representing the true density map of the current ith datum.
Step S52, loss L calculated in the step S51MSEAnd updating the model parameters by using a random gradient descent method.
And step S6, iterating the steps S4 and S5 to a specified number of times, wherein the iteration number is 50 times.
And step S7, acquiring the crowd density map to obtain the estimated number of people, wherein the specific mode is that the number of people contained in the crowd image is obtained by summing all pixels in the crowd density map calculated by the model.
Compared with the current method for solving the crowd scale change by adopting multi-branch and multi-size convolution kernels, the invention provides a method based on multi-level feature fusion, wherein the shallow output features of the VGG16 feature extractor contained in the network comprise the spatial information and the texture information of the crowd, and the high-level output features comprise the semantic information of the crowd. The shallow features describe the spatial position of the crowd, and the high-level features provide specific details of the crowd features. The method combines the low-level features and the high-level features, can effectively solve the problem of crowd scale change, and overcomes the defect that the method adopting multi-branch and multi-size convolution kernels can only extract the crowd features with discrete scales. Compared with the existing method, the method provided by the invention is more accurate.
Claims (1)
1. A crowd counting method based on multi-level feature fusion is characterized by specifically comprising the following steps:
step S1: preprocessing the acquired crowd image, and generating a corresponding crowd density map by using the labeling information, wherein the specific mode is as follows:
step S11: centralizing the acquired crowd image, specifically, subtracting an average value corresponding to a channel from elements on three channels of R, G and B of the image, and dividing the average value by a standard deviation corresponding to the channel, wherein the average value corresponding to the three channels of R, G and B is (0.485,0.456,0.406), and the corresponding standard deviation is (0.229,0.224, 0.225);
step S12: generating a position matrix for the provided labeling information, wherein the specific mode is that a matrix with the elements which are the same as the corresponding image resolution and are all 0 is created, and the elements at the corresponding positions of the matrix are set to be 1 according to the coordinates provided by the labeling information;
step S13, randomly cutting image blocks and matrixes with fixed sizes from the centralized crowd images and the corresponding position matrixes, wherein in the specific embodiment of the invention, the cutting size is 400 × 400;
step S14: generating a corresponding crowd density map by convolving the position matrix through a Gaussian kernel in a specific mode that two one-dimensional Gaussian convolution kernels are generated, wherein mu is 15, and sigma is 4, transposing one of the Gaussian convolution kernels and multiplying the other Gaussian convolution kernel to obtain a two-dimensional Gaussian convolution kernel, and performing convolution operation on the two-dimensional Gaussian convolution kernel and an element with the size of 1 in the position matrix to generate the crowd density map;
step S15, down-sampling the density map generated in the step S14 to 200 × 200 resolution, specifically, performing convolution operation on the density map by taking the step size as 2 by using convolution kernels with 2 × 2 parameters as 1;
step S2: the method comprises the following steps of constructing a multi-level feature fusion crowd counting network, and specifically comprising the following steps:
step S21: building a VGG16 network which does not contain a full connection layer;
step S22, constructing a channel domain attention module, wherein the specific method is that a global average pooling layer on the channel domain is constructed, an input feature X is pooled into a feature of 1 × 1 × C, two full connection layers are added behind the pooling layer, the number of neurons is C/4 and C respectively, a Sigmoid activation layer is constructed behind the two full connection layers, and element multiplication operation is carried out on the output of the activation layer and the input feature X to obtain the output of the channel domain attention module;
step S23: outputting characteristics X of the fifth layer to the fourth layer of the VGG16 network constructed in the step S2150,X40Performing feature fusion by outputting the fifth layer with the feature X50Performing an upsampling operation (the amplification factors of the upsampling layer are all 2 in the invention), and combining the upsampled characteristics with the output characteristics X of the fourth layer40Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, and inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 256 of 3 × 3 to obtain the output characteristic X of the convolution block41;
Step S24: outputting characteristics X of the fourth layer to the third layer of the VGG16 network constructed in the step S2140,X30And the feature X obtained in step S2341Performing feature fusion by combining the features X40Up-sampling is carried out, and the up-sampled result and the characteristic X are30Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 128 of 3 × 3 to obtain characteristics X31The feature X41Performing an upsampling operation to obtain a feature X32The feature X31And feature X32Performing splicing operation on a channel domain, inputting the spliced characteristics into a channel domain attention module, and inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 128 of 3 × 3 to obtain the output characteristics X of the convolution block33;
Step S25: outputting characteristics X from the third layer to the second layer of the VGG16 network constructed in the step S2130,X20And the feature X obtained in step S2431,X33Performing feature fusion by combining the features X30Performing an upsampling operation, and comparing the upsampled feature with the feature X20Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 to obtain characteristics X21The feature X31Performing an upsampling operation to obtain a feature X22The feature X21And feature X22Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 and obtaining the output characteristics X of the convolution block23The feature X33Performing an upsampling operation to obtain a feature X24The feature X23And feature X24Performing splicing operation on a channel domain, inputting spliced features into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the channel number of 3 × 3 being 64 and a convolution layer with the channel number of 3 × 3 being 32, inputting the output of the convolution block into a convolution layer with the channel number of 1 × 1 being 1, and completing the construction of a multi-level feature fusion crowd counting network;
step S3, initializing network weight parameters, specifically, for the crowd counting network obtained in step S2, the initial value of the feature extractor VGG16 is the classification weight of ImageNet of VGG16 not including the full connection layer, and other convolutional layers and the full connection layer all adopt positive-too-distribution initialization parameters, wherein: μ ═ 0, σ ═ 0.01;
step S4, inputting the crowd image and the crowd density map preprocessed in the step S1 into a network to finish forward propagation;
and S5, calculating loss by the result of forward propagation in the step S4 and the real density map of the input network, and updating model parameters in the following specific mode:
step S51 calculating mean square error loss L of the result of forward propagation and the true density mapMSEThe concrete mode is as follows:
where N represents the number of samples of input data that are propagated forward at one time, where N is 8 in the present invention,a density map representing the current ith data forward propagation computation,a true density map representing the current ith datum;
step S52, loss L calculated in the step S51MSEUpdating the model parameters by using a random gradient descent method;
step S6, iterating the steps S4 and S5 to the specified times, wherein the iteration times are 50 times;
and step S7, acquiring the crowd density map to obtain the estimated number of people, wherein the specific mode is that the number of people contained in the crowd image is obtained by summing all pixels in the crowd density map calculated by the model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284030.5A CN111488834B (en) | 2020-04-13 | 2020-04-13 | Crowd counting method based on multi-level feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284030.5A CN111488834B (en) | 2020-04-13 | 2020-04-13 | Crowd counting method based on multi-level feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111488834A true CN111488834A (en) | 2020-08-04 |
CN111488834B CN111488834B (en) | 2023-07-04 |
Family
ID=71792806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010284030.5A Active CN111488834B (en) | 2020-04-13 | 2020-04-13 | Crowd counting method based on multi-level feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111488834B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801340A (en) * | 2020-12-16 | 2021-05-14 | 北京交通大学 | Crowd density prediction method based on multilevel city information unit portrait |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301387A (en) * | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
CN109271960A (en) * | 2018-10-08 | 2019-01-25 | 燕山大学 | A kind of demographic method based on convolutional neural networks |
CN109598220A (en) * | 2018-11-26 | 2019-04-09 | 山东大学 | A kind of demographic method based on the polynary multiple dimensioned convolution of input |
CN109903339A (en) * | 2019-03-26 | 2019-06-18 | 南京邮电大学 | A kind of video group personage's position finding and detection method based on multidimensional fusion feature |
CN110705344A (en) * | 2019-08-21 | 2020-01-17 | 中山大学 | Crowd counting model based on deep learning and implementation method thereof |
-
2020
- 2020-04-13 CN CN202010284030.5A patent/CN111488834B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301387A (en) * | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
CN109271960A (en) * | 2018-10-08 | 2019-01-25 | 燕山大学 | A kind of demographic method based on convolutional neural networks |
CN109598220A (en) * | 2018-11-26 | 2019-04-09 | 山东大学 | A kind of demographic method based on the polynary multiple dimensioned convolution of input |
CN109903339A (en) * | 2019-03-26 | 2019-06-18 | 南京邮电大学 | A kind of video group personage's position finding and detection method based on multidimensional fusion feature |
CN110705344A (en) * | 2019-08-21 | 2020-01-17 | 中山大学 | Crowd counting model based on deep learning and implementation method thereof |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801340A (en) * | 2020-12-16 | 2021-05-14 | 北京交通大学 | Crowd density prediction method based on multilevel city information unit portrait |
CN112801340B (en) * | 2020-12-16 | 2024-04-26 | 北京交通大学 | Crowd density prediction method based on multi-level city information unit portraits |
Also Published As
Publication number | Publication date |
---|---|
CN111488834B (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112541503B (en) | Real-time semantic segmentation method based on context attention mechanism and information fusion | |
CN113344806A (en) | Image defogging method and system based on global feature fusion attention network | |
CN111815665B (en) | Single image crowd counting method based on depth information and scale perception information | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
CN112967218A (en) | Multi-scale image restoration system based on wire frame and edge structure | |
CN113449735B (en) | Semantic segmentation method and device for super-pixel segmentation | |
CN111833360B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN107506792B (en) | Semi-supervised salient object detection method | |
CN111640116B (en) | Aerial photography graph building segmentation method and device based on deep convolutional residual error network | |
CN108921850B (en) | Image local feature extraction method based on image segmentation technology | |
CN116258757A (en) | Monocular image depth estimation method based on multi-scale cross attention | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN112837320B (en) | Remote sensing image semantic segmentation method based on parallel hole convolution | |
CN116797787A (en) | Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network | |
CN115482518A (en) | Extensible multitask visual perception method for traffic scene | |
CN111476133A (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN114943893A (en) | Feature enhancement network for land coverage classification | |
CN114926734A (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
CN115471718A (en) | Construction and detection method of lightweight significance target detection model based on multi-scale learning | |
CN111488834B (en) | Crowd counting method based on multi-level feature fusion | |
CN115049945A (en) | Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image | |
CN116543165B (en) | Remote sensing image fruit tree segmentation method based on dual-channel composite depth network | |
CN111275076B (en) | Image significance detection method based on feature selection and feature fusion | |
CN113553949A (en) | Tailing pond semantic segmentation method based on photogrammetric data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |