CN112966600B - Self-adaptive multi-scale context aggregation method for crowded population counting - Google Patents

Self-adaptive multi-scale context aggregation method for crowded population counting Download PDF

Info

Publication number
CN112966600B
CN112966600B CN202110242403.7A CN202110242403A CN112966600B CN 112966600 B CN112966600 B CN 112966600B CN 202110242403 A CN202110242403 A CN 202110242403A CN 112966600 B CN112966600 B CN 112966600B
Authority
CN
China
Prior art keywords
scale
representing
context
feature map
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110242403.7A
Other languages
Chinese (zh)
Other versions
CN112966600A (en
Inventor
赵怀林
梁兰军
张亚妮
周方波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN202110242403.7A priority Critical patent/CN112966600B/en
Publication of CN112966600A publication Critical patent/CN112966600A/en
Application granted granted Critical
Publication of CN112966600B publication Critical patent/CN112966600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a self-adaptive multi-scale context aggregation method for crowds counting, which comprises the following steps: inputting a sample picture into a backbone network, and extracting a feature map with the size j times of the resolution of an input image; inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; performing convolution layer processing on the generated multi-scale context characteristics to generate a density map; and carrying out integral summation on the density map to obtain the number of predicted people. The method effectively extracts multi-scale information, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.

Description

Self-adaptive multi-scale context aggregation method for crowded population counting
Technical Field
The invention relates to the technical field of data processing, in particular to a self-adaptive multi-scale context aggregation method for crowded crowd counting.
Background
Crowd counting is a basic task of crowd analysis based on computer vision, aimed at automatically detecting crowd conditions.
However, in crowd scenarios, tasks often encounter challenging factors such as severe occlusions, scale changes, diversity of crowd distribution, etc., especially in very crowded scenarios, where estimating crowdedness is difficult due to the visual similarity of foreground and background objects and scale changes of the head.
Networks that directly aggregate different scale context features currently exist, but not all features are useful for final population counting, and direct aggregation creates redundancy of information that can affect the performance of the counting network.
Disclosure of Invention
In view of the drawbacks of the prior art, an object of the present invention is to provide an adaptive multi-scale context aggregation method for crowding.
The invention provides a self-adaptive multi-scale context aggregation method for crowds counting, which comprises the following steps:
step 1: inputting a sample picture into a backbone network, and extracting a feature map with the size i times of the resolution of an input image;
step 2: inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and is used for converting multi-scale context characteristics into a characteristic diagram with higher resolution;
step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density map;
step 4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step 5: and integrating and summing the generated density maps to obtain the number of predicted people.
Optionally, the step 4 includes:
generating a true value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark points, wherein the calculation formula of the density map is as follows:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ Representing gaussian kernel, delta (·) representing dirac function, sigma being standard deviation, N representing the total number of people in the picture, xRepresenting the pixel points of the picture.
Optionally, the step 2 includes:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features with large-scale context features; the multi-scale context aggregation module comprises a plurality of branches with different void convolutions and different void ratios;
by usingTo represent features extracted by the i-th scale of the hole convolution; wherein i represents the void fraction of the convolution kernel, < >>Representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>The method comprises the steps of representing a feature map extracted by cavity convolution of an ith scale, wherein the resolution of the feature map is j times of that of the original feature map; w x H represents the resolution of the image, C represents the number of channels of the image, and R represents the set of all feature maps of j times the resolution;
the feature diagram extracted by the cavity convolution is input into a channel attention module, and the channel attention module adopts self-adaptive selection of a selection function fUseful context feature information in the document, and outputs a feature map Y in which the context information is aggregated j ∈R jW×jH×C Wherein Yj is defined as follows:
Y j a feature map representing j times the resolution extracted by the aggregation module,representing element-by-element summation>Representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of the feature map of scale 3,the feature map of the nth scale is extracted, and j represents that the resolution is j times that of the input picture.
Optionally, the selection is adaptive using a selection function fIncluding:
each context feature is subjected to pooling processing through a global space average pooling layer, and feature information is output
The feature information F is composed of two layers of fully connected bottleneck structure avg Processing is carried out, and output characteristics are normalized to be (0, 1) through a sigmoid function, wherein the calculation formula of the adaptive output coefficient is as follows:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
adding a residual connection between the input and output of the channel attention mechanism, the resulting selection function is defined as follows:
wherein:representing the output of the ith channel attention mechanism module,/->A feature map representing a convolution extraction of a hole representing the ith scale,/i>Representing the adaptive coefficients of the ith channel attention mechanism module.
Compared with the prior art, the invention has the following beneficial effects:
the self-adaptive multi-scale context aggregation method for crowded counting effectively extracts multi-scale information, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which is used for crowd density estimation in crowded scenes. The method mainly comprises the following steps: and inputting a picture, firstly extracting characteristic information through a backbone network, and then inputting the extracted characteristic picture into a plurality of multi-scale context aggregation modules in a cascading mode. The module firstly extracts multi-scale information by convolution kernels with different void ratios, and then self-adaptively selects channel context characteristic information through a channel attention mechanism and performs aggregation. Each time a multiscale context aggregation module is used, the feature map is converted into a feature map with higher resolution through upsampling, finally, an estimated density map is output through a convolution kernel of 1*1, and the number of people to be predicted is obtained through integral summation. The method provided by the invention effectively extracts multi-scale information through a plurality of convolution kernels with different void ratios, solves the problem of non-uniform size of the head of a person, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of information, can have more accurate density estimation in crowded scenes, and has higher robustness.
Fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention, as shown in fig. 1, may include the following steps:
step S1: and inputting the sample picture into a backbone network, and extracting a feature map with the size i times of the resolution of the original image.
Step S2: the extracted feature images are input into a plurality of self-adaptive multi-scale context aggregation modules in a cascading mode, multi-scale context information is extracted and self-adaptively aggregated, and an up-sampling layer is arranged behind each module and used for converting multi-scale context features into feature images with higher resolution.
Step S3: the generated multi-scale context features are subjected to 1*1 convolution layer processing to generate a density map.
Step S4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step S5: and integrating and summing the density map to obtain the predicted number of people.
In this embodiment, according to the picture with the head mark point, a real density map of the crowd is generated by gaussian kernel convolution, and the pixel point with the head is expressed as x i The Gaussian kernel is denoted as G σ The true density map can be expressed as:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ The method is characterized in that the method comprises the steps of expressing a Gaussian kernel, delta (·) represents a dirac function, sigma is a standard deviation, N represents the total number of people in a picture, and x represents the pixel point of the picture.
Specifically, the adaptive multi-scale context aggregation module in step S2 is shown in fig. 1, and adaptively selects and aggregates reliable small-scale context features with large-scale context features. The specific operation is as follows:
the multi-scale context aggregation module comprises a plurality of branches with different void rate and different void convolution, and is used for To represent features extracted by the i-th scale of the hole convolution; where i represents the void fraction of the convolution kernel,representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>Representing a feature map extracted by hole convolution of the ith scale, wherein the resolution of the feature map is i times of the original resolution, W multiplied by H represents the resolution of an image, C represents the channel number of the image, and R represents the set of all feature maps with j times of the resolution; the feature map extracted by the hole convolution is then input to a channel attention module (CA) which uses a selection function f adaptive selection +.>Useful context feature information, and finally output feature map Y aggregated with the context information j ∈R jW×jH×C The definition is as follows:
wherein: y is Y j A feature map representing j times the resolution extracted by the aggregation module,representing element-by-element summation>Representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of a feature map of scale 3, < ->The feature map of the nth scale is extracted, and j represents that the resolution is j times that of the input picture.
Illustratively, the selection function f employs a channel attention mechanism for aggregating multi-scale context information, specifically operating as:
each feature is first pooled by a global spatial averaging layer (denoted F avg ) The feature is then processed with a bottleneck structure consisting of two fully connected layers, and finally the output feature is normalized to (0, 1) by a sigmoid function. The adaptive output coefficient may be expressed as:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
furthermore, for better optimization, a residual connection is added between the input and output of the channel attention mechanism, and the final selection function is defined as:
compared with the existing counting, the embodiment adopts a plurality of convolutions with different void ratios to extract multi-scale information, and self-adaptively selects and aggregates the multi-scale context information through a channel attention mechanism, so that good performance is shown in crowded scenes, and the accuracy of crowd counting is improved.
The technical scheme of the invention is described in more detail below with reference to specific embodiments. Knowing the pixel value and the label of a picture, the true value density map corresponding to the picture is obtained through gaussian convolution, which can be expressed as: wherein xi represents pixel points with human head, x represents all pixel points, G σ Expressed as gaussian kernel, δ (·) represents dirac function, σ is standard deviation, and N represents the total number of people in the picture.
The complex nonlinear mapping from the input image to the crowd estimated density map is then learned by a multi-scale context aggregation network, as follows:
the first ten layers of VGG-16 are selected as a backbone network, the pictures are input into the backbone network, the characteristic information is extracted, and the size of the characteristic diagram is 1\8 of the input image.
The extracted feature map is convolved with a convolution kernel of 3*3 and the feature information is then sent to the multi-scale context aggregation module. Firstly, extracting different scale features through a plurality of branches of cavity convolution with different cavity rates, wherein each scale feature is marked asThere are n pieces of scale information in total.
Will beThe feature information of (1) adaptively aggregates multi-scale context information through an attention module. The method comprises the steps of firstly extracting context information through a global space average pooling layer, then processing the characteristics by adopting a bottleneck structure formed by two layers which are completely connected, and finally normalizing the output characteristics into (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
finally, we directly connect the input and output of the channel attention mechanism with the residual, and the final output result is:
will beMulti-scale contextual profile selected by attention mechanisms->And the 2 nd scale information->Pixel-by-pixel summing is performed, which can be expressed as: />
Extracting the extractThe feature information is sent to the channel attention mechanism to adaptively select the context information, and the context information and the feature information of the 3 rd scale are subjected to pixel summation, and the like, so that the feature mapping which aggregates the multi-scale context information is finally obtained:
And after the multi-scale context information is extracted by the multi-scale context aggregation module, the multi-scale context information is converted into a characteristic diagram with higher resolution by upsampling. And then the obtained image is sent to a multi-scale context aggregation module to perform feature extraction in the same mode, the three multi-scale context aggregation modules are sequentially processed, and finally, an estimated density map is output through a 1*1 convolution kernel to calculate a loss function L (theta):
wherein F (I) i The method comprises the steps of carrying out a first treatment on the surface of the θ) is a density map of the output of the network, F i The method is a true density map, theta is a parameter which needs to be optimized by the network, the network continuously optimizes the parameter theta through a gradient descent method, and a parameter value which enables a loss function to be minimum is found.
It should be noted that, the steps in the adaptive multi-scale context aggregation method for crowd counting provided in the present invention may be implemented by using corresponding modules, devices, units, etc. in the adaptive multi-scale context aggregation system for crowd counting, and those skilled in the art may refer to the technical scheme of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred embodiment for implementing the method, which is not repeated herein.
Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.

Claims (2)

1. A method for adaptive multi-scale context aggregation for crowd counting, comprising:
step 1: inputting a sample picture into a backbone network, and extracting a feature map with the size j times of the resolution of an input image;
step 2: inputting the extracted feature map into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and is used for converting multi-scale context characteristics into a characteristic diagram with higher resolution;
step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density map;
step 4: calculating a loss function between the generated density map and the true value density map, and optimizing network parameters;
step 5: integrating and summing the generated density maps to obtain the number of predicted people;
the step 2 comprises the following steps:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features with large-scale context features; the multi-scale context aggregation module comprises a plurality of branches with different void convolutions and different void ratios;
by usingTo represent features extracted by the i-th scale of the hole convolution; wherein i represents the void fraction of the convolution kernel, < >>Representing the resolution as j times the resolution of the input image, r representing the reduction rate of the backbone network, +.>The method comprises the steps of representing a feature map extracted by cavity convolution of an ith scale, wherein the resolution of the feature map is j times of that of the original feature map; w x H represents the resolution of the image, C represents the number of channels of the image, and R represents the set of all feature maps of j times the resolution;
the feature diagram extracted by the cavity convolution is input into a channel attention module, and the channel attention module adopts self-adaptive selection of a selection function fUseful context feature information in the document, and outputs a feature map Y in which the context information is aggregated j ∈R jW×jH×C Wherein Y is j The definition is as follows:
Y j a feature map representing j times resolution extracted by the aggregation module, the tie representing element-by-element summation,representing the extraction of a feature map of scale 1, < ->Representing the extraction of a feature map of scale 2, < ->Representing the extraction of a feature map of scale 3, < ->Representing extracting a feature map of an nth scale, j representing a resolution j times that of the input picture;
said adaptive selection using a selection function fIncluding:
each context feature is subjected to pooling processing through a global space average pooling layer, and feature information is output
The feature information F is composed of two layers of fully connected bottleneck structure avg Processing is carried out, and output characteristics are normalized to be (0, 1) through a sigmoid function, wherein the calculation formula of the adaptive output coefficient is as follows:
wherein:and->Respectively representing the weight coefficients of two fully connected layers, wherein the back of the first fully connected layer is provided with a RELU function, and the back of the second fully connected layer adopts a Sigmoid function, & lt + & gt>Representation->Output after the pooling layer is averaged;
adding a residual connection between the input and output of the channel attention mechanism, the resulting selection function is defined as follows:
wherein:representing the output of the ith channel attention mechanism module,/->A feature map representing a convolution extraction of a hole representing the ith scale,/i>Representing the adaptive coefficients of the ith channel attention mechanism module.
2. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 4 comprises:
generating a true value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark points, wherein the calculation formula of the density map is as follows:
wherein F is i (x) Represents a true value density map, x i Pixel point representing head of person, G σ Representing gaussian kernel, delta (·) representing dirac function, sigma being standard deviation, N representing the total number of people in the picture, x representing the picturePixel points of the tile.
CN202110242403.7A 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting Active CN112966600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242403.7A CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242403.7A CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Publications (2)

Publication Number Publication Date
CN112966600A CN112966600A (en) 2021-06-15
CN112966600B true CN112966600B (en) 2024-04-16

Family

ID=76277443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242403.7A Active CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Country Status (1)

Country Link
CN (1) CN112966600B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120233B (en) * 2021-11-29 2024-04-16 上海应用技术大学 Training method of lightweight pyramid cavity convolution aggregation network for crowd counting

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
WO2020169043A1 (en) * 2019-02-21 2020-08-27 苏州大学 Dense crowd counting method, apparatus and device, and storage medium
CN111709290A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Crowd counting method based on coding and decoding-jumping connection scale pyramid network
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020169043A1 (en) * 2019-02-21 2020-08-27 苏州大学 Dense crowd counting method, apparatus and device, and storage medium
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
CN111709290A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Crowd counting method based on coding and decoding-jumping connection scale pyramid network
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多层次特征融合的人群密度估计;陈朋;汤一平;王丽冉;何霞;;中国图象图形学报(第08期);全文 *

Also Published As

Publication number Publication date
CN112966600A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN110189334B (en) Medical image segmentation method of residual error type full convolution neural network based on attention mechanism
CN112132023B (en) Crowd counting method based on multi-scale context enhancement network
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN109949255B (en) Image reconstruction method and device
CN110097086B (en) Image generation model training method, image generation method, device, equipment and storage medium
US10339421B2 (en) RGB-D scene labeling with multimodal recurrent neural networks
CN108921225B (en) Image processing method and device, computer equipment and storage medium
EP3427195B1 (en) Convolutional neural networks, particularly for image analysis
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN111582483B (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111914997B (en) Method for training neural network, image processing method and device
CN111062355A (en) Human body action recognition method
CN110176024B (en) Method, device, equipment and storage medium for detecting target in video
CN107506792B (en) Semi-supervised salient object detection method
JP2014041476A (en) Image processing apparatus, image processing method, and program
CN113743269B (en) Method for recognizing human body gesture of video in lightweight manner
CN111815665A (en) Single image crowd counting method based on depth information and scale perception information
CN112990219A (en) Method and apparatus for image semantic segmentation
WO2022064656A1 (en) Processing system, processing method, and processing program
CN114140346A (en) Image processing method and device
CN111523548B (en) Image semantic segmentation and intelligent driving control method and device
CN114781499B (en) Method for constructing ViT model-based intensive prediction task adapter
CN112966600B (en) Self-adaptive multi-scale context aggregation method for crowded population counting
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant