CN112784685A - Crowd counting method and system based on multi-scale guiding attention mechanism network - Google Patents

Crowd counting method and system based on multi-scale guiding attention mechanism network Download PDF

Info

Publication number
CN112784685A
CN112784685A CN202011580568.7A CN202011580568A CN112784685A CN 112784685 A CN112784685 A CN 112784685A CN 202011580568 A CN202011580568 A CN 202011580568A CN 112784685 A CN112784685 A CN 112784685A
Authority
CN
China
Prior art keywords
attention
scale
crowd
feature
guiding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011580568.7A
Other languages
Chinese (zh)
Other versions
CN112784685B (en
Inventor
吕蕾
顾玲玉
谢锦阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202011580568.7A priority Critical patent/CN112784685B/en
Publication of CN112784685A publication Critical patent/CN112784685A/en
Application granted granted Critical
Publication of CN112784685B publication Critical patent/CN112784685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The utility model provides a crowd counting method and system based on a multi-scale guiding attention mechanism network, which is used for acquiring image data to be identified; performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map; inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales; fusing the attention feature maps under all scales, performing density regression on the fused feature maps to obtain a crowd density map, and obtaining crowd counts according to the crowd density map; according to the method and the device, richer multi-scale contextual feature information is captured by adopting a multi-scale guiding attention mechanism, local features and corresponding global dependency relations can be integrated, important channel information is highlighted in a self-adaptive mode, and the crowd counting precision is greatly improved.

Description

Crowd counting method and system based on multi-scale guiding attention mechanism network
Technical Field
The disclosure relates to the technical field of computer vision image processing, in particular to a crowd counting method and system based on a multi-scale guiding attention mechanism network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the development of the technology level to a new height, the life quality of people is gradually improved. People often take part in large activities, and in the scene, the accompanying potential safety hazards such as crowding, treading and the like bring huge threats to lives and properties of people. Therefore, security measures in high-density crowd distribution places are key problems for guaranteeing life and property safety of people. Therefore, the research of the crowd counting problem is more and more intense, and if the crowd density of the current scene can be accurately estimated and the rapid change of the crowd can be timely detected, the public traffic dispatching can be optimized and corresponding security measures can be arranged, so that the purpose of effectively reducing or avoiding the occurrence of the events can be achieved.
In recent years, there has been a tremendous progress in the counting of people based on computer vision. The purpose of the population count is to predict the number of people present in the image. Algorithms developed for population counting have a variety of applications, such as video and traffic monitoring, agricultural monitoring (plant counting), cell counting, scene understanding, city planning and environmental surveys. The field of computer vision has handled this task in various ways: early work counted based on the output of body or head detectors, or learned the mapping of global or local features of images to predicted counts. However, these methods are only suitable for relatively sparse people. In crowded scenes, crowd counting remains a challenging task because it presents problems with variable dimensions, occlusion, changing viewing angles, background clutter, etc.
The inventors have found that some current Convolutional Neural Network (CNN) based methods attempt to solve these problems with varying degrees of success. Although convolutional neural networks have facilitated the development of population counts, these models still have some drawbacks. Firstly, using multi-scale methods, information redundancy results from similar low-level features being extracted multiple times at multiple scales, although pyramid pooling, a hole convolution pyramid, these methods may help to capture objects at different scales, the contextual dependency of all image regions is homogeneous, non-adaptive, ignoring the dependency between local feature representations and contextual information; secondly, the long-distance feature dependency cannot be extracted efficiently, which results in the inability to accurately count the population.
Disclosure of Invention
In order to solve the defects of the prior art, the disclosure provides a crowd counting method and a system based on a multi-scale guiding attention mechanism network, wherein the multi-scale guiding attention mechanism is adopted to capture richer multi-scale context feature information, so that the limitation of the conventional convolutional neural network structure is overcome, local features and corresponding global dependency relations can be integrated, and important channel information is highlighted in a self-adaptive mode; meanwhile, the additional loss between different modules ignores irrelevant information through guiding an attention mechanism, and focuses on the crowd area of the image by emphasizing relevant characteristic association, so that the precision of crowd counting is greatly improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
the disclosure provides a crowd counting method based on a multi-scale guiding attention mechanism network.
A crowd counting method based on a multi-scale attention-guiding mechanism network comprises the following steps:
acquiring image data to be identified;
performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
As some possible implementation manners, in the guiding attention mechanism model, weighted attention feature maps at different scales are obtained according to the space attention and the channel attention.
As some possible implementation manners, different loss functions are set, so that the attention mechanism model is guided to self-adjust the feature information needing attention in training.
As a further limitation, according to the obtained feature map and the multi-scale fusion feature map, in combination with the encoder-decoder and attention mechanism module of the guidance attention mechanism model, first attention loss functions on each scale are obtained, and the first attention loss functions on each scale are added to obtain a combined guidance loss.
As a further limitation, the output of the codec is guided to be consistent or nearly consistent with the input features thereof, the reconstructed feature map and the input feature map are combined to obtain second attention loss functions on each scale, and the second attention loss functions on each scale are added to obtain a combined reconstruction loss.
And as some possible implementation modes, performing concatee operation on the obtained multiple feature maps, and then performing convolution operation to generate the multi-scale fusion feature map.
As some possible implementation manners, the pixel values of the crowd density image are accumulated and summed to obtain a final crowd count value.
A second aspect of the present disclosure provides a crowd counting system based on a multi-scale guided attention mechanism network.
A crowd counting system based on a multi-scale attention-directing mechanism network, comprising:
an image acquisition module configured to: acquiring image data to be identified;
a multi-scale feature extraction module configured to: performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
a direct attention mechanism module configured to: inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
a crowd counting module configured to: and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
A third aspect of the present disclosure provides a computer readable storage medium having stored thereon a program which, when executed by a processor, performs the steps in the method for population counting based on a multi-scale guiding attention mechanism network according to the first aspect of the present disclosure.
A fourth aspect of the present disclosure provides an electronic device, comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for people counting based on a multi-scale attention-directing mechanism network according to the first aspect of the present disclosure.
Compared with the prior art, the beneficial effect of this disclosure is:
1. the method, system, medium, or electronic device described in this disclosure employs a multi-scale attention-directing mechanism to capture richer multi-scale contextual feature information, thereby overcoming the limitations of existing convolutional neural network structures, being able to integrate local features with their corresponding global dependencies, and highlighting important channel information in a self-adaptive manner.
2. According to the method, the system, the medium or the electronic equipment, the additional loss between different modules ignores irrelevant information through a guiding attention mechanism, and focuses on the crowd area of the image by emphasizing relevant feature association, so that the crowd counting precision is greatly improved.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a schematic flowchart of a crowd counting method based on a multi-scale guiding attention mechanism network according to embodiment 1 of the present disclosure.
Fig. 2 is a schematic diagram of a counting method based on a multi-scale guiding attention mechanism network provided in embodiment 1 of the present disclosure.
Fig. 3 is a schematic diagram of a module for guiding attention according to embodiment 1 of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example 1:
as shown in fig. 1, fig. 2 and fig. 3, embodiment 1 of the present disclosure provides a crowd counting method based on a multi-scale guiding attention mechanism network, which employs the multi-scale guiding attention mechanism network for crowd counting.
The multi-scale attention guiding mechanism network comprises a multi-scale feature information extraction module, an attention mechanism module and an attention guiding mechanism module.
First, the multi-scale feature information extraction module receives context information under different receptive fields. The low-level features are focused on local information, the high-level features are used for coding global information, and the multi-scale method encourages different semantic information to be coded by attention diagrams generated by different receptive fields; then, on each scale information, the attention mechanism module is guided to gradually remove the noise area and emphasize the area with the semantic meaning of the crowd target, and the attention mechanism module comprises two independent attention mechanisms which respectively process the characteristic dependency on the space and the channel, can respectively extract more extensive and richer context information, and strengthen the dependency relationship between the channels in the characteristic diagram, thereby reducing the interference caused by the background area.
Specifically, the technical method comprises the following steps:
s1: multi-scale feature information extraction
The whole network is improved based on VGG16, feature maps F0, F1, F2 and F3 with different scales are generated by Conv1, Conv2, Conv3 and Conv4, and are upsampled to the same size, namely F 'through Bilinear interpolation's. Mixing the resulting F'0,F′1,F′2,F′3Performing concatee operation and then performing convolution operation to generate a multi-scale fusion characteristic diagram FMS
FMS=conv([F′0,F′1,F′2,F′3])
Due to multi-scale features FMSFeatures of different scales are fused, and low-level features and high-level features in the extracted multi-scale feature information are mutually supplemented, so that the extracted multi-scale context information is richer.
S2: attention mechanism
The attention mechanism module explicitly establishes a spatial attention mechanism and a channel attention mechanism, and features of each position are extracted by comparison with all other positions.
For inputFeature(s)
Figure RE-GDA0002974321530000071
Channel attention was first calculated, since each channel focuses on a different feature, it was necessary to highlight those channels that focused on the population, while calculating the maximum and mean values to obtain soft attention:
c(Fi)=δ1(Fi)+δ2(Fi)
δ is a softmax normalization, and each response can be considered a detector of the population when dealing with low-level features. Taking into account delta2Returning only a single response, concentrated on a distinguishable part and ignoring the other, and δ1The positions where the detector is encouraged to treat on average, inevitably introducing noise, for which purpose c (F) is calculated in this embodimenti) To make a soft attention selection.
Spatial attention is then computed, which includes two terms: the first term is similar to the channel attention, the spatial mean matrix is calculated and normalized using softmax, and the second term LPPool solves the similarity of local blocks. We scale the channel to 1 by dot convolution and use average pooling (2 x 2) to get a representative value for each block, thus ensuring that the attention feature of each pixel is computed both locally and globally:
s(Fi)=δ(Fi)+σ(LPPool(Fi))
where σ is a sigmoid, it should be noted that using softmax in spatial averaging pooling, sigmoid is used in local attention calculations, since the response of a single location should be independent (sum to 1), while the local response is correlated with other locations.
Finally, attention weighting features are calculated
Figure RE-GDA0002974321530000072
Figure RE-GDA0002974321530000073
Where,. is the pixel multiplication c (F)i) And s (F)i) I is a matrix with values of 1, channel attention and spatial attention, respectively.
S3: attention guide mechanism module
In the guiding attention mechanism module, the attention mechanism proposed in S2 is directly used to guide the model to self-adjust the feature information needing attention in training by setting different loss.
Inputting the feature map of each scale and the multi-scale fusion feature map into an attention mechanism module to generate an attention feature map on one hand, and entering a coder-decoder on the other hand, and calculating a first attention loss:
Figure RE-GDA0002974321530000081
wherein E isi() is an encoded representation of the ith codec network,
Figure RE-GDA0002974321530000082
the attention feature generated after the ith attention mechanism module is shown, and M is the iteration number. It should be noted that it is preferable that,
Figure RE-GDA0002974321530000083
semantically guiding the characteristics of the input of the attention mechanism module, specifically, generating the reconstructed characteristic map in the first coder-decoder and the attention characteristics generated by the first attention mechanism module by matrix multiplication
Figure RE-GDA0002974321530000084
Furthermore, to ensure that the reconstructed features correspond to the features at the input of the attention mechanism module, leading the output of the codec to closely match the features of its input, the loss of the second attention is calculated:
Figure RE-GDA0002974321530000085
wherein
Figure RE-GDA0002974321530000086
Is a reconstructed feature map, i.e. E of the ith encoder-decoder networki(F)。
Since the lead attention mechanism module is applied at multiple scales, the combined lead penalty for all modules is:
Figure RE-GDA0002974321530000087
likewise, the reconstruction loss becomes:
Figure RE-GDA0002974321530000088
wherein L isRec1And LRec2To guide attention to the loss of reconstruction of the codec structures in the first and second of the modules.
The generated multi-scale feature map FMSAnd F'0,F′1,F′2,F′3Respectively carrying out concatee operation, then carrying out convolution, inputting the result into a guide attention mechanism model, and obtaining an attention characteristic diagram A under different scales0,A1,A2,A3
As=AttMods(conv([F′s,FMS]))
Wherein A issIndicating the attention characteristics, AttMod indicates each module of the guiding attention mechanism by which the additional loss between different modules causes the attention mechanism to ignore irrelevant information, and by emphasizing features related to the population, focus on the region of the population in the image.
S4: regression density map
4 characteristic maps A for guiding attention mechanism module output0,A1,A2,A3And performing fusion, and performing density regression on the fused feature map to obtain a high-quality crowd density map.
S5: population count
And accumulating and summing the density image pixel values to obtain a final numerical value of the crowd count, wherein the specific formula is as follows:
Figure RE-GDA0002974321530000091
where C is the final estimated number of people, H is the height of the density map, W is the width of the density map, P isijIs the pixel value at coordinate (i, j) of the entire density map.
Example 2:
the embodiment 2 of the present disclosure provides a crowd counting system based on a multi-scale guiding attention mechanism network, including:
an image acquisition module configured to: acquiring image data to be identified;
a multi-scale feature extraction module configured to: performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
a direct attention mechanism module configured to: inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
a crowd counting module configured to: and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
The working method of the system is the same as the crowd counting method based on the multi-scale guiding attention mechanism network provided in embodiment 1, and details are not repeated here.
Example 3:
the embodiment 3 of the present disclosure provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the program implements the steps in the crowd counting method based on the multi-scale guiding attention mechanism network according to the embodiment 1 of the present disclosure, where the steps are:
acquiring image data to be identified;
performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
The detailed steps are the same as those of the population counting method based on the multi-scale guiding attention mechanism network provided in embodiment 1, and are not described herein again.
Example 4:
the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the crowd counting method based on the multi-scale attention-guiding mechanism network according to the embodiment 1 of the present disclosure, where the steps are:
acquiring image data to be identified;
performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
The detailed steps are the same as those of the population counting method based on the multi-scale guiding attention mechanism network provided in embodiment 1, and are not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A crowd counting method based on a multi-scale attention-guiding mechanism network is characterized in that: the method comprises the following steps:
acquiring image data to be identified;
performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
2. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 1, wherein:
in the guiding attention mechanism model, weighted attention feature maps under different scales are obtained according to the space attention and the channel attention.
3. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 1, wherein:
by setting different loss functions, the attention mechanism model is guided to self-adjust the feature information needing attention in training.
4. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 3, wherein:
and according to the obtained feature map and the multi-scale fusion feature map, combining a coder-decoder of the guiding attention mechanism model and an attention mechanism module to obtain first attention loss functions on all scales, and adding the first attention loss functions on all scales to obtain combined guiding loss.
5. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 3, wherein:
and guiding the output of the coder-decoder to be consistent or nearly consistent with the input characteristics, combining the reconstructed characteristic diagram and the input characteristic diagram to obtain second attention loss functions on all scales, and adding the second attention loss functions on all scales to obtain combined reconstruction loss.
6. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 1, wherein:
and performing concatee operation on the obtained multiple feature maps, and then performing convolution operation to generate the multi-scale fusion feature map.
7. The crowd counting method based on the multi-scale attention-guiding mechanism network as claimed in claim 1, wherein:
and accumulating and summing the pixel values of the crowd density image to obtain a final crowd counting value.
8. A crowd counting system based on a multi-scale attention-guiding mechanism network is characterized in that: the method comprises the following steps:
an image acquisition module configured to: acquiring image data to be identified;
a multi-scale feature extraction module configured to: performing multi-scale feature extraction on the acquired image data to obtain a plurality of feature maps, and fusing all the feature maps to obtain a multi-scale fusion feature map;
a direct attention mechanism module configured to: inputting the acquired feature map of each scale and the multi-scale fusion feature map into a preset attention guiding mechanism model to obtain attention feature maps under different scales;
a crowd counting module configured to: and fusing the attention characteristic graphs under all scales, performing density regression on the fused characteristic graphs to obtain a crowd density graph, and obtaining the crowd count according to the crowd density graph.
9. A computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the method for population counting based on a multi-scale guided attention mechanism network according to any one of claims 1-7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the method for people counting based on a multi-scale attentive force mechanism network according to any of claims 1-7.
CN202011580568.7A 2020-12-28 2020-12-28 Crowd counting method and system based on multi-scale guiding attention mechanism network Active CN112784685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011580568.7A CN112784685B (en) 2020-12-28 2020-12-28 Crowd counting method and system based on multi-scale guiding attention mechanism network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011580568.7A CN112784685B (en) 2020-12-28 2020-12-28 Crowd counting method and system based on multi-scale guiding attention mechanism network

Publications (2)

Publication Number Publication Date
CN112784685A true CN112784685A (en) 2021-05-11
CN112784685B CN112784685B (en) 2022-08-26

Family

ID=75752918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011580568.7A Active CN112784685B (en) 2020-12-28 2020-12-28 Crowd counting method and system based on multi-scale guiding attention mechanism network

Country Status (1)

Country Link
CN (1) CN112784685B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538402A (en) * 2021-07-29 2021-10-22 燕山大学 Crowd counting method and system based on density estimation
CN113642319A (en) * 2021-07-29 2021-11-12 北京百度网讯科技有限公司 Text processing method and device, electronic equipment and storage medium
CN114120245A (en) * 2021-12-15 2022-03-01 平安科技(深圳)有限公司 Crowd image analysis method, device and equipment based on deep neural network
CN114241411A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114511636A (en) * 2022-04-20 2022-05-17 科大天工智能装备技术(天津)有限公司 Fruit counting method and system based on double-filtering attention module
CN114758206A (en) * 2022-06-13 2022-07-15 武汉珈鹰智能科技有限公司 Steel truss structure abnormity detection method and device
CN114898284A (en) * 2022-04-08 2022-08-12 西北工业大学 Crowd counting method based on feature pyramid local difference attention mechanism
CN117253184A (en) * 2023-08-25 2023-12-19 燕山大学 Foggy day image crowd counting method guided by foggy priori frequency domain attention characterization

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543695A (en) * 2018-10-26 2019-03-29 复旦大学 General density people counting method based on multiple dimensioned deep learning
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN110929869A (en) * 2019-12-05 2020-03-27 同盾控股有限公司 Attention model training method, device, equipment and storage medium
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
CN111832414A (en) * 2020-06-09 2020-10-27 天津大学 Animal counting method based on graph regular optical flow attention network
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method
WO2020222985A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN109543695A (en) * 2018-10-26 2019-03-29 复旦大学 General density people counting method based on multiple dimensioned deep learning
WO2020222985A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
CN110929869A (en) * 2019-12-05 2020-03-27 同盾控股有限公司 Attention model training method, device, equipment and storage medium
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
CN111832414A (en) * 2020-06-09 2020-10-27 天津大学 Animal counting method based on graph regular optical flow attention network
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张友梅: "基于注意力卷积神经网络的人群计数算法研究", 《基于注意力卷积神经网络的人群计数算法研究 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538402A (en) * 2021-07-29 2021-10-22 燕山大学 Crowd counting method and system based on density estimation
CN113642319A (en) * 2021-07-29 2021-11-12 北京百度网讯科技有限公司 Text processing method and device, electronic equipment and storage medium
CN113642319B (en) * 2021-07-29 2022-11-29 北京百度网讯科技有限公司 Text processing method and device, electronic equipment and storage medium
CN114120245A (en) * 2021-12-15 2022-03-01 平安科技(深圳)有限公司 Crowd image analysis method, device and equipment based on deep neural network
CN114241411A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114241411B (en) * 2021-12-15 2024-04-09 平安科技(深圳)有限公司 Counting model processing method and device based on target detection and computer equipment
CN114898284A (en) * 2022-04-08 2022-08-12 西北工业大学 Crowd counting method based on feature pyramid local difference attention mechanism
CN114898284B (en) * 2022-04-08 2024-03-12 西北工业大学 Crowd counting method based on feature pyramid local difference attention mechanism
CN114511636A (en) * 2022-04-20 2022-05-17 科大天工智能装备技术(天津)有限公司 Fruit counting method and system based on double-filtering attention module
CN114758206B (en) * 2022-06-13 2022-10-28 武汉珈鹰智能科技有限公司 Steel truss structure abnormity detection method and device
CN114758206A (en) * 2022-06-13 2022-07-15 武汉珈鹰智能科技有限公司 Steel truss structure abnormity detection method and device
CN117253184A (en) * 2023-08-25 2023-12-19 燕山大学 Foggy day image crowd counting method guided by foggy priori frequency domain attention characterization
CN117253184B (en) * 2023-08-25 2024-05-17 燕山大学 Foggy day image crowd counting method guided by foggy priori frequency domain attention characterization

Also Published As

Publication number Publication date
CN112784685B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN112784685B (en) Crowd counting method and system based on multi-scale guiding attention mechanism network
WO2021093468A1 (en) Video classification method and apparatus, model training method and apparatus, device and storage medium
CN107967451B (en) Method for counting crowd of still image
CN108805015B (en) Crowd abnormity detection method for weighted convolution self-coding long-short term memory network
Yoon et al. Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition
Tan et al. Fine-grained classification via hierarchical bilinear pooling with aggregated slack mask
CN112597941A (en) Face recognition method and device and electronic equipment
Shamsian et al. Learning object permanence from video
CN112329685A (en) Method for detecting crowd abnormal behaviors through fusion type convolutional neural network
CN115829171B (en) Pedestrian track prediction method combining space-time information and social interaction characteristics
CN113177640B (en) Discrete asynchronous event data enhancement method
CN115455130B (en) Fusion method of social media data and movement track data
Munir et al. LDNet: End-to-end lane marking detection approach using a dynamic vision sensor
CN109214253A (en) A kind of video frame detection method and device
CN115690557A (en) Construction safety early warning method and device based on attention mechanism neural network
CN111738074B (en) Pedestrian attribute identification method, system and device based on weak supervision learning
Chen et al. Modeling social interaction and intention for pedestrian trajectory prediction
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN114742112A (en) Object association method and device and electronic equipment
Jiang et al. A compatible detector based on improved YOLOv5 for hydropower device detection in AR inspection system
Kong et al. A multi-context representation approach with multi-task learning for object counting
CN117058235A (en) Visual positioning method crossing various indoor scenes
Wang et al. Multi-scale dense and attention mechanism for image semantic segmentation based on improved DeepLabv3+
CN116994114A (en) Lightweight household small target detection model construction method based on improved YOLOv8

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant