CN112270213A - Improved HRnet based on attention mechanism - Google Patents

Improved HRnet based on attention mechanism Download PDF

Info

Publication number
CN112270213A
CN112270213A CN202011084171.9A CN202011084171A CN112270213A CN 112270213 A CN112270213 A CN 112270213A CN 202011084171 A CN202011084171 A CN 202011084171A CN 112270213 A CN112270213 A CN 112270213A
Authority
CN
China
Prior art keywords
channel
attention mechanism
pooling
hrnet
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011084171.9A
Other languages
Chinese (zh)
Inventor
王聪
乔元风
蒋伟
柯钦瑜
黄勇
李紫薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuanwei Beijing Biotechnology Co ltd
Original Assignee
Xuanwei Beijing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuanwei Beijing Biotechnology Co ltd filed Critical Xuanwei Beijing Biotechnology Co ltd
Priority to CN202011084171.9A priority Critical patent/CN112270213A/en
Publication of CN112270213A publication Critical patent/CN112270213A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

An improved HRnet model based on an attention mechanism is characterized in that: when inputtingFWhen the characteristic diagram is input, an attention mechanism module is added, and the following 2 operations are carried out on the attention mechanism module:
Figure DEST_PATH_IMAGE002
the invention adopting the technical scheme has the following beneficial effects: the invention adds an attention mechanism model on the basis of the original HRnet model, so that the improved HRnet is used for detecting the posture of a human body in the cardio-pulmonary resuscitation pressing action process, provides an accurate backbone network for example segmentation models such as a dummy chest and a head in the cardio-pulmonary resuscitation medical examination, and improves the detection precision of the model.

Description

Improved HRnet based on attention mechanism
Technical Field
The invention relates to an improved algorithm, in particular to an improved HRnet model based on an attention mechanism.
Background
Sudden cardiac arrest seriously threatens the life and health of people, and the survival rate of patients can be remarkably improved by carrying out cardio-pulmonary resuscitation (CPR) with high quality, and the method is also an important means for saving the lives of the patients. The American Heart Association (AHA) and the International Resuscitation Commission (ILCOR) have high quality cardiopulmonary Resuscitation as the core of Resuscitation [1 ]. At present, the conventional cardio-pulmonary resuscitation training and assessment mode is to apply a medical simulator and make a judgment by a judge. The method has several disadvantages, such as strong subjectivity of examiner judgment and not objective; in the assessment and judgment process, the specific pressing depth, frequency and the like of an examinee depend on the quality conditions of the anthropomorphic dummy, and the examiner is difficult to judge; in the training process, the trainees need to supervise and cooperate with the examinees at all times to correct and improve the self operation, and a large amount of labor cost for training and examination is consumed.
In the prior art, after the pressing image of the examinee is obtained, the pressing action is a dynamic process, so that whether the pressing posture of the examinee is qualified or not cannot be judged according to the pressing image, and the difficulty is brought to automatic judgment.
Meanwhile, when extracting image features, different models need to be segmented according to actual conditions. For each model, because the image data volume is large, the accuracy of the model is guaranteed to realize human body posture recognition better, and therefore how to provide the model accuracy is an urgent problem to be solved.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to provide the accuracy of the model, an improved HRnet model based on an attention mechanism is provided.
In order to solve the technical problems, the invention adopts the following technical scheme:
an improved HRnet model based on an attention mechanism is characterized in that: when inputtingFWhen the input feature map is used as an input feature map, adding an attention module attention block, and performing the following 2 operations on the attention module attention block:
Figure 262860DEST_PATH_IMAGE001
Figure 133995DEST_PATH_IMAGE002
the method represents the operation of attention extraction on the channel dimension, namely establishing a channel attention mechanism model,
Figure 927114DEST_PATH_IMAGE003
the method is characterized in that attention extraction operation is carried out on a spatial dimension, namely a spatial attention mechanism model is built.
The channel attention mechanism model is as follows: original feature map XinObtaining a feature map U and a feature map V through convolution operations with convolution kernels respectively having the sizes of 3X3 and 5X5, then adding the feature maps to obtain a feature map F, wherein the feature map F fuses information of a plurality of receptive fields and has the shape of [ C, H, W]C represents a channel, H represents height, W represents width, then averaging and maximum values are obtained along the dimensions H and W, and two one-dimensional vectors are obtained in total after two posing functions are carried out; then, element addition is carried out on the two one-dimensional vectors, and finally the information about the channel is a 1 multiplied by C one-dimensional vector which represents the importance degree of the information of each channel; performing linear transformation on the 1 × 1 × C one-dimensional vector, mapping the original C dimension into Z-dimensional information, then performing 2 linear transformations on the Z-dimensional one-dimensional vector, respectively, and converting the Z dimension into the original C dimension, thereby completing information extraction for channel dimensions, and then performing normalization by using Softmax, where each channel corresponds to a score, which represents the importance degree of the channel, which is equivalent to a mask; multiplying the 2 masks obtained respectively by the corresponding feature maps U and V to obtain feature maps U 'and V'; then adding the 2 modules of the characteristic diagrams U 'and V' for information fusion to obtain a final moduleXout
The spatial attention mechanism model is as follows: inputting an original feature map XinPerforming pooling characteristic, wherein the pooling characteristic comprises 3 pooling layers, namely average pooling, maximum pooling and stripe pooling, performing convolution operation of 1X1 on the pooling characteristic to realize channel dimensionality reduction and obtain a characteristic diagram with the channel number of 1, and performing Sigmoid function and input original characteristic diagram X on the characteristic diagraminPerforming element-by-element dot multiplication to obtain output Xout
The invention adopting the technical scheme has the following beneficial effects: the invention adds an attention mechanism model on the basis of the original HRnet model, so that the improved HRnet is used for detecting the posture of a human body in the cardio-pulmonary resuscitation pressing action process, provides an accurate backbone network for example segmentation models such as a dummy chest and a head in the cardio-pulmonary resuscitation medical examination, and improves the detection precision of the model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is the original HRnet model.
FIG. 2 is a diagram of an improved HRnet model according to the present invention.
FIG. 3 is a schematic diagram of an embodiment of the channel attention mechanism of the present invention.
FIG. 4 is a model diagram of a spatial attention mechanism.
Fig. 5 is a modified overall structure diagram of the HRnet.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same technical meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be further understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of the stated features, steps, operations, devices, components, and/or combinations thereof.
In the present invention, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be determined according to specific situations by persons skilled in the relevant scientific or technical field, and are not to be construed as limiting the present invention.
The HRNet is used for detecting the posture of a human body in the cardio-pulmonary resuscitation pressing action process and detecting the trunk network of example segmentation models such as dummy chests, heads and the like in the cardio-pulmonary resuscitation medical assessment, and the HRNet is optimized and improved for improving the accuracy of the models.
As shown in fig. 1, there are 4 stages in the original HRnet, and the 2 nd, 3 rd and 4 th stages are all repeated multi-resolution blocks (modulated multi-resolution blocks). Before each multiresolution module, there is a switching layer (Translation layer) where additional feature maps appear. While no additional feature maps appear for the multiresolution module (multiresolution packet convolution + multiresolution convolution). The invention improves and optimizes the HRnet and improves the detection precision. Adding an attribute block in the convolution process from the multi-resolution group conv to the multi-resolution conv so as to improve the feature expression capability of the network model. The attention can not only tell the network model what to pay attention to, but also enhance the characterization of a specific area. The structure is shown in fig. 2, and the whole frame refers to: CBAM: conditional Block Attention Module.
In FIG. 2, an attention mechanism is introduced in both the channel and spatial dimensions, when inputtingFWhen the input feature map is used as an input feature map, adding an attention block, and performing the following 2 operations on the attention block by using an attentive mechanism module:
Figure 260006DEST_PATH_IMAGE001
the output is F',
Figure 28373DEST_PATH_IMAGE002
the operation of attribute extraction on the channel dimension is shown, namely, a channel attention mechanism model is established,
Figure 202566DEST_PATH_IMAGE003
the method is characterized in that an attribute extraction operation is performed on a spatial dimension, namely a spatial attention mechanism model is built.
The channel attention mechanism model is specifically, as shown in fig. 3, an original feature map XinObtaining a U characteristic diagram and a V characteristic diagram through convolution operation with convolution kernel sizes of 3X3 and 5X5 respectively, then adding the U characteristic diagram and the V characteristic diagram to obtain a characteristic diagram F, wherein the characteristic diagram F fuses information of a plurality of receptive fields and has the shape of [ C, H, W]Wherein, C represents channel, H represents height, W represents width, then average and maximum values are obtained along H and W dimensions, two one-dimensional vectors can be obtained in total after two forcing functions, global average forcing has feedback to each pixel point on the feature map f (feature map), and global max forcing has feedback that there is gradient only where the response is maximum in the feature map f (feature map) when performing gradient back propagation calculation, and can be used as a supplement to global average forcing. Then, element addition is carried out, and finally, the information about the channel is a one-dimensional vector of 1 × 1 × C, which represents the importance degree of the information of each channel.
And then, a linear transformation is used for mapping the original C dimension into Z-dimension information, then 2 linear transformations are respectively used for changing the Z dimension into the original C dimension, so that information extraction aiming at the channel dimension is completed, then Softmax is used for normalization, and each channel corresponds to a score at this time and represents the importance degree of the channel, which is equivalent to a mask. And multiplying the 2 masks respectively obtained by the corresponding feature maps U and V to obtain feature maps U 'and V'. Then adding 2 modules for information fusion to obtain a final module XoutFinal module XoutFeature map X compared to the original feature mapinInformation of a plurality of receptive fields is fused through information extraction.
Considering the long-distance correlation of human joint points, the spatial attention mechanism model needs to effectively capture remote context information. The overall attention mechanism model is shown in FIG. 4:
raw feature map input XinThrough Pooling Feature, wherein the Pooling Feature comprises 3 Pooling layers, namely, averaging Pooling, max Pooling and Strip Pooling, the Strip Pooling refers to Strip Poling, which refers to a Reitingling Spatial Pooling for Scene matching paper, and mainly solves the problem related to target distance. Poollg Feature is subjected to convolution operation of 1X1 to realize channel dimensionality reduction, and a Feature map with the channel number of 1 is obtained, and the Feature map is subjected to a Sigmoid function and is subjected to input original Feature map XinPerforming element-wise dot multiplication to obtain outputX out
The improved HRNet overall structure is shown in fig. 5:
the Channel maps and the Attention Block are connected directly without the Upesple and Strided conv modules.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (3)

1. An improved HRnet based on attention mechanism, characterized in that: when inputtingFWhen the feature graph is used as an input feature graph, adding an attention mechanism module, and performing the following 2 operations on the attention mechanism module:
Figure DEST_PATH_IMAGE001
Figure DEST_PATH_IMAGE003
the method represents the operation of attention extraction on the channel dimension, namely establishing a channel attention mechanism model,
Figure DEST_PATH_IMAGE005
the method is characterized in that attention extraction operation is carried out on a spatial dimension, namely a spatial attention mechanism model is built.
2. An improved attention mechanism HRnet according to claim 1, wherein: the channel attention mechanism model is as follows: original feature map XinObtaining a feature map U and a feature map V through convolution operations with convolution kernels respectively having the sizes of 3X3 and 5X5, then adding the feature maps to obtain a feature map F, wherein the feature map F fuses information of a plurality of receptive fields and has the shape of [ C, H, W]C represents a channel, H represents height, W represents width, then averaging and maximum values are obtained along the dimensions H and W, and two one-dimensional vectors are obtained in total after two posing functions are carried out; then element addition is carried out on the two one-dimensional vectors, and finally the information about the channel is obtainedInformation is a 1 × 1 × C one-dimensional vector, which represents the importance of information of each channel; performing linear transformation on the 1 × 1 × C one-dimensional vector, mapping the original C dimension into Z-dimensional information, then performing 2 linear transformations on the Z-dimensional one-dimensional vector, respectively, and converting the Z dimension into the original C dimension, thereby completing information extraction for channel dimensions, and then performing normalization by using Softmax, where each channel corresponds to a score, which represents the importance degree of the channel, which is equivalent to a mask; multiplying the 2 masks obtained respectively by the corresponding feature maps U and V to obtain feature maps U 'and V'; then adding the 2 modules of the characteristic diagrams U 'and V' for information fusion to obtain a final module Xout
3. An improved attention mechanism HRnet according to claim 1, wherein: the spatial attention mechanism model is as follows: inputting an original feature map XinPerforming pooling characteristic, wherein the pooling characteristic comprises 3 pooling layers, namely average pooling, maximum pooling and stripe pooling, performing convolution operation of 1X1 on the pooling characteristic to realize channel dimensionality reduction and obtain a characteristic diagram with the channel number of 1, and performing Sigmoid function and input original characteristic diagram X on the characteristic diagraminPerforming element-by-element dot multiplication to obtain output Xout
CN202011084171.9A 2020-10-12 2020-10-12 Improved HRnet based on attention mechanism Pending CN112270213A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011084171.9A CN112270213A (en) 2020-10-12 2020-10-12 Improved HRnet based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011084171.9A CN112270213A (en) 2020-10-12 2020-10-12 Improved HRnet based on attention mechanism

Publications (1)

Publication Number Publication Date
CN112270213A true CN112270213A (en) 2021-01-26

Family

ID=74338520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011084171.9A Pending CN112270213A (en) 2020-10-12 2020-10-12 Improved HRnet based on attention mechanism

Country Status (1)

Country Link
CN (1) CN112270213A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734757A (en) * 2021-03-29 2021-04-30 成都成电金盘健康数据技术有限公司 Spine X-ray image cobb angle measuring method
CN113034545A (en) * 2021-03-26 2021-06-25 河海大学 Vehicle tracking method based on CenterNet multi-target tracking algorithm
CN115100545A (en) * 2022-08-29 2022-09-23 东南大学 Target detection method for small parts of failed satellite under low illumination

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610129A (en) * 2019-08-05 2019-12-24 华中科技大学 Deep learning face recognition system and method based on self-attention mechanism
CN111476184A (en) * 2020-04-13 2020-07-31 河南理工大学 Human body key point detection method based on double-attention machine system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610129A (en) * 2019-08-05 2019-12-24 华中科技大学 Deep learning face recognition system and method based on self-attention mechanism
CN111476184A (en) * 2020-04-13 2020-07-31 河南理工大学 Human body key point detection method based on double-attention machine system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SANGHYUN WOO 等: "CBAM: Convolutional Block Attention Module", 《ECCV 2018: COMPUTER VISION – ECCV 2018》 *
何凯等: "基于多尺度特征融合与反复注意力机制的细粒度图像分类算法", 《天津大学学报(自然科学与工程技术版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034545A (en) * 2021-03-26 2021-06-25 河海大学 Vehicle tracking method based on CenterNet multi-target tracking algorithm
CN112734757A (en) * 2021-03-29 2021-04-30 成都成电金盘健康数据技术有限公司 Spine X-ray image cobb angle measuring method
CN112734757B (en) * 2021-03-29 2021-06-25 成都成电金盘健康数据技术有限公司 Spine X-ray image cobb angle measuring method
CN115100545A (en) * 2022-08-29 2022-09-23 东南大学 Target detection method for small parts of failed satellite under low illumination

Similar Documents

Publication Publication Date Title
CN112270213A (en) Improved HRnet based on attention mechanism
CN109410261B (en) Monocular image depth estimation method based on pyramid pooling module
CN112052886A (en) Human body action attitude intelligent estimation method and device based on convolutional neural network
CN112434655B (en) Gait recognition method based on adaptive confidence map convolution network
CN109166130A (en) A kind of image processing method and image processing apparatus
Leclerc et al. LU-Net: a multistage attention network to improve the robustness of segmentation of left ventricular structures in 2-D echocardiography
CN110827304B (en) Traditional Chinese medicine tongue image positioning method and system based on deep convolution network and level set method
CN106204779A (en) The check class attendance method learnt based on plurality of human faces data collection strategy and the degree of depth
CN112580515B (en) Lightweight face key point detection method based on Gaussian heat map regression
CN110060286B (en) Monocular depth estimation method
CN110838140A (en) Ultrasound and nuclear magnetic image registration fusion method and device based on hybrid supervised learning
CN112541433B (en) Two-stage human eye pupil accurate positioning method based on attention mechanism
CN112001122A (en) Non-contact physiological signal measuring method based on end-to-end generation countermeasure network
CN105139000A (en) Face recognition method and device enabling glasses trace removal
CN112597847B (en) Face pose estimation method and device, electronic equipment and storage medium
CN112149613B (en) Action pre-estimation evaluation method based on improved LSTM model
CN116631064A (en) 3D human body posture estimation method based on complementary enhancement of key points and grid vertexes
CN109559278A (en) Super resolution image reconstruction method and system based on multiple features study
CN113505719A (en) Gait recognition model compression system and method based on local-integral joint knowledge distillation algorithm
CN112183419A (en) Micro-expression classification method based on optical flow generation network and reordering
CN117409002A (en) Visual identification detection system for wounds and detection method thereof
CN112200065B (en) Micro-expression classification method based on action amplification and self-adaptive attention area selection
CN113810683A (en) No-reference evaluation method for objectively evaluating underwater video quality
CN117351957A (en) Lip language image recognition method and device based on visual tracking
CN116758220A (en) Single-view three-dimensional point cloud reconstruction method based on conditional diffusion probability model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210126

RJ01 Rejection of invention patent application after publication