CN115359571A - Online cross-channel interactive parallel distillation framework attitude estimation method and device - Google Patents
Online cross-channel interactive parallel distillation framework attitude estimation method and device Download PDFInfo
- Publication number
- CN115359571A CN115359571A CN202211061531.2A CN202211061531A CN115359571A CN 115359571 A CN115359571 A CN 115359571A CN 202211061531 A CN202211061531 A CN 202211061531A CN 115359571 A CN115359571 A CN 115359571A
- Authority
- CN
- China
- Prior art keywords
- channel
- characteristic diagram
- feature
- network
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 41
- 238000004821 distillation Methods 0.000 title claims abstract description 30
- 238000010586 diagram Methods 0.000 claims abstract description 53
- 238000001514 detection method Methods 0.000 claims abstract description 37
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000013507 mapping Methods 0.000 claims abstract description 9
- 230000009466 transformation Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013140 knowledge distillation Methods 0.000 claims description 9
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of artificial intelligence computer vision, and particularly provides an online cross-channel interactive parallel distillation architecture attitude estimation method which comprises the steps of firstly, acquiring an external video stream by a video acquisition device, segmenting the video stream into frames, inputting the frames into a feature extraction network, and extracting features; the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body; transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels. Compared with the prior art, the method and the device have the advantages that the relevance of the channel characteristic information and the spatial characteristic information is considered, and the expression capacity of the required characteristic information is improved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence computer vision, and particularly provides an online cross-channel interactive parallel distillation architecture attitude estimation method and device.
Background
Human posture estimation is an important technical field in artificial intelligence computer vision. Human-computer interaction can be better realized by estimating the behavior of people in the scene. At present, human posture estimation is commonly used in worker violation operation detection, security protection field and VR wearable equipment. The accuracy of extracting the human body detection frame by the target detection model in the human body posture estimation algorithm is crucial to the accuracy and stability of positioning the human body key points.
The existing attention model does not consider the relevance of the channel characteristic information and the spatial characteristic information, so that the accuracy is not high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an online cross-channel interactive parallel distillation framework attitude estimation method with strong practicability.
The invention further provides a device for estimating the attitude of the online cross-channel interactive parallel distillation framework, which is reasonable in design, safe and applicable.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for estimating attitude of an online cross-channel interactive parallel distillation framework comprises the steps that firstly, a video acquisition device acquires external video streams, and the video streams are segmented into frames and input into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human body key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human body key point labels.
Further, the feature extraction network is designed into a CSP structure, a cross-channel interactive attention mechanism is introduced, the cross-channel interactive attention mechanism combines channel attention and space attention, the covariance matrix is used for calculating the similarity of every two channels of the feature map in the channel attention model, and the channels with high similarity are fused;
the second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
Further, if the covariance calculation value of the two channels is a negative number, negative correlation is represented, if the covariance calculation value is 0, the two channels are independent from each other and are not correlated with each other, and if the covariance calculation value is a positive number, the positive correlation of the two channels is represented for feature fusion;
first, the mean value of each channel is calculated as shown in formula (1):
The variance for each channel is calculated as shown in equation (2):
the covariance between channels C1, C2 is calculated as shown in equation (3):
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only covariance is made among different channels, and pixel-by-pixel fusion is carried out on the channels with positive correlation according to the covariance value.
Further, the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and detection frames are marked to obtain feature data of the target human body;
the YOLOV5 target detection model was trained using the public data set msco 2017. And randomly extracting the MSCOCO 2017 data set according to a preset proportion, and performing data enhancement pretreatment operation on the sample data.
Preferably, the data enhancement mode includes performing multi-angle rotation on the image, dividing the rotation angle into 30 degrees, randomly masking the image according to the probability P, setting the pixel value under the mask to be 0, turning the image up, down, left and right, performing distortion processing of different degrees on the image, and performing color disturbance on the image.
Further, the channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using the SoftMax function;
and the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
Further, feature extraction is carried out on each channel of the feature map by using a Depth-Wise method to obtain a feature value matrix of each channel, and cross-channel feature fusion is carried out.
Furthermore, an online parallel knowledge distillation method is carried out in a posture detection model Faster-Pose, the online parallel knowledge distillation method continues to use a knowledge distillation frame based on Teacher-Student (Teacher-Student) on a network structure, a Teacher network consists of 8 Hourglass feature extraction modules, and a Student network consists of 4 Hourglass feature extraction modules;
training a Teacher network by using an MSCOCO 2017 data set, training a Student network by using a part of data set with labels, calculating loss of a Teacher network characteristic diagram and Student network characteristic diagram by using KL divergence in the training process, fusing information of the Teacher characteristic diagram and information of the Student characteristic diagram according to channel similarity, and training the Teacher and the Student network in parallel in the training process;
in the reasoning process, a Teacher network direct reasoning Student network is removed, a cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, the cross-channel interactive attention mechanism endows a characteristic diagram in the Teacher network with different weight information, and the calculation processes of the TEAcher network characteristic diagram and the Student network characteristic diagram are shown in a formula (4):
wherein,respectively representing a characteristic diagram extracted by the second Hourglass module of the Teacher network and a characteristic diagram extracted by the first Hourglass module of the Student network;
the total profile loss is shown in equation (5):
the final loss function of the Faster-Pose attitude estimation model is shown as the formula (6):
Further, the data information of the human key point Heat Map output by the Faster-Pose attitude detection model is mapped to the original characteristic diagram by using a linear interpolation method, and the pixel point deviation in the mapping process is corrected by using the trilinear interpolation.
An online cross-channel interactive parallel distillation framework attitude estimation device comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform an online cross-channel interactive parallel distillation architecture pose estimation method.
Compared with the prior art, the online cross-channel interactive parallel distillation architecture attitude estimation method and device provided by the invention have the following outstanding beneficial effects:
the invention provides a cross-channel interactive attention mechanism and a new attitude detection model Faster-Pose. In the feature extraction stage, channel attention is used to detect which channels of the feature map contain the feature expression of the required information, and spatial attention is used to detect which position of the feature map has the required feature information. In the invention, the feature information extracted by the spatial attention and the feature information extracted by the channel attention are fused, and the expression capability of the required feature information is improved by considering the relevance of the channel feature information and the spatial feature information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 2 is a schematic diagram of a gesture prediction algorithm model framework in an online cross-channel interactive parallel distillation architecture gesture prediction method;
FIG. 3 is a schematic diagram of a preprocessing flow of an MSCOCO 2017 data set in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 4 is a schematic diagram of a YOLOV5 target detection algorithm framework in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 5 is a schematic diagram of a C-CIAM attention mechanism framework in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 6 is a schematic diagram of a Faster-Pose attitude detection model in an online cross-channel interactive parallel distillation architecture attitude estimation method.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
as shown in fig. 1 to 6, in the method for estimating the attitude of the online cross-channel interactive parallel distillation architecture in the embodiment, first, a video acquisition device acquires an external video stream, divides the video stream into frames, and inputs the frames into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and detection frames are marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels.
The video acquisition device in this embodiment is a 4-channel 2D camera, and the video stream is segmented into frames at a rate of 60 frames per second and input to the feature extraction network for feature extraction.
The feature extraction network designs a CSP structure by referring to a CSPNet network, and introduces a cross-channel interactive attention mechanism. The cross-channel interactive attention mechanism combines channel attention and spatial attention, the covariance matrix is used for calculating the similarity of every two channels of the feature map in the channel attention model, and the channels with high similarity are fused. The second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
And if the covariance calculation value of the two channels is a negative number, the two channels are in negative correlation, if the covariance calculation value is 0, the two channels are independent from each other, and if the covariance calculation value is a positive number, the two channels are in positive correlation for feature fusion.
First, the mean value of each channel is calculated as shown in formula (1):
The variance for each channel is calculated as shown in equation (2):
the variance of all channels is noted as:the covariance between channels C1, C2 is calculated as shown in equation (3):
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only covariance is made among different channels, and pixel-by-pixel fusion is carried out on the channels with positive correlation according to the covariance value.
And (3) conveying the extracted features to a YOLOV5 target detection model, detecting the position of the target human body in each frame of image and marking a detection frame to obtain feature data of the target human body. The YOLOV5 target detection model was trained using the public data set msco 2017. And randomly extracting the MSCOCO 2017 data set according to a preset proportion, and performing data enhancement pretreatment operation on the sample data.
The channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using the SoftMax function. And the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
And (3) extracting the characteristics of each channel of the characteristic diagram by using a Depth-Wise method to obtain a characteristic value matrix of each channel, fusing the probability matrix and the original characteristic diagram by using a product mode respectively, and adding weight information to the characteristic diagram.
The data enhancement mode comprises the steps of carrying out multi-angle rotation on the image, wherein the division interval of the rotation angle is 30 degrees; randomly masking the image according to the probability P, and setting the pixel value under the mask to be 0; turning the image up, down, left and right; and carrying out distortion deformation processing on the image to different degrees and carrying out color disturbance on the image.
The fast-position attitude detection model improves the existing FastPose attitude detection model, and provides a new Distillation method, namely an Online Parallel knowledge Distillation method (Online Parallel Distillation). The online parallel knowledge distillation method continues to use a knowledge distillation framework based on teachers and students (Teacher-Student) on a network structure, wherein the Teacher network consists of 8 Hourglass feature extraction modules, and the Student network consists of 4 Hourglass feature extraction modules. The Teacher network was trained using the MSCOCO 2017 dataset and the Student network was trained using a portion of the labeled dataset. KL (Kullback-Leibler Divergence) Divergence is used for calculating loss of the Teacher network characteristic diagram and the Student network characteristic diagram in the training process, information of the Teacher characteristic diagram and information of the Student characteristic diagram are fused according to the channel similarity, and the Teacher and the Student network are trained in parallel in the training process. The Teacher network direct reasoning Student network is removed in the reasoning process. A cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, and different weight information is given to a feature diagram in a Teacher network by the cross-channel interactive attention mechanism. The calculation process of the Teacher network characteristic diagram and the Student network characteristic diagram is shown in formula (4):
whereinRespectively representing the feature diagram extracted by the second Hourglass module of the Teacher network and the feature diagram extracted by the first Hourglass module of the Student network.
The total profile loss is shown in equation (5):
the final loss function of the Faster-Pose attitude estimation model is shown in formula (6):
And mapping the data information of the human body key point Heat Map output by the Faster-Pose attitude detection model into the original characteristic diagram by using a linear interpolation method, and correcting the pixel point offset in the mapping process by using the trilinear interpolation.
Based on the above method, the online cross-channel interactive parallel distillation architecture posture estimation device in this embodiment includes: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform an online cross-channel interactive parallel distillation architecture pose estimation method.
The memory in this embodiment is 512GB, the processor is an 8-core CPU processor, and the apparatus further requires an incadata (NVIDIA) graphics card of RTX2080TI or more.
The method and the device fully consider the relation between the channel attention and the space attention, fuse the two characteristics according to the designed channel fusion rule, and have the advantages that the distribution of the characteristics can be determined through a channel attention model, and the characterization capability of the space dimension target characteristics can be further enhanced by combining the position information of the target characteristics in the space dimension.
The method fully considers that when the extracted feature diagram has few channels and large spatial features, the channel feature generalization is easy to be insufficient, the spatial features are sensitive and difficult to learn, and the pixel value difference and the pixel gradient direction of the feature diagram are calculated on the spatial attention model by using a second-order finite difference method, so that the positioning performance of the target position on the spatial dimension can be improved.
The original FastPose attitude estimation model is improved, and a new attitude estimation model, namely, faster-Pose, is provided. A new information interaction and fusion mode between the Teacher network and the Student network is designed, and a new loss function is proposed.
The above embodiments are only specific examples, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that may be made by one of ordinary skill in the art and in accordance with the claims of the present invention for the method and apparatus for estimating pose of an on-line cross-channel interactive parallel distillation structure should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. An online cross-channel interactive parallel distillation architecture attitude estimation method is characterized in that firstly, a video acquisition device acquires an external video stream, and the video stream is segmented into frames and input into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels.
2. The method for estimating the attitude of the online cross-channel interactive parallel distillation framework according to claim 1, wherein the feature extraction network is designed to be a CSP structure, a cross-channel interactive attention mechanism is introduced, the cross-channel interactive attention mechanism combines channel attention and spatial attention, the covariance matrix is used for calculating the similarity of every two channels of feature maps in a channel attention model, and the channels with high similarity are fused;
the second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
3. The method for estimating the attitude of the on-line cross-channel interactive parallel distillation framework, according to claim 2, wherein the calculated covariance values of the two channels are negative numbers, which means negative correlation, and 0, which means independent and uncorrelated between the two channels, and positive numbers, which means positive correlation of the two channels for feature fusion;
first, the mean value of each channel is calculated as shown in formula (1):
The variance for each channel is calculated as shown in equation (2):
the covariance between channels C1, C2 is calculated as shown in equation (3):
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only making covariance among different channels, and performing pixel-by-pixel fusion on the channels with positive correlation according to the covariance value.
4. The online cross-channel interactive parallel distillation architecture posture estimation method as claimed in claim 3, wherein the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected and detection frames are marked to obtain feature data of the target human body;
the YOLOV5 target detection model was trained using the public data set MSCOCO 2017. And the MSCOCO 2017 data set is randomly extracted according to a preset proportion, and data enhancement pretreatment operation is carried out on the sample data.
5. The method as claimed in claim 4, wherein the data enhancement mode comprises multi-angle rotation of the image, the division interval of the rotation angle is 30 degrees, the image is randomly masked according to the probability P, the pixel value under the mask is set to be 0, the image is turned over up and down, left and right, the image is distorted and deformed to different degrees, and color disturbance is performed on the image.
6. The method for estimating the attitude of the on-line cross-channel interactive parallel distillation framework, according to claim 5, is characterized in that the channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using a SoftMax function;
and the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
7. The online cross-channel interactive parallel distillation architecture attitude estimation method as claimed in claim 6, wherein a Depth-Wise method is used to perform feature extraction on each channel of a feature map to obtain a feature value matrix of each channel, and cross-channel feature fusion is performed.
8. The online cross-channel interactive parallel distillation architecture posture estimation method according to claim 7, characterized in that an online parallel knowledge distillation method is performed in a posture detection model fast-Pose, the online parallel knowledge distillation method continues to use a knowledge distillation framework based on Teacher-Student (Teacher-Student) on a network structure, the Teacher network consists of 8 Hourglass characteristic extraction modules, and the Student network consists of 4 Hourglass characteristic extraction modules;
training a Teacher network by using an MSCOCO 2017 data set, training a Student network by using a part of data set with a label, calculating loss of a Teacher network characteristic diagram and a Student network characteristic diagram by using KL divergence in the training process, fusing information of the Teacher characteristic diagram and information of the Student characteristic diagram according to channel similarity, and training the Teacher and the Student network in parallel in the training process;
in the reasoning process, a Teacher network direct reasoning Student network is removed, a cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, the cross-channel interactive attention mechanism endows a characteristic diagram in the Teacher network with different weight information, and the calculation processes of the TEAcher network characteristic diagram and the Student network characteristic diagram are shown in a formula (4):
wherein,respectively representing a characteristic diagram extracted by the second Hourglass module of the Teacher network and a characteristic diagram extracted by the first Hourglass module of the Student network;
the total profile loss is shown in equation (5):
the final loss function of the Faster-Pose attitude estimation model is shown in formula (6):
9. The method as claimed in claim 8, wherein the Pose estimation method comprises mapping the human body key point Heat Map data output by the Faster-Pose Pose detection model to an original feature Map by linear interpolation, and correcting the pixel shift occurring during the mapping process by trilinear interpolation.
10. An online cross-channel interactive parallel distillation framework attitude estimation device is characterized by comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211061531.2A CN115359571A (en) | 2022-09-01 | 2022-09-01 | Online cross-channel interactive parallel distillation framework attitude estimation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211061531.2A CN115359571A (en) | 2022-09-01 | 2022-09-01 | Online cross-channel interactive parallel distillation framework attitude estimation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115359571A true CN115359571A (en) | 2022-11-18 |
Family
ID=84004623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211061531.2A Pending CN115359571A (en) | 2022-09-01 | 2022-09-01 | Online cross-channel interactive parallel distillation framework attitude estimation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115359571A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091496A (en) * | 2023-04-07 | 2023-05-09 | 菲特(天津)检测技术有限公司 | Defect detection method and device based on improved Faster-RCNN |
-
2022
- 2022-09-01 CN CN202211061531.2A patent/CN115359571A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091496A (en) * | 2023-04-07 | 2023-05-09 | 菲特(天津)检测技术有限公司 | Defect detection method and device based on improved Faster-RCNN |
CN116091496B (en) * | 2023-04-07 | 2023-11-24 | 菲特(天津)检测技术有限公司 | Defect detection method and device based on improved Faster-RCNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097568B (en) | Video object detection and segmentation method based on space-time dual-branch network | |
CN111209810B (en) | Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images | |
CN109886121B (en) | Human face key point positioning method for shielding robustness | |
CN111709409B (en) | Face living body detection method, device, equipment and medium | |
US8345962B2 (en) | Transfer learning methods and systems for feed-forward visual recognition systems | |
CN105138998B (en) | Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again | |
CN110853033B (en) | Video detection method and device based on inter-frame similarity | |
CN114782694B (en) | Unsupervised anomaly detection method, system, device and storage medium | |
CN111091109A (en) | Method, system and equipment for predicting age and gender based on face image | |
CN110796018A (en) | Hand motion recognition method based on depth image and color image | |
CN112818969A (en) | Knowledge distillation-based face pose estimation method and system | |
CN111931603B (en) | Human body action recognition system and method of double-flow convolution network based on competitive network | |
CN117636134B (en) | Panoramic image quality evaluation method and system based on hierarchical moving window attention | |
CN111950457A (en) | Oil field safety production image identification method and system | |
CN113312973A (en) | Method and system for extracting features of gesture recognition key points | |
CN114693607A (en) | Method and system for detecting tampered video based on multi-domain block feature marker point registration | |
CN107292272A (en) | A kind of method and system of the recognition of face in the video of real-time Transmission | |
CN115359571A (en) | Online cross-channel interactive parallel distillation framework attitude estimation method and device | |
CN112464775A (en) | Video target re-identification method based on multi-branch network | |
CN115393928A (en) | Face recognition method and device based on depth separable convolution and additive angle interval loss | |
CN113221812A (en) | Training method of face key point detection model and face key point detection method | |
CN113947801B (en) | Face recognition method and device and electronic equipment | |
CN114445649A (en) | Method for detecting RGB-D single image shadow by multi-scale super-pixel fusion | |
CN114639132A (en) | Feature extraction model processing method, device and equipment in face recognition scene | |
Chen et al. | Siamese network algorithm based on multi-scale channel attention fusion and multi-scale depth-wise cross correlation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |