CN115359571A - Online cross-channel interactive parallel distillation framework attitude estimation method and device - Google Patents

Online cross-channel interactive parallel distillation framework attitude estimation method and device Download PDF

Info

Publication number
CN115359571A
CN115359571A CN202211061531.2A CN202211061531A CN115359571A CN 115359571 A CN115359571 A CN 115359571A CN 202211061531 A CN202211061531 A CN 202211061531A CN 115359571 A CN115359571 A CN 115359571A
Authority
CN
China
Prior art keywords
channel
characteristic diagram
feature
network
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211061531.2A
Other languages
Chinese (zh)
Inventor
郑群
杨义坤
孙桂刚
李超
朱宪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Information Technology Application And Innovation Research Institute Co ltd
Original Assignee
Xiamen Information Technology Application And Innovation Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Information Technology Application And Innovation Research Institute Co ltd filed Critical Xiamen Information Technology Application And Innovation Research Institute Co ltd
Priority to CN202211061531.2A priority Critical patent/CN115359571A/en
Publication of CN115359571A publication Critical patent/CN115359571A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of artificial intelligence computer vision, and particularly provides an online cross-channel interactive parallel distillation architecture attitude estimation method which comprises the steps of firstly, acquiring an external video stream by a video acquisition device, segmenting the video stream into frames, inputting the frames into a feature extraction network, and extracting features; the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body; transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels. Compared with the prior art, the method and the device have the advantages that the relevance of the channel characteristic information and the spatial characteristic information is considered, and the expression capacity of the required characteristic information is improved.

Description

Online cross-channel interactive parallel distillation framework attitude estimation method and device
Technical Field
The invention relates to the technical field of artificial intelligence computer vision, and particularly provides an online cross-channel interactive parallel distillation architecture attitude estimation method and device.
Background
Human posture estimation is an important technical field in artificial intelligence computer vision. Human-computer interaction can be better realized by estimating the behavior of people in the scene. At present, human posture estimation is commonly used in worker violation operation detection, security protection field and VR wearable equipment. The accuracy of extracting the human body detection frame by the target detection model in the human body posture estimation algorithm is crucial to the accuracy and stability of positioning the human body key points.
The existing attention model does not consider the relevance of the channel characteristic information and the spatial characteristic information, so that the accuracy is not high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an online cross-channel interactive parallel distillation framework attitude estimation method with strong practicability.
The invention further provides a device for estimating the attitude of the online cross-channel interactive parallel distillation framework, which is reasonable in design, safe and applicable.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for estimating attitude of an online cross-channel interactive parallel distillation framework comprises the steps that firstly, a video acquisition device acquires external video streams, and the video streams are segmented into frames and input into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human body key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human body key point labels.
Further, the feature extraction network is designed into a CSP structure, a cross-channel interactive attention mechanism is introduced, the cross-channel interactive attention mechanism combines channel attention and space attention, the covariance matrix is used for calculating the similarity of every two channels of the feature map in the channel attention model, and the channels with high similarity are fused;
the second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
Further, if the covariance calculation value of the two channels is a negative number, negative correlation is represented, if the covariance calculation value is 0, the two channels are independent from each other and are not correlated with each other, and if the covariance calculation value is a positive number, the positive correlation of the two channels is represented for feature fusion;
first, the mean value of each channel is calculated as shown in formula (1):
Figure BDA0003826443620000021
the mean of all channels is characterized as
Figure BDA0003826443620000022
The variance for each channel is calculated as shown in equation (2):
Figure BDA0003826443620000023
the variance of all channels is noted as:
Figure BDA0003826443620000024
the covariance between channels C1, C2 is calculated as shown in equation (3):
Figure BDA0003826443620000025
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only covariance is made among different channels, and pixel-by-pixel fusion is carried out on the channels with positive correlation according to the covariance value.
Further, the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and detection frames are marked to obtain feature data of the target human body;
the YOLOV5 target detection model was trained using the public data set msco 2017. And randomly extracting the MSCOCO 2017 data set according to a preset proportion, and performing data enhancement pretreatment operation on the sample data.
Preferably, the data enhancement mode includes performing multi-angle rotation on the image, dividing the rotation angle into 30 degrees, randomly masking the image according to the probability P, setting the pixel value under the mask to be 0, turning the image up, down, left and right, performing distortion processing of different degrees on the image, and performing color disturbance on the image.
Further, the channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using the SoftMax function;
and the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
Further, feature extraction is carried out on each channel of the feature map by using a Depth-Wise method to obtain a feature value matrix of each channel, and cross-channel feature fusion is carried out.
Furthermore, an online parallel knowledge distillation method is carried out in a posture detection model Faster-Pose, the online parallel knowledge distillation method continues to use a knowledge distillation frame based on Teacher-Student (Teacher-Student) on a network structure, a Teacher network consists of 8 Hourglass feature extraction modules, and a Student network consists of 4 Hourglass feature extraction modules;
training a Teacher network by using an MSCOCO 2017 data set, training a Student network by using a part of data set with labels, calculating loss of a Teacher network characteristic diagram and Student network characteristic diagram by using KL divergence in the training process, fusing information of the Teacher characteristic diagram and information of the Student characteristic diagram according to channel similarity, and training the Teacher and the Student network in parallel in the training process;
in the reasoning process, a Teacher network direct reasoning Student network is removed, a cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, the cross-channel interactive attention mechanism endows a characteristic diagram in the Teacher network with different weight information, and the calculation processes of the TEAcher network characteristic diagram and the Student network characteristic diagram are shown in a formula (4):
Figure BDA0003826443620000041
wherein,
Figure BDA0003826443620000042
respectively representing a characteristic diagram extracted by the second Hourglass module of the Teacher network and a characteristic diagram extracted by the first Hourglass module of the Student network;
the total profile loss is shown in equation (5):
Figure BDA0003826443620000043
the final loss function of the Faster-Pose attitude estimation model is shown as the formula (6):
Figure BDA0003826443620000044
wherein
Figure BDA0003826443620000045
For Student network model loss, α and λ are the hyper-parameters to be learned.
Further, the data information of the human key point Heat Map output by the Faster-Pose attitude detection model is mapped to the original characteristic diagram by using a linear interpolation method, and the pixel point deviation in the mapping process is corrected by using the trilinear interpolation.
An online cross-channel interactive parallel distillation framework attitude estimation device comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform an online cross-channel interactive parallel distillation architecture pose estimation method.
Compared with the prior art, the online cross-channel interactive parallel distillation architecture attitude estimation method and device provided by the invention have the following outstanding beneficial effects:
the invention provides a cross-channel interactive attention mechanism and a new attitude detection model Faster-Pose. In the feature extraction stage, channel attention is used to detect which channels of the feature map contain the feature expression of the required information, and spatial attention is used to detect which position of the feature map has the required feature information. In the invention, the feature information extracted by the spatial attention and the feature information extracted by the channel attention are fused, and the expression capability of the required feature information is improved by considering the relevance of the channel feature information and the spatial feature information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 2 is a schematic diagram of a gesture prediction algorithm model framework in an online cross-channel interactive parallel distillation architecture gesture prediction method;
FIG. 3 is a schematic diagram of a preprocessing flow of an MSCOCO 2017 data set in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 4 is a schematic diagram of a YOLOV5 target detection algorithm framework in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 5 is a schematic diagram of a C-CIAM attention mechanism framework in an online cross-channel interactive parallel distillation architecture attitude estimation method;
FIG. 6 is a schematic diagram of a Faster-Pose attitude detection model in an online cross-channel interactive parallel distillation architecture attitude estimation method.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
as shown in fig. 1 to 6, in the method for estimating the attitude of the online cross-channel interactive parallel distillation architecture in the embodiment, first, a video acquisition device acquires an external video stream, divides the video stream into frames, and inputs the frames into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and detection frames are marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels.
The video acquisition device in this embodiment is a 4-channel 2D camera, and the video stream is segmented into frames at a rate of 60 frames per second and input to the feature extraction network for feature extraction.
The feature extraction network designs a CSP structure by referring to a CSPNet network, and introduces a cross-channel interactive attention mechanism. The cross-channel interactive attention mechanism combines channel attention and spatial attention, the covariance matrix is used for calculating the similarity of every two channels of the feature map in the channel attention model, and the channels with high similarity are fused. The second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
And if the covariance calculation value of the two channels is a negative number, the two channels are in negative correlation, if the covariance calculation value is 0, the two channels are independent from each other, and if the covariance calculation value is a positive number, the two channels are in positive correlation for feature fusion.
First, the mean value of each channel is calculated as shown in formula (1):
Figure BDA0003826443620000061
the mean of all channels is characterized
Figure BDA0003826443620000062
The variance for each channel is calculated as shown in equation (2):
Figure BDA0003826443620000071
the variance of all channels is noted as:
Figure BDA0003826443620000072
the covariance between channels C1, C2 is calculated as shown in equation (3):
Figure BDA0003826443620000073
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only covariance is made among different channels, and pixel-by-pixel fusion is carried out on the channels with positive correlation according to the covariance value.
And (3) conveying the extracted features to a YOLOV5 target detection model, detecting the position of the target human body in each frame of image and marking a detection frame to obtain feature data of the target human body. The YOLOV5 target detection model was trained using the public data set msco 2017. And randomly extracting the MSCOCO 2017 data set according to a preset proportion, and performing data enhancement pretreatment operation on the sample data.
The channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using the SoftMax function. And the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
And (3) extracting the characteristics of each channel of the characteristic diagram by using a Depth-Wise method to obtain a characteristic value matrix of each channel, fusing the probability matrix and the original characteristic diagram by using a product mode respectively, and adding weight information to the characteristic diagram.
The data enhancement mode comprises the steps of carrying out multi-angle rotation on the image, wherein the division interval of the rotation angle is 30 degrees; randomly masking the image according to the probability P, and setting the pixel value under the mask to be 0; turning the image up, down, left and right; and carrying out distortion deformation processing on the image to different degrees and carrying out color disturbance on the image.
The fast-position attitude detection model improves the existing FastPose attitude detection model, and provides a new Distillation method, namely an Online Parallel knowledge Distillation method (Online Parallel Distillation). The online parallel knowledge distillation method continues to use a knowledge distillation framework based on teachers and students (Teacher-Student) on a network structure, wherein the Teacher network consists of 8 Hourglass feature extraction modules, and the Student network consists of 4 Hourglass feature extraction modules. The Teacher network was trained using the MSCOCO 2017 dataset and the Student network was trained using a portion of the labeled dataset. KL (Kullback-Leibler Divergence) Divergence is used for calculating loss of the Teacher network characteristic diagram and the Student network characteristic diagram in the training process, information of the Teacher characteristic diagram and information of the Student characteristic diagram are fused according to the channel similarity, and the Teacher and the Student network are trained in parallel in the training process. The Teacher network direct reasoning Student network is removed in the reasoning process. A cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, and different weight information is given to a feature diagram in a Teacher network by the cross-channel interactive attention mechanism. The calculation process of the Teacher network characteristic diagram and the Student network characteristic diagram is shown in formula (4):
Figure BDA0003826443620000081
wherein
Figure BDA0003826443620000082
Respectively representing the feature diagram extracted by the second Hourglass module of the Teacher network and the feature diagram extracted by the first Hourglass module of the Student network.
The total profile loss is shown in equation (5):
Figure BDA0003826443620000091
the final loss function of the Faster-Pose attitude estimation model is shown in formula (6):
Figure BDA0003826443620000092
wherein
Figure BDA0003826443620000093
For Student network model loss, α and λ are the hyper-parameters to be learned.
And mapping the data information of the human body key point Heat Map output by the Faster-Pose attitude detection model into the original characteristic diagram by using a linear interpolation method, and correcting the pixel point offset in the mapping process by using the trilinear interpolation.
Based on the above method, the online cross-channel interactive parallel distillation architecture posture estimation device in this embodiment includes: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform an online cross-channel interactive parallel distillation architecture pose estimation method.
The memory in this embodiment is 512GB, the processor is an 8-core CPU processor, and the apparatus further requires an incadata (NVIDIA) graphics card of RTX2080TI or more.
The method and the device fully consider the relation between the channel attention and the space attention, fuse the two characteristics according to the designed channel fusion rule, and have the advantages that the distribution of the characteristics can be determined through a channel attention model, and the characterization capability of the space dimension target characteristics can be further enhanced by combining the position information of the target characteristics in the space dimension.
The method fully considers that when the extracted feature diagram has few channels and large spatial features, the channel feature generalization is easy to be insufficient, the spatial features are sensitive and difficult to learn, and the pixel value difference and the pixel gradient direction of the feature diagram are calculated on the spatial attention model by using a second-order finite difference method, so that the positioning performance of the target position on the spatial dimension can be improved.
The original FastPose attitude estimation model is improved, and a new attitude estimation model, namely, faster-Pose, is provided. A new information interaction and fusion mode between the Teacher network and the Student network is designed, and a new loss function is proposed.
The above embodiments are only specific examples, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that may be made by one of ordinary skill in the art and in accordance with the claims of the present invention for the method and apparatus for estimating pose of an on-line cross-channel interactive parallel distillation structure should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. An online cross-channel interactive parallel distillation architecture attitude estimation method is characterized in that firstly, a video acquisition device acquires an external video stream, and the video stream is segmented into frames and input into a feature extraction network for feature extraction;
the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected, and a detection frame is marked to obtain feature data of the target human body;
transmitting the target human body feature data to a posture detection model Faster-Pose to obtain human body key point feature information; and mapping the obtained characteristic information of the human key points into a characteristic diagram through linear transformation to obtain the characteristic diagram with the human key point labels.
2. The method for estimating the attitude of the online cross-channel interactive parallel distillation framework according to claim 1, wherein the feature extraction network is designed to be a CSP structure, a cross-channel interactive attention mechanism is introduced, the cross-channel interactive attention mechanism combines channel attention and spatial attention, the covariance matrix is used for calculating the similarity of every two channels of feature maps in a channel attention model, and the channels with high similarity are fused;
the second-order finite difference method is used in the spatial attention to calculate the characteristic image pixel value difference and the pixel gradient direction.
3. The method for estimating the attitude of the on-line cross-channel interactive parallel distillation framework, according to claim 2, wherein the calculated covariance values of the two channels are negative numbers, which means negative correlation, and 0, which means independent and uncorrelated between the two channels, and positive numbers, which means positive correlation of the two channels for feature fusion;
first, the mean value of each channel is calculated as shown in formula (1):
Figure FDA0003826443610000011
the mean of all channels is characterized
Figure FDA0003826443610000012
The variance for each channel is calculated as shown in equation (2):
Figure FDA0003826443610000013
the variance of all channels is noted as:
Figure FDA0003826443610000021
the covariance between channels C1, C2 is calculated as shown in equation (3):
Figure FDA0003826443610000022
all the channel similarity covariance values obtained by analogy are recorded as Cov i,k And only making covariance among different channels, and performing pixel-by-pixel fusion on the channels with positive correlation according to the covariance value.
4. The online cross-channel interactive parallel distillation architecture posture estimation method as claimed in claim 3, wherein the extracted features are transmitted to a YOLOV5 target detection model, the position of the target human body in each frame of image is detected and detection frames are marked to obtain feature data of the target human body;
the YOLOV5 target detection model was trained using the public data set MSCOCO 2017. And the MSCOCO 2017 data set is randomly extracted according to a preset proportion, and data enhancement pretreatment operation is carried out on the sample data.
5. The method as claimed in claim 4, wherein the data enhancement mode comprises multi-angle rotation of the image, the division interval of the rotation angle is 30 degrees, the image is randomly masked according to the probability P, the pixel value under the mask is set to be 0, the image is turned over up and down, left and right, the image is distorted and deformed to different degrees, and color disturbance is performed on the image.
6. The method for estimating the attitude of the on-line cross-channel interactive parallel distillation framework, according to claim 5, is characterized in that the channel attention model obtains a channel characteristic probability matrix by using a SoftMax function, and the space attention model obtains a space characteristic probability matrix by using a SoftMax function;
and the probability matrix and the original characteristic diagram are fused in a product mode respectively, and weight information is added to the characteristic diagram.
7. The online cross-channel interactive parallel distillation architecture attitude estimation method as claimed in claim 6, wherein a Depth-Wise method is used to perform feature extraction on each channel of a feature map to obtain a feature value matrix of each channel, and cross-channel feature fusion is performed.
8. The online cross-channel interactive parallel distillation architecture posture estimation method according to claim 7, characterized in that an online parallel knowledge distillation method is performed in a posture detection model fast-Pose, the online parallel knowledge distillation method continues to use a knowledge distillation framework based on Teacher-Student (Teacher-Student) on a network structure, the Teacher network consists of 8 Hourglass characteristic extraction modules, and the Student network consists of 4 Hourglass characteristic extraction modules;
training a Teacher network by using an MSCOCO 2017 data set, training a Student network by using a part of data set with a label, calculating loss of a Teacher network characteristic diagram and a Student network characteristic diagram by using KL divergence in the training process, fusing information of the Teacher characteristic diagram and information of the Student characteristic diagram according to channel similarity, and training the Teacher and the Student network in parallel in the training process;
in the reasoning process, a Teacher network direct reasoning Student network is removed, a cross-channel interactive attention mechanism is introduced into a Faster-Pose attitude detection model, the cross-channel interactive attention mechanism endows a characteristic diagram in the Teacher network with different weight information, and the calculation processes of the TEAcher network characteristic diagram and the Student network characteristic diagram are shown in a formula (4):
Figure FDA0003826443610000031
wherein,
Figure FDA0003826443610000032
respectively representing a characteristic diagram extracted by the second Hourglass module of the Teacher network and a characteristic diagram extracted by the first Hourglass module of the Student network;
the total profile loss is shown in equation (5):
Figure FDA0003826443610000033
the final loss function of the Faster-Pose attitude estimation model is shown in formula (6):
Figure FDA0003826443610000034
wherein
Figure FDA0003826443610000041
For Student network model loss, α and λ are the hyper-parameters to be learned.
9. The method as claimed in claim 8, wherein the Pose estimation method comprises mapping the human body key point Heat Map data output by the Faster-Pose Pose detection model to an original feature Map by linear interpolation, and correcting the pixel shift occurring during the mapping process by trilinear interpolation.
10. An online cross-channel interactive parallel distillation framework attitude estimation device is characterized by comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 9.
CN202211061531.2A 2022-09-01 2022-09-01 Online cross-channel interactive parallel distillation framework attitude estimation method and device Pending CN115359571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211061531.2A CN115359571A (en) 2022-09-01 2022-09-01 Online cross-channel interactive parallel distillation framework attitude estimation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211061531.2A CN115359571A (en) 2022-09-01 2022-09-01 Online cross-channel interactive parallel distillation framework attitude estimation method and device

Publications (1)

Publication Number Publication Date
CN115359571A true CN115359571A (en) 2022-11-18

Family

ID=84004623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211061531.2A Pending CN115359571A (en) 2022-09-01 2022-09-01 Online cross-channel interactive parallel distillation framework attitude estimation method and device

Country Status (1)

Country Link
CN (1) CN115359571A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116091496B (en) * 2023-04-07 2023-11-24 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN

Similar Documents

Publication Publication Date Title
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
CN109886121B (en) Human face key point positioning method for shielding robustness
CN111709409B (en) Face living body detection method, device, equipment and medium
US8345962B2 (en) Transfer learning methods and systems for feed-forward visual recognition systems
CN105138998B (en) Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again
CN110853033B (en) Video detection method and device based on inter-frame similarity
CN114782694B (en) Unsupervised anomaly detection method, system, device and storage medium
CN111091109A (en) Method, system and equipment for predicting age and gender based on face image
CN110796018A (en) Hand motion recognition method based on depth image and color image
CN112818969A (en) Knowledge distillation-based face pose estimation method and system
CN111931603B (en) Human body action recognition system and method of double-flow convolution network based on competitive network
CN117636134B (en) Panoramic image quality evaluation method and system based on hierarchical moving window attention
CN111950457A (en) Oil field safety production image identification method and system
CN113312973A (en) Method and system for extracting features of gesture recognition key points
CN114693607A (en) Method and system for detecting tampered video based on multi-domain block feature marker point registration
CN107292272A (en) A kind of method and system of the recognition of face in the video of real-time Transmission
CN115359571A (en) Online cross-channel interactive parallel distillation framework attitude estimation method and device
CN112464775A (en) Video target re-identification method based on multi-branch network
CN115393928A (en) Face recognition method and device based on depth separable convolution and additive angle interval loss
CN113221812A (en) Training method of face key point detection model and face key point detection method
CN113947801B (en) Face recognition method and device and electronic equipment
CN114445649A (en) Method for detecting RGB-D single image shadow by multi-scale super-pixel fusion
CN114639132A (en) Feature extraction model processing method, device and equipment in face recognition scene
Chen et al. Siamese network algorithm based on multi-scale channel attention fusion and multi-scale depth-wise cross correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination