CN116359846A - Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning - Google Patents

Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning Download PDF

Info

Publication number
CN116359846A
CN116359846A CN202310236507.6A CN202310236507A CN116359846A CN 116359846 A CN116359846 A CN 116359846A CN 202310236507 A CN202310236507 A CN 202310236507A CN 116359846 A CN116359846 A CN 116359846A
Authority
CN
China
Prior art keywords
task
human body
features
analysis
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310236507.6A
Other languages
Chinese (zh)
Inventor
王帅
梅洛瑜
曹东江
史瑞签
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202310236507.6A priority Critical patent/CN116359846A/en
Publication of CN116359846A publication Critical patent/CN116359846A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Electromagnetism (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of Internet of things perception, and designs a dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning, in particular to human body analysis based on millimeter wave perception. Currently, millimeter wave perception centered by people is mostly focused on scenes such as motion recognition, gesture estimation and the like, but the scenes cannot acquire semantic information of millimeter wave point clouds, namely body part information corresponding to each radar point cannot be distinguished, so that a dynamic millimeter wave Lei Dadian cloud human body analysis scheme is needed. The method comprises the following steps: and for millimeter wave point cloud data, clustering is firstly carried out, then a multi-task learning model is used for jointly executing human body analysis and gesture estimation tasks to extract features, and then multi-task feature fusion is carried out through a non-local network, so that the final output result is point cloud with annotation semantic tags.

Description

Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning
Technical Field
The invention relates to the technical field of perception of the Internet of things, in particular to a dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning.
Background
Perception and understanding of human activity plays an increasingly important role in human-centric intelligent applications. The traditional method adopts a camera or a body contact sensor, is easily influenced by severe environment and has privacy problem. In terms of human perception, research using millimeter wave radar has been on the rise in recent years, and its effectiveness in gesture and activity recognition, gesture estimation, identity recognition, and the like has been demonstrated. However, the tasks cannot clearly acquire semantic information of the millimeter wave point cloud, namely, body part information corresponding to each radar point is difficult to distinguish, so that human body analysis is realized.
In human body sensing applications, fine-grained body part information is continuously required, and the lack of semantic information greatly limits millimeter wave radars from becoming an enabling technology for human body computing in daily life. Meanwhile, the semantic information is used as an additional information input channel, so that the human perception task is more robust. Various computer vision tasks have demonstrated that inclusion of semantic information input can significantly improve the accuracy of pose estimation, activity recognition, and person recognition, which is more prominent with millimeter wave radars because millimeter wave point clouds are inherently lower in image quality than vision sensors. Therefore, a technical scheme is needed to realize the dynamic millimeter wave Lei Dadian cloud human body analysis task and obtain the point cloud with the body part semantic information tag.
The sparse nature of millimeter wave point clouds makes feature extraction challenging: due to the single chip and antenna size, millimeter wave point clouds are extremely sparse, which makes it difficult to perceive detailed structures of the human body from the point clouds even with the naked human eye. Extracting features containing human body structural information (such as gestures) using existing deep neural network models is challenging, which directly affects human body parsing tasks.
Specular reflection causes loss of body parts in millimeter wave point cloud data: due to the fact that the millimeter wave radar with low cost is limited to the small antenna aperture, most human body reflection signals are not returned to the sensor, specular reflection occurs, body parts in the point cloud are lost, and finally an error analysis result is caused.
Disclosure of Invention
In order to solve the problems, the invention discloses a dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning; according to the method, millimeter wave point cloud data are clustered firstly, then human body analysis and gesture estimation tasks are executed in a combined mode through a multi-task learning model to conduct feature extraction, and then multi-task feature fusion is conducted through a non-local network, so that the point cloud with annotation semantic tags is output finally.
The specific technical scheme is as follows:
step 1: the sparsity of the millimeter wave point cloud is solved through a multi-task feature extraction module; the method comprises the steps of adopting a multi-task learning model to jointly execute a main task of human body analysis and an auxiliary task of gesture estimation; the auxiliary task can effectively guide the human analytic network to extract high-level structural features representing the posture of the subject; because of strong correlation between the human body posture and the human body analysis, the human body posture correlation characteristics are beneficial to improving the accuracy and the robustness of the analysis network in predicting semantic tags; for human analysis and gesture estimation tasks, the multi-task learning model extracts corresponding features in parallel; step 2; the problem of missing body parts in the point cloud data caused by specular reflection is solved by a multitasking feature fusion module; obtaining inspiration from a non-local network (NLN), designing a multi-task feature fusion method, combining intra-task attention and inter-task attention mechanisms, and realizing the aggregation of space-time features of a main body from a global angle;
step 3: the invention adopts a Kinect system to obtain the true values of the human body analysis tag and the gesture estimation tag in the off-line training stage of the system.
Further, in step 1, the multi-task learning model may be specifically divided into a point module, a frame module and a feature aggregation module, where the input of the model is a frame sequence with a length s, each frame includes n points, and each point includes d feature dimensions;
step 1.1, extracting analysis characteristics of a human body;
in the point module, for the point set C of the frame corresponding to the t moment t Any radar point p in (2) i,t Obtaining a high-dimensional characteristic representation of a point, i.e. a point characteristic, using a multi-presence perceptron (MLP)
Figure BDA0004122505260000031
The formula is as follows:
Figure BDA0004122505260000032
wherein θ is e The learnable parameters of the MLP are represented, and H represents the human body analysis task.
At the frame module, for each radar point p i,t Point features of (2)
Figure BDA0004122505260000033
First encoded as a higher-dimensional feature representation +.>
Figure BDA0004122505260000034
The formula is as follows:
Figure BDA0004122505260000035
wherein θ is h A learnable parameter representing MLP;
step 1.2: extracting human body posture characteristics;
for extraction of human body posture features, a slightly different network architecture is used than for human body analytic feature extraction; in particular, frame characteristics for any frame
Figure BDA0004122505260000036
Processing using a long short term memory network (LSTM); the formula is as follows:
Figure BDA0004122505260000037
wherein θ is r Parameters representing LSTM.
Finally, will
Figure BDA0004122505260000041
Point features associated with the task of human posture->
Figure BDA0004122505260000042
Connecting to obtain characteristic vector of each point in the frame under the task of attitude estimation>
Figure BDA0004122505260000043
The formula is as follows:
Figure BDA0004122505260000044
further, in the step 1.1, the point characteristics of all the points in the frame are determined
Figure BDA0004122505260000045
Aggregation into frame features->
Figure BDA0004122505260000046
To extract global information of the frame; the formula is as follows:
Figure BDA0004122505260000047
wherein N is the number of points contained in the frame corresponding to the time t, A () represents the attention function and is theta a A learnable parameter representing an attention function;
finally, frame characteristics
Figure BDA0004122505260000048
Connect to the dot feature->
Figure BDA0004122505260000049
Obtaining human body analysis tasksFeature vector +.>
Figure BDA00041225052600000410
The formula is as follows:
Figure BDA00041225052600000411
further, the step 2 performs the tasks of human body analysis and posture estimation by using two parallel NLNs respectively; the method comprises the following steps:
step 2.1: an intra-task attention mechanism, for human body analysis tasks, analyzing NLN with a series of analysis features as input, and executing task self-attention to aggregate the features of different frames; this will generate a global context for classifying the body parts in each frame to solve the problem of losing body parts due to specular reflection in the local frames;
step 2.2: an inter-task attention mechanism; in order to integrate the features in the human body analysis and gesture estimation tasks, the correlation between the analysis features and the gesture features needs to be found, and an inter-task attention mechanism is adopted to calculate the space-time correlation between the analysis features and the gesture features; for analysis tasks, the method inputs the gesture estimation features into an analysis NLN, and firstly analyzes a feature matrix Z H And gesture feature matrix Z P Performing linear transformation, then performing dot product and normalization on the result to finally obtain an inter-task attention matrix a of the human body analysis task H→P
Step 2.3 feature polymerization: the present invention fuses human body analytic features and pose estimation features in all frames to predict body parts at points in a particular frame using intra-task and inter-task attention matrices.
Step 2.4: outputting a model; human body analytic characteristic Y H And pose estimation feature Y P The three-dimensional human body part classification information and the human skeleton key point position information are output finally after being processed by a multi-layer perceptron and a fully-connected neural network respectively.
Further, the step 2.1: for parsing tasks, firstCharacterizing points of all points within all time series
Figure BDA0004122505260000051
Stacked as feature matrix Z H After that, Z H Obtaining an embedded vector by linear transformation>
Figure BDA0004122505260000052
And->
Figure BDA0004122505260000053
Further, in order to estimate the space-time correlation between the points in each group of frames, dot product and normalization processing are performed on the embedded vectors through a nonlinear function, so as to obtain an intra-task attention matrix a under the human body analysis task H . The formula is as follows:
Figure BDA0004122505260000054
where sigma represents a non-linear function,
Figure BDA0004122505260000055
and->
Figure BDA0004122505260000056
Representing parameters of the linear transformation.
Similarly, in the posture estimation task, the same processing procedure is performed to obtain the attention matrix a in the corresponding task P The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure BDA0004122505260000057
where sigma represents a non-linear function,
Figure BDA0004122505260000058
and->
Figure BDA0004122505260000059
Representing linear transformationsParameters of Z P Point characteristics representing all points in all time sequences under the task of pose estimation +.>
Figure BDA00041225052600000510
And stacking the obtained feature matrix.
Further, the specific formula of the step 2.2 is as follows:
Figure BDA00041225052600000511
wherein the method comprises the steps of
Figure BDA0004122505260000061
And->
Figure BDA0004122505260000062
The weight parameters representing the linear transformation.
Similarly, the human body analytic features are input into the pose estimation NLN to obtain an inter-task attention matrix a of the pose estimation task P→H The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure BDA0004122505260000063
wherein the method comprises the steps of
Figure BDA0004122505260000064
And->
Figure BDA0004122505260000065
The weight parameters representing the linear transformation.
Further, the step 2.3 is specifically that for human body analysis task, the feature matrix Z is first set H Respectively linearly transformed into
Figure BDA0004122505260000066
And->
Figure BDA0004122505260000067
Then respectively and intra-task attention matrix a H And inter-task attention matrix a H→P Multiplying to obtain the intra-task features and inter-task features respectively, and calculating the weighted sum of all the features according to the correlation between all the features and the current frame; finally, the intra-task features and inter-task features are connected, and the result is connected with the original feature Z H Adding elements by elements and generating final aggregated human body analytic features Y H The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure BDA0004122505260000068
wherein the method comprises the steps of
Figure BDA0004122505260000069
And->
Figure BDA00041225052600000610
Respectively linear transformation parameters;
for the attitude estimation task, the aggregated attitude estimation feature Y is finally obtained in the same way as the method P The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure BDA00041225052600000611
wherein the method comprises the steps of
Figure BDA00041225052600000612
And->
Figure BDA00041225052600000613
Respectively linear transformation parameters.
Further, the step 3 needs to be described as follows: the Kinect system is only used in the offline training phase, but is not required in the reasoning phase; for human body analytic tasks, cross entropy loss is adopted to minimize errors between predicted body part classification and true classification of each point; the formula is as follows:
Figure BDA0004122505260000071
where N represents the number of points, K is the class number of the semantic tags,
Figure BDA0004122505260000072
is a function of the output 0 or 1, when +.>
Figure BDA0004122505260000073
When 1 indicates that sample n belongs to category k, < >>
Figure BDA0004122505260000074
Is the predicted probability that sample b belongs to class k;
for the attitude estimation task, a mean square error is adopted to minimize the error between the predicted position and the actual position of the skeleton joint; the formula is as follows:
Figure BDA0004122505260000075
wherein I II the L2-paradigm is represented,
Figure BDA0004122505260000076
and p m Respectively representing a predicted value and an actual value of the bone joint, wherein M represents the number of selected bone joint points; the network architecture is trained end to end, and the overall supervision function of the system is as follows:
L=γL H +βL P
where γ and β are hyper-parameters.
The invention has the beneficial effects that: :
the invention designs a dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning, which can generate point cloud with annotation semantic tags, solves the problem that the semantic information of the millimeter wave point cloud cannot be acquired by millimeter wave perception centered by people at present, achieves the accuracy of about 92% and the accuracy of IoU of 84%, and can respectively improve the performance of two downstream tasks (gesture estimation and action recognition) by about 18% and 6% by the predicted semantic tags.
Drawings
Fig. 1: the overall structure of the system is schematically shown.
Fig. 2: the multi-task feature extraction module is structured schematically.
Fig. 3: the multi-task feature fusion module is structurally schematic.
Fig. 4: system accuracy in different scenarios.
Detailed Description
The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention. It should be noted that the words "front", "rear", "left", "right", "upper" and "lower" used in the following description refer to directions in the drawings, and the words "inner" and "outer" refer to directions toward or away from, respectively, the geometric center of a particular component.
As shown in fig. 1, the method for analyzing a dynamic millimeter wave Lei Dadian cloud human body based on joint learning in this embodiment specifically includes the following steps:
as shown in fig. 2, step 1: solving sparsity of millimeter wave point cloud through' multitasking feature extraction module
Aiming at the characteristic problem that the existing deep neural network model is difficult to extract the information containing the human body structure, the invention adopts the multi-task learning model to jointly execute human body analysis (main task) and gesture estimation (auxiliary task). The auxiliary task may effectively direct the human analytic network to extract high-level structural features representative of the subject's posture. Because of the strong correlation between the human body posture and the human body analysis, the human body posture correlation characteristics are beneficial to improving the accuracy and the robustness of the analysis network in predicting the semantic tags. For human analysis and gesture estimation tasks, the multi-task learning model extracts corresponding features in parallel, specifically, the multi-task learning model can be divided into a point module, a frame module and a feature aggregation module, the input of the model is a frame sequence with the length of s, each frame comprises n points, and each point comprises d feature dimensions.
Step 1.1: extracting analysis characteristics of a human body;
in the point module, for the point set C of the frame corresponding to the t moment t Any radar point p in (2) i,t Obtaining a high-dimensional characteristic representation of a point, i.e. a point characteristic, using a multi-presence perceptron (MLP)
Figure BDA0004122505260000091
The formula is as follows:
Figure BDA0004122505260000092
wherein θ is e The learnable parameters of the MLP are represented, and H represents the human body analysis task.
At the frame module, for each radar point p i,t Point features of (2)
Figure BDA0004122505260000093
First encoded as a higher-dimensional feature representation +.>
Figure BDA0004122505260000094
The formula is as follows:
Figure BDA0004122505260000095
wherein θ is h A learnable parameter representing MLP.
Further, the point characteristics of all points in the frame
Figure BDA0004122505260000096
Aggregation into frame features->
Figure BDA0004122505260000097
To extract global information of the frame. The formula is as follows:
Figure BDA0004122505260000098
wherein N is the time t pairThe number of points contained in the frame, A () represents the attention function, which is θ a A learnable parameter representing an attention function.
Finally, frame characteristics
Figure BDA0004122505260000099
Connect to the dot feature->
Figure BDA00041225052600000910
Obtaining feature vector of each point in the frame under human body analysis task>
Figure BDA00041225052600000911
The formula is as follows:
Figure BDA00041225052600000912
step 1.2: extracting human body posture characteristics;
for the extraction of human body posture features, a slightly different network architecture is used than the human body analytic feature extraction. In particular, frame characteristics for any frame
Figure BDA0004122505260000101
Processing is performed using a long short term memory network (LSTM). The formula is as follows:
Figure BDA0004122505260000102
wherein θ is r Parameters representing LSTM.
Finally, will
Figure BDA0004122505260000103
Point features associated with the task of human posture->
Figure BDA0004122505260000104
Connecting to obtain characteristic vector of each point in the frame under the task of attitude estimation>
Figure BDA0004122505260000105
The formula is as follows:
Figure BDA0004122505260000106
as shown in fig. 3, step 2: solving the problem of missing body parts in point cloud data caused by specular reflection through a multi-task feature fusion module
In order to solve the problem of missing body parts in point cloud data caused by specular reflection, the invention obtains inspiration from a non-local network (NLN), and designs a multi-task feature fusion method which combines intra-task attention and inter-task attention mechanisms and realizes the aggregation of space-time features of a main body from a global angle. Specifically, the method uses two parallel NLNs to perform human body parsing and pose estimation tasks, respectively.
Step 2.1: an intra-task attention mechanism;
for human body analysis tasks, analyzing NLN takes a series of analysis features as input, and executing task self-attention to aggregate the features of different frames. This will generate a global context for classifying the body parts in each frame to solve the problem of losing body parts due to specular reflection in the local frames. More specifically, for the parsing task, point features of all points in all time series are first generated
Figure BDA0004122505260000111
Stacked as feature matrix Z H After that, Z H Obtaining an embedded vector by linear transformation>
Figure BDA0004122505260000112
And->
Figure BDA0004122505260000113
Further, in order to estimate the space-time correlation between the points in each group of frames, the dot product and normalization processing are performed on the embedded vector through a nonlinear function (such as a softmax function), so as to obtain any one of the human body analysis tasksIntra-business attention matrix a H . The formula is as follows:
Figure BDA0004122505260000114
where sigma represents a non-linear function,
Figure BDA0004122505260000115
and->
Figure BDA0004122505260000116
Representing parameters of the linear transformation.
Similarly, in the posture estimation task, the same processing procedure is performed to obtain the attention matrix a in the corresponding task P . The formula is as follows:
Figure BDA0004122505260000117
where sigma represents a non-linear function,
Figure BDA0004122505260000118
and->
Figure BDA0004122505260000119
Parameters representing linear transformation, Z P Point characteristics representing all points in all time sequences under the task of pose estimation +.>
Figure BDA00041225052600001110
And stacking the obtained feature matrix.
Step 2.2: an inter-task attention mechanism;
in order to integrate the features in the human body analysis and gesture estimation tasks, the correlation between the analysis features and the gesture features needs to be found out, and the inter-task attention mechanism is adopted to calculate the space-time correlation between the analysis features and the gesture features. For the analysis task, the method inputs the gesture estimation feature into the analysis NLN, and firstly analyzes the feature matrix Z H And gesture feature momentArray Z P Performing linear transformation, then performing dot product and normalization on the result to finally obtain an inter-task attention matrix a of the human body analysis task H→P . The formula is as follows:
Figure BDA00041225052600001111
wherein the method comprises the steps of
Figure BDA00041225052600001112
And->
Figure BDA00041225052600001113
The weight parameters representing the linear transformation.
Similarly, the human body analytic features are input into the pose estimation NLN to obtain an inter-task attention matrix a of the pose estimation task P→H . The formula is as follows:
Figure BDA0004122505260000121
wherein the method comprises the steps of
Figure BDA0004122505260000122
And->
Figure BDA0004122505260000123
The weight parameters representing the linear transformation.
Step 2.3: feature aggregation;
the present invention fuses human body analytic features and pose estimation features in all frames to predict body parts at points in a particular frame using intra-task and inter-task attention matrices.
Specifically, for human body analysis tasks, the feature matrix Z is first of all H Respectively linearly transformed into
Figure BDA0004122505260000124
And
Figure BDA0004122505260000125
then respectively and intra-task attention matrix a H And inter-task attention matrix a H→P Multiplying to obtain the intra-task feature and inter-task feature, and calculating the weighted sum of all features according to the correlation between all features and the current frame. Finally, the intra-task features and inter-task features are connected, and the result is connected with the original feature Z H Adding elements by elements and generating final aggregated human body analytic features Y H . The formula is as follows:
Figure BDA0004122505260000126
wherein the method comprises the steps of
Figure BDA0004122505260000127
And->
Figure BDA0004122505260000128
Respectively linear transformation parameters.
For the attitude estimation task, the aggregated attitude estimation feature Y is finally obtained in the same way as the method P . The formula is as follows:
Figure BDA0004122505260000129
wherein the method comprises the steps of
Figure BDA00041225052600001210
And->
Figure BDA00041225052600001211
Respectively linear transformation parameters.
Step 2.4: outputting a model;
human body analytic characteristic Y H And pose estimation feature Y P The three-dimensional model is processed by a multi-layer perceptron (MLP) and a fully connected neural network (FC) respectively, and the final output is human body part classification information and human skeleton key point position information.
Step 3: multitasking supervision
In the off-line training stage of the system, the Kinect system is adopted to obtain the true values of the human body analysis tag and the gesture estimation tag. It should be noted that the Kinect system is only used in the offline training phase, but is not required in the reasoning phase. For human body parsing tasks, the present invention employs cross entropy loss to minimize the error between the predicted body part classification and the true classification for each point. The formula is as follows:
Figure BDA0004122505260000131
where N represents the number of points, K is the class number of the semantic tags,
Figure BDA0004122505260000132
is a function of the output 0 or 1, when +.>
Figure BDA0004122505260000133
When 1 indicates that sample n belongs to category k, < >>
Figure BDA0004122505260000134
Is the predicted probability that sample n belongs to class k.
For the pose estimation task, the present invention employs Mean Square Error (MSE) to minimize the error between the predicted and actual positions of the skeletal joints. The formula is as follows:
Figure BDA0004122505260000135
wherein I II the L2-paradigm is represented,
Figure BDA0004122505260000136
and p m Respectively representing a predicted value and an actual value of the bone joint, and M represents the number of selected bone joint points.
The network architecture designed by the invention is trained end to end, and the overall supervision function of the system is as follows:
L=γL H +βL P
where γ and β are hyper-parameters.
As shown in fig. 4, the system accuracy in different scenarios is shown.
Example 1: emergency rescue
In emergency rescue, rescue teams often need to be tightly matched, and tools are accurately handed to teammates. However, in a scene such as a fire rescue scene, a large amount of smoke exists, and the conventional imaging device based on the camera is difficult to work normally, and the imaging device based on the millimeter wave still has robustness in a severe environment. According to the invention, by adding the extra semantic tag with the body part as extra input, the accuracy of the hand of the radar identification personnel can be improved, so that the rescue personnel can be assisted to accurately finish tool handover in a smoke environment.
Example 2: motion recognition
In some scenarios, accurate perception of human motion is required. In the scene of nursing homes, millimeter wave equipment for health monitoring needs to accurately identify the falling action of the old. According to the method, the action recognition performance of the millimeter wave equipment can be improved by adding the semantic information of the body part of the point cloud.
Example 3: identification recognition
Since cameras may pose privacy violations, more and more millimeter wave devices have been deployed in recent years in private settings such as warehouses, offices, etc. to replace cameras for monitoring. Unlike camera imaging, which contains rich semantic information, millimeter wave imaging has inherent sparsity and is disadvantageous in identifying personnel. According to the invention, by adding the semantic information of the body part of the point cloud, the personnel identity recognition performance of the millimeter wave equipment can be improved.
Example 4: autopilot
In the field of automatic driving, the method has great significance in helping the vehicle understand and recognize the actions of pedestrians, and can be applied to vehicle-mounted millimeter wave radar equipment to improve the action recognition capability of the automatic driving vehicle to the pedestrians, and pre-judgment and response can be timely carried out when emergency is met.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features.

Claims (8)

1. A dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning is characterized by comprising the following steps of: the method comprises the following steps:
step 1: the sparsity of the millimeter wave point cloud is solved through a multi-task feature extraction module; the method comprises the steps of adopting a multi-task learning model to jointly execute a main task of human body analysis and an auxiliary task of gesture estimation; the auxiliary task can effectively guide the human analytic network to extract high-level structural features representing the posture of the subject; because of strong correlation between the human body posture and the human body analysis, the human body posture correlation characteristics are beneficial to improving the accuracy and the robustness of the analysis network in predicting semantic tags; for human analysis and gesture estimation tasks, the multi-task learning model extracts corresponding features in parallel;
step 2; the problem of missing body parts in the point cloud data caused by specular reflection is solved by a multitasking feature fusion module; the inspiration is obtained from a non-local network, a multi-task feature fusion method is designed, and the intra-task attention and inter-task attention mechanisms are combined, so that the space-time features of the main body are aggregated from the global angle;
step 3: the invention adopts a Kinect system to obtain the true values of the human body analysis tag and the gesture estimation tag in the off-line training stage of the system.
2. The dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning is characterized by comprising the following steps of: the step 1 specifically, the multi-task learning model may be divided into a point module, a frame module and a feature aggregation module, wherein the input of the model is a frame sequence with a length s, each frame comprises n points, and each point comprises d feature dimensions;
step 1.1, extracting analysis characteristics of a human body;
at the point module, for the t moment pairPoint set C of the application frame t Any radar point p in (2) i,t Obtaining a high-dimensional characteristic representation of the point by adopting a multi-presentation perceptron, namely the point characteristic
Figure FDA0004122505250000011
The formula is as follows:
Figure FDA0004122505250000012
wherein θ is e The learning parameters of the MLP are represented, and H represents the human body analysis task;
at the frame module, for each radar point p i,t Point features of (2)
Figure FDA0004122505250000021
First encoded as a higher-dimensional feature representation +.>
Figure FDA0004122505250000022
The formula is as follows:
Figure FDA0004122505250000023
wherein θ is h A learnable parameter representing MLP;
step 1.2: extracting human body posture characteristics;
for extraction of human body posture features, a slightly different network architecture is used than for human body analytic feature extraction; in particular, frame characteristics for any frame
Figure FDA0004122505250000024
Processing by using a long-term and short-term memory network; the formula is as follows:
Figure FDA0004122505250000025
wherein θ is r Parameters representing LSTM;
finally, will
Figure FDA0004122505250000026
Point features associated with the task of human posture->
Figure FDA0004122505250000027
Connecting to obtain characteristic vector of each point in the frame under the task of attitude estimation>
Figure FDA0004122505250000028
The formula is as follows:
Figure FDA0004122505250000029
3. the dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning is characterized by comprising the following steps of: further in the step 1.1, the point characteristics of all points in the frame are obtained
Figure FDA00041225052500000210
Aggregation into frame features->
Figure FDA00041225052500000211
To extract global information of the frame; the formula is as follows:
Figure FDA00041225052500000212
wherein N is the number of points contained in the frame corresponding to the time t, A () represents the attention function and is theta a A learnable parameter representing an attention function;
finally, frame characteristics
Figure FDA0004122505250000031
Connecting to point features/>
Figure FDA0004122505250000032
Obtaining feature vector of each point in the frame under human body analysis task
Figure FDA0004122505250000033
The formula is as follows:
Figure FDA0004122505250000034
4. the dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning according to claim 1, wherein the method comprises the following steps of: step 2, respectively executing human body analysis and posture estimation tasks by using two parallel NLNs; the method comprises the following steps:
step 2.1: an intra-task attention mechanism, for human body analysis tasks, analyzing NLN with a series of analysis features as input, and executing task self-attention to aggregate the features of different frames; this will generate a global context for classifying the body parts in each frame to solve the problem of losing body parts due to specular reflection in the local frames;
step 2.2: an inter-task attention mechanism; in order to integrate the features in the human body analysis and gesture estimation tasks, the correlation between the analysis features and the gesture features needs to be found, and an inter-task attention mechanism is adopted to calculate the space-time correlation between the analysis features and the gesture features; for analysis tasks, the method inputs the gesture estimation features into an analysis NLN, and firstly analyzes a feature matrix Z H And gesture feature matrix Z P Performing linear transformation, then performing dot product and normalization on the result to finally obtain an inter-task attention matrix a of the human body analysis task H→P
Step 2.3 feature polymerization: the present invention fuses human body analytic features and pose estimation features in all frames to predict body parts at points in a particular frame using intra-task and inter-task attention matrices.
Step 2.4: outputting a model; human body analytic characteristic Y H And pose estimation feature Y P The three-dimensional human body part classification information and the human skeleton key point position information are output finally after being processed by a multi-layer perceptron and a fully-connected neural network respectively.
5. The dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning according to claim 4, wherein the method comprises the following steps of:
the step 2.1 is as follows: for the analysis task, the point characteristics of all points in all time sequences are firstly calculated
Figure FDA0004122505250000041
Stacked as feature matrix Z H After that, Z H Obtaining an embedded vector by linear transformation>
Figure FDA0004122505250000042
And->
Figure FDA0004122505250000043
Further, in order to estimate the space-time correlation between the points in each group of frames, dot product and normalization processing are performed on the embedded vectors through a nonlinear function, so as to obtain an intra-task attention matrix a under the human body analysis task H . The formula is as follows:
Figure FDA0004122505250000044
where sigma represents a non-linear function,
Figure FDA0004122505250000045
and->
Figure FDA0004122505250000046
Parameters representing a linear transformation;
similarly, in the pose estimation task, the same processing procedure is performedObtaining the attention matrix a in the corresponding task P The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure FDA0004122505250000047
where sigma represents a non-linear function,
Figure FDA0004122505250000048
and->
Figure FDA0004122505250000049
Parameters representing linear transformation, Z P Point characteristics representing all points in all time sequences under the task of pose estimation +.>
Figure FDA00041225052500000410
And stacking the obtained feature matrix.
6. The dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning according to claim 4, wherein the method comprises the following steps of: the specific formula of the step 2.2 is as follows:
Figure FDA00041225052500000411
wherein the method comprises the steps of
Figure FDA00041225052500000412
And->
Figure FDA00041225052500000413
Weight parameters representing linear transformations;
similarly, the human body analytic features are input into the pose estimation NLN to obtain an inter-task attention matrix a of the pose estimation task P→H The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure FDA0004122505250000051
wherein the method comprises the steps of
Figure FDA0004122505250000052
And->
Figure FDA0004122505250000053
The weight parameters representing the linear transformation.
7. The dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning according to claim 4, wherein the method comprises the following steps of: said step 2.3 is specifically that for human body analysis tasks, the feature matrix Z is first of all applied H Respectively linearly transformed into
Figure FDA0004122505250000054
And->
Figure FDA0004122505250000055
Then respectively and intra-task attention matrix a H And inter-task attention matrix a H→P Multiplying to obtain the intra-task features and inter-task features respectively, and calculating the weighted sum of all the features according to the correlation between all the features and the current frame; finally, the intra-task features and inter-task features are connected, and the result is connected with the original feature Z H Adding elements by elements and generating final aggregated human body analytic features Y H The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure FDA0004122505250000056
wherein the method comprises the steps of
Figure FDA0004122505250000057
And->
Figure FDA0004122505250000058
Respectively linear transformation parameters;
for the attitude estimation task, the aggregated attitude estimation feature Y is finally obtained in the same way as the method P The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure FDA0004122505250000059
wherein the method comprises the steps of
Figure FDA00041225052500000510
And->
Figure FDA00041225052500000511
Respectively linear transformation parameters.
8. The dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning according to claim 1, wherein the method comprises the following steps of: the step 3 needs to be described as follows: the Kinect system is only used in the offline training phase, but is not required in the reasoning phase; for human body analytic tasks, cross entropy loss is adopted to minimize errors between predicted body part classification and true classification of each point; the formula is as follows:
Figure FDA0004122505250000061
where N represents the number of points, K is the class number of the semantic tags,
Figure FDA0004122505250000062
is a function of the output 0 or 1, when +.>
Figure FDA0004122505250000063
When 1 indicates that sample n belongs to category k, < >>
Figure FDA0004122505250000064
Is the predicted probability that sample b belongs to class k;
for the attitude estimation task, a mean square error is adopted to minimize the error between the predicted position and the actual position of the skeleton joint; the formula is as follows:
Figure FDA0004122505250000065
wherein I II the L2-paradigm is represented,
Figure FDA0004122505250000066
and p m Respectively representing a predicted value and an actual value of the bone joint, wherein M represents the number of selected bone joint points; the network architecture is trained end to end, and the overall supervision function of the system is as follows:
L=γL H +βL p
where γ and β are hyper-parameters.
CN202310236507.6A 2023-03-13 2023-03-13 Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning Pending CN116359846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310236507.6A CN116359846A (en) 2023-03-13 2023-03-13 Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310236507.6A CN116359846A (en) 2023-03-13 2023-03-13 Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning

Publications (1)

Publication Number Publication Date
CN116359846A true CN116359846A (en) 2023-06-30

Family

ID=86939780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310236507.6A Pending CN116359846A (en) 2023-03-13 2023-03-13 Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning

Country Status (1)

Country Link
CN (1) CN116359846A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630394A (en) * 2023-07-25 2023-08-22 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630394A (en) * 2023-07-25 2023-08-22 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint
CN116630394B (en) * 2023-07-25 2023-10-20 山东中科先进技术有限公司 Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint

Similar Documents

Publication Publication Date Title
Zhang et al. Real-time human motion behavior detection via CNN using mmWave radar
CN108447078B (en) Interference perception tracking algorithm based on visual saliency
Lim et al. A feature covariance matrix with serial particle filter for isolated sign language recognition
Adithya et al. Artificial neural network based method for Indian sign language recognition
CN111523559B (en) Abnormal behavior detection method based on multi-feature fusion
Nigam et al. A review of computational approaches for human behavior detection
Appenrodt et al. Data gathering for gesture recognition systems based on single color-, stereo color-and thermal cameras
Boualia et al. Pose-based human activity recognition: a review
CN111523378A (en) Human behavior prediction method based on deep learning
Rangesh et al. Hidden hands: Tracking hands with an occlusion aware tracker
CN116359846A (en) Dynamic millimeter wave Lei Dadian cloud human body analysis method based on joint learning
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
Martin et al. Estimation of pointing poses for visually instructing mobile robots under real world conditions
Aldahoul et al. A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
Uppal et al. Emotion recognition and drowsiness detection using Python
Putro et al. An efficient face detector on a cpu using dual-camera sensors for intelligent surveillance systems
Huang et al. Small target detection model in aerial images based on TCA-YOLOv5m
Feng et al. DAMUN: A domain adaptive human activity recognition network based on multimodal feature fusion
Rafiq et al. Real-time vision-based bangla sign language detection using convolutional neural network
CN115798055B (en) Violent behavior detection method based on cornersort tracking algorithm
Rodrigues et al. Modeling and assessing an intelligent system for safety in human-robot collaboration using deep and machine learning techniques
Wachs et al. Human posture recognition for intelligent vehicles
Duth et al. Human Activity Detection Using Pose Net
Dos Santos et al. Pedestrian trajectory prediction with pose representation and latent space variables
Itano et al. Human actions recognition in video scenes from multiple camera viewpoints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination