CN114924249B - Millimeter wave radar-based human body posture estimation method and device and electronic equipment - Google Patents

Millimeter wave radar-based human body posture estimation method and device and electronic equipment Download PDF

Info

Publication number
CN114924249B
CN114924249B CN202210859873.2A CN202210859873A CN114924249B CN 114924249 B CN114924249 B CN 114924249B CN 202210859873 A CN202210859873 A CN 202210859873A CN 114924249 B CN114924249 B CN 114924249B
Authority
CN
China
Prior art keywords
feature
vector
information
sub
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210859873.2A
Other languages
Chinese (zh)
Other versions
CN114924249A (en
Inventor
陈彦
解春阳
张东恒
张冬
孙启彬
吴曼青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210859873.2A priority Critical patent/CN114924249B/en
Publication of CN114924249A publication Critical patent/CN114924249A/en
Application granted granted Critical
Publication of CN114924249B publication Critical patent/CN114924249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The invention provides a method and a device for estimating a human body posture based on a millimeter wave radar and electronic equipment, wherein the method comprises the following steps: acquiring a horizontal projection information set and a vertical projection information set related to a radar signal set reflected by a human body at a first moment; performing feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set; converting the feature vector into a feature sub-vector sequence; performing first coding processing on a target characteristic sub-vector sequence comprising position coding information to obtain a characteristic coding sub-vector sequence; and carrying out second coding processing on the feature coding vector to obtain the three-dimensional posture of the human body at the first moment, wherein the feature coding vector is determined according to the feature coding sub-vector sequence.

Description

Millimeter wave radar-based human body posture estimation method and device and electronic equipment
Technical Field
The invention relates to the field of computers and signal processing, in particular to a human body posture estimation method and device based on a millimeter wave radar and electronic equipment.
Background
The human body posture estimation has high application value in the aspects of intelligent home, enterprise security, virtual reality, medical care and the like. With the development of deep learning, three-dimensional human body posture estimation based on visual images or videos has made great progress. However, capturing human body gestures through a camera still has some limitations. First, the camera-based pose estimation method does not adapt well to real-world complex scenes, such as occlusion, dim lighting, motion blur, etc. In addition, privacy concerns are not negligible. Especially in fields such as intelligent house and medical care, use the camera in a large number and can bring privacy problem.
In view of the above, more and more researchers are focusing on wireless sensing technology, i.e., sensing and learning human activities using radio signals which are more private, have a certain penetration force, and are not affected by lighting conditions. The millimeter wave radar-based wireless sensing technology has high application value due to the stronger spatial resolution of radio frequency signals and reasonable hardware cost.
Disclosure of Invention
In view of this, the invention provides a method and a device for estimating a human body posture based on a millimeter wave radar, and an electronic device.
One aspect of the present invention provides a human body posture estimation method based on a millimeter wave radar, including: acquiring a horizontal projection information set and a vertical projection information set related to a radar signal set reflected by a human body at a first moment; performing feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set; converting the feature vectors into a feature sub-vector sequence, wherein the feature sub-vector sequence comprises a plurality of feature sub-vectors corresponding to a plurality of pieces of human body bone point information, the plurality of feature sub-vectors comprise at least one first mask vector, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents the relative position information of the human body bone point information; performing first coding processing on a target feature sub-vector sequence including the position coding information to obtain a feature coding sub-vector sequence, wherein the feature coding sub-vector sequence includes a plurality of feature coding sub-vectors corresponding to the plurality of pieces of human body bone point information, each feature coding sub-vector includes position representation information corresponding to the human body bone point information corresponding to the feature coding sub-vector, and the position representation information represents absolute position information of the human body bone point information; and carrying out second coding processing on the feature coding vector to obtain the three-dimensional posture of the human body at the first moment, wherein the feature coding vector is determined according to the feature coding sub-vector sequence.
Another aspect of the present invention provides a body posture estimation device based on millimeter wave radar, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a horizontal projection information set and a vertical projection information set which are related to a radar signal set reflected by a human body at a first moment; the first obtaining module is used for carrying out feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set; the conversion module is used for converting the feature vectors into a feature sub-vector sequence, wherein the feature sub-vector sequence comprises a plurality of feature sub-vectors corresponding to a plurality of pieces of human body bone point information, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents the relative position information of the human body bone point information; a second obtaining module, configured to perform a first encoding process on a target feature sub-vector sequence including the position encoding information, to obtain a feature encoding sub-vector sequence, where the feature encoding sub-vector sequence includes multiple feature encoding sub-vectors corresponding to the multiple pieces of human bone point information, each feature encoding sub-vector includes position characterizing information corresponding to the human bone point information corresponding to the feature encoding sub-vector, and the position characterizing information characterizes absolute position information of the human bone point information; and the third obtaining module is used for carrying out second coding processing on the feature coding vector to obtain the three-dimensional posture of the human body at the first moment, wherein the feature coding vector is determined according to the feature coding sub-vector sequence.
Another aspect of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a millimeter wave radar based body pose estimation method according to the present disclosure.
According to the embodiment of the invention, as the feature sub-vector sequence comprising the first mask vector is subjected to the first coding treatment to obtain the plurality of feature coding sub-vectors corresponding to the plurality of pieces of human body bone point information, the position information of the human body bone points which are not captured by the radar can be recovered, the technical problem that all information of the human body cannot be captured by the radar signal obtained at a single moment is solved, the more complete position information of the human body bone points is obtained, and the three-dimensional posture of the human body at the corresponding moment is obtained by combining the second coding treatment, so that the efficiency and the accuracy of posture estimation are improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 schematically shows an exemplary system architecture to which a millimeter wave radar-based human body posture estimation method may be applied, according to an embodiment of the present invention;
FIG. 2 is a flow chart schematically illustrating a millimeter wave radar-based human body pose estimation method according to an embodiment of the present invention;
FIG. 3 schematically shows a schematic diagram of extracting horizontal projection information and vertical projection information according to an embodiment of the invention;
FIG. 4 schematically illustrates a schematic diagram of a feature fusion module according to an embodiment of the invention;
FIG. 5A schematically illustrates a schematic diagram of a spatial attention module in accordance with an embodiment of the invention;
FIG. 5B schematically shows a schematic diagram of a temporal attention module according to an embodiment of the invention;
FIG. 5C schematically illustrates a schematic diagram of a multi-headed self-attention encoder in accordance with an embodiment of the present invention;
FIG. 6 schematically shows a schematic diagram of a millimeter wave radar based body pose estimation system according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a millimeter wave radar-based human body pose estimation apparatus according to an embodiment of the present invention; and
FIG. 8 schematically shows a block diagram of a computer system suitable for implementing the above described method according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). Where a convention analogous to "at least one of A, B, or C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, or C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
The existing research work shows that radio frequency signals generated by the millimeter wave radar carry a large amount of information and can be used for estimating two-dimensional or three-dimensional human body postures.
In the process of implementing the concept of the present invention, the inventor finds that the existing research work does not fully explore the characteristics of the radio frequency signal in the wireless sensing field, so that the performance cannot be further improved. Firstly, the radio frequency signals reflected after contacting the human body have different distribution characteristics on channels and characteristic scales, so that the difficulty of characteristic fusion is increased. In addition, since the human body is specularly reflected relative to the radio-frequency signal, part of the body part cannot reflect the radio-frequency signal to the receiving antenna of the radar, which means that the millimeter-wave radar cannot capture all the information of the human body at a single transmission and reception, resulting in sparsity and incompleteness of the information. These characteristics make current human body pose estimation based on radio frequency signals a challenging task.
The embodiment of the invention provides a human body posture estimation method and device based on a millimeter wave radar and electronic equipment. The method comprises the steps of obtaining a horizontal projection information set and a vertical projection information set related to a radar signal set reflected by a human body at a first moment; performing feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set; converting the characteristic vectors into a characteristic sub-vector sequence, wherein the characteristic sub-vector sequence comprises a plurality of characteristic sub-vectors corresponding to a plurality of pieces of human body bone point information, the plurality of characteristic sub-vectors comprise at least one first mask vector, each characteristic sub-vector corresponds to one piece of position coding information, and the position coding information represents the relative position information of the human body bone point information; performing first coding processing on a target characteristic sub-vector sequence comprising position coding information to obtain a characteristic coding sub-vector sequence, wherein the characteristic coding sub-vector sequence comprises a plurality of characteristic coding sub-vectors corresponding to a plurality of pieces of human body bone point information, each characteristic coding sub-vector comprises position representation information corresponding to the human body bone point information corresponding to the characteristic coding sub-vector, and the position representation information represents absolute position information of the human body bone point information; and carrying out second coding processing on the feature coding vector to obtain the three-dimensional posture of the human body at the first moment, wherein the feature coding vector is determined according to the feature coding sub-vector sequence.
Fig. 1 schematically shows an exemplary system architecture 100 to which a millimeter-wave radar-based human body pose estimation method may be applied, according to an embodiment of the present invention. It should be noted that fig. 1 is only an example of a system architecture to which the embodiment of the present invention may be applied to help those skilled in the art understand the technical content of the present invention, and does not mean that the embodiment of the present invention may not be applied to other devices, systems, environments or scenarios.
As shown in FIG. 1, a system architecture 100 according to this embodiment may include a first millimeter wave radar 101, a second millimeter wave radar 102, a network 104, and a server 105. Network 104 is used to provide a medium of communication links between first millimeter-wave radar 101, second millimeter-wave radar 102, and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth. In certain embodiments, the system architecture 100 may also include a multi-camera system 103.
First millimeter-wave radar 101, second millimeter-wave radar 102 may interact with server 105 through network 104 to receive or send messages, etc. First millimeter-wave radar 101, second millimeter-wave radar 102 may be various devices having radar signal transceiving functions, alternative devices of which may include, but are not limited to, microwave radars, photoelectric radars, and the like.
Server 105 may be a server that provides various services, such as a background management server (for example only) that provides support for radar signals collected by first millimeter wave radar 101 and second millimeter wave radar 102. The background management server can analyze and process the received information such as the radar signal and the like, and feed back the processing result.
It should be noted that the method for estimating the human body posture based on the millimeter wave radar provided by the embodiment of the present invention may be generally executed by the server 105. Accordingly, the body posture estimation device based on millimeter wave radar provided by the embodiment of the invention can be generally arranged in the server 105. The body posture estimation method based on the millimeter wave radar provided by the embodiment of the invention can also be executed by a server or a server cluster which is different from the server 105 and can be communicated with the first millimeter wave radar 101, the second millimeter wave radar 102 and/or the server 105. Accordingly, the body posture estimation device based on millimeter wave radar provided by the embodiment of the present invention may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the first millimeter wave radar 101, the second millimeter wave radar 102 and/or the server 105.
For example, the first millimeter wave radar 101 and the second millimeter wave radar 102 may generate and receive radar signals. The server 105 may obtain a horizontal projection information set and a vertical projection information set related to a radar signal set reflected by a human body at a first time, and locally perform the method for estimating the posture of the human body based on the millimeter wave radar provided in the embodiment of the present invention, or send the horizontal projection information set and the vertical projection information set to another terminal device, a server, or a server cluster, and perform the method for estimating the posture of the human body based on the millimeter wave radar provided in the embodiment of the present invention by another terminal device, a server, or a server cluster that receives the horizontal projection information set and the vertical projection information set.
It should be understood that the number of millimeter wave radars, multi-camera systems, networks, and servers in fig. 1 are merely illustrative. There may be any number of millimeter wave radars, multi-camera systems, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flowchart of a human body posture estimation method based on millimeter wave radar according to an embodiment of the present invention.
As shown in FIG. 2, the method includes operations S201 to S205.
In operation S201, a set of horizontal projection information and a set of vertical projection information related to a set of radar signals reflected by a human body at a first time are acquired.
According to an embodiment of the present invention, the first timing may be an arbitrary timing. The set of radar signals may include a plurality of radar signals reflected by a human body. For example, based on the basic principle of millimeter Wave radar, FMCW (Frequency Modulated Continuous Wave) and linear antenna arrays may be used for signal transmission and reception. The antenna can produce radar signal, and radar signal meets the human body in the space and can obtain the radar signal of human reflection, also can be called echo signal after producing the reflection. By combining FMCW and antenna array, it is possible to combine the time t space
Figure 642132DEST_PATH_IMAGE001
The radar signal reflected by the location is represented in the form as shown in equation (1).
Figure 762535DEST_PATH_IMAGE002
(1)
In the formula (1), the first and second groups of the compound,
Figure 260381DEST_PATH_IMAGE003
can be represented in
Figure 655590DEST_PATH_IMAGE004
At the first moment
Figure 673225DEST_PATH_IMAGE005
Received on the root antenna
Figure 964529DEST_PATH_IMAGE006
The information of the individual echo signals is used,
Figure 231562DEST_PATH_IMAGE007
may be indicative of the wavelength of the echo signal,
Figure 882992DEST_PATH_IMAGE008
may indicate from the antenna to
Figure 20713DEST_PATH_IMAGE001
The round-trip distance of the location(s),
Figure 217339DEST_PATH_IMAGE009
may represent the total number of echo sampled signals that can be received on each antenna,
Figure 971668DEST_PATH_IMAGE010
the total number of antennas can be represented as,
Figure 177521DEST_PATH_IMAGE011
is an imaginary unit.
By characterizing information of radar signals
Figure 153437DEST_PATH_IMAGE012
The horizontal projection information and the vertical projection information corresponding to the radar signal can be obtained by performing decomposition in the horizontal direction and the vertical direction.
In operation S202, feature fusion is performed on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set.
According to an embodiment of the present invention, after the horizontal projection information set and the vertical projection information set are acquired, fine-grained features in the horizontal direction and fine-grained features in the vertical direction may be extracted from the respective radar signals based on the horizontal projection information and the vertical projection information associated with each radar signal. By performing feature fusion on the fine-grained features in the horizontal direction and the fine-grained features in the vertical direction, a feature vector in which the features of the horizontal projection information and the features of the vertical projection information are fused can be obtained.
In operation S203, the feature vector is converted into a feature sub-vector sequence, where the feature sub-vector sequence includes a plurality of feature sub-vectors corresponding to a plurality of pieces of human bone point information, the plurality of feature sub-vectors includes at least one first mask vector, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents relative position information of the human bone point information.
According to the embodiment of the present invention, since only the radar signal reflected by a specific part of a human body can be captured at a time, the feature sub-vector sequence obtained by converting the feature vector related to the radar signal may include a feature sub-vector in a non-mask form corresponding to a first part of human body bone point information in the specific part of the human body, a feature sub-vector in a mask form (i.e., a first mask vector) corresponding to a second part of human body bone point information in other parts of the corresponding human body, and the like. Other parts of the body may include parts of the body not captured by the radar at that time. The first mask vector may include a zero value vector and other vectors determined by other mask information, and so on.
According to embodiments of the present invention, the relative position information may include position information determined from custom coded information. For example, the three-dimensional posture of the human body can be confirmed according to 10 to 16 human body skeleton points, the plurality of human body skeleton point information can comprise information corresponding to the 10 to 16 human body skeleton points, and the position coding information corresponding to the plurality of feature sub-vectors can be respectively expressed in the forms of 1 to 16.
In operation S204, a target feature sub-vector sequence including position coding information is subjected to a first coding process to obtain a feature coding sub-vector sequence, where the feature coding sub-vector sequence includes a plurality of feature coding sub-vectors corresponding to a plurality of pieces of human body bone point information, each feature coding sub-vector includes position characterizing information corresponding to the piece of human body bone point information corresponding to the feature coding sub-vector, and the position characterizing information characterizes absolute position information of the piece of human body bone point information.
According to an embodiment of the present invention, in the course of the first encoding process, a non-local correlation between a first part of human skeletal point information in a specific part of a human body and a second part of human skeletal point information in other parts of the human body can be learned. Based on this, a plurality of feature encoding sub-vectors corresponding to all the information of the skeletal points of the human body can be obtained by performing the first encoding process on a plurality of feature sub-vectors including the mask vector. All the skeleton point information of the human body can comprise the first part of the skeleton point information and the second part of the skeleton point information of the human body, and can comprise information corresponding to the 10 to 16 skeleton points of the human body. The feature encoding subvector may be characterized as a feature encoding result including absolute position information of a human skeleton point, and the absolute position information described in the embodiments of the present invention may include position information of the human skeleton point in world coordinates and position information of the human skeleton point in a predefined coordinate system.
In operation S205, a second encoding process is performed on the feature encoding vector to obtain a three-dimensional posture of the human body at the first time, where the feature encoding vector is determined according to the feature encoding sub-vector sequence.
According to an embodiment of the present invention, the second encoding process may process a feature encoding vector including position representation information of a plurality of human body bone points to obtain absolute position information of each of the plurality of human body bone points, and may determine a three-dimensional posture of the human body according to the absolute position information.
According to an embodiment of the present invention, the network employed by the procedure of the first encoding process and the procedure of the second encoding process may include multi-head self-attention encoders of the same structure or different structures.
According to the embodiment of the invention, as the feature sub-vector sequence comprising the first mask vector is subjected to the first coding treatment to obtain the plurality of feature coding sub-vectors corresponding to the plurality of pieces of human body bone point information, the position information of the human body bone points which are not captured by the radar can be recovered, the technical problem that all information of the human body cannot be captured by the radar signal obtained at a single moment is solved, the more complete position information of the human body bone points is obtained, and the three-dimensional posture of the human body at the corresponding moment is obtained by combining the second coding treatment, so that the efficiency and the accuracy of posture estimation are improved.
The method shown in fig. 2 is further described below with reference to specific examples.
According to the embodiment of the invention, when the human body posture estimation needs to be carried out based on the millimeter wave radar, the characteristic coding vector sequence corresponding to the human body at a plurality of continuous second moments can be firstly obtained based on the same mode of determining the characteristic coding sequence corresponding to the first moment. The feature encoding vector sequence comprises a plurality of feature encoding vectors corresponding to a plurality of second time instants, and the plurality of feature encoding vectors comprise at least one second mask vector. And then, carrying out third coding processing on the feature coding vector sequence to obtain a three-dimensional attitude sequence. The three-dimensional gesture sequence comprises a plurality of three-dimensional gestures corresponding to the human body at a plurality of second moments.
According to the embodiment of the present invention, the plurality of consecutive second time instants may include the first time instant, or may be other consecutive time instants unrelated to the first time instant. At a certain or some second time, the radar signal may be obtained, and then the corresponding feature code vector may be determined, or the radar signal may not be obtained due to factors such as environmental interference, in this case, the corresponding feature code vector may be null, and the obtained feature code vector sequence may be a sequence including the second mask vector. The second mask vector may include a zero value vector and other vectors determined by other mask information, and so on.
According to the embodiment of the present invention, in the course of the third encoding process, the time-relevance information of the consecutive plurality of second time instants can be learned. Based on this, the three-dimensional attitude sequences corresponding to the plurality of second time instants can be obtained by performing the third encoding process on the feature encoding vector sequences corresponding to the plurality of second time instants.
Through the embodiment of the invention, the three-dimensional posture sequences corresponding to the second moments can be determined according to the feature coding vector sequences corresponding to the second moments and comprising the second mask vectors, so that the accurate and continuous three-dimensional human body posture can be obtained, and the effectiveness and the integrity of the three-dimensional posture estimation result can be improved.
According to an embodiment of the present invention, acquiring a set of horizontal projection information and a set of vertical projection information related to a set of radar signals reflected by a human body at a first time may include: for each radar signal in the set of radar signals, horizontal round-trip distance information and vertical round-trip distance information between an antenna that generates and receives the radar signal and a reflection point of the radar signal in space are acquired. And determining horizontal projection information related to the radar signal according to the echo signal information related to the radar signal, the wavelength information of the echo signal and the horizontal round trip distance information. And determining vertical projection information related to the radar signal according to the echo signal information related to the radar signal, the wavelength information of the echo signal and the vertical round-trip distance information.
According to an embodiment of the present invention, the antenna generating the radar signal and the antenna receiving the radar signal may be the same antenna or different antennas, the horizontal round trip distance information may represent a horizontal distance between the antenna generating the radar signal → the reflection point → the antenna receiving the radar signal, and the vertical round trip distance information may represent a vertical distance between the antenna generating the radar signal → the reflection point → the antenna receiving the radar signal. In the case of obtaining horizontal and vertical round trip distance information, it may be that in time t space
Figure 583281DEST_PATH_IMAGE013
The horizontal projection information of the position reflection is expressed as shown in formula (2), and will be in the space at time t
Figure 28169DEST_PATH_IMAGE014
The vertical projection information of the position reflection is expressed in the form as shown in equation (3).
Figure 772134DEST_PATH_IMAGE015
(2)
Figure 415605DEST_PATH_IMAGE016
(3)
In the formula (2), the first and second groups,
Figure 203301DEST_PATH_IMAGE017
can represent a slave antennaTo
Figure 401064DEST_PATH_IMAGE018
Horizontal round trip distance information for a location. In the formula (3), the first and second groups,
Figure 479879DEST_PATH_IMAGE019
may represent the range from the antenna to
Figure 181118DEST_PATH_IMAGE020
Vertical round trip distance information of a location. The definitions of the other parameter information in the formula (2) and the formula (3) are explained in the related description of the formula (1), and are not repeated herein.
Through the embodiment of the invention, the decomposed horizontal projection information and vertical projection information can be directly obtained by respectively determining the round-trip distance information in the horizontal direction and the vertical direction, and the time complexity and the space complexity of data processing can be effectively reduced by converting the radar signals of the four-dimensional tensor into the horizontal projection information and the vertical projection information of the three-dimensional tensor and performing parallel processing, and the processing can be implemented by using various machine learning platforms, so that the processing efficiency is improved.
It should be noted that, due to the existence of severe multipath interference in the indoor environment, radar signals reflected by a human body cannot be accurately acquired from original radar signals. In order to solve the problem, the radar signal reflected by the human body caused by the motion of the human body changes along with the time, and the multipath effect caused by the static object in the environment is kept unchanged. The radar signals with unchanged multipath effect can be filtered in the time domain to obtain radar signals reflected by the human body.
According to the embodiment of the invention, the whole process of acquiring the horizontal projection information set and the vertical projection information set related to the radar signal set reflected by the human body at the first moment can be completed by designing the signal processing module. The signal processing module can process radar signals collected at a certain moment in space to obtain a horizontal projection information set and a vertical projection information set reflected by a human body at the moment.
According to embodiments of the invention, the radar signal set may include a plurality of radar signals from a plurality of channels. The information scales of the horizontal projection information set and the vertical projection information set are different.
Fig. 3 schematically shows a schematic diagram of extracting horizontal projection information and vertical projection information according to an embodiment of the present invention.
As shown in fig. 3, in the process that the radar 310 transmits a radar signal and returns via the human body 320, horizontal projection information 330 and vertical projection information 340 related to the radar signal generated in the process may be acquired. The radar signal represented by the horizontal projection information 330 is parallel to the ground, and the position information of the human body in the space can be obtained. The radar signal represented by the vertical projection information 340 is vertical to the ground, and the position-related information of each part of the human body (such as each human body bone point) can be obtained.
According to an embodiment of the present invention, in consideration of differences between horizontal projection information and vertical projection information in channels and dimensions, performing feature fusion on the horizontal projection information set and the vertical projection information set, and obtaining a feature vector corresponding to a radar signal set may include: and aiming at the plurality of radar signals, respectively extracting channel characteristics of a horizontal projection information set and a vertical projection information set which are related to the radar signal corresponding to each channel to obtain a first fusion characteristic fused with the plurality of channel characteristics. And based on a plurality of scales, carrying out scale feature extraction on the first fusion feature to obtain a second fusion feature fused with a plurality of scale features and a plurality of channel features. And vectorizing the second fusion feature to obtain a feature vector.
According to the embodiment of the invention, the process of obtaining the feature vector by performing feature fusion on the horizontal projection information set and the vertical projection information set can be completed by designing a feature fusion module. The set of horizontal projection information may include a set of horizontal projection information at a time obtained based on the manner shown in fig. 3. The set of vertical projection information may include a set of vertical projection information for respective time instants obtained based on the manner shown in fig. 3.
FIG. 4 schematically shows a schematic diagram of a feature fusion module according to an embodiment of the invention.
As shown in fig. 4, the feature fusion module 400 may include a channel fusion module 410 and a multi-scale feature fusion module 420. In the channel fusion module 410, two packet convolutions may be used first to perform a preliminary extraction of channel features for the stack of the horizontal projection information set 430 and the vertical projection information set 440 corresponding to each channel, respectively, resulting in a plurality of channel features 411. Then, based on the residual error network with embedded SE (squeez-and-Excitation, which is to compress the features along the spatial dimension, each two-dimensional feature channel becomes a real number, which has a global receptive field to some extent, and the output dimension matches the input feature channel number, excitation, which is to generate a weight for representing the importance of the feature channel based on the correlation between the feature channels, and combine the channel weights 412 provided by the SE module, the features of the plurality of channel features 411 are fused to obtain a first fused feature 413 with a plurality of channel features. Then, considering the scale difference of the features in the horizontal direction and the vertical direction, the deformable convolution and the multi-stage multi-resolution convolution neural network can be combined as the multi-scale fusion module 420, and the scale feature extraction is performed on the first fusion feature 413 based on a plurality of different scales, so as to obtain a second fusion feature 424 fused with a plurality of channel features and a plurality of scale features. The second fused feature 424 may then pass through a lightweight multi-layered perceptron module 425 to generate a feature vector 450 for predicting the pose of the human body.
According to an embodiment of the present invention, the set of horizontal projection information 430 may include the horizontal projection information 330 as in fig. 3, and the set of vertical projection information 440 may include the vertical projection information 340 as in fig. 3. The plurality of different scales on which the channel feature extraction is based may include, and may not be limited to, a first scale 421, a second scale 422, and a third scale 423, etc., as shown in fig. 4.
It should be noted that the feature fusion method described above is only an exemplary embodiment, but is not limited thereto, and other feature fusion methods known in the art may be included as long as they can fuse multi-channel and multi-scale features.
Through the embodiment of the invention, information of different channels and different scales in the horizontal direction and the vertical direction can be efficiently fused.
According to the embodiment of the invention, the feature vector extracted from the radar signal set collected at each moment can be subjected to feature embedding processing by using a full-connection network, and the feature vector is converted into a plurality of feature sub-vectors corresponding to a plurality of pieces of human body bone point information.
According to an embodiment of the present invention, the first encoding process includes M rounds of processes. Performing a first encoding process on a target feature sub-vector sequence including position encoding information to obtain a feature encoded sub-vector sequence may include: and in the mth round, carrying out fourth encoding processing on the encoded information of the (m-1) th round and the (m-1) th dimension reduction result to obtain an mth encoding result. The encoded information of the first round includes a target feature sub-vector sequence, and M is a positive integer greater than 1 and less than or equal to M. And carrying out first dimension reduction processing on the mth coding result to obtain the mth dimension reduction result. And in the case that the dimension of the element in the mth dimension reduction result is determined to be the predefined dimension, determining the mth dimension reduction result as the characteristic coding sub-vector sequence. The predefined dimension is greater than three dimensions.
According to an embodiment of the present invention, the process of the first encoding process may include processes of a fourth encoding process and a first dimension reduction process of M rounds. The process of the fourth encoding process may be performed in conjunction with a multi-headed self-attention encoder, which may implement a dimension-invariant feature transform. The first dimension reduction process may be performed in conjunction with a linear layer network having a dimension reduction function. Since the final objective is to regress the absolute position information of the human skeleton point from the target feature sub-vector sequence, in this embodiment, for example, a progressive dimensionality reduction model may be constructed by alternately stacking a multi-head self-attention encoder and a linear layer network, so as to gradually implement the transformation from the feature dimensionality to the absolute position information of the human skeleton point, so as to complete the pose estimation process.
According to an embodiment of the present invention, the above-mentioned process of M rounds of the first encoding process may be accomplished by designing a spatial attention module. The spatial attention module may utilize an attention mechanism to establish non-local associations of human skeletal points with other parts of the body, thereby restoring body parts not captured by the radar.
FIG. 5A schematically illustrates a schematic diagram of a spatial attention module, according to an embodiment of the invention.
As shown in FIG. 5A, M first sub-modules 511, \8230, 51M may be included in the spatial attention module 510. Each first sub-module may include a multi-headed self-attention encoder and a linear-layer network. The input of the spatial attention module 510 may include as many target feature sub-vector sequences 501 as the number of the human bone point information, and the first mask vector 505 may be included in the target feature sub-vector sequences 501. After multiple rounds of encoding and dimension reduction, a feature encoded sub-vector sequence 502 can be obtained, which includes the same number of feature encoded sub-vectors as the number of the human skeletal point information.
It should be noted that the predefined dimension may be selected by the training process as the most suitable dimension for the spatial attention module to process, and the predefined dimension may not be limited thereto. The value of M can be determined according to the dimension of the elements in the target feature sub-vector sequence, the dimensionality reduction degree of each of the first sub-modules 511 to 51M and the predefined dimension.
According to the embodiment of the present invention, the third encoding process includes N rounds of processes. Performing a third encoding process on the feature encoding vector sequence to obtain a three-dimensional attitude sequence may include: and in the nth round, carrying out fifth coding processing on the coded information of the nth-1 round and the nth-1 dimension reduction result to obtain an nth coding result. The encoded information of the first round comprises a feature encoding vector sequence, and N is a positive integer greater than 1 and less than or equal to N. And carrying out second dimension reduction processing on the nth coding result to obtain an nth dimension reduction result. And under the condition that the dimension of the element in the nth dimension reduction result is determined to be three-dimensional, determining a three-dimensional attitude sequence according to the nth dimension reduction result.
According to an embodiment of the present invention, the process of the third encoding process may include processes of N rounds of the fifth encoding process and the second dimension reduction process. The fifth encoding process can be performed in conjunction with a multi-head self-attention encoder, which can implement a feature transform that is dimension-invariant. The second dimension reduction process may be performed in conjunction with a linear layer network having a dimension reduction function.
Although human skeletal points that are not captured by radar can be recovered using the spatial attention module, generating an accurate and continuous sequence of three-dimensional poses is still difficult according to embodiments of the present invention. On this basis, the above-mentioned N-turn process of the third encoding process can be completed by designing a time attention module. The time attention module and the space attention module have the same design idea and can be constructed in a mode that a multi-head self-attention encoder and a linear layer network are alternately stacked. One of the main differences is the nature of the input. The spatial attention module performs attention coding on a target feature sub-vector sequence corresponding to the radar signal set at each moment independently, and the temporal attention module performs attention coding on a time sequence feature coding vector sequence, wherein the feature coding vector at each moment is coded into an independent vector. The time attention module can optimize the prediction result by utilizing the relevance in the time domain, and recover the three-dimensional attitude information at the moment which is not captured by the radar.
FIG. 5B schematically shows a schematic diagram of a temporal attention module according to an embodiment of the invention.
As shown in fig. 5B, N second submodules 521, \8230, 52N may be included in the temporal attention module 520. Each second sub-module may include a multi-head self-attention encoder and a linear layer network. The input to the temporal attention module 520 may include a time-sequential feature encoding vector sequence 503. After multiple rounds of encoding and dimension reduction, a three-dimensional pose sequence 504 can be obtained, including multiple three-dimensional poses.
It should be noted that the value of N may be determined according to the dimension of each feature encoding sub-vector output by the space attention module and the dimensionality reduction of each of the second sub-modules 521 to 52n. The procedure of the second encoding process may be the same as that of the third encoding process except that the second encoding process is performed for one eigenvector, and the third encoding process is performed for a time-series eigenvector sequence corresponding to a plurality of second time instants.
FIG. 5C schematically shows a schematic diagram of a multi-headed self-attention encoder according to an embodiment of the present invention.
According to an embodiment of the present invention, the multi-head self-attention encoder in both the spatial attention module and the temporal attention module may use a multi-head self-attention encoder as shown in fig. 5C as a base component.
By designing the space attention module and the time attention module, the reconstruction of the human body three-dimensional posture sequence from the radar signal only containing unknown body parts can be effectively realized.
It should be noted that the implementation methods of the first encoding process, the second encoding process, and the third encoding process described above are only exemplary embodiments, but are not limited thereto, and other methods known in the art may be included as long as the three-dimensional pose sequence can be finally obtained. The first encoding process and the second encoding process may be combined into one process, and accordingly, the first encoding process and the third encoding process may be combined into one process as long as a three-dimensional posture sequence can be obtained.
According to the embodiment of the invention, the human body posture estimation method based on the millimeter wave radar can be completed by designing a three-dimensional posture estimation model. Specifically, for example, the following may be included: the feature vector is obtained by performing feature fusion on a horizontal projection information set and a vertical projection information set by using a three-dimensional attitude estimation model, the feature sub-vector sequence is obtained by converting the feature vector by using the three-dimensional attitude estimation model, the feature coding sub-vector sequence is obtained by performing first coding processing on a target feature sub-vector sequence by using the three-dimensional attitude estimation model, and the three-dimensional attitude is obtained by performing second coding processing on the feature coding vector by using the three-dimensional attitude estimation model.
According to the embodiment of the invention, the three-dimensional attitude estimation model can be obtained by training in the following way: and acquiring a real three-dimensional attitude sequence corresponding to the human body at a plurality of continuous third moments. The real three-dimensional posture sequence comprises a plurality of real three-dimensional postures corresponding to the human body at a plurality of third moments. And inputting a plurality of radar signal set samples reflected by the human body at a plurality of continuous third moments into the three-dimensional attitude estimation model to obtain a predicted three-dimensional attitude sequence. And training the three-dimensional attitude estimation model according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
According to an embodiment of the present invention, as shown in fig. 1, first millimeter wave radar 101 and second millimeter wave radar 102 may be placed at specific positions in space. The horizontally disposed first millimeter wave radar 101 may collect a radar signal set reflected in the horizontal direction, and the vertically disposed second millimeter wave radar 102 may collect a radar signal set reflected in the vertical direction. In addition, 13 raspberries pie 4 generations can be arranged to form a multi-camera system 103, each raspberries pie is provided with a PoE (power over ethernet, active power over ethernet) module, power is supplied through a Network cable, and Time synchronization is performed with a main control computer in a local area Network through an NTP (Network Time Protocol) Protocol. All raspberry groups are provided with cameras and are calibrated by multiple cameras.
When data acquisition is needed, the first millimeter wave radar 101 and the second millimeter wave radar 102 may collect a radar signal set reflected by a moving human body in a space, and the radar signal set is used as a radar signal set sample for training a three-dimensional attitude estimation model. The multi-camera system 103 may collect image information of a moving body in space. Thereafter, the multi-camera system 103 may generate two-dimensional human body skeletons at multiple viewing angles using an image-based two-dimensional human body pose estimation algorithm, and generate a final three-dimensional human body skeleton using a camera calibration matrix to determine a true three-dimensional pose sequence of the moving human body. The real three-dimensional attitude sequence obtained by the method can be used as real labeling information of a radar signal set sample. By obtaining a predicted three-dimensional attitude sequence related to a radar signal set sample according to the three-dimensional attitude estimation model, the three-dimensional attitude estimation model can be trained according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
According to an embodiment of the invention, the three-dimensional pose estimation model may include a feature fusion module, a fully connected network module, a first encoder module, and a second encoder module. Training the three-dimensional pose estimation model based on the true three-dimensional pose sequence and the predicted three-dimensional pose sequence may comprise: and aiming at each radar signal set sample, inputting a horizontal projection information set sample and a vertical projection information set sample which are related to the radar signal set sample into a feature fusion module to obtain a predicted feature vector. Inputting the predicted feature vectors into a full-connection network module to obtain a predicted feature sub-vector sequence, wherein the predicted feature sub-vector sequence comprises a plurality of predicted feature sub-vectors corresponding to a plurality of pieces of human body bone point information, the plurality of predicted feature sub-vectors comprise at least one third mask vector, and each predicted feature sub-vector corresponds to one piece of position coding information. And inputting the target prediction characteristic sub-vector sequence comprising the position coding information into a first coder to obtain a prediction characteristic coding sub-vector sequence. The predicted feature coding sub-vector sequence comprises a plurality of predicted feature coding sub-vectors corresponding to a plurality of pieces of human bone point information, each predicted feature coding sub-vector comprises predicted position characterization information corresponding to the human bone point information corresponding to the predicted feature coding sub-vector, and the predicted position characterization information characterizes absolute position information of the human bone point information. And inputting a plurality of predicted characteristic coding vectors corresponding to a plurality of continuous third moments into a second encoder to obtain a predicted three-dimensional attitude sequence. Each predictive feature coded vector is determined according to a predictive feature coded sub-vector sequence corresponding to a third time corresponding to the predictive feature coded vector, and the plurality of predictive feature coded vectors include at least one fourth mask vector. And adjusting parameters in the feature fusion module, parameters in the full-connection network module, parameters in the first encoder module and parameters in the second encoder module according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
According to an embodiment of the invention, the feature fusion module may comprise the modules shown in fig. 4, the first encoder module may comprise the spatial attention module shown in fig. 5A, and the second encoder module may comprise the temporal attention module shown in fig. 5B.
According to the embodiment of the invention, the millimeter wave radar may capture only the reflected signal of a specific part of the human body at each moment. In order to simulate the specular reflection characteristic of a human body relative to a radar signal, a method of keypoint masking is proposed to randomly mask feature information of a part of human skeleton points, in this embodiment, the keypoints of the masking are represented in the form of a third mask vector, for example, as shown in fig. 5A, a feature sub-vector 505 can represent the third mask vector in a training process. Based on the mode, the first encoder can indirectly learn the non-local relevance of the human skeleton points and other parts of the body, the body parts which are not captured by the radar can be recovered through the processing of the first encoder, and the feature coding sub-vector sequence comprising the more complete position information of the human skeleton points is output. In addition, in order to enhance the modeling capability of the second encoder on the time attention, a frame masking method, i.e. randomly masking the feature encoding sequence of the portion time, may also be used, and the masked frame in this embodiment is represented in the form of a fourth mask vector, for example, as shown in fig. 5B 506, which may be characterized in the training process. Based on the mode, the second encoder can model the time relevance, can predict the three-dimensional postures of all the moments including the covered moment according to the feature coding vectors of the partial moments, and outputs an accurate and continuous three-dimensional posture sequence.
It should be noted that the third mask vector and the fourth mask vector may also include zero value vectors, other vectors determined by other mask information, and the like. The first mask vector, the second mask vector, the third mask vector, and the fourth mask vector may be the same or different.
According to the embodiment of the invention, in the process that the first encoder learns the self-attention among the human skeleton point features covered by the mask and models the non-local relevance of the human skeleton points and other body parts, the spatial position code determined according to the arrangement of the human skeleton points can be added to assist the learning of the first encoder. For example, the multi-headed self-attention encoder in fig. 5A may learn self-attention among mask-masked human skeletal point features, modeling non-local associations of human skeletal points with other body parts.
Specifically, taking the determination of the human body posture according to 14 human body bone points as an example, the predicted feature vectors obtained by the feature fusion module may be input into a full-connection network for processing, so as to obtain 14 predicted feature sub-vectors. Then, an association relationship may be established between the predicted feature sub-vectors and the human skeletal point information, so that each predicted feature sub-vector may correspond to one piece of human skeletal point information. Then, according to the arrangement or the position relation of the human skeleton points, one piece of position coding information can be configured for each predicted feature sub-vector. For example, position code information 1 may be arranged for a feature subvector corresponding to a bone point representing the head, and position code information 2 may be arranged for a feature subvector corresponding to a bone point representing the neck.
According to an embodiment of the present invention, the data acquisition module may be constructed from the first millimeter wave radar 101, the second millimeter wave radar 102, and the multi-camera system 103 shown in fig. 1. Based on the data acquisition module, the signal processing module, the feature fusion module shown in fig. 4, the spatial attention module shown in fig. 5A, and the temporal attention module shown in fig. 5B, a human body posture estimation system based on the millimeter wave radar can be constructed.
Fig. 6 schematically shows a schematic diagram of a millimeter wave radar-based human body pose estimation system according to an embodiment of the invention.
As shown in fig. 6, the data acquisition module 610 includes a radar system 611 for acquiring radar signals and a multi-camera system 612 for acquiring human posture images. The signal processing module 620 may receive and process the radar signal collected by the radar system 611, so as to obtain horizontal projection information 621 and vertical projection information 622. The horizontal projection information 621 and the vertical projection information 622 may be further input into the feature fusion module 630, and processed by the channel fusion module 631 and the multi-scale feature fusion module 632 in the feature fusion module 630, so as to obtain a feature vector. The spatial attention module 640 may receive a set of target feature sub-vectors 641 including position-encoding information associated with the feature vectors, and output a sequence of feature-encoded sub-vectors 643 in conjunction with processing by a spatial attention encoder 642. The temporal attention module 650 may receive a set 651 of time-ordered feature-encoded sub-vector sequences, combined with processing by the temporal attention encoder 652, to derive a sequence 653 of three-dimensional poses of the human body predicted based on radar information. The spatial attention encoder 642 may be used to complete the processes of the fourth encoding process and the first dimension reduction process for M rounds as shown in fig. 5A, and the temporal attention encoder 652 may be used to complete the processes of the fifth encoding process and the second dimension reduction process for N rounds as shown in fig. 5B. The human body posture images acquired by the multi-camera system 612 can provide a real three-dimensional posture label 660 for combining with the human body three-dimensional posture predicted by the time attention module 650 to adjust parameters in the feature fusion module 630, the space attention module 640, the time attention module 650 and the like in the system.
Through the embodiment of the invention, the millimeter wave radar-based three-dimensional human body posture estimation system is realized, the system collects radar signals generated by the millimeter wave radar in the horizontal and vertical directions and generates the corresponding three-dimensional human body skeleton as a label, and the accurate three-dimensional human body posture can be predicted on different users in different scenes. In the system, an effective feature extraction and fusion module can be designed according to the characteristics of radar signals, and the system has important significance for improving the performance of the human body posture estimation method based on radio frequency signals.
In order to verify the effectiveness of the invention in estimating the three-dimensional posture of the human body based on the radar signals, the present embodiment collects a data set of 9 volunteers walking randomly in 10 scenes with different shielding and lighting conditions. The age, sex, dress and height of each volunteer are all different. The data set consists of 174050 video frames with three-dimensional body pose labels and 348100 radar frames. The method uses 307280 radar frames as a training set and uses the remaining 76820 radar frames as a test set, and the result is shown in table 1, the method can accurately detect the coordinates of key points of a human body by using a millimeter wave radar to generate a three-dimensional human body skeleton, and the accuracy can reach 57.1 mm.
TABLE 1
Nose Neck part Shoulder(s) Elbow of hand Wrist part Thigh Knee Ankle Mean error
57.5mm 37.2mm 49.1mm 64.9mm 68.2mm 46.5mm 58.1mm 65.1mm 57.1mm
In addition, the system can also position the indoor human body, and the experimental result is shown in table 2, so that the average positioning error of the invention can reach 3.6 centimeters.
TABLE 2
X axis Y-axis Z axis Mean positioning error
2.3cm 1.8cm 1.1cm 3.6cm
Fig. 7 schematically shows a schematic diagram of a body posture estimation device based on millimeter wave radar according to an embodiment of the present invention.
As shown in fig. 7, the millimeter wave radar-based human body posture estimation apparatus 700 includes: a first obtaining module 710, a first obtaining module 720, a converting module 730, a second obtaining module 740, and a third obtaining module 750.
A first obtaining module 710, configured to obtain a set of horizontal projection information and a set of vertical projection information related to a set of radar signals reflected by a human body at a first time.
The first obtaining module 720 is configured to perform feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set.
The converting module 730 is configured to convert the feature vector into a feature sub-vector sequence, where the feature sub-vector sequence includes a plurality of feature sub-vectors corresponding to a plurality of pieces of human bone point information, the plurality of feature sub-vectors includes at least one first mask vector, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents relative position information of the human bone point information.
The second obtaining module 740 is configured to perform a first encoding process on a target feature sub-vector sequence including position encoding information to obtain a feature encoding sub-vector sequence, where the feature encoding sub-vector sequence includes a plurality of feature encoding sub-vectors corresponding to a plurality of pieces of human body bone point information, each feature encoding sub-vector includes position characterizing information corresponding to the piece of human body bone point information corresponding to the feature encoding sub-vector, and the position characterizing information characterizes absolute position information of the piece of human body bone point information.
And a third obtaining module 750, configured to perform a second encoding process on the feature encoding vector to obtain a three-dimensional posture of the human body at the first time, where the feature encoding vector is determined according to the feature encoding sub-vector sequence.
According to the embodiment of the present invention, the millimeter wave radar-based human body posture estimation apparatus 700 further includes: the device comprises a second obtaining module and a fourth obtaining module.
And the second obtaining module is used for obtaining a feature coding vector sequence corresponding to the human body at a plurality of continuous second moments, wherein the feature coding vector sequence comprises a plurality of feature coding vectors corresponding to the plurality of second moments, and the plurality of feature coding vectors comprises at least one second mask vector.
And the fourth obtaining module is used for carrying out third coding processing on the feature coding vector sequence to obtain a three-dimensional posture sequence, wherein the three-dimensional posture sequence comprises a plurality of three-dimensional postures corresponding to the human body at a plurality of second moments.
According to an embodiment of the present invention, the first obtaining module includes: the device comprises an acquisition unit, a first determination unit and a second determination unit.
An acquisition unit configured to acquire, for each radar signal in the radar signal set, horizontal round trip distance information and vertical round trip distance information between an antenna that generates and receives the radar signal and a reflection point of the radar signal in space.
The device comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining horizontal projection information related to the radar signal according to echo signal information related to the radar signal, wavelength information of the echo signal and horizontal round-trip distance information.
And the second determining unit is used for determining the vertical projection information related to the radar signal according to the echo signal information related to the radar signal, the wavelength information of the echo signal and the vertical round-trip distance information.
According to an embodiment of the invention, the radar signal set comprises a plurality of radar signals from a plurality of channels, and the information scale of the horizontal projection information set and the vertical projection information set is different. The first obtaining module includes: a first obtaining unit, a second obtaining unit, and a third obtaining unit.
The first obtaining unit is used for respectively extracting channel characteristics of a horizontal projection information set and a vertical projection information set related to the radar signal corresponding to each channel aiming at the plurality of radar signals to obtain a first fusion characteristic fused with the plurality of channel characteristics.
And the second obtaining unit is used for extracting the scale features of the first fusion features based on a plurality of scales to obtain second fusion features fused with a plurality of scale features and a plurality of channel features.
And the third obtaining unit is used for carrying out vectorization processing on the second fusion feature to obtain a feature vector.
According to an embodiment of the present invention, the first encoding process includes M rounds of processes; the second obtaining module includes: a fourth obtaining unit, a fifth obtaining unit, and a third determining unit.
And a fourth obtaining unit, configured to perform fourth encoding processing on the encoded information of the (M-1) th round and the (M-1) th dimensionality reduction result in the mth round to obtain an mth encoding result, where the encoded information of the first round includes the target feature sub-vector sequence, and M is a positive integer greater than 1 and less than or equal to M.
And the fifth obtaining unit is used for carrying out first dimension reduction processing on the mth coding result to obtain the mth dimension reduction result.
And a third determining unit, configured to determine the mth dimension reduction result as the sequence of feature coding subvectors if the dimension of the element in the mth dimension reduction result is determined to be the predefined dimension, where the predefined dimension is greater than three dimensions.
According to an embodiment of the present invention, the third encoding process includes N rounds of processes; the fourth obtaining module includes: a sixth obtaining unit, a seventh obtaining unit, and a fourth determining unit.
A sixth obtaining unit, configured to perform fifth encoding processing on the encoded information of the (N-1) th round and the (N-1) th dimensionality reduction result in an nth round to obtain an nth encoding result, where the encoded information of the first round includes a feature encoding vector sequence, and N is a positive integer greater than 1 and less than or equal to N.
A seventh obtaining unit, configured to perform second dimension reduction processing on the nth coding result to obtain an nth dimension reduction result;
and the fourth determining unit is used for determining the three-dimensional attitude sequence according to the nth dimension reduction result under the condition that the dimension of the element in the nth dimension reduction result is determined to be three-dimensional.
According to the embodiment of the invention, the feature vector is obtained by performing feature fusion on a horizontal projection information set and a vertical projection information set by using a three-dimensional attitude estimation model, the first number of feature sub-vectors are obtained by converting the feature vector by using the three-dimensional attitude estimation model, the feature coding sequence is obtained by performing first coding processing on the first number of feature sub-vectors by using the three-dimensional attitude estimation model, and the three-dimensional attitude is obtained by performing second coding processing on the feature coding sequence by using the three-dimensional attitude estimation model. The three-dimensional attitude estimation model is obtained by training the following modules: the device comprises a third obtaining module, a fifth obtaining module and a training module.
And the third acquisition module is used for acquiring a real three-dimensional posture sequence corresponding to the human body at a plurality of continuous third moments, wherein the real three-dimensional posture sequence comprises a plurality of real three-dimensional postures corresponding to the human body at the plurality of third moments.
And the fifth obtaining module is used for inputting a plurality of radar signal set samples reflected by the human body at a plurality of continuous third moments into the three-dimensional attitude estimation model to obtain a predicted three-dimensional attitude sequence.
And the training module is used for training the three-dimensional attitude estimation model according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
According to an embodiment of the invention, the three-dimensional attitude estimation model comprises a feature fusion module, a fully-connected network module, a first encoder module and a second encoder module. The training module comprises: an eighth obtaining unit, a ninth obtaining unit, a tenth obtaining unit, an eleventh obtaining unit, and an adjusting unit.
And the eighth obtaining unit is used for inputting the horizontal projection information set samples and the vertical projection information set samples related to the radar signal set samples into the feature fusion module aiming at each radar signal set sample to obtain the prediction feature vector.
And a ninth obtaining unit, configured to input the predicted feature vector into the full-connection network module, so as to obtain a predicted feature sub-vector sequence, where the predicted feature sub-vector sequence includes multiple predicted feature sub-vectors corresponding to multiple pieces of human bone point information, the multiple predicted feature sub-vectors include at least one third mask vector, and each predicted feature sub-vector corresponds to one piece of position encoding information.
A tenth obtaining unit, configured to input the target predictive feature sub-vector sequence including the position coding information into the first encoder, to obtain a predictive feature coding sub-vector sequence, where the predictive feature coding sub-vector sequence includes multiple predictive feature coding sub-vectors corresponding to multiple pieces of human body bone point information, each predictive feature coding sub-vector includes predicted position characterizing information corresponding to the human body bone point information corresponding to the predictive feature coding sub-vector, and the predicted position characterizing information characterizes absolute position information of the human body bone point information.
And an eleventh obtaining unit, configured to input, to the second encoder, a plurality of predicted feature encoded vector sequences corresponding to a plurality of consecutive third time instants, to obtain a predicted three-dimensional attitude sequence, where each predicted feature encoded vector is determined according to a predicted feature encoded sub-vector sequence corresponding to the third time instant corresponding to the predicted feature encoded vector, and the plurality of predicted feature encoded vectors includes at least one fourth mask vector.
And the adjusting unit is used for adjusting the parameters in the characteristic fusion module, the parameters in the full-connection network module, the parameters in the first encoder module and the parameters in the second encoder module according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the invention may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present invention may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present invention may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or may be implemented by any one of three implementations of software, hardware and firmware, or any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present invention may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.
For example, any plurality of the first obtaining module 710, the first obtaining module 720, the converting module 730, the second obtaining module 740, and the third obtaining module 750 may be combined and implemented in one module/unit, or any one of the modules/units may be split into a plurality of modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units and implemented in one module/unit. According to an embodiment of the present invention, at least one of the first obtaining module 710, the first obtaining module 720, the converting module 730, the second obtaining module 740, and the third obtaining module 750 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or by any other reasonable manner of integrating or packaging a circuit, or by any one of three implementations of software, hardware, and firmware, or by any suitable combination of any of them. Alternatively, at least one of the first obtaining module 710, the first obtaining module 720, the converting module 730, the second obtaining module 740, and the third obtaining module 750 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
It should be noted that, in the embodiment of the present invention, the body posture estimation device part based on the millimeter wave radar corresponds to the body posture estimation method part based on the millimeter wave radar in the embodiment of the present invention, and the description of the body posture estimation device part based on the millimeter wave radar specifically refers to the body posture estimation method part based on the millimeter wave radar, and is not repeated herein.
FIG. 8 schematically shows a block diagram of a computer system suitable for implementing the above described method according to an embodiment of the present invention. The computer system illustrated in FIG. 8 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the invention.
As shown in fig. 8, a computer system 800 according to an embodiment of the present invention includes a processor 801 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or related chip sets and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include on-board memory for caching purposes. The processor 801 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 803, various programs and data necessary for the operation of the system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 802 and/or the RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
System 800 may also include an input/output (I/O) interface 805, also coupled to bus 804, according to an embodiment of the invention. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
According to an embodiment of the invention, the method flow according to an embodiment of the invention may be implemented as a computer software program. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable storage medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiment of the present invention. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present invention.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the invention.
According to an embodiment of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to an embodiment of the present invention, a computer-readable storage medium may include the above-described ROM 802 and/or RAM 803 and/or one or more memories other than the ROM 802 and RAM 803.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present invention, when the computer program product is run on an electronic device, the program code is configured to cause the electronic device to implement the millimeter wave radar-based human body posture estimation method provided by the embodiments of the present invention.
The computer program, when executed by the processor 801, performs the above-described functions defined in the system/apparatus of the embodiment of the present invention. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal, distributed over a network medium, downloaded and installed via communications portion 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
According to embodiments of the present invention, program code for executing a computer program provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, the computer program may be implemented using a high level procedural and/or object oriented programming language, and/or an assembly/machine language. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It will be appreciated by a person skilled in the art that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present invention are possible, even if such combinations or combinations are not explicitly recited in the present invention. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present invention may be made without departing from the spirit or teaching of the invention. All such combinations and/or associations fall within the scope of the present invention.
The embodiments of the present invention have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the invention, and these alternatives and modifications are intended to fall within the scope of the invention.

Claims (10)

1. A human body posture estimation method based on a millimeter wave radar comprises the following steps:
acquiring a horizontal projection information set and a vertical projection information set related to a radar signal set reflected by a human body at a first moment;
performing feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set;
converting the feature vectors into a feature sub-vector sequence, wherein the feature sub-vector sequence comprises a plurality of feature sub-vectors corresponding to a plurality of pieces of human body bone point information, the plurality of feature sub-vectors comprise at least one first mask vector, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents the relative position information of the human body bone point information;
performing first coding processing on a target feature sub-vector sequence including the position coding information to obtain a feature coding sub-vector sequence, wherein the feature coding sub-vector sequence includes a plurality of feature coding sub-vectors corresponding to the plurality of pieces of human body bone point information, each feature coding sub-vector includes position representation information corresponding to the human body bone point information corresponding to the feature coding sub-vector, and the position representation information represents absolute position information of the human body bone point information;
and carrying out second coding processing on the feature coding vector to obtain the three-dimensional posture of the human body at the first moment, wherein the feature coding vector is determined according to the feature coding sub-vector sequence.
2. The method of claim 1, wherein the acquiring a set of horizontal projection information and a set of vertical projection information related to a set of radar signals reflected by a human body at a first time comprises:
for each radar signal in the radar signal set, acquiring horizontal round-trip distance information and vertical round-trip distance information between an antenna generating and receiving the radar signal and a reflection point of the radar signal in space;
determining horizontal projection information related to the radar signal according to echo signal information related to the radar signal, wavelength information of the echo signal and the horizontal round-trip distance information;
and determining vertical projection information related to the radar signal according to the echo signal information related to the radar signal, the wavelength information of the echo signal and the vertical round-trip distance information.
3. The method of claim 1, wherein the set of radar signals includes a plurality of radar signals from a plurality of channels, the set of horizontal projection information and the set of vertical projection information having different information scales; the performing feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set includes:
aiming at the plurality of radar signals, respectively extracting channel characteristics from a horizontal projection information set and a vertical projection information set which are related to the radar signal corresponding to each channel to obtain a first fusion characteristic fused with a plurality of channel characteristics;
based on a plurality of scales, performing scale feature extraction on the first fusion feature to obtain a second fusion feature fused with the plurality of scale features and the plurality of channel features;
and vectorizing the second fusion feature to obtain the feature vector.
4. The method of claim 1, wherein the first encoding process comprises M rounds of processing; the first encoding processing is performed on the target feature sub-vector sequence including the position encoding information, and obtaining a feature encoding sub-vector sequence includes:
in the mth round, carrying out fourth encoding processing on the encoded information of the (M-1) th round and the (M-1) th dimensionality reduction result to obtain an mth encoding result, wherein the encoded information of the first round comprises the target characteristic sub-vector sequence, and M is a positive integer which is greater than 1 and less than or equal to M;
performing first dimension reduction processing on the mth coding result to obtain an mth dimension reduction result;
determining the mth dimension reduction result as the characteristic coding sub-vector sequence if the dimension of the element in the mth dimension reduction result is determined to be a predefined dimension, wherein the predefined dimension is greater than three dimensions.
5. The method of claim 1, further comprising:
acquiring a feature coding vector sequence corresponding to the human body at a plurality of continuous second moments, wherein the feature coding vector sequence comprises a plurality of feature coding vectors corresponding to the plurality of second moments, and the plurality of feature coding vectors comprise at least one second mask vector;
and performing third coding processing on the feature coding vector sequence to obtain a three-dimensional posture sequence, wherein the three-dimensional posture sequence comprises a plurality of three-dimensional postures corresponding to the human body at the second moments.
6. The method of claim 5, wherein the third encoding process comprises N rounds of processing; the third encoding processing of the feature encoding vector sequence to obtain a three-dimensional attitude sequence includes:
in the nth round, carrying out fifth coding processing on the coded information of the (N-1) th round and the (N-1) th dimensionality reduction result to obtain an nth coding result, wherein the coded information of the first round comprises the characteristic coding vector sequence, and N is a positive integer which is greater than 1 and less than or equal to N;
carrying out second dimension reduction processing on the nth coding result to obtain an nth dimension reduction result;
and under the condition that the dimensionality of the elements in the nth dimension reduction result is determined to be three-dimensional, determining the three-dimensional attitude sequence according to the nth dimension reduction result.
7. The method according to any one of claims 1 to 6, wherein the feature vector is obtained by feature fusion of the horizontal projection information set and the vertical projection information set by using a three-dimensional posture estimation model, the feature sub-vector sequence is obtained by converting the feature vector by using the three-dimensional posture estimation model, the feature coding sub-vector sequence is obtained by performing first coding processing on the target feature sub-vector sequence by using the three-dimensional posture estimation model, and the three-dimensional posture is obtained by performing second coding processing on the feature coding vector by using the three-dimensional posture estimation model; the three-dimensional attitude estimation model is obtained by training in the following way:
acquiring a real three-dimensional posture sequence corresponding to the human body at a plurality of continuous third moments, wherein the real three-dimensional posture sequence comprises a plurality of real three-dimensional postures corresponding to the human body at the plurality of third moments;
inputting a plurality of radar signal set samples reflected by the human body at the continuous third moments into the three-dimensional attitude estimation model to obtain a predicted three-dimensional attitude sequence;
and training the three-dimensional attitude estimation model according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
8. The method of claim 7, wherein the three-dimensional pose estimation model comprises a feature fusion module, a fully connected network module, a first encoder module, and a second encoder module; the training of the three-dimensional pose estimation model according to the real three-dimensional pose sequence and the predicted three-dimensional pose sequence comprises:
for each radar signal set sample, inputting a horizontal projection information set sample and a vertical projection information set sample related to the radar signal set sample into the feature fusion module to obtain a predicted feature vector;
inputting the predicted feature vectors into the full-connection network module to obtain a predicted feature sub-vector sequence, wherein the predicted feature sub-vector sequence comprises a plurality of predicted feature sub-vectors corresponding to the plurality of pieces of human body bone point information, the plurality of predicted feature sub-vectors comprise at least one third mask vector, and each predicted feature sub-vector corresponds to one piece of position coding information;
inputting a target predictive feature sub-vector sequence including the position coding information into the first encoder to obtain a predictive feature coding sub-vector sequence, wherein the predictive feature coding sub-vector sequence includes a plurality of predictive feature coding sub-vectors corresponding to the plurality of pieces of human body bone point information, each predictive feature coding sub-vector includes predicted position characterizing information corresponding to the human body bone point information corresponding to the predictive feature coding sub-vector, and the predicted position characterizing information characterizes absolute position information of the human body bone point information;
inputting a plurality of predicted feature coding vectors corresponding to the plurality of consecutive third moments into the second encoder to obtain a predicted three-dimensional attitude sequence, wherein each predicted feature coding vector is determined according to a predicted feature coding sub-vector sequence corresponding to the third moment corresponding to the predicted feature coding vector, and the plurality of predicted feature coding vectors include at least one fourth mask vector;
and adjusting parameters in the feature fusion module, parameters in the fully-connected network module, parameters in the first encoder module and parameters in the second encoder module according to the real three-dimensional attitude sequence and the predicted three-dimensional attitude sequence.
9. A human body posture estimation device based on millimeter wave radar comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a horizontal projection information set and a vertical projection information set which are related to a radar signal set reflected by a human body at a first moment;
the first obtaining module is used for carrying out feature fusion on the horizontal projection information set and the vertical projection information set to obtain a feature vector corresponding to the radar signal set;
the conversion module is used for converting the feature vectors into a feature sub-vector sequence, wherein the feature sub-vector sequence comprises a plurality of feature sub-vectors corresponding to a plurality of pieces of human body bone point information, each feature sub-vector corresponds to one piece of position coding information, and the position coding information represents the relative position information of the human body bone point information;
a second obtaining module, configured to perform a first encoding process on a target feature sub-vector sequence including the position encoding information, to obtain a feature encoding sub-vector sequence, where the feature encoding sub-vector sequence includes multiple feature encoding sub-vectors corresponding to the multiple pieces of human bone point information, each feature encoding sub-vector includes position characterizing information corresponding to the human bone point information corresponding to the feature encoding sub-vector, and the position characterizing information characterizes absolute position information of the human bone point information;
and a third obtaining module, configured to perform second encoding processing on a feature encoding vector to obtain a three-dimensional posture of the human body at the first time, where the feature encoding vector is determined according to the feature encoding sub-vector sequence.
10. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
CN202210859873.2A 2022-07-22 2022-07-22 Millimeter wave radar-based human body posture estimation method and device and electronic equipment Active CN114924249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210859873.2A CN114924249B (en) 2022-07-22 2022-07-22 Millimeter wave radar-based human body posture estimation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210859873.2A CN114924249B (en) 2022-07-22 2022-07-22 Millimeter wave radar-based human body posture estimation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114924249A CN114924249A (en) 2022-08-19
CN114924249B true CN114924249B (en) 2022-10-28

Family

ID=82816254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210859873.2A Active CN114924249B (en) 2022-07-22 2022-07-22 Millimeter wave radar-based human body posture estimation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114924249B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097008A (en) * 2019-04-30 2019-08-06 苏州大学 A kind of human motion recognition method
WO2020216316A1 (en) * 2019-04-26 2020-10-29 纵目科技(上海)股份有限公司 Driver assistance system and method based on millimetre wave radar, terminal, and medium
CN113406666A (en) * 2021-05-14 2021-09-17 中山大学 Target attitude estimation method, system and medium based on optical radar image fusion
CN113447905A (en) * 2021-06-29 2021-09-28 西安电子科技大学 Double-millimeter-wave radar human body falling detection device and detection method
CN113885022A (en) * 2021-10-27 2022-01-04 青岛海信日立空调系统有限公司 Fall detection method and radar equipment
CN113935379A (en) * 2021-10-15 2022-01-14 中国科学技术大学 Human body activity segmentation method and system based on millimeter wave radar signals
CN114460555A (en) * 2022-04-08 2022-05-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Radar echo extrapolation method and device and storage medium
CN114511873A (en) * 2021-12-16 2022-05-17 清华大学 Static gesture recognition method and device based on millimeter wave radar imaging

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9442189B2 (en) * 2010-10-27 2016-09-13 The Fourth Military Medical University Multichannel UWB-based radar life detector and positioning method thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020216316A1 (en) * 2019-04-26 2020-10-29 纵目科技(上海)股份有限公司 Driver assistance system and method based on millimetre wave radar, terminal, and medium
CN110097008A (en) * 2019-04-30 2019-08-06 苏州大学 A kind of human motion recognition method
CN113406666A (en) * 2021-05-14 2021-09-17 中山大学 Target attitude estimation method, system and medium based on optical radar image fusion
CN113447905A (en) * 2021-06-29 2021-09-28 西安电子科技大学 Double-millimeter-wave radar human body falling detection device and detection method
CN113935379A (en) * 2021-10-15 2022-01-14 中国科学技术大学 Human body activity segmentation method and system based on millimeter wave radar signals
CN113885022A (en) * 2021-10-27 2022-01-04 青岛海信日立空调系统有限公司 Fall detection method and radar equipment
CN114511873A (en) * 2021-12-16 2022-05-17 清华大学 Static gesture recognition method and device based on millimeter wave radar imaging
CN114460555A (en) * 2022-04-08 2022-05-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Radar echo extrapolation method and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Human tracking and identification through a millimeter wave radar;Peijun Zhao 等;《Science Direct》;20210311;全文 *
基于多天线FMCW 雷达的人体行为识别方法;田增山 等;《重庆邮电大学学报》;20201031;第32卷(第5期);全文 *

Also Published As

Publication number Publication date
CN114924249A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Sengupta et al. mm-Pose: Real-time human skeletal posture estimation using mmWave radars and CNNs
EP3605394B1 (en) Method and apparatus for recognizing body movement
CN113159283B (en) Model training method based on federal transfer learning and computing node
Ding et al. Radar-based 3D human skeleton estimation by kinematic constrained learning
Divya et al. Docker-based intelligent fall detection using edge-fog cloud infrastructure
Wong et al. Multi-Bernoulli based track-before-detect with road constraints
Wen et al. Deep learning based smart radar vision system for object recognition
Lee et al. Hupr: A benchmark for human pose estimation using millimeter wave radar
CN116597336A (en) Video processing method, electronic device, storage medium, and computer program product
Zeng et al. Idea-net: Dynamic 3d point cloud interpolation via deep embedding alignment
Yuan et al. STransUNet: A siamese TransUNet-based remote sensing image change detection network
Mauro et al. Few-shot user-definable radar-based hand gesture recognition at the edge
Decourt et al. A recurrent CNN for online object detection on raw radar frames
CN114924249B (en) Millimeter wave radar-based human body posture estimation method and device and electronic equipment
Liu et al. General spiking neural network framework for the learning trajectory from a noisy mmwave radar
US9792551B1 (en) Multi-scale information dynamics for decision making
Wang et al. Skeleton-based human pose recognition using channel state information: A survey
Zhong et al. Point‐convolution‐based human skeletal pose estimation on millimetre wave frequency modulated continuous wave multiple‐input multiple‐output radar
CN111461091B (en) Universal fingerprint generation method and device, storage medium and electronic device
Yu et al. ECCNet: Efficient chained centre network for real‐time multi‐category vehicle tracking and vehicle speed estimation
Zhang et al. Lightweight network for small target fall detection based on feature fusion and dynamic convolution
Farrell et al. CoIR: Compressive Implicit Radar
Krishnendu et al. Deep Learning-Based Open Set Domain Hyperspectral Image Classification Using Dimension-Reduced Spectral Features
US20240161484A1 (en) Neural implicit function for end-to-end reconstruction of dynamic cryo-em structures
HASAN et al. Drone Tracking and Object Detection By YOLO And CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant