WO2023213051A1 - 一种基于csi信号到达角估计的静态人体姿势估计方法 - Google Patents

一种基于csi信号到达角估计的静态人体姿势估计方法 Download PDF

Info

Publication number
WO2023213051A1
WO2023213051A1 PCT/CN2022/125127 CN2022125127W WO2023213051A1 WO 2023213051 A1 WO2023213051 A1 WO 2023213051A1 CN 2022125127 W CN2022125127 W CN 2022125127W WO 2023213051 A1 WO2023213051 A1 WO 2023213051A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
aoa
data
csi
tof
Prior art date
Application number
PCT/CN2022/125127
Other languages
English (en)
French (fr)
Inventor
肖甫
徐铭明
郭政鑫
胡海
桂林卿
盛碧云
周剑
蔡惠
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Publication of WO2023213051A1 publication Critical patent/WO2023213051A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of human posture estimation, and in particular to a static human posture estimation method based on CSI signal arrival angle estimation.
  • Human pose estimation technology can be further divided into single pose recognition and pose recognition based on human skeleton points. Single pose recognition can only recognize a few fixed poses, while pose estimation based on human skeleton points outputs the number of human skeleton points. Location and association.
  • human posture estimation technology mainly uses visual methods, RF signals, millimeter wave radar, etc.
  • Vision-based human posture estimation methods analyze images taken during human activities and use machine learning algorithms to determine the posture or bone point positions of people in the images, which has been widely used.
  • visual methods require the use of cameras that may cause privacy leaks, so they have been resisted.
  • the human posture estimation method based on RF signals uses modulated electromagnetic waves to scan the target area, analyzes the signal changes in the sensing area, and then can achieve human posture estimation.
  • this solution requires expensive special equipment and professionals to carry out the equipment. Deployment is subject to many restrictions.
  • the human posture estimation method based on millimeter wave radar uses millimeter wave equipment to transmit signal beams and receive echoes to analyze the human activity status in the sensing area, and then estimate human posture. It can estimate 3D human skeleton points and human activity trajectories, but using The equipment is expensive, and its high signal transmission power may have an impact on human health in living environments, making it difficult to deploy widely.
  • the purpose of this invention is to provide a static human posture estimation method based on CSI signal arrival angle estimation, which receives CSI data in the sensing area through widely deployed Wi-Fi devices, estimates the signal arrival angle in the sensing area, and constructs a two-dimensional Angle of arrival images, using teacher-student network to estimate human pose from two-dimensional angle of arrival images, with high estimation accuracy and low usage cost.
  • a static human posture estimation method based on CSI signal arrival angle estimation including the following steps:
  • Step 1 Place a receiving antenna column with a moving track in the sensing area, use the fixed transmitting antenna to send Wi-Fi data packets to the receiving antenna installed on the moving track, and move the receiving antenna to multiple designated heights to collect CSI data, and simultaneously collect image data;
  • Step 2 Extract the phase information in the CSI data, construct it into a one-dimensional AoA image, and combine the one-dimensional AoA images at different heights into a two-dimensional AoA image;
  • Step 3 Use the environmental noise reduction algorithm to eliminate environmental interference factors in the two-dimensional AoA image
  • Step 4 Input the image data into the teacher network to obtain supervision data of human skeleton point coordinates, and input the supervision data and denoised 2D AoA images into the student network for training;
  • Step 5 When recognizing human posture, place a receiving antenna column with a moving track in the sensing area, use the fixed transmitting antenna to send Wi-Fi data packets to the receiving antenna installed on the moving track, and move the receiving antenna to multiple At a specified height, collect one CSI data at each height. After extracting features from the CSI data collected at different heights in steps 2 and 3, interpolate the features and input them into the student network model trained in step 4, and output Predicted coordinates of human skeleton points of targets within the sensing area.
  • phase error model for the phase data ⁇ i,k of the k-th subcarrier of the i-th receiving antenna in a highly collected single CSI data packet, the phase error model is expressed as:
  • ⁇ k is the original phase, is the nonlinear error
  • f s is the frequency space between subcarriers
  • is the propagation delay caused by multipath propagation
  • is the linear phase error
  • Z is Gaussian white noise
  • step 2 the phase data of the 56 subcarriers contained in the CSI data are extracted, and the phase data of the three receiving antennas are synchronized and error corrected; when the receiving antenna moves to height i, the phase data of the collected CSI signals are used
  • the MUSIC algorithm calculates the MUSIC spectrum P MUSIC ( ⁇ , ⁇ ) jointly estimated by ToF and AoA, where ⁇ is the flight time and ⁇ is the angle of arrival; the MUSIC spectrum P MUSIC ( ⁇ , ⁇ ) is converted into a one-dimensional AoA image, and the image contains 1 ⁇ 181 pixels, the k-th pixel is calculated as:
  • CSI data were collected at 8 different heights and converted into one-dimensional AoA images, and the eight one-dimensional AoA images were combined into a complete two-dimensional AoA image:
  • step 3 the specific method of the environmental noise reduction algorithm is as follows:
  • Step 3-1 Count and analyze the PMUsIC of 1500 data packets at sampling point A collected in 15 seconds in a static environment, and use the distribution function to analyze its ToF static time range and distribution;
  • ToF is divided into several segments.
  • the distribution interval of ToF is [X min , ,x kr ], its distribution is P (x kl ⁇ ToF ⁇ x kr );
  • Step 3-2 Use the exponential weighting function to calculate the weighting matrix weight (ToF static ) according to the interval ToF segment and distribution.
  • ToF static the weighting matrix weight
  • is a parameter determined by the length of ToF
  • is the attenuation factor, which is set according to the intensity of environmental factors that need to be reduced
  • Step 3-3 After collecting the P MUSIC of a single data packet at sampling point A in a dynamic environment, analyze the time-of-flight ToF dynamic of the P MUSIC of a single data packet in a dynamic environment, using the weight function weight obtained in step 3-2 () Calculate P′ MUSIC after environmental noise reduction:
  • the designed neural network includes a teacher network and a student network.
  • the student network includes an input layer, a residual block and an output layer; the input of the teacher network is an image, and the output is the human skeleton point coordinates; the input of the student network is A two-dimensional AoA image with a size of 32 ⁇ 181.
  • the input layer uses a convolution kernel with a stride of 2 and a size of 7 ⁇ 7 and increases the number of channels to 64; four residual blocks are used, each residual block contains 2 residual layers.
  • the first residual layer of the residual block uses a convolution kernel with a stride of 2 and a size of 3 ⁇ 3.
  • the second residual layer has the same structure as the first residual layer and has a stride of 2. is 1; the output layer uses a flat layer to one-dimensionally convert the data into a fully connected layer, and finally outputs the predicted human skeleton point coordinates.
  • step 4 for the obtained 2D AoA image ⁇ R 8 ⁇ 181 , Fourier interpolation method is used to interpolate the data into 2D AoA image′ ⁇ R 32 ⁇ 181 .
  • the beneficial effects achieved by this invention are: proposing a human posture estimation method based on CSI signal arrival angle estimation, collecting CSI data through commercial Wi-Fi equipment, estimating the signal arrival angle in the sensing area, and constructing a two-dimensional arrival angle image,
  • the teacher-student network is used to estimate static human poses from 2D angle-of-arrival images, which has the advantages of non-contact sensing, high estimation accuracy and low cost.
  • Figure 1 is a schematic flow chart of a human posture estimation method based on CSI signal arrival angle estimation in an embodiment of the present invention.
  • Figure 2 is a schematic diagram of the experimental scene in the embodiment of the present invention.
  • Figure 3 is a schematic network structure diagram of a student network in an embodiment of the present invention.
  • Figure 4 is a schematic diagram of the prediction results of human posture estimation in the embodiment of the present invention.
  • Figure 5 is a schematic diagram of key skeletal points of COCO18 in an embodiment of the present invention.
  • Figure 6 shows the two skeletal points COCO18 and Body10 in the embodiment of the present invention.
  • Figure 7 is a schematic diagram of the impact of the environmental noise reduction algorithm on the accuracy of skeletal point recognition in the embodiment of the present invention.
  • the present invention provides a static human posture estimation method based on CSI signal arrival angle estimation.
  • the process is as follows:
  • Step 1 As shown in Figure 2, place a receiving antenna column with a moving track in the sensing area, use the fixed transmitting antenna to send Wi-Fi data packets to the receiving antenna on the moving track, and move the receiving antenna to multiple designated CSI data is collected at a certain height and image data is collected simultaneously.
  • Step 2 Collect 3000 CSI data packets at eight different heights, extract the phase information in the CSI data, construct a one-dimensional AoA image, and combine the one-dimensional AoA images at different heights into a two-dimensional AoA image.
  • phase error model For the phase data ⁇ i,k of the k-th subcarrier of the i-th receiving antenna in a highly collected single CSI data packet, the phase error model can be expressed as:
  • ⁇ k is the original phase, is the nonlinear error
  • f s is the frequency space between subcarriers
  • is the propagation delay caused by multipath propagation
  • is the linear phase error
  • Z is Gaussian white noise
  • the ToF sum is calculated from the collected CSI signals using the MUSIC algorithm AoA jointly estimated MUSIC spectrum P MUSIC ( ⁇ , ⁇ ), where ⁇ is the flight time and ⁇ is the angle of arrival. Convert the MUSIC spectrum P MUSIC ( ⁇ , ⁇ ) into a one-dimensional AoA image.
  • the image contains 1 ⁇ 181 pixels.
  • the k-th pixel is calculated as:
  • the k in the formula here is the angle in the spectrum.
  • Each pixel of the one-dimensional AoA image corresponds to 1 degree in angle. There are 181 pixels because it contains 0 degrees.
  • CSI data were collected at 8 different heights and converted into one-dimensional AoA images, and the eight one-dimensional AoA images were combined into a complete two-dimensional AoA image:
  • Step 3 Use the environmental noise reduction algorithm to eliminate environmental interference factors in the two-dimensional AoA image:
  • Step 3-1 Count and analyze the P MUSIC of 1500 data packets at sampling point A collected in 15 seconds in a static environment, and use the distribution function to analyze its ToF static time range and distribution.
  • ToF is divided into several segments.
  • the distribution interval of ToF is [X min , ,x kr ], its distribution is P (x kl ⁇ ToF ⁇ x kr ).
  • Step 3-2 Use the exponential weighting function to calculate the weighting matrix weight (ToF static ) according to the interval ToF segment and distribution.
  • ToF static the weighting matrix weight
  • is a parameter determined by the length of ToF
  • is the attenuation factor, which is set according to the intensity of environmental factors that need to be reduced.
  • Step 3-3 After collecting the P MUSIC of a single data packet at sampling point A in a dynamic environment, analyze the time-of-flight ToF dynamic of the P MUSIC of a single data packet in a dynamic environment, using the weight function weight obtained in step 3-2 () Calculate P′ MUSIC after environmental noise reduction:
  • Step 4 Input the image data into the teacher network to obtain supervision data of human skeleton point coordinates, and input the supervision data and denoised AoA images into the student network for training.
  • the designed neural network contains a teacher network and a student network.
  • the student network contains an input layer, a residual block and an output layer.
  • the input of the teacher network is an image, and the output is the coordinates of human skeleton points.
  • the input of the student network is a two-dimensional AoA image with a size of 32 ⁇ 181.
  • the input layer uses a convolution kernel with a stride of 2 and a size of 7 ⁇ 7 and increases the number of channels to 64; four Residual blocks, each residual block contains 2 residual layers.
  • the first residual layer of the residual block uses a convolution kernel with a stride of 2 and a size of 3 ⁇ 3.
  • the second residual layer is The first residual layer is the same, but the step size is 1; the output layer uses a flat layer to one-dimensionally convert the data into a fully connected layer, and finally outputs the predicted human skeleton point coordinates.
  • Step 5 When recognizing human posture, place a receiving antenna column with a moving track in the sensing area, use the fixed transmitting antenna to send Wi-Fi data packets to the receiving antenna installed on the moving track, and move the receiving antenna to multiple At a specified height, collect one CSI data at each height. After extracting features from the CSI data collected at different heights in steps 2 and 3, the features are interpolated:
  • the Fourier transform interpolation method is used to interpolate the data to 2D AoA image ′ ⁇ R 32 ⁇ 181 .
  • steps 1-4 corresponds to the training stage in machine learning.
  • a large amount of data needs to be collected at each height to train the student network.
  • Each height collection takes 30 seconds, and 8 height collections take 5 minutes.
  • Step 5 is for Human body posture recognition.
  • For the test phase in machine learning only one data needs to be collected at each height during the test. The collection of one data only takes 10 milliseconds. Counting the slide rail movement time, data collection at 8 heights can be done in 10 seconds. Completed within. In actual use, using the trained student network model for gesture recognition only requires less data collection time. If a radio frequency switch is used, this process can be reduced to less than 1 second.
  • Laboratory A The laboratory has an irregular shape and some experimental equipment is stacked near the wall; 2.
  • Laboratory B Rectangular laboratory with multiple desks and iron filing cabinets against the wall; c.
  • Corridor long and narrow with a window on one side.
  • the data set contains 5 actions collected by 6 volunteers in 3 environments, namely: standing, crossing hands, raising hands, hands up, and sitting. Each action contains 8 sampling points, and each sampling point collects 1,500 CSI samples. A total of 1,080,000 CSI samples are collected, and 135,000 2D AoA images are generated. At the same time, we use cameras to collect pictures of each volunteer's actions, and use OpenPose to obtain labels for human skeleton points. 75% of the data is used to train the network and the remaining 25% is used to test the network.
  • Figure 4 shows the bone point results of this method for estimating five different postures of a static human body on Lab B: the first row of images is the collected images of the object in five different postures and the key points of the human body predicted using OpenPose; the second row of images is The row images are the five different postures of the collected objects and the key point predictions of this patent.
  • the prediction results show that this method can accurately estimate different postures of the human body, and the prediction results are not much different from the vision-based method OpenPose.
  • PCK Percentage of Correct Keypoint
  • L is a logical function, which outputs 1 when the expression is true and outputs 0 when the expression is false;
  • N is the number of samples tested, i refers to the i-th key point of the body, i ⁇ COCO18 or Body10;
  • pd i is the key point of the prediction point, gt i is the ground-truth, is the Euclidean distance between the predicted value of key point i and the true value;
  • length std is the parameter used to standardize the error, and a is the range size that allows the predicted point to be near the true value.
  • the volunteer's head length hl is used as a standardized parameter.
  • COCO18 calculated based on PCK@0.5hl
  • the average key point recognition rate of this method in three different scenarios is 85.5% (A: 88.4%, B: 91.8%, C: 76.3%);
  • Body10 based on PCK@0.5hl calculation shows that the average key point recognition rate of this method in three different scenarios is 83.5% (A: 85.4%, B: 91.8%, C: 73.4%).
  • Wi-Pose and WiSPPN the accuracy of this method is significantly improved.
  • Table 2 shows two bone point representation standards, COCO18 and Body10, refer to Figure 5-6.
  • Tables 3 and 4 show the average prediction accuracy of each key point under COCO18 PCK@25 and Body10 PCK@25. The results show that the average prediction accuracy of all skeleton keypoints in COCO18 and Body10 is 88%.
  • the prediction accuracy of the left and right wrists in the same environment is lower than other key points, 79% in COCO18 and 80% in Body10. This is inferred because the wrists sometimes move further away from the torso, making prediction more difficult.
  • FIG. 7 shows the prediction accuracy of this method using the environmental noise reduction algorithm to evaluate the data and the prediction accuracy of the original data respectively.
  • the results show that the environmental noise reduction algorithm improves prediction accuracy by 5% in all environments.
  • the improvement of the environmental noise reduction algorithm increases from 2.6% to 9.5%.
  • the data of the environmental noise reduction algorithm has a 7% improvement at PCK@25.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种基于CSI信号到达角估计的静态人体姿势估计方法,通过CSI估计感知区域内的信号到达角并构建图像,利用师生网络从图像估计人体姿势。具体如下:首先使用带有移动轨道的接收天线柱,在不同的高度收集感知区域内的CSI信号。其次构造二维AoA图像特征,利用MUSIC算法将CSI信息转换为一维AoA数据,并将不同高度的一维AoA数据组合为二维AoA图像,设计环境降噪算法以消除静态环境因素,增强人体感知部分。最后构建师生网络模型,使用基于视觉的教师网络对基于二维AoA图像的学生网络进行监督,最终学生网络模型能独立地由CSI估计人体姿势。本方法能够识别多种静态人体姿势,以较低成本获得较高预测准确率。

Description

一种基于CSI信号到达角估计的静态人体姿势估计方法 技术领域
本发明涉及人体姿势估计技术领域,具体涉及一种基于CSI信号到达角估计的静态人体姿势估计方法。
背景技术
随着以人为本的计算机应用技术的高速发展,例如智能家居,体感游戏,健康检测,活动记录等面向人体活动的应用迫切需要易用的人体姿势检测技术。传统的人体姿势检测技术往往需要使用者佩戴传感器或部署摄像头,增加了使用成本并可能侵犯隐私。新兴的室内无线感知技术借助于无线电磁信号来感知人体活动,无需用户佩戴额外的传感器并且易于部署。人体姿势估计技术还可以进一步划分为单一姿势识别和以人体骨骼点为基础的姿势识别,单一姿势识别仅能识别出固定的几个姿势,而基于人体骨骼点的姿势估计则输出人体骨骼点的位置和关联。
目前人体姿势估计技术主要使用视觉方法,RF信号,毫米波雷达等。然而,这些技术往往需要用户购买额外的硬件并且感知精度较差。基于视觉的人体姿势估计方法分析人体活动时拍摄的图像,利用机器学习算法判断图像中人物的姿势或骨骼点位置,已经得到了广泛的应用。然而,随着人们隐私意识的提高,视觉方法需要使用的摄像机可能造成隐私泄露,因而受到了抵触。基于RF信号的人体姿势估计方法利用调制的电磁波对目标所在区域进行扫描,分析感知区域内的信号变化,进而可以实现人体姿势估计,但该方案需要昂贵的专用 设备,还需要专业人员进行设备的部署,因而受到了许多限制。基于毫米波雷达的人体姿势估计方法使用毫米波设备发射信号波束并接收回波来分析感知区域内的人体活动状态,进而估计人体姿势,可以估计出3D人体骨骼点和人的活动轨迹,但是使用的设备价格昂贵,并且其较高的信号发射功率在生活环境下可能对人体健康造成影响,难以广泛部署。
发明内容
本发明的目的是提供一种基于CSI信号到达角估计的静态人体姿势估计方法,通过广泛部署的Wi-Fi设备接收感知区域内的CSI数据,估计感知区域内的信号到达角,并构建二维到达角图像,利用师生网络从二维到达角图像中估计人体姿势,具有较高的估计精度和较低的使用成本。
一种基于CSI信号到达角估计的静态人体姿势估计方法,包括如下步骤:
步骤1:在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向安装在移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度采集CSI数据,并同步采集图像数据;
步骤2:提取CSI数据中的相位信息,构造为一维AoA图像,将不同高度的一维AoA图像组合成二维AoA图像;
步骤3:利用环境降噪算法消除二维AoA图像中的环境干扰因素;
步骤4:将图像数据输入教师网络获得人体骨骼点坐标的监督数 据,将监督数据和降噪后的二维AoA图像输入学生网络进行训练;
步骤5:在识别人体姿势时,在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向安装在移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度,在每个高度采集一个CSI数据,将不同高度采集的CSI数据经步骤2和3提取特征后,将特征进行插值处理后,输入步骤4中已训练的学生网络模型中,输出感知区域内目标的人体骨骼点预测坐标。
进一步地,步骤2中,对于一个高度采集的单个CSI数据包中第i根接收天线的第k个子载波的相位数据φ i,k,其相位误差模型表示为:
Figure PCTCN2022125127-appb-000001
其中,θ k是原始相位,
Figure PCTCN2022125127-appb-000002
是非线性误差,f s是子载波之间的频率空间,δ是由多径传播导致的传播时延,β是线性相位误差,Z是高斯白噪声;
通过电缆直接连接收发天线,获得无环境干扰的相位φ k′,使用线性拟合求解出非线性和线性相位误差
Figure PCTCN2022125127-appb-000003
β、以及接收天线之间存在的同步相位误差
Figure PCTCN2022125127-appb-000004
进一步地,步骤2中,提取CSI数据中包含的56个子载波的相位数据,并将三根接收天线的相位数据进行同步和误差校正;当接收天线移动到高度i时,从采集的CSI信号中利用MUSIC算法计算ToF和AoA联合估计的MUSIC频谱P MUSIC(τ,θ),其中τ是飞行时间,θ是到达角;将MUSIC频谱P MUSIC(τ,θ)转换为一维AoA图像,图像包含1×181个像素点,第k个像素点计算为:
Figure PCTCN2022125127-appb-000005
在8个不同高度分别采集CSI数据并将其转换为一维AoA图像,将8张一维AoA图像组合成完整的二维AoA图像:
2D AoA image=(img 1 img 2…img 8) T
进一步地,步骤3中,环境降噪算法的具体方法如下:
步骤3-1:统计并分析静态环境下15s采集的采样点A处的1500个数据包的P MUsIC,用分布函数分析其ToF static时间范围和分布情况;
F ToFstatic(x)=P(ToF static≤x)
根据ToF分布的实际情况将ToF分为若干段,ToF的分布区间为[X min,X max],根据ToF的数值切分成L个不等长的ToF段,对于第k个ToF段[x kl,x kr],其分布为P(x kl≤ToF≤x kr);
步骤3-2:根据区间ToF段和分布利用指数加权函数计算加权矩阵weight(ToF static),对于区间inter=[x kl,x kr]:
Figure PCTCN2022125127-appb-000006
其中,β是由ToF长短决定的参数,α是衰减因子,根据需要降低的环境因素强度设置;
步骤3-3:在动态环境下采集采样点A处的单个数据包的P MUSIC后,分析动态环境下单个数据包的P MUSIC的飞行时间ToF dynamic,利用步骤3-2中获得的权重函数weight()计算环境降噪后的P′ MUSIC
P MUSIC(ToF dynamic,AoA)′
=P MUSIC(ToF dynamic,AoA)×weight(ToF dynamic)
进一步地,步骤4中,设计的神经网络包含教师网络和学生网络, 学生网络包含输入层,残差块和输出层;教师网络的输入为图像,输出为人体骨骼点坐标;学生网络的输入是大小为32×181的二维AoA图像,输入层使用步长为2,大小为7×7的卷积核并将通道数提升至64;使用了四个残差块,每个残差块包含2个残差层,残差块的第一个残差层使用步长为2,大小为3×3的卷积核,第二个残差层与第一个残差层结构相同,步长为1;输出层使用扁平层将数据一维化后输入全连接层,最终输出预测的人体骨骼点坐标。
进一步地,步骤4中,对于获得的2D AoA image∈R 8×181,使用傅里叶插值法将数据插值为2D AoA image′∈R 32×181
本发明达到的有益效果为:提出一种基于CSI信号到达角估计的人体姿势估计方法,通过商用Wi-Fi设备采集CSI数据,估计感知区域内的信号到达角,并构建二维到达角图像,利用师生网络从二维到达角图像中估计静态人体姿势,具有非接触式感知,高估计精度和低成本的优点。
附图说明
图1是本发明实施例中的基于CSI信号到达角估计的人体姿势估计方法的流程示意图。
图2是本发明实施例中的实验场景示意图。
图3是本发明实施例中的学生网络的网络结构示意图。
图4是本发明实施例中的人体姿势估计的预测结果示意图。
图5是本发明实施例中的COCO18的骨骼关键点示意图。
图6是本发明实施例中的COCO18和Body10两种骨骼点
图7是本发明实施例中的环境降噪算法对骨骼点识别精度的影响示意图。
具体实施方式
下面结合说明书附图对本发明的技术方案做进一步的详细说明。
如图1所示,本发明提供了一种基于CSI信号到达角估计的静态人体姿势估计方法,其过程如下所述:
步骤1:如图2所示,在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度采集CSI数据,并同步采集图像数据。
步骤2:在八个不同的高度各采集3000个CSI数据包,提取CSI数据中的相位信息,构造为一维AoA图像,将不同高度的一维AoA图像组合成二维AoA图像。
具体地,对于一个高度采集的单个CSI数据包中第i根接收天线的第k个子载波的相位数据φ i,k,其相位误差模型可表示为:
Figure PCTCN2022125127-appb-000007
其中,θ k是原始相位,
Figure PCTCN2022125127-appb-000008
是非线性误差,f s是子载波之间的频率空间,δ是由多径传播导致的传播时延,Ω是线性相位误差,Z是高斯白噪声。
通过电缆直接连接收发天线,获得无环境干扰的相位φ k′,使用线性拟合求解出非线性相位误差
Figure PCTCN2022125127-appb-000009
线性相位误差Ω,以及接收天线之间存在的同步相位误差
Figure PCTCN2022125127-appb-000010
对三根天线的CSI相位(φ 1,1~562,1~563,1~56),当接收天线移动到高度i时,从采集的CSI信号中利用MUSIC算法计算ToF和AoA联合估计的MUSIC频谱P MUSIC(τ,θ),其中τ是飞行时间,θ是到达角。将MUSIC频谱P MUSIC(τ,θ)转换为一维AoA图像,图像包含1×181个像素点,第k个像素点计算为:
Figure PCTCN2022125127-appb-000011
此处公式中的k就是频谱中的角度,一维AoA图像的每个像素点对应于角度上的1度,有181个像素点是因为包含0度。
在8个不同高度分别采集CSI数据并将其转换为一维AoA图像,将8张一维AoA图像组合成完整的二维AoA图像:
2D AoA image=(img 1 img 2…img 8) T
步骤3:利用环境降噪算法消除二维AoA图像中的环境干扰因素:
步骤3-1:统计并分析静态环境下15s采集的采样点A处的1500个数据包的P MUSIC,用分布函数分析其ToF static时间范围和分布情况,
F ToFstatic(x)=P(ToF static≤x)
根据ToF分布的实际情况将ToF分为若干段,ToF的分布区间为[X min,X max],根据ToF的数值切分成L个不等长的ToF段,对于第k个ToF段[x kl,x kr],其分布为P(x kl≤ToF≤x kr)。
步骤3-2:根据区间ToF段和分布利用指数加权函数计算加权矩阵weight(ToF static),对于区间inter=[x kl,x kr]:
Figure PCTCN2022125127-appb-000012
其中,β是由ToF长短决定的参数,α是衰减因子,根据需要降低的环境因素强度设置。
步骤3-3:在动态环境下采集采样点A处的单个数据包的P MUSIC后,分析动态环境下单个数据包的P MUSIC的飞行时间ToF dynamic,利用步骤3-2中获得的权重函数weight()计算环境降噪后的P′ MUSIC
P MUSIC(ToF dynamic,AoA)′
=P MUSIC(ToF dynamic,AoA)×weight(ToF dynamic)
步骤4:将图像数据输入教师网络获得人体骨骼点坐标的监督数据,将监督数据和降噪后的AoA图像输入学生网络进行训练。
设计的神经网络包含教师网络和学生网络,学生网络包含输入层,残差块和输出层。教师网络的输入为图像,输出为人体骨骼点坐标。如图3所示,学生网络的输入是大小为32×181的二维AoA图像,输入层使用步长为2,大小为7×7的卷积核并将通道数提升至64;使用了四个残差块,每个残差块包含2个残差层,残差块的第一个残差层使用步长为2,大小为3×3的卷积核,第二个残差层与第一个残差层相同,但步长为1;输出层使用扁平层将数据一维化后输入全连接层,最终输出预测的人体骨骼点坐标。
步骤5:在识别人体姿势时,在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向安装在移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度,在每个高度采集一个CSI数据,将不同高度采集的CSI数据经步骤2和3提取特征后,将特征进行插值处理:
对于获得的2D AoA image∈R 8×181,使用傅里叶变换插值法将数据插值为2D AoA image′∈R 32×181
将2D AoA image输入步骤4中已训练的学生网络模型中,输出感知区域内目标的人体骨骼点预测坐标。
步骤1-4的流程对应于机器学习中的训练阶段,训练时需要在每个高度采集大量数据用以训练学生网络,每个高度采集30s,8个高度采集需花费5分钟;步骤5是对人体姿势进行识别,对于于机器学习中的测试阶段,测试时每个高度仅需要采集一个数据,一个数据的采集仅需要10毫秒,算上滑轨移动时间,8个高度的数据采集可以在10s内完成。实际使用时,使用训练好的学生网络模型进行姿势识别,仅需要较少的数据采集时间,如使用射频开关,此过程可减少到1s内。
为了评估本方法在不同时间和不同场景下的可靠性,在以下三个实验场景中进行了实验:1.实验室A:实验室形状不规则,靠近墙壁堆放了一些实验器材;2.实验室B:长方形实验室,靠墙有多个书桌和铁质文件柜;c.走廊:狭长并有一侧窗户。
为了评估不同身高和体型对本方法的影响,选择6位不同性别,身高和体型的志愿者。数据集包含了6个志愿者分别在3个环境下采集的5个动作,分别为:站立,叉手,平举双手,双手向上,坐姿。每个动作包含8个采样点,每个采样点采集1500个CSI样本,总共采集了1080000个CSI样本,生成了135000个2D AoA图像。同时,我们使用相机采集每个志愿者每个动作的图片,并使用OpenPose获 得人体骨骼点的标签。数据的75%被用于训练网络,剩余的25%被用于测试网络。
图4展示了本方法在Lab B上对静态人体的5个不同姿势进行估计的骨骼点结果:第一行图像是采集的对象5个不同姿势的图片及利用OpenPose预测的人体关键点;第二行图像则是采集的对象的5个不同姿势的及本专利的关键点预测。预测结果显示本方法能准确估计人体的不同姿势,预测结果与基于视觉的方法OpenPose差别不大。
为了更好的评估本方法所预测人体骨骼点与annotation outputted by OpenPose之间的差距,使用Percentage of Correct Keypoint(PCK):
Figure PCTCN2022125127-appb-000013
其中,L是一个逻辑函数,表达式为真时输出1,表达式为假时输出0;N是测试的样本的数量,i代指第i个人体关键点,i∈COCO18 or Body10;pd i是预测点关键点,gt i是ground-truth,
Figure PCTCN2022125127-appb-000014
是关键点i预测值与真实值的欧式距离;length std是用于标准化误差的参数,a是允许预测点在真实值附近的范围大小。
1.整体识别率:
由于不同的志愿者的身高和体型各不相同,使用志愿者的头部长度hl作为标准化参数。采用COCO18时,以PCK@0.5hl计算,本方法在三个不同场景下的平均关键点识别率为85.5%(A:88.4%,B:91.8%,C:76.3%);采用Body10时,以PCK@0.5hl计算,本方法在三个不同场景下的平均关键点识别率为83.5%(A:85.4%,B:91.8%,C:73.4%)。与Wi-Pose和WiSPPN相比,本方法的精度显著提高,在PCK指标设 置的较低(PCK@40,length std=1)时能获得18%的识别率提升;在PCK指标设置的较高时(PCK@25,length std=1),Wi-Pose和WiSPPN均无法识别,而本方法仍能获得85%的识别率。
表一骨骼点识别率
Figure PCTCN2022125127-appb-000015
2.不同骨骼点的识别精度:
人体骨骼的关键点分布在人体的不同部位,无线信号在不同关键点的反射特性也不同。因此,每个关键点的预测精度是不同的。表二给出了COCO18和Body10两种骨骼点表示标准,参考图5-6。表三和表四给出了COCO18 PCK@25和Body10 PCK@25下每个关键点的平均预测精度。结果表明,所有骨架关键点在COCO18和Body10中的平均预测精度均为88%。左右手腕在相同环境下的预测精度低于其他关键点,在COCO18中为79%,在Body10中为80%。推断这是因为手腕有时会远离躯干,这使得预测变得更加困难。
表二COCO18和Body10骨骼点列表
Figure PCTCN2022125127-appb-000016
表三COCO18PCK@25下骨骼点平均识别精度(百分比)
Figure PCTCN2022125127-appb-000017
表四Body10PCK@25下骨骼点平均识别精度(百分比)
Figure PCTCN2022125127-appb-000018
3.环境降噪算法对于识别精度的影响:
本方法利用环境降噪算法来减少环境因素的影响。图7分别给出了本方法使用环境降噪算法评估数据的预测准确度和原始数据的预测准确度。结果表明,环境降噪算法在所有环境中的预测精度都提高 了5%。将PCK指标从PCK@50提高到PCK@10时,环境降噪算法的提高从2.6%提高到9.5%。此外,对于走廊杂环境,环境降噪算法的数据在PCK@25时有7%的提升。
以上所述仅为本发明的较佳实施方式,本发明的保护范围并不以上述实施方式为限,但凡本领域普通技术人员根据本发明所揭示内容所作的等效修饰或变化,皆应纳入权利要求书中记载的保护范围内。

Claims (6)

  1. 一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:所述人体姿势估计方法包括如下步骤:
    步骤1:在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向安装在移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度,在每个高度采集多个CSI数据,并同步采集图像数据;
    步骤2:提取CSI数据中的相位信息,构造为一维AoA图像,将不同高度的一维AoA图像组合成二维AoA图像;
    步骤3:利用环境降噪算法消除二维AoA图像中的环境干扰因素;
    步骤4:将图像数据输入教师网络获得人体骨骼点坐标的监督数据,将监督数据和降噪后的二维AoA图像输入学生网络进行训练;
    步骤5:在识别人体姿势时,在感知区域内摆放带有移动轨道的接收天线柱,利用固定的发射天线向安装在移动轨道上的接收天线发送Wi-Fi数据包,接收天线移动到多个指定的高度,在每个高度采集一个CSI数据,将不同高度采集的CSI数据经步骤2和3提取特征后,将特征进行插值处理后,输入步骤4中已训练的学生网络模型中,输出感知区域内目标的人体骨骼点预测坐标。
  2. 根据权利要求1所述的一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:步骤2中,对于一个高度采集的单个CSI数据包中第i根接收天线的第k个子载波的相位数据φ i,k,其相位误差模型表示为:
    Figure PCTCN2022125127-appb-100001
    其中,θ k是原始相位,
    Figure PCTCN2022125127-appb-100002
    是非线性误差,f s是子载波之间的频率空间,δ是由多径传播导致的传播时延,Ω是线性相位误差,Z是高斯白噪声;
    通过电缆直接连接收发天线,获得无环境干扰的相位φ k′,使用线性拟合求解出非线性相位误差
    Figure PCTCN2022125127-appb-100003
    线性相位误差Ω、以及接收天线之间存在的同步相位误差
    Figure PCTCN2022125127-appb-100004
  3. 根据权利要求1所述的一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:步骤2中,提取CSI数据中包含的56个子载波的相位数据,并将三根接收天线的相位数据进行同步和误差校正;当接收天线移动到高度i时,从采集的CSI信号中利用MUSIC算法计算ToF和AoA联合估计的MUSIC频谱P MUSIC(τ,θ),其中τ是飞行时间,θ是到达角;将MUSIC频谱P MUSIC(τ,θ)转换为一维AoA图像,图像包含1×181个像素点,高度为i的一维AoA图像的第k个像素点计算为:
    Figure PCTCN2022125127-appb-100005
    在8个不同高度分别采集CSI数据并将其转换为一维AoA图像,将8张一维AoA图像组合成完整的二维AoA图像:
    2D AoA image=(img 1 img 2 … img 8) T
  4. 根据权利要求1所述的一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:步骤3中,环境降噪算法的具体方法如下:
    步骤3-1:统计并分析静态环境下15s采集的采样点A处的1500个数据包的P MUSIC,用分布函数分析静态环境下飞行时间ToF static时间范围和分布情况;
    Figure PCTCN2022125127-appb-100006
    根据ToF分布的实际情况将ToF分为若干段,ToF的分布区间为[X min,X max],根据ToF的数值切分成L个不等长的ToF段,对于第k个ToF段[x kl,x kr],其分布为P(x kl≤ToF≤x kr);
    步骤3-2:根据区间ToF段和分布利用指数加权函数计算加权矩阵weight(ToF static),对于区间inter=[x kl,x kr]:
    Figure PCTCN2022125127-appb-100007
    其中,β是由ToF长短决定的参数,α是衰减因子,根据需要降低的环境因素强度设置;
    步骤3-3:在动态环境下采集采样点A处的单个数据包的P MUSIC后,分析动态环境下单个数据包的P MUSIC的飞行时间ToF dynamic,利用步骤3-2中获得的权重函数weight()计算环境降噪后的P′ MUSIC
    P MUSIC(ToF dynamic,AoA)′=P MUSIC(ToF dynamic,AoA)×weight(ToF dynamic)
  5. 根据权利要求1所述的一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:步骤4中,设计的神经网络包含教师网络和学生网络,学生网络包含输入层,残差块和输出层;教师网络的输入为图像,输出为人体骨骼点坐标;学生网络的输入是大小为32×181的二维AoA图像,输入层使用步长为2,大小为7×7的卷 积核并将通道数提升至64;残差块包含2个残差层,残差块的第一个残差层使用步长为2,大小为3×3的卷积核,第二个残差层与第一个残差层结构相同,步长为1;输出层使用扁平层将数据一维化后输入全连接层,最终输出预测的人体骨骼点坐标。
  6. 根据权利要求1所述的一种基于CSI信号到达角估计的静态人体姿势估计方法,其特征在于:步骤4中,对于获得的2D AoA image∈R 8×181,使用傅里叶插值法将数据插值为2D AoA image′∈R 32×181
PCT/CN2022/125127 2022-05-06 2022-10-13 一种基于csi信号到达角估计的静态人体姿势估计方法 WO2023213051A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210484261.XA CN114581958B (zh) 2022-05-06 2022-05-06 一种基于csi信号到达角估计的静态人体姿势估计方法
CN202210484261.X 2022-05-06

Publications (1)

Publication Number Publication Date
WO2023213051A1 true WO2023213051A1 (zh) 2023-11-09

Family

ID=81769069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125127 WO2023213051A1 (zh) 2022-05-06 2022-10-13 一种基于csi信号到达角估计的静态人体姿势估计方法

Country Status (2)

Country Link
CN (1) CN114581958B (zh)
WO (1) WO2023213051A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581958B (zh) * 2022-05-06 2022-08-16 南京邮电大学 一种基于csi信号到达角估计的静态人体姿势估计方法
CN115412188A (zh) * 2022-08-26 2022-11-29 福州大学 一种基于无线感知的配电站房操作行为识别方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN114219853A (zh) * 2021-11-12 2022-03-22 杭州昌泽信息技术有限公司 一种基于无线信号的多人三维姿态估计方法
CN114581958A (zh) * 2022-05-06 2022-06-03 南京邮电大学 一种基于csi信号到达角估计的静态人体姿势估计方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106658590B (zh) * 2016-12-28 2023-08-01 南京航空航天大学 基于WiFi信道状态信息多人室内环境状态监控系统的设计与实现
CN110740004B (zh) * 2019-10-28 2021-03-16 北京邮电大学 目标状态确定方法、装置、电子设备及可读存储介质
CN111182459B (zh) * 2019-12-31 2021-05-04 西安电子科技大学 一种基于信道状态信息室内无线定位方法、无线通信系统
CN111225354B (zh) * 2020-02-14 2022-02-22 重庆邮电大学 WiFi干扰环境下的CSI人体跌倒识别方法
CN113033318B (zh) * 2021-03-01 2023-09-26 深圳大学 人体动作的检测方法、装置及计算机可读存储介质
CN114386321A (zh) * 2021-12-24 2022-04-22 南京邮电大学 用于室内定位的aoa和tof联合估计方法、装置及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN114219853A (zh) * 2021-11-12 2022-03-22 杭州昌泽信息技术有限公司 一种基于无线信号的多人三维姿态估计方法
CN114581958A (zh) * 2022-05-06 2022-06-03 南京邮电大学 一种基于csi信号到达角估计的静态人体姿势估计方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 11 April 2019, NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, CN, article CHEN, HAOXIANG: "Research on Indoor Localization Algorithm Using Wi-Fi Channel State Information", pages: 1 - 82, XP009550146, DOI: 10.27151/d.cnki.ghnlu.2019.003783 *
YILI REN; JIE YANG: "3D Human Pose Estimation for Free-form Activity Using WiFi Signals", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 October 2021 (2021-10-15), 201 Olin Library Cornell University Ithaca, NY 14853, XP091076949 *
ZHAO MINGMIN; LI TIANHONG; ALSHEIKH MOHAMMAD ABU; TIAN YONGLONG; ZHAO HANG; TORRALBA ANTONIO; KATABI DINA: "Through-Wall Human Pose Estimation Using Radio Signals", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 7356 - 7365, XP033473655, DOI: 10.1109/CVPR.2018.00768 *

Also Published As

Publication number Publication date
CN114581958B (zh) 2022-08-16
CN114581958A (zh) 2022-06-03

Similar Documents

Publication Publication Date Title
WO2023213051A1 (zh) 一种基于csi信号到达角估计的静态人体姿势估计方法
Zhao et al. mid: Tracking and identifying people with millimeter wave radar
Li et al. Capturing human pose using mmWave radar
Choi et al. People counting based on an IR-UWB radar sensor
Yang et al. Dense people counting using IR-UWB radar with a hybrid feature extraction method
CN102131290B (zh) 基于自相关滤波的wlan室内邻近匹配定位方法
CN111208509B (zh) 一种超宽带雷达人体目标姿态可视化增强方法
Tang et al. Occupancy detection and people counting using wifi passive radar
Wang et al. RFID & vision based indoor positioning and identification system
CN108010065A (zh) 低空目标快速检测方法及装置、存储介质及电子终端
Song et al. Efficient through-wall human pose reconstruction using UWB MIMO radar
CN112200146A (zh) 一种基于fmcw的手势识别的检测方法
Wang et al. Multi-person device-free gesture recognition using mmWave signals
Song et al. Dual-task human activity sensing for pose reconstruction and action recognition using 4d imaging radar
Ren et al. Grouped people counting using mm-wave FMCW MIMO radar
Chen et al. A hand gesture recognition method for Mmwave radar based on angle-range joint temporal feature
CN115600101B (zh) 一种基于先验知识的无人机信号智能检测方法及装置
Zhu et al. TWLBR: Multi-Human Through-Wall Localization and Behavior Recognition Based on MIMO Radar
KR102270808B1 (ko) 무선 인공지능을 이용한 보이는 네트워크 제공 장치 및 방법
CN115494496A (zh) 单比特雷达成像系统、方法及相关设备
Chetty et al. Occupancy Detection and People Counting Using WiFi Passive Radar
Xu et al. WiSPE: A COTS Wi-Fi-Based 2-D Static Human Pose Estimation
CN112924928A (zh) 一种基于路径分离的室内Wi-Fi多人检测方法
Ding et al. Data fusion network-based time-frequency enhancement algorithm for Doppler through-wall radar tracking
Pearce et al. Regional trajectory analysis through multi-person tracking with mmWave radar

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940745

Country of ref document: EP

Kind code of ref document: A1