CN113963229A - Video-based wireless signal enhancement and cross-target gesture recognition method - Google Patents

Video-based wireless signal enhancement and cross-target gesture recognition method Download PDF

Info

Publication number
CN113963229A
CN113963229A CN202111110503.0A CN202111110503A CN113963229A CN 113963229 A CN113963229 A CN 113963229A CN 202111110503 A CN202111110503 A CN 202111110503A CN 113963229 A CN113963229 A CN 113963229A
Authority
CN
China
Prior art keywords
video
human body
gesture
point cloud
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111110503.0A
Other languages
Chinese (zh)
Other versions
CN113963229B (en
Inventor
陈晓江
宋凤仪
王楠
张扬帆
李欣怡
房鼎益
李珂
王夫蔚
任宇辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN202111110503.0A priority Critical patent/CN113963229B/en
Publication of CN113963229A publication Critical patent/CN113963229A/en
Application granted granted Critical
Publication of CN113963229B publication Critical patent/CN113963229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video image-based wireless signal enhancement and human body gesture recognition method. And generating virtual data by using a MoCoGAN video generation model, and expanding a video data set. And removing frame set background noise by using a contour detection and target extraction algorithm, converting the 2D image into 3D point cloud data by using an HMR algorithm, setting the height and the shape of the human body by parameter adjustment, and expanding the data set again. And eliminating invisible points of the transmitting terminal in the point cloud by using HPR, and obtaining Wi-Fi signals under corresponding deployment conditions by simulation. And respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions, and establishing a gesture characteristic system analysis model of the Wi-Fi channel state information data to realize high-precision cross-target gesture recognition.

Description

Video-based wireless signal enhancement and cross-target gesture recognition method
Technical Field
The invention relates to the field of passive sensing, in particular to a low-cost video-based wireless signal enhancement and high-precision passive target-crossing gesture recognition method.
Technical Field
With the rapid development of the internet of things and wireless communication technology, most common devices in life gradually have computing and sensing capabilities, such as sensors, cameras, routers and the like, and therefore, human-computer interaction research becomes more and more important. Gesture recognition is a typical application in this field, and a large number of researchers detect gesture activities of users in different ways to complete specific application functions. In the past decades, researchers have proposed various gesture recognition methods, which can be classified into an active type and a passive type according to the difference of recognition modes:
the first type: provided is an active identification method. The active human body gesture recognition method mainly utilizes various sensing devices such as an accelerometer, a gyroscope, a pressure gauge and the like to collect data on different characteristic dimensions to complete recognition of gesture activities of a user, however, the method requires the user to carry additional sensing devices for a long time, is poor in friendliness, and is not widely applied.
The second type: a passive identification method. Compared with an active recognition method, the passive recognition method is more convenient and fast, does not need to carry additional equipment, and mainly comprises two modes of visual image and wireless signal. The gesture recognition method based on the visual image data obtains user activity data through video acquisition equipment, and further performs gesture recognition by using technologies such as image processing and the like, but the method is limited by light conditions in the environment, and no shielding is required between the user and the equipment, so that potential threats exist to the privacy security of the user. The method for sensing and identifying by utilizing the wireless signals has the advantages of low cost, good privacy and the like, wherein the Wi-Fi equipment is widely deployed at present, so that the gesture identification realized by utilizing the Wi-Fi equipment is more universal. However, the gesture recognition method based on the Wi-Fi still has many limiting factors, firstly, the method based on the Wi-Fi needs a large amount of data to train a recognition model with robustness and strong generalization capability, secondly, CSI data are sensitive, and due to diversity of user characteristics, the law of influence on signals is different, so that the method is poor in robustness and the precision is greatly reduced.
In summary, the existing passive gesture recognition technology has disadvantages in cost, robustness and generalization capability. Therefore, it is desirable to have a cross-target, robust and high-precision passive gesture recognition technique.
Disclosure of Invention
In order to solve the problems in the prior art, an object of the present invention is to provide a video-based method for enhancing wireless signals and recognizing gestures across targets, which can provide a high-precision target material recognition rate and greatly reduce the cost required by the method.
In order to realize the task, the invention adopts the following technical solution:
a video-based wireless signal enhancement and cross-target gesture recognition method comprises the following steps:
collecting Wi-Fi channel state information of gesture actions in a monitoring area;
step two, eliminating environmental noise and outlier influence by preprocessing Wi-Fi channel state information data;
collecting original human body gesture action video data in a monitoring area;
randomly generating new video from the original human body gesture motion video data by using a video generation model to obtain an expanded video data set;
step five, a frame set is obtained by preprocessing the extended video data set;
removing background noise and extracting a human body contour through a contour detection and target extraction algorithm;
step seven: converting the frame set into corresponding standard human body surface 3D point cloud data;
step eight: restoring a plurality of human body surface 3D point cloud data sets with different heights and body types from the standard 3D point cloud data through parameter adjustment;
step nine: performing gesture signal simulation on a human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions;
step ten, respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions;
step eleven, establishing a gesture characteristic system analysis model of the simulated wireless signals and the collected Wi-Fi channel state information data, and completing gesture recognition of the cross-target.
Further, the two pairs of channel state information data are preprocessed, and a Hampel filter is adopted to remove outliers; and retaining data information of a low frequency band through a Butterworth low-pass filter to eliminate high-frequency noise.
Further, the video generation model in the fourth step is a MoCoGAN video generation model, the MoCoGAN video generation model generates a new video according to the original human body gesture action video data, and two discriminators are used for discriminating the image and the video frame sequence respectively.
Further, in the seventh step, the frame set is converted into corresponding standard human body surface 3D point cloud data through an HMR algorithm.
Further, in the step eight, each point in the human body surface 3D point cloud data set describes human body surface information of a corresponding characteristic target user, including a body type, a posture or a direction of the target user.
Further, in the eighth step, a hidden point elimination algorithm is adopted to eliminate the disturbance signal in the human body surface 3D point cloud data set, so as to obtain a point set visible to the emission end in the human body surface 3D point cloud data set.
And in the ninth step, performing gesture signal simulation on the human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions:
Figure BDA0003273880840000041
wherein M' (t) is a point set visible to the emission end in a 3D point cloud data set of the human body surface, XT、XR、XmRespectively represent a transmitting end, a receiving end and a reflecting point on the surface of a human body, Am、GmRespectively representing the reflectivity and the angle; g (X)T,XR) Representing the slave transmitting end XTPropagating through a line-of-sight path to reach a receiving end XRThe apparent distance signal strength of (c).
And the time-frequency domain characteristics extracted in the step ten at least comprise minimum value, variance, mean value, skewness, standard deviation, kurtosis, energy and FFT peak value.
The eleventh gesture characteristic system analysis model at least comprises a classification layer, a characteristic mapping layer and a reconstruction layer; the classification layer is used for extracting gesture features, and the feature mapping layer is used for learning a gesture feature mapping relation between Wi-Fi channel state information data and simulated Wi-Fi signals under corresponding deployment conditions; the reconstruction layer is used for assisting the feature mapping layer to perform learning training, and emphasizes and extracts feature mapping relation related to the gesture.
The video-based wireless signal enhancement and cross-target gesture recognition method has the beneficial effects that:
the gesture actions of the cross-target user are recognized through a method of combining video and Wi-Fi, so that the training data acquisition and marking cost is reduced, the data diversity is improved through the 3D point cloud simulation wireless signals, and the robustness and the generalization capability of the system are enhanced. Meanwhile, the constructed model can learn the mapping relation between the analog signal and the real signal, and the robust cross-target identification precision can be realized.
Drawings
FIG. 1 is a flow chart of a video-based wireless signal enhancement and cross-target gesture recognition method of the present invention.
FIG. 2 is a diagram of a general framework of a gesture feature system analysis model.
FIG. 3 is a schematic diagram of a feature mapping layer structure.
FIG. 4 is a deployment diagram of the video-based wireless signal enhancement and cross-target gesture recognition method of the present invention.
FIG. 5 is a graph of the impact of different numbers of training users on recognition accuracy.
FIG. 6 is a graph of the impact of different classification gesture numbers on recognition accuracy.
Fig. 7 is a multi-model comparative evaluation diagram.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Referring to fig. 1, the present embodiment provides a video-based wireless signal enhancement and cross-target gesture recognition method, including:
collecting Wi-Fi channel state information of gesture actions in a monitoring area;
step two, eliminating environmental noise and outlier influence by preprocessing Wi-Fi channel state information data;
collecting original human body gesture action video data in a monitoring area;
randomly generating new video from the original human body gesture motion video data by using a video generation model to obtain an expanded video data set;
step five, a frame set is obtained by preprocessing the extended video data set;
removing background noise and extracting a human body contour through a contour detection and target extraction algorithm;
step seven: converting the frame set into corresponding standard human body surface 3D point cloud data;
step eight: restoring a plurality of human body surface 3D point cloud data sets with different heights and body types from the standard 3D point cloud data through parameter adjustment;
step nine: performing gesture signal simulation on a human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions;
step ten, respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions;
step eleven, establishing a gesture characteristic system analysis model of the simulated wireless signals and the collected Wi-Fi channel state information data, and completing gesture recognition of the cross-target.
In the implementation of the invention, the state information of the gesture action channel on the Wi-Fi link and the corresponding gesture video information are collected at first, and the data are preprocessed respectively. And generating virtual data by using a MoCoGAN video generation model, and expanding a video data set. And removing frame set background noise by using a contour detection and target extraction algorithm, converting the 2D image into 3D point cloud data by using an HMR algorithm, setting the height and the shape of the human body by parameter adjustment, and expanding the data set again. And eliminating invisible points of the transmitting terminal in the point cloud by using HPR, and obtaining Wi-Fi signals under corresponding deployment conditions by simulation. And respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions, and establishing a gesture characteristic system analysis model of the Wi-Fi channel state information data to realize high-precision cross-target gesture recognition.
Optionally, step one, collecting Wi-Fi channel state information of the gesture motion in the monitoring area specifically includes:
the channel state information data collection method comprises the steps that Intel-5300Wi-Fi-NIC is used as a receiving end, a TP-Link AC1750 kilomega wireless router is used as a sending end, a CSITool tool is used for extracting 30 sub-carrier data of an antenna, wherein the Wi-Fi experiment packet sending rate is set to be 1000 pkts/s;
optionally, in the second step, by preprocessing the Wi-Fi channel state information data, environmental noise and outlier influence are eliminated, which specifically includes:
and aiming at special outliers existing in the Wi-Fi channel state information sequence of the collected gesture actions in the monitoring area, a Hampel filter is adopted to remove the outliers. The influence of the gesture action on the signal is mainly concentrated on a low-frequency part, the environmental noise is mainly existed in a high frequency, and the influence caused by the high-frequency noise can be eliminated by reserving data information of a low frequency band through a Butterworth low-pass filter. In order to remove the influence of different gesture making time lengths of different people and ensure the uniformity of data in time dimension, interpolation processing needs to be carried out on the data.
Optionally, step three, collecting video data of the original human body gesture actions in the monitoring area, specifically including:
acquiring original video data of a target gesture by using mobile phone equipment, wherein the acquisition frame rate is set to be 30 fps/s;
optionally, a video generation model is used to randomly generate new video from the original human body gesture video data to obtain an extended video data set, which specifically includes:
the extended video data set adopts a MoCoGAN video generation model, utilizes the countermeasure thought, generates a new virtual video according to the original human body gesture action video data distribution, and uses two discriminators to discriminate the image and the video frame sequence respectively.
MoCoGAN mainly divides human body posture into two parts of content and action, namely { ZI=ZC+ZMIn which Z isIRepresenting a set of video frames, each point Z ∈ ZIRepresenting an image, a video of K frames is represented by a path of length K, [ z [ ](1),...,z(K)]。ZCRepresenting a content vector space, ZMRepresenting a motion vector space.
Follow-up of content and actionsThe machine vectors are mapped to a sequence of video frames for video generation. The model structure mainly adopts 4 structures: the GRU is a recurrent neural network which is mainly used for simulating the next action; gIFor generating successive frame images; dIThe method is used for judging whether the generated image is real or not, namely whether the content is real or not; dVAnd the method is used for judging whether the video formed by the generated images is real, namely whether the motion is real. During the training process, the generator GIOperation content and action representation { Z }C,ZMGenerating a sequence of video frames
Figure BDA0003273880840000081
Sequence of frames
Figure BDA0003273880840000082
Randomly taking 1 frame (S)1) Input picture discriminator DISequence of frames to be paired
Figure BDA0003273880840000083
Randomly taking T frame (S)T) Input video discriminator DVTherefore, the virtual video with reasonable content and action is generated, and the diversity of video data is improved.
In addition, in order to reduce the training complexity, the color three-channel video can be converted into the gray single-channel video for training.
Step five, preprocessing the extended video data set to obtain a frame set specifically comprises the following steps:
and intercepting the original human body gesture action video data frame by frame through an MATLAB video processing tool and storing the intercepted data as frame images to obtain frame image stream data of each action video, namely obtaining a frame set.
Removing background noise and extracting a human body contour through a contour detection and target extraction algorithm, and specifically comprising the following steps of:
and (3) carrying out contour detection and target extraction by using a DeepLabv3plus model, finding out the boundary between the human body contour part and other parts, and further extracting a human body target region from the boundary.
Step seven: converting the frame set into corresponding standard human body surface 3D point cloud data, which specifically comprises the following steps:
the HMR algorithm is used to reconstruct a complete human 3D point cloud from a single image centered around the person in a feed forward manner. In the training, it is assumed that all images are labeled with joints in 2D, and that some of the data have 3D labels. The original 2D image is convolutionally encoded to obtain convolution characteristics, which are sent to an iterative 3D regression module SMPL, which aims to infer 3D human and camera parameters, which allow its 3D joints to be projected onto the labeled 2D joints. The derived parameters are also sent to a challenge discriminator network to determine from unpaired data whether the 3D parameters are true body movements, thereby making the resulting 3D point cloud information more realistic.
When 3D annotation information is present, it is used as an intermediate loss, the overall loss function being: l ═ λ (L)reproj+t×L3D)+Ladv. Wherein L isreprojDescribes the loss function of a 3D regression model SMPL whose optimization goal is to make LreprojMinimization; the lambda parameter is used for describing the relative importance of each target, t is an indication parameter, when the training data set contains a real 3D label, the value of the parameter is 1, otherwise, the value is 0; corresponding L3DNamely the loss value of training when a real 3D image exists; l isadvThe loss function of discriminator D is described.
Step eight: through parameter adjustment, restore the human surface 3D point cloud data set of a plurality of different heights, sizes in the standard 3D point cloud data, specifically include:
each point in the human body surface 3D point cloud data set describes human body surface information of a corresponding characteristic target user, and the human body surface information at least comprises the body type, posture or direction of the target user. By the HMR algorithm, a standard (normalized) human body surface 3D point cloud data set can be obtained, and by adjusting parameters, a plurality of human body surface 3D point cloud data sets with different heights and body types can be recovered from an image of one user, so that the data diversity is further improved.
Further, in the eighth step, a hidden point elimination algorithm is adopted to eliminate the disturbance signal in the human body surface 3D point cloud data set, so as to obtain a point set visible to the emission end in the human body surface 3D point cloud data set. The method specifically comprises the following steps:
extracting a point set visible to a transmitting end in a human body surface 3D point cloud data set by adopting a hidden point elimination algorithm HPR, wherein the method comprises the following steps:
because there is an angle problem between the deployment position of the transceiving end and the target position, the signal is not propagated to the receiving end through the reflection action of all 3D surface points (including the back and side of the target), i.e. not all points in the 3D point cloud will generate the reflection action on the signal. In order to correctly estimate signals, hidden points in the 3D point cloud are eliminated by using an HPR algorithm, the algorithm judges visible points in the 3D point cloud according to a preset simulation receiving and transmitting end position by using spherical transformation and convex hulls, so that the hidden points in the point cloud are eliminated, and a point set visible to a transmitting end in a 3D point cloud data set on the surface of a human body is obtained.
Step nine: the method comprises the following steps of carrying out gesture signal simulation on a human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions, and specifically comprises the following steps:
under the condition of not considering the influence of environmental multipath, a signal received by a receiving end Rx consists of two parts, namely a transmitting end X under a line-of-sight pathTIs propagated to the receiving end XRAnd in a non line-of-sight path from the transmitting end XTReflected to a receiving end X by a visible reflection point M' (t) on the surface of the human bodyROf the signal of (1). The method carries out gesture signal simulation on a human body surface 3D point cloud image set, and obtains Wi-Fi signals under corresponding deployment conditions as follows:
Figure BDA0003273880840000111
wherein M' (t) is a point set visible to the emission end in a 3D point cloud data set of the human body surface, XT、XR、XmRespectively represent a transmitting end, a receiving end and a reflecting point on the surface of a human body, Am、GmRespectively representing the reflectivity and the angle; g (X)T,XR) Representing the slave transmitting end XTPropagating through a line-of-sight path to reach a receiving end XRThe apparent distance signal strength of the site;
the green function g (x, y) describes the signal strength of the signal propagating from x to the receiving end at y, as shown in the formula:
Figure BDA0003273880840000112
where | l. - | describes the euclidean norm, λ represents the wavelength of the signal.
Reflecting point X via the surface of the usermThe signal strength reflected to the receiving end is determined by two factors, namely the surface area and the reflection point XmThe direction of reflection of (1). Wherein the larger the reflection area is, the larger the reflectance AmThe higher. The direction r of the strongest reflected signal can be obtained by the reflection model when the signal is transmitted to the surface of the human bodymMay pass through the normal nmAnd (3) determining:
Figure BDA0003273880840000113
from the point of reflection XmDirection X of propagation to receiver RxR-XmSignal strength sum rmThe included angle between the two is inversely proportional and can be determined by a Gaussian function GmAnd (3) determining:
Figure BDA0003273880840000121
by the method, gesture signal simulation is carried out on the human body surface 3D point cloud image set, and Wi-Fi signals under corresponding deployment conditions are obtained.
Step ten, respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions, and specifically comprising the following steps:
the artificial features related to the invention specifically comprise time domain and frequency domain features. The time domain features specifically include a maximum, a minimum/maximum, a variance, a mean, a skewness, a standard deviation, a kurtosis, a (0.25/0.5/0.75) quantile; the frequency domain features specifically include energy, FFT peaks.
Step eleven, establishing a simulated wireless signal and a gesture feature system analysis model of the collected Wi-Fi channel state information data to complete gesture recognition of a cross-target, specifically comprising the following steps of:
in consideration of the multipath influence in the real environment, the video-simulated wireless signals cannot be directly applied to the identification of the Wi-Fi channel state information of collected gesture actions, so that a gesture feature system analysis model of the simulated wireless signals and the collected Wi-Fi channel state information data needs to be established, and a robust high-precision target-crossing gesture identification effect is achieved. And environmental multipath influence can be removed as far as possible in order to ensure the extracted characteristics.
The gesture characteristic system analysis model at least comprises a classification layer, a characteristic mapping layer and a reconstruction layer; the classification layer is used for extracting gesture features, and the feature mapping layer is used for learning a gesture feature mapping relation between Wi-Fi channel state information data and simulated Wi-Fi signals under corresponding deployment conditions; the reconstruction layer is used for assisting the feature mapping layer to perform learning training, and emphasizes and extracts feature mapping relation related to the gesture. FIG. 2 is a general framework diagram of a gesture feature system analysis model.
A classification layer with high generalization capability is obtained by training with simulated high-quality non-interference data, a reconstruction layer is added aiming at the problem of multipath interference of the collected Wi-Fi channel state information data, the branch assists a feature mapping layer to realize the mapping conversion of the collected Wi-Fi channel state information data and the simulated wireless signals of corresponding gestures by neglecting the external environment influence.
The classification layer uses CNN to extract features and learn the wide features of gestures in high-diversity analog signals, so that a classifier with strong generalization capability is realized. The layer mainly performs convolution, pooling and normalization operations, introduces a nonlinear mapping function in the final full-connection operation, and outputs the prediction label distribution of the model to the data. This layer will be used to guide the feature mapping layer to learn as much as possible the gesture feature mapping features between the real data and the simulated data.
And the feature mapping layer is used for learning a gesture feature mapping relation from Wi-Fi signals to analog signals in a real environment, and finally realizing robust gesture recognition based on the classification layer model.
The reconstruction layer is used for assisting the feature mapping layer to perform learning training, and emphasizes and extracts feature mapping relation related to the gesture. The reconstruction layer is only used for assisting the feature mapping layer to learn in a training stage, when the network training is put into practical use after finishing, the reconstruction layer does not participate in a gesture classification task, and the specific organization structure of the reconstruction layer introduces Euclidean distance as a loss function for the learning effect of the weighing reconstruction layer:
Figure BDA0003273880840000131
and d (,) is an Euclidean distance, and the mapping capability of the feature mapping layer on the gesture-related features is enhanced through continuous iteration and auxiliary optimization of the layer, so that a better effect is finally achieved. FIG. 3 is a diagram illustrating a feature mapping layer structure.
And (3) comparing experimental results:
the inventor tries to evaluate the video-based wireless signal enhancement and cross-target gesture recognition method provided by the embodiment from the following three aspects:
the recognition performance of different training user numbers; different classification gesture number recognition performance; and (5) multi-model comparative evaluation.
Identification performance of different training user numbers:
fig. 5 shows the verification of the system validity under 5 gestures. The training set A, B, C, D contains 5 types of gesture data for 4, 8, 12, and 16 users, respectively, and the test set contains 4 user data. Under 5 gestures, the cross-target recognition accuracy gradually improves with the increase of the data volume of the training set.
Different classification gesture number recognition performance:
fig. 6 shows training sets of 4 different numbers of users under 5 and 10 gestures, and using the same test set to evaluate recognition performance. It can be seen that as the number of classified gestures increases, the recognition accuracy decreases, but still higher accuracy recognition can be maintained.
And (3) multi-model comparison evaluation:
FIG. 7 is a graph of the use of a conventional classification model: SVM, KNN, RF, CNN and our classification model are used for obtaining cross-target classification precision results under 5 and 10 gestures respectively. From the figure we can see that our method is a significant improvement over the baseline method.
Cross-target gesture recognition performance:
in general, the invention greatly reduces the cost and can achieve satisfactory high-precision cross-target gesture recognition precision.

Claims (9)

1. A video-based wireless signal enhancement and cross-target gesture recognition method is characterized by comprising the following steps: the method comprises the following steps:
collecting Wi-Fi channel state information of gesture actions in a monitoring area;
step two, eliminating environmental noise and outlier influence by preprocessing Wi-Fi channel state information data;
collecting original human body gesture action video data in a monitoring area;
randomly generating new video from the original human body gesture motion video data by using a video generation model to obtain an expanded video data set;
step five, a frame set is obtained by preprocessing the extended video data set;
removing background noise and extracting a human body contour through a contour detection and target extraction algorithm;
step seven: converting the frame set into corresponding standard human body surface 3D point cloud data;
step eight: restoring a plurality of human body surface 3D point cloud data sets with different heights and body types from the standard 3D point cloud data through parameter adjustment;
step nine: performing gesture signal simulation on a human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions;
step ten, respectively extracting time-frequency domain characteristics of the collected Wi-Fi channel state information data and Wi-Fi signals under corresponding deployment conditions;
step eleven, establishing a gesture characteristic system analysis model of the simulated wireless signals and the collected Wi-Fi channel state information data, and completing gesture recognition of the cross-target.
2. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: preprocessing the channel state information data by two pairs, and removing outliers by adopting a Hampel filter; and retaining data information of a low frequency band through a Butterworth low-pass filter to eliminate high-frequency noise.
3. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: and the video generation model in the fourth step is a MoCoGAN video generation model, the MoCoGAN video generation model generates a new video according to the original human body gesture action video data, and two discriminators are used for discriminating the image and the video frame sequence respectively.
4. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: and in the seventh step, the frame set is converted into corresponding standard human body surface 3D point cloud data through an HMR algorithm.
5. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: and step eight, each point in the human body surface 3D point cloud data set describes human body surface information of a target user with corresponding characteristics, and the human body surface information at least comprises the body type, posture or direction of the target user.
6. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 5, wherein: and step eight, eliminating disturbance signals in the human body surface 3D point cloud data set by adopting a hidden point elimination algorithm to obtain a point set with a visible emission end in the human body surface 3D point cloud data set.
7. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 6, wherein: and in the ninth step, performing gesture signal simulation on the human body surface 3D point cloud image set to obtain Wi-Fi signals under corresponding deployment conditions:
Figure FDA0003273880830000021
wherein M' (t) is a point set visible to the emission end in a 3D point cloud data set of the human body surface, XT、XR、XmRespectively represent a transmitting end, a receiving end and a reflecting point on the surface of a human body, Am、GmRespectively representing the reflectivity and the angle; g (X)T,XR) Representing the slave transmitting end XTPropagating through a line-of-sight path to reach a receiving end XRThe apparent distance signal strength of (c).
8. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: and the time-frequency domain characteristics extracted in the step ten at least comprise minimum value, variance, mean value, skewness, standard deviation, kurtosis, energy and FFT peak value.
9. The video-based wireless signal enhancement and cross-target gesture recognition method of claim 1, wherein: the eleventh gesture characteristic system analysis model at least comprises a classification layer, a characteristic mapping layer and a reconstruction layer; the classification layer is used for extracting gesture features, and the feature mapping layer is used for learning a gesture feature mapping relation between Wi-Fi channel state information data and simulated Wi-Fi signals under corresponding deployment conditions; the reconstruction layer is used for assisting the feature mapping layer to perform learning training, and emphasizes and extracts feature mapping relation related to the gesture.
CN202111110503.0A 2021-09-23 2021-09-23 Video-based wireless signal enhancement and cross-target gesture recognition method Active CN113963229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111110503.0A CN113963229B (en) 2021-09-23 2021-09-23 Video-based wireless signal enhancement and cross-target gesture recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111110503.0A CN113963229B (en) 2021-09-23 2021-09-23 Video-based wireless signal enhancement and cross-target gesture recognition method

Publications (2)

Publication Number Publication Date
CN113963229A true CN113963229A (en) 2022-01-21
CN113963229B CN113963229B (en) 2023-08-18

Family

ID=79461919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111110503.0A Active CN113963229B (en) 2021-09-23 2021-09-23 Video-based wireless signal enhancement and cross-target gesture recognition method

Country Status (1)

Country Link
CN (1) CN113963229B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170123487A1 (en) * 2015-10-30 2017-05-04 Ostendo Technologies, Inc. System and methods for on-body gestural interfaces and projection displays
CN108073277A (en) * 2016-11-08 2018-05-25 罗克韦尔自动化技术公司 For the virtual reality and augmented reality of industrial automation
US20190087654A1 (en) * 2017-09-15 2019-03-21 Huazhong University Of Science And Technology Method and system for csi-based fine-grained gesture recognition
CN111709295A (en) * 2020-05-18 2020-09-25 武汉工程大学 SSD-MobileNet-based real-time gesture detection and recognition method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170123487A1 (en) * 2015-10-30 2017-05-04 Ostendo Technologies, Inc. System and methods for on-body gestural interfaces and projection displays
CN108073277A (en) * 2016-11-08 2018-05-25 罗克韦尔自动化技术公司 For the virtual reality and augmented reality of industrial automation
US20190087654A1 (en) * 2017-09-15 2019-03-21 Huazhong University Of Science And Technology Method and system for csi-based fine-grained gesture recognition
CN111709295A (en) * 2020-05-18 2020-09-25 武汉工程大学 SSD-MobileNet-based real-time gesture detection and recognition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
温俊芹;王修晖;: "基于线性判别分析和自适应K近邻法的手势识别", 数据采集与处理, no. 03 *

Also Published As

Publication number Publication date
CN113963229B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
Li et al. Capturing human pose using mmWave radar
CN110555412B (en) End-to-end human body gesture recognition method based on combination of RGB and point cloud
CN107992792A (en) A kind of aerial handwritten Chinese character recognition system and method based on acceleration transducer
CN111505632B (en) Ultra-wideband radar action attitude identification method based on power spectrum and Doppler characteristics
US9117138B2 (en) Method and apparatus for object positioning by using depth images
US9081999B2 (en) Head recognition from depth image
Kogler et al. Event-based stereo matching approaches for frameless address event stereo data
KR101612605B1 (en) Method for extracting face feature and apparatus for perforimg the method
CN110287918B (en) Living body identification method and related product
CN111368635B (en) Millimeter wave-based multi-person gait recognition method and device
CN112232109A (en) Living body face detection method and system
CN110728213A (en) Fine-grained human body posture estimation method based on wireless radio frequency signals
CN103377366A (en) Gait recognition method and system
CN111476089B (en) Pedestrian detection method, system and terminal for multi-mode information fusion in image
CN113609976A (en) Direction-sensitive multi-gesture recognition system and method based on WiFi (Wireless Fidelity) equipment
CN111444488A (en) Identity authentication method based on dynamic gesture
Kekre et al. Gabor filter based feature vector for dynamic signature recognition
CN110866468A (en) Gesture recognition system and method based on passive RFID
CN103324921B (en) A kind of mobile identification method based on interior finger band and mobile identification equipment thereof
CN115343704A (en) Gesture recognition method of FMCW millimeter wave radar based on multi-task learning
CN113051972A (en) Gesture recognition system based on WiFi
KR102309453B1 (en) Social network service system based on unidentified video and method therefor
Zhang et al. 3D human pose estimation from range images with depth difference and geodesic distance
CN111142668A (en) Interaction method for positioning and activity gesture joint identification based on Wi-Fi fingerprint
CN116466827A (en) Intelligent man-machine interaction system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant