CN115457732A

CN115457732A - Fall detection method based on sample generation and feature separation

Info

Publication number: CN115457732A
Application number: CN202211018371.3A
Authority: CN
Inventors: 罗悦; 周瑞; 张子若; 程宇; 张宏; 王佳昊
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-12-09
Anticipated expiration: 2042-08-24
Also published as: CN115457732B

Abstract

The invention provides a fall detection method based on sample generation and feature separation, which is used for extracting amplitude information of each subcarrier from CSI (channel state information) data of WiFi (wireless fidelity) signals to perform fall detection. In order to solve the problems of insufficient falling sample amount and dependence of a falling detection model on the environment, a falling action sample in source domain data is subjected to noise adding reconstruction to obtain a virtual falling action sample, a feature extractor is used for extracting feature vectors of the source domain data and the virtual falling data, a feature vector layer neuron obtained by the feature extractor can be divided into two parts, the upper part is brought into a falling and non-falling detection two-classifier training, the lower part is brought into a domain classifier training, the upper part gradually retains information related to falling and non-falling actions in the training process, and the lower part gradually retains environment related information in the training process, so that the falling and non-falling detection two-classifier can better distinguish the falling and non-falling actions, and the generalization capability of falling detection on the environment is enhanced.

Description

Fall detection method based on sample generation and feature separation

Technical Field

The invention relates to a WiFi signal-based behavior detection technology, in particular to a falling action detection technology.

Background

With the development of the internet of things and artificial intelligence, action recognition based on wireless signals becomes possible. Since falls are a major cause of injury to the elderly population, indoor fall detection has become an urgent need. Existing fall detection technologies are mostly based on wearable devices or video surveillance. However, wearable device-based fall detection requires wearing of the wearable device, while video surveillance-based fall detection is highly dependent on indoor light and poses privacy invasion risk. These limitations make it difficult to widely deploy fall detection systems in a residential environment.

In order to improve the current fall detection method and overcome the defects of the existing detection technology, the invention provides a method for fall detection based on WiFi signals. Compared with a traditional video detection mode, wiFi has penetrating capacity, can normally work under the conditions of weak light, no light and shielding, and almost does not invade privacy. Compare in wearable equipment mode, wiFi fall detection is a contactless detection mode, and the user need not to carry any equipment, has greatly increased the convenience. Due to the popularity and wide coverage of WiFi, wiFi-based fall detection is extremely low cost. The WiFi has high potential value and future prospect for indoor fall detection.

Currently, a WiFi-based fall detection method generally obtains Channel State Information (CSI) through a wireless network card and related tools to perform fall action identification. The propagation of WiFi indoors consists of the superposition of multiple paths, including direct paths, obstacle reflected paths, etc. The CSI has good stability and can well reflect the channel state. When the human body performs different actions, different influences are caused on the propagation of WiFi signals, and further CSI changes are caused, so that the falling action is detected. The method comprises the steps of collecting CSI information of daily actions (including falling actions), training a classification model capable of identifying the falling actions by using a machine learning algorithm, and inputting real-time CSI data into the model for classification so as to detect whether the falling occurs or not.

Fall detection is a two-classification problem, usually with supervised learning methods. In order to ensure the accuracy of fall detection, a large amount of labeled different motion data (including fall motions) needs to be collected to train the model, and the cost of collecting and labeling the labeled data is high, which consumes manpower. Due to the fact that environmental changes can generate large and difficult-to-predict influences on WiFi signals, prediction accuracy of a model trained by the supervised learning method in a changing environment is greatly reduced, and the method cannot be suitable for practical application.

Disclosure of Invention

Aiming at the defect that the existing WiFi-based fall detection method cannot well adapt to the environment change to cause the reduction of the detection precision, the invention aims to solve the technical problem of providing the fall detection method which enables the characteristics of a generated sample to be independent of the environment as far as possible so as to reduce the influence of the environment change on the detection.

The invention adopts the technical scheme that a fall detection method based on sample generation and feature separation comprises the following steps:

1) Fall detection environment deployment: deploying a WiFi transmitter and a WiFi receiver in a detection environment, and covering the whole detection area with WiFi signals;

2) A CSI data acquisition step: setting action types, wherein the action types comprise c actions, namely falling actions and non-falling actions; setting environment types, wherein each user repeatedly executes c actions in each environment, and each action is executed for multiple times; collecting Channel State Information (CSI) data in a detection area when actions are executed each time, and extracting a subcarrier amplitude value from the CSI data corresponding to one execution action as an action sample;

3) Data division: taking an action sample and corresponding action labels and environment labels as source domain data, and forming a source domain data set by using the source domain data corresponding to all actions executed in all environments;

4) Adding random Gaussian noise into an action sample of source domain data falling by utilizing a source domain data set action label, reconstructing by utilizing a self-encoder to generate a falling action virtual sample, generating t x N falling action virtual samples, and generating virtual falling data with the same format as the source domain data by utilizing the falling action virtual sample;

5) Inputting the source domain data and the virtual falling data into a feature extractor for feature separation training, and then respectively inputting the separation features output by feature extraction into a falling and non-falling detection two-domain classifier and a multi-domain classifier for detection training, specifically:

5-1) using a feature extractor to perform motion and environment separation extraction on the source domain data and the virtual fall data to obtain feature vectors

The feature separator to separate domain information related to the environment into feature vectors

The lower half of (1), storing the motion information in the feature vector

The upper half of the training system is used as a target to finish training;

5-2) feature vector

Divided into upper half

And the lower half

Layer of feature vectors

Upper half of (1)

Carrying in fall and non-fall detection two-classifier training, and carrying out the lower half part

Carrying out multi-domain classifier training; fall and non-fall detection bi-classifier for receiving feature vector layer

Upper half of (1)

Finishing training by taking information related to falling and non-falling actions as a target and performing two classification judgment of the falling and non-falling actions; multi-domain classifier training for receiving feature vector layers

The lower half of the training is completed with the goal of keeping the domain information related to the environment;

6) A detection step: and inputting the subcarrier amplitude of the CSI data to be classified into the trained feature extractor, and inputting the upper half part of features of the feature vector output by the feature extractor into the trained fall and non-fall detection secondary classifier to finish fall detection.

According to the influence of different actions of an experimenter in the detection area on the WiFi signals, the amplitude information of each subcarrier is extracted from the CSI data of the WiFi signals for fall detection. In order to solve the problems of insufficient falling sample size and dependence of a falling detection model on the environment, gaussian noise is added into a falling motion sample in source domain data, a self-encoder is used for reconstruction to obtain a virtual falling motion sample, a feature extractor is used for extracting feature vectors of the source domain data and the virtual falling motion data, a neuron of a feature vector layer obtained by the feature extractor can be divided into two parts, the upper part is brought into a falling and non-falling detection two-classifier for training, the lower part is brought into a domain classifier for training, information related to falling and non-falling motions is gradually reserved in the upper part in the training process, and information related to the environment is gradually reserved in the lower part in the training process, so that the falling and non-falling detection two-classifier can better distinguish the falling and non-falling motions.

The method has the beneficial effect that through the step of reconstructing the falling type samples, the problem that the model is over-fitted with non-falling type data due to the fact that the falling type samples are insufficient is solved. In order to further improve the environment-independent fall detection precision, the invention carries out feature separation on the feature vectors obtained by the feature extractor, and brings the two separated parts into a fall detection two-classifier and a non-fall detection two-classifier and a domain classifier for training respectively, thereby enhancing the generalization capability of the fall detection on the environment.

Drawings

Fig. 1 is a schematic diagram of an experimental scenario.

Fig. 2 is an overall frame diagram.

Detailed Description

The method comprises the following specific steps:

1) And (3) deployment of fall detection environments: the invention needs to be carried out in an environment covering WiFi, and a WiFi transmitter and a WiFi receiver are arranged in a detection environment, as shown in FIG. 1;

2) A CSI data acquisition step: setting the action types, wherein c actions are required to include falling actions and non-falling actions, specifically, c actions include 1 falling action ₁ And c-1 non-fall actions c _i The label of the falling action is i =2, \8230c, c, or the distribution number of other falling actions and non-falling actions is configured according to the requirement; setting environment types, wherein t environments are set; in each environment, each user repeatedly executes c actions, each action is executed for N times, and Channel State Information (CSI) data in a detection area is collected when the action is executed each time; extracting subcarrier amplitude values from the collected CSI data as action samples;

3) Data division: taking CSI data collected in t different environments as a source domain data set

Wherein

Represents the classification of the action as c _i And the environment is classified as t _l Of source domain CSI data, t _l The l-th context class is represented,

representing the classification of actions as c _i A category label of (2), wherein the fall action c ₁ Label in the same category, non-falling action c _i (i ≠ 1) is the same category label;

representing the classification of an environment as t _l A domain tag of (a);

4) Generating a falling action virtual sample for an action sample of falling source domain data by utilizing the action label in the source domain data set, and generating t × N falling action virtual samples together, so that the virtual samples and the falling action samples are consistent with the action sample amount of non-falling categories, and the samples are balanced during the subsequent two-category training; generating virtual falling data with the same format as the source domain data by using a falling motion virtual sample; the generation of one virtual sample of the falling motion comprises the following steps:

4-1) based on a source domain dataset X ^S One source domain data set with fall as middle action label

Selecting a source domain data

Source domain data

Wherein each sample is data acquired by multiple transmit-receive antenna pairs

In combination wherein a _i Representing antenna pair i. Source domain data

The motion samples in (1) are divided into n sets of data acquired by different antenna pairs:

wherein n represents the number of antenna pairs;

4-2) using a combined auto-encoder for n sets of data acquired by different antenna pairs

Reconstructing to generate n sets of corresponding intermediate layer vectors

The number of the combined self-encoders is n, and corresponds to the number of the antenna pairs;

each generating coder corresponds to a group of data acquired by the same antenna pair, and n generating coders are adopted to respectively correspond to n groups of data acquired by different antenna pairs

Performing feature extraction to obtain n groups of intermediate layer vectors

b is random gaussian noise, and function f represents the generation encoder; each generating encoder in this embodiment is a neural network including 4 fully-connected layers, and may also be implemented by using other fully-connected layers and other network structures;

4-3) each generating decoder corresponds to a group of intermediate layer vectors, n generating decoders are adopted to respectively reconstruct n groups of different intermediate layer vectors to obtain n groups of numbers acquired by different antenna pairsAccording to

N groups of virtual data after reconstruction

The structure of the generating decoder is opposite to that of the generating encoder, the generating decoder is a 4-layer fully-connected neural network, and the function g represents the generating decoder;

4-4) acquiring n groups of data acquired by different antenna pairs

N groups of reconstructed virtual data

Spliced into a complete and original motion sample

Similar fall action virtual samples:

5) Feature separation, comprising the steps of:

5-1) Source Domain data x Using feature extractor ^S And virtual fall data

Extracting action characteristics and environment characteristics to obtain a characteristic vector layer

The feature extractor is composed of 5 convolution blocks and a layer of full-connection network, each convolution block comprises 1 convolution layer and 1 pooling layer, and the 5 convolution blocks respectively adopt convolution kernels of 200, 150, 100, 20 and 10; layer of feature vectors

Has a dimension of (m × 2, 1), wherein the value of m needs to be dynamically adjusted according to the number of antenna pairs, and the feature vector layer is divided into two layers

Division into

And

two parts of which

Has the dimension of (m, 1),

has a dimension of (m, 1);

5-2) layer of feature vectors

Upper half of (2)

Bring into tumble and non-tumble and detect two classifier training, the latter half

Carrying in multi-domain classifier training, wherein a falling and non-falling detection two classifier is composed of 4 layers of fully-connected layer neural networks, and a multi-domain classifier is composed of 4 layers of fully-connected layer neural networks; layer of feature vectors

The upper half of (i.e.

Gradually retaining information related to falling and non-falling actions in the training process, and providing a feature vector layer

The lower half of (i.e.

Gradually retaining information related to the environment during the training process;

the feature extractor, the fall and non-fall detection two-classifier and the multi-domain classifier are not limited to the above structural description as long as the structure can support them to meet the training target:

the feature extractor is used for receiving the source domain data and the virtual falling data and outputting a feature vector

To separate context-related domain information into feature vectors

The lower half of (1), storing the motion information in the feature vector

The upper half of the training system is used as a target to finish training;

fall and non-fall detection bi-classifier for receiving feature vector layer

Upper half of (1)

Finishing training by taking the information related to falling and non-falling actions as a target and performing two-classification judgment of the falling and non-falling actions;

multi-domain classifier training for receiving layers of feature vectors

6) Inputting the amplitude of the CSI data to be classified into a trained feature extractor, and obtaining featuresThe extractor extracts the feature vector layer

To pair

After characteristic separation to obtain

And

two parts, the upper half being characterised by

And carrying in a trained fall and non-fall detection two classifier to finish fall detection.

Experimental verification

A WiFi transmitter and a WiFi receiver are arranged in a detection environment, the transmitter is a common commercial router, the receiver is a wireless network card provided with an Intel WiFi Link 5300, and the transmitter and the receiver are respectively provided with 3 antennas and form 9 links. And acquiring CSI information from the Intel WiFi Link 5300 wireless network card by using a CSI tools package, wherein each antenna pair can acquire 30 groups of subcarrier information, and each data package totals 270 groups of subcarrier information.

The specific implementation steps are as follows:

step 1: a pair of WiFi transmitter and WiFi receiver is deployed in the detection area, wiFi signals are required to cover the whole detection area, and the experimental environment is schematically shown in FIG. 1.

And 2, step: CSI motion data are collected in the detection area. Selecting a plurality of time periods as different environments, and executing related actions in each environment by each user, wherein the actions specifically comprise standing posture falling, sitting posture falling, squatting, standing, walking, jumping and the like, and each action is repeated for a plurality of times. The sampling frequency was set at 100Hz.

And step 3: taking the action data of different environments as source domains to form a source domain data set

Each user performs each action N times in each context.

The CSI subcarriers may be described as complex forms

For H _i Taking absolute value to obtain amplitude data set X ^a ＝{|H _i An | }; the variance threshold clipping is performed on the extracted amplitude values so that each motion sample size is 300 × 270, and the sampled data can be represented as:

x＝[x ₁ ,x ₂ ,...x ₃₀₀ ]

where 300 is the number of packets per action, x _i ＝[h ₁ ,h ₂ ...h ₂₇₀ ]Is a data packet containing 270 subcarrier amplitudes.

And 4, step 4: a combined self-encoder is adopted to generate virtual samples of the falling action, so that the sample data size of the falling category in the training set is consistent with the sample data size of the non-falling category, as shown in fig. 2:

step 4-1: selecting a source domain data set x ^S Data set with fall category as middle action category

Source domain data

Data in which each sample is formed by a plurality of transmit-receive antenna pairs

In combination wherein a _i Representing antenna pair i, source domain data

Each sample is divided into n sets of data acquired by different antenna pairs:

where n represents the number of antenna pairs, in this test n has a value of 9, i.e. there are 9 different sets of antenna pairs; a is _i Representing different pairs of antennas, i.e. source domain data

Each sample in the array is divided into a plurality of groups of data collected by different antenna pairs;

step 4-2: using a combined auto-encoder to n sets of data acquired by different antenna pairs

Reconstructing to generate n sets of corresponding intermediate layer vectors

The number of the combined self-encoders is n, and the number of the combined self-encoders corresponds to the number of different antenna pairs;

each generating coder corresponds to a group of antenna pair data, and n generating coders are adopted to respectively pair n groups of data acquired by different antenna pairs

Feature extraction is performed, thereby obtaining n sets of intermediate layer vectors

Each generating coder is a neural network comprising 4 fully-connected layers, the input dimension is 300 multiplied by 30, 30 is the number of subcarriers contained in each signal link, b is random noise with the dimension of 300 multiplied by 30, and the output dimension is 64 multiplied by 30; in an embodiment, the number n of antenna pairs is 9, then a total of 9 sets of intermediate layer vectors are generated; the function f represents the generating encoder;

step 4-3: each generating decoder corresponds to a group of intermediate layer vectors, n generating decoders are adopted to respectively reconstruct n groups of different intermediate layer vectors to obtain n groups of data acquired by different antenna pairs

N groups of virtual data after reconstruction:

the generation decoder and the generation encoder are of opposite structures and are respectively a 4-layer fully-connected neural network, the input dimension is 64 multiplied by 30, and the output dimension is 300 multiplied by 30; in the embodiment of the present invention, if the number n of antenna pairs is 9, then 9 sets of virtual data are generated in total, each set corresponding to a different antenna pair; function g represents the generation decoder;

wherein, each group of the generation encoder and the generation decoder adopts Mean Square Error (MSE) as a loss function during training, and each group adopts different antenna pairs to collect falling motion samples

Corresponding to virtual samples of falling actions

As an MSE input, the loss function may be expressed as

Training the weights of the encoder and decoder such that reconstructed data is obtained for the data by different antennas

Source domain fall category data approaching corresponding antenna pairs

Step 4-4: will be provided withBy n sets of data acquired by different antenna pairs

Splicing n groups of reconstructed virtual data into a complete and original tumble motion sample

Similar virtual fall action sample

The transceivers adopted by the invention are all 3 antennas, so that 9 signal links are formed in total, namely n =9.

And 5: using feature extractor to align source domain data x ^S And virtual fall data

Performing characteristic separation to separate action information and environment information and bring action information characteristics and environment information characteristics into a classifier for training respectively;

step 5-1: the feature extractor is composed of 5 convolution blocks Conv1d and a layer of fully-connected network, each convolution block includes 1 convolution layer and 1 pooling layer, the input dimension of the feature extractor is 300 × 270 × 1, and the output dimension is (m × 2) × 1:

where x is the source domain data x ^S Or virtual fall data

For the feature vector layer, the feature vector layer

Dimension of (m × 2, 1), where the value of m needs to be dynamically adjusted according to the number of antenna pairs, in an embodiment of the present invention, the value of m is 1025, and the eigenvector layer is stacked

Division into

And

two parts, wherein

Has the dimension of (m, 1),

has the dimension of (m, 1):

step 5-2: will be provided with

The training is carried out in a falling and non-falling detection two-stage classifier formed by a 2-layer full-connection layer neural network, a probability distribution is obtained through the falling and non-falling detection two-stage classifier, and the probability distribution is combined with a real action label to calculate the cross entropy loss L ₁ ：

Where M is the number of samples in each training batch, y _i The label representing sample i, with a falling action of 1 and a non-falling action of 0 _i Representing the probability that sample i is predicted to fall.

Step 5-3: will be provided with

The method is carried out by training in a multi-domain classifier formed by a 2-layer full-connection layer neural network, a probability distribution is obtained through the multi-domain classifier, the probability distribution is combined with a real domain label, and cross entropy loss L is calculated ₂ ：

Where K is the number of domain classes, z _id Is the true domain label (0 or 1) of the sample, z is the true class of sample i equals d _id Get 1, otherwise z _id Take 0,p _ic Is the predicted probability that sample i belongs to class c. The classification effect of each category in the multi-domain classifier is not the focus, and the multi-domain classifier is not used for detection after training is finished. The multi-domain classifier is used for assisting the two-domain classifier to complete training and obtain an accurate two-class detection result.

Step 5-4: the final objective function L is

L＝L ₁ +L ₂

And 6: after the training of the fully-connected neural network is completed, the amplitude value of the CSI data in the new environment to be classified is directly input into a trained feature extractor, the feature extractor can extract the falling action information and the environment information in the new environment data into a feature vector, the information related to the environment is gradually separated into the lower half part of a feature vector layer, and the required information related to the falling action and the non-falling action is stored in the upper half part of the feature vector layer; and inputting the upper half part of the feature vector for reserving the relevant information of the falling action into a trained falling and non-falling detection two classifier to finish falling detection.

In a verification experiment, 12 different environment data are used as source domain data, and the data of 2 new environments are input into a trained model, so that the average falling detection accuracy rate in the new environments is more than 87%, the accuracy is more than 86%, and the recall rate is more than 88%.

Claims

1. A fall detection method based on sample generation and feature separation, comprising the steps of:

1) And (3) deployment of fall detection environments: deploying a WiFi transmitter and a WiFi receiver in a detection environment, and covering a WiFi signal in the whole detection area;

2) A CSI data acquisition step: setting the action types, wherein the action types comprise c actions, namely falling actions and non-falling actions; setting environment types, wherein each user repeatedly executes c actions in each environment, and each action is executed for multiple times; collecting Channel State Information (CSI) data in a detection area when actions are executed each time, and extracting a subcarrier amplitude value from the CSI data corresponding to one execution action as an action sample;

4) Performing noise addition reconstruction by using an action sample of source domain data of which the action label is fallen in the source domain data set to generate a falling action virtual sample, generating t × N falling action virtual samples, and generating virtual falling data with the same format as the source domain data by using the falling action virtual sample;

5) Inputting source domain data and virtual falling data into a feature extractor for feature separation training, and then respectively inputting separation features output by feature extraction into a falling and non-falling detection two-domain classifier and a multi-domain classifier for detection training, wherein the method specifically comprises the following steps:

The feature separator to separate domain information related to an environment into feature vectors

The lower half of (1), storing the motion information in the feature vector

The upper half of the training system is used as a target to finish training;

5-2) feature vector

Divided into upper parts

And the lower half

Layer of feature vectors

Upper half of (2)

Upper half of (2)

Finishing training by taking the information related to falling and non-falling actions as a target and performing two-classification judgment of the falling and non-falling actions; multi-domain classifier training for receiving feature vector layers

The lower half of (a) to preserve context-dependentThe domain information of the training system is taken as a target to complete training;

2. The method as claimed in claim 1, wherein the virtual sample of the fall action in step 4) is generated by:

4-1) grouping and dividing the motion sample of the source domain data with the falling motion label in the source domain data set according to the logarithm of the WiFi transceiving antenna pair;

4-2) adding noise to the motion samples subjected to grouping and segmentation by using a combined self-encoder to reconstruct, and generating falling virtual samples; the combined self-encoder consists of generating encoders and generating decoders, wherein the number of the generating encoders is the same as that of the antenna pairs, and the number of the generating decoders is the same as that of the antenna pairs; in the reconstruction process, each generation coder performs feature extraction on one group in the corresponding action sample to obtain a group of intermediate layer vectors;

4-3) reconstructing an intermediate layer vector by using a generating decoder, and obtaining the reconstructed virtual data with the same number as the antenna pairs after reconstructing all the intermediate layer vectors;

4-4) splicing the reconstructed virtual data with the same number as the antenna pairs into a virtual sample of the falling action.

3. The method of claim 2, wherein the generative decoder is a neural network of 4 fully-connected layers, as opposed to the generative encoder.

4. The method as claimed in claim 1, wherein the step 4-1) of grouping and dividing motion samples of one source domain data with a fall in the source domain data set according to the logarithm of WiFi transceiving antenna pairs is specifically represented as:

wherein, the first and the second end of the pipe are connected with each other,

source domain data for action tags falling, c ₁ An action tag for a fall, a _i Representing the ith antenna pair, the total logarithm of the WiFi transmitting antenna and the WiFi receiving antenna is n.

5. The method as claimed in claim 1, wherein the c actions set in step 2) specifically include 1 falling action and c-1 non-falling actions; setting the environment types, wherein t environments are set; within each environment, each user repeatedly performs c actions, each action being performed N times;

and 4) generating t × N falling motion virtual samples.

6. The method of claim 1, wherein the feature extractor adopts a structure consisting of 5 convolutional blocks and a fully connected network, each convolutional block comprising 1 convolutional layer and 1 pooling layer; feature vector

Is (m x 2, 1), wherein the value of the dimension parameter m is dynamically adjusted according to the number of antenna pairs,

has the dimension of (m, 1),

has a dimension of (m, 1).

7. The method of claim 1, wherein the fall and non-fall detection bi-classifier is formed by a 2-layer fully-connected layer neural network, and the domain classifier is formed by a 2-layer fully-connected layer neural network.