CN108520199B

CN108520199B - Human body action open set identification method based on radar image and generation countermeasure model

Info

Publication number: CN108520199B
Application number: CN201810177104.8A
Authority: CN
Inventors: 汪清; 郎玥; 侯春萍; 杨阳; 管岱; 黄丹阳
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-03-04
Filing date: 2018-03-04
Publication date: 2022-04-08
Anticipated expiration: 2038-03-04
Also published as: CN108520199A

Abstract

The invention relates to the technical field of radar and the field of human body action recognition, and aims to provide a human body action open set recognition method based on a radar image and a generated countermeasure model, which directly distinguishes the known or unknown action types of an input image and outputs the type information of the input image so as to realize end-to-end open set recognition of human body actions. Therefore, the technical scheme adopted by the invention is that the human body action open set identification method based on the radar image and the generated countermeasure model utilizes the characteristic that the micro Doppler image of the radar can reflect the micro motion of the human body, and simultaneously adopts a discriminator in the generated countermeasure model as an open set identifier to directly distinguish the known or unknown action type of the input image and output the type information of the known or unknown action type so as to realize the end-to-end open set identification of the human body action. The invention is mainly applied to the technical field of radars and human body action recognition occasions.

Description

Human body action open set identification method based on radar image and generation countermeasure model

Technical Field

The invention relates to the technical field of radar and the field of human body action recognition, in particular to an open set action recognition method based on a generated confrontation model.

Background

In recent decades, human motion recognition has attracted a wide range of attention in many fields. Motion recognition is considered a topic with broad application prospects due to its increasing demand in entertainment, medical monitoring, security, emergency rescue and other application areas. Human motion recognition has in the past relied primarily on visual sensor data and has achieved a number of results with the aid of computer vision. Thereafter, research in this field is advanced again by the depth sensor. Sensors such as "Kinect" provide researchers with a simple way to obtain depth information. However, these sensors are very susceptible to environmental factors such as light, shading, weather, etc., which are difficult to avoid in practical applications, and thus the robustness of these sensors in different application scenarios is not strong.

The radar can ignore the influence of environmental factors such as weather and the like, and can work in all weather, so the radar becomes a new common sensor for human body action recognition. The "micro-doppler effect" of radar-received echoes refers to micro-doppler shifts caused by certain micro-motions (e.g., hands, feet, limbs) during the sustained movement of the target. Such features may be reflected on the spectrogram after the radar signal is visualized. Currently, there are related researches based on micro-doppler images, for example, manually extracting features (such as trunk frequency, total signal bandwidth, period, etc.) from a radar image, and then classifying the radar image based on the features by using a Support Vector Machine (SVM), k-Nearest Neighbor (kNN), etc. Compared with a method which needs to manually extract features, a Convolutional Neural Network (CNN) has good nonlinear mapping capability and can autonomously extract implicit features in an image, and therefore, the method is widely applied.

The current problem of human body motion recognition is based on closed set data, i.e. the test set data is from the same source and contains the same classes as the training set data. However, in a real environment, human body actions are varied, and it is obviously difficult to construct a data set and label the actions one by one. Even if a certain type of action can be defined in a fixed scene, there is a great difference in the expression of the same action by different people. Therefore, motion recognition in a real-world environment should be viewed as an open set recognition problem, where the data set contains known classes and unknown classes, and solving such problem requires providing a model that can automatically distinguish the unknown classes from the known classes.

At present, some open Set identification methods are proposed, and W.Scheirer et al propose a '1-vs-Set' learning machine, which becomes one of pioneering researches in the open Set identification field. Then, the CAP method (Compact adaptive Probability, CAP) is proposed and combined with the statistical Extreme Value Theory (EVT) to form a Weibull-calibrated support vector Machine (W-SVM), and experiments prove that the method is improved in recognition effect compared with a 1-vs-Set Machine. Abhijit Bendale and Terrance Boult expand the Nearest Class Mean algorithm (Nearest Class Mean type algorithms, NCM) into the Nearest Non-Outlier algorithm (Nearest Non-Outlier, NNO), and the method can balance the relationship between the identification accuracy and the degree of diversity. An open set recognition method utilizing deep learning also becomes an emerging direction, and Abhijit Bendale and Terranece Boult propose a novel layer structure, namely an OpenMax layer, but the method needs to be assisted by a pre-training model, so that the usability of the method on other recognition tasks is not strong.

In summary, existing open-set methods rely on a reasonable choice of probability threshold, and therefore these methods lack robustness among other tasks. In addition, in consideration of the disadvantages of the sensor and the advantages of the radar, the problem of open-set identification of human body actions by using radar micro-doppler images is to be solved.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a human body action open set identification method based on radar images and a generated countermeasure model, which directly distinguishes the known or unknown action types of the input images and outputs the type information of the known or unknown action types so as to realize end-to-end open set identification of human body actions. Therefore, the technical scheme adopted by the invention is that the human body action open set identification method based on the radar image and the generated countermeasure model utilizes the characteristic that the micro Doppler image of the radar can reflect the micro motion of the human body, and simultaneously adopts a discriminator in the generated countermeasure model as an open set identifier to directly distinguish the known or unknown action type of the input image and output the type information of the known or unknown action type so as to realize the end-to-end open set identification of the human body action.

The method comprises the following specific steps:

the method comprises the following steps: sending and receiving a human body echo signal by using an ultra-wideband radar module, preprocessing the data after acquiring the data, performing short-time Fourier transform, noise cancellation and other operations on the echo signal, and determining a useful signal interval;

step two: further eliminating noise interference by using a method for setting a threshold value, and only displaying points with the echo intensity greater than the threshold value in the radar micro Doppler image;

step three: calibrating the collected data and determining a training set, a verification set and a test set;

step four: establishing a generated countermeasure model GAN (generated adaptive New works) by utilizing a dense Connected network DenseNet structure, and mapping the output probability of the originally generated countermeasure network to the probability of each category by adopting a softmax function at the output end of a discriminator of the model;

step five: and training the generated countermeasure model in the fourth step by using the training set data determined in the third step, taking the weight of the discriminator to test the test set data after the training is finished, and verifying the model open set identification effect.

Further, the ultra-wideband radar used in the first step is a pulseon 440 radar module, the radar has a working frequency of 3.1GHz to 4.8GHz, two directional antennas are used for receiving human body echo signals during data acquisition, data are acquired in an indoor environment, seven typical human body actions are acquired, and the seven selected human body actions are respectively: walking, boxing, crawling on the ground, sneaking, standing, forward standing and jumping and running, wherein each action is repeated three times by each testee, and the acquisition time is 7 seconds each time.

The short-time Fourier transform is realized by windowing a non-stationary process into a series of short-time stationary signals, and Fourier transform is performed on the signals in the window to obtain a time-varying frequency spectrum of the signals, wherein the short-time Fourier transform is a formula:

wherein τ is the length of a time window, ω is the angular frequency, t is the time, j is an imaginary number, e is a natural constant, G (t) is a window function, f (t) is the collected echo signal of the human body, G_f(. to) is a transformed time-varying spectrum.

The noise cancellation adopts a mean background cancellation method, namely, the column vector of the echo intensity average value is subtracted from the whole echo signal;

the method for determining the useful signal interval is to determine the interval with human body motion through the time-distance image of the signal and then reasonably set the time starting point and the time ending point for time-frequency conversion.

Specifically, the intensity threshold value set in the second step is not displayed, and noise is filtered by adopting a segmented threshold value method in a manual selection mode.

Specifically, in the third step, the radar micro-Doppler image is calibrated, seven actions of walking, boxing, ground crawling, sneaking, standing, forward standing and jumping and running are marked in sequence by using numbers from 0 to 6, and then the images generated in the first step and the second step are divided into a training set, a verification set and a test set according to the ratio of 4:2: 1.

Specifically, the dense connection network in step four includes two parts, namely a "connection block" and a "transition layer", specifically:

each connecting block structure consists of two convolution layers and a connecting operation layer, the connecting block structure connects the characteristics of each layer before the layer as the input of the layer, each convolution layer is followed by a batch normalization operation BN (batch normalization) and a modified Linear unit ReLU (modified Linear units) or a leaked Linear modified unit Leaky ReLU, the expressions of ReLU and Leaky ReLU are respectively as follows:

wherein p is the input to the cell;

the transition layer represents the part between two connection block structures, and in the generator for generating the countermeasure model, the transition layer consists of a convolution layer and an anti-convolution layer; in the discriminator, the transition layer is composed of a convolution layer and a mean pooling layer.

The generated countermeasure model described in the fourth step is composed of a generator and a discriminator, the generator randomly samples from a potential space as input, the output result of the generator needs to imitate a real sample in a training set as much as possible, the input of the discriminator is the output of the real sample or a generated network, the purpose is to distinguish the output of the generator from the real sample as much as possible, the generator deceives the discriminator as much as possible, the two networks resist each other and continuously adjust the network weight of each layer, the final purpose is to make the discriminator unable to judge whether the output result of the generator is real, and the objective function V (D, G) of the generated countermeasure network is expressed as follows:

wherein G denotes a generator, D denotes a discriminator, x denotes an input sample, z denotes a random variable of the input, min (-) denotes a minimization operation, max (-) denotes a maximization operation, log (-) is a logarithm operation with base 10, E (-) denotes expectation, pdata (x) denotes a data distribution obeying a real sample, pz (z) denotes a data distribution obeying a random distribution, and the output part of the discriminator employs a softmax function, which essentially compresses an arbitrary real vector of one K-dimension into a real vector of another K-dimension, where each element in the vector takes on values between (0,1), the softmax function being in the form:

in the formula, z_jDenotes the jth element, z_kDenotes the kth element, e is a natural constant, σ (z)_jA softmax value representing the jth element;

in this way, the output of the discriminator can be understood as the probability of the input image in each motion category, and the one with the highest probability is the category of the input image determined by the discriminator.

In the network training process, adaptive Moment estimation adam (adaptive motion estimation) is adopted to optimize network weight, and a gradient penalty strategy is also adopted, namely penalty items are added into an objective function

Wherein λ is 10, the ratio of the total of the two,

alpha is at

A random variable between the number of bits in the random variable to 1,

representing false samples generated by the generator, x representing true samples,

representing the gradient, E (-) represents the expectation, the objective function V (D, G) is:

the effect of a model is evaluated by a common index 'F-measure-Openness curve' in open set identification, wherein the F-measure is defined as follows:

wherein the content of the first and second substances,

TP represents positive samples predicted by the model, TN represents negative samples predicted by the model, FP represents negative samples predicted by the model, and FN represents positive samples predicted by the model.

The invention has the characteristics and beneficial effects that:

according to the invention, the radar micro-Doppler image is used for identifying the human body action, so that the defect that other sensors are easily influenced by the environment can be avoided, and the micro-motion capturing capability is strong; the invention solves the problem of end-to-end identification of unknown actions, namely the problem of open set identification of human actions by using the characteristic of generating a confrontation model, has low algorithm complexity and has certain application value.

Description of the drawings:

FIG. 1 is a block diagram of a human body motion open set identification method based on radar micro Doppler images;

FIG. 2 is a time-distance image of a radar echo signal;

FIG. 3 is an example of micro-Doppler images for various motions;

FIG. 4 is a schematic diagram of a generator structure in a generation pairing antibody model;

FIG. 5 is a schematic diagram of a structure of a discriminator in a generation pairwise reactance model;

FIG. 6 is a graph comparing the results of the experiment of the present invention with those of other methods.

Detailed Description

In order to solve the problems, the invention provides a human body action open set identification method based on a radar image and a generated countermeasure model.

In order to achieve the purpose, the human body action open set identification method based on the radar image and the generated confrontation model comprises the following steps:

the method comprises the following steps: the ultra-wideband radar module is used for sending and receiving human body echo signals, preprocessing the data after the data are collected, and performing short-time Fourier transform, noise cancellation and other operations on the echo signals to determine a useful signal interval.

Step two: and further eliminating noise interference by using a method for setting a threshold value, and only displaying points with the echo intensity larger than the threshold value in the radar micro Doppler image.

Step three: and calibrating the acquired data and determining a training set, a verification set and a test set.

Step four: the method comprises the steps of establishing a generated countermeasure model (GAN) by utilizing a dense Connected probabilistic network (DenseNet) structure, and mapping output probability of the originally generated countermeasure network to probability of each category by adopting a softmax function at an output end of a discriminator of the model.

Specifically, the ultra-wideband radar used in the first step is a pulseon 440 radar module, the working frequency of the radar is 3.1GHz to 4.8GHz, and two directional antennas are adopted to receive the human body echo signals during data acquisition. Data were collected in an indoor environment. Seven typical human body actions are collected in the experiment, and the seven selected human body actions are respectively as follows: walking, boxing, crawling on the ground, sneaking, standing, forward standing and jumping, and running. Each action is repeated three times by each subject, and the acquisition time is about 7 seconds.

The short-time Fourier transform is implemented by windowing the signals in a time dimension and performing Fourier transform on the signals in the window to obtain the time-varying frequency spectrum of the signals. The formula for the short-time fourier transform can be written as:

where τ is the time window length, ω is the angular frequency, and t is timeJ is an imaginary number, e is a natural constant, G (t) is a window function, f (t) is an acquired human echo signal, G_f(. to) is a transformed time-varying spectrum.

The noise cancellation adopts a mean background cancellation method, namely, the column vector of the echo intensity average value is subtracted from the whole echo signal.

Specifically, the setting of the non-displayed intensity threshold in the second step depends on a manual selection mode. In the data acquisition process, because some human body motions are from far to near, the echo intensity tends to become larger gradually, and if a uniform threshold value is adopted, the noise filtering of short-distance echo signals is insufficient or the noise filtering of long-distance echo signals is excessive, so that the detail information of the human body echo signals is lost. Therefore, the invention takes the influence of the distance into consideration and adopts a sectional threshold value method to filter out noise.

Specifically, the dense connection network in step four includes two parts, namely a "connection block" and a "transition layer", and the two components are described below.

Each connecting block structure is composed of two convolution layers and a connecting operation layer, and the connecting block structure can be used for connecting the characteristics of each layer before the layer as the input of the layer. Each convolutional layer is followed by a Batch Normalization (BN) and a modified Linear unit (ReLU) or Leaky Linear modified unit (leak ReLU). The expressions for ReLU and leakage ReLU are expressed as follows:

where p is the input to the cell.

The transition layer represents the portion between two connected block structures. In a generator for generating a countermeasure model, a transition layer is composed of a convolution layer and an anti-convolution layer; in the discriminator, the transition layer is composed of a convolution layer and a mean pooling layer.

The generative confrontation model described in step four consists of a Generator (Generator) and a Discriminator (Discriminator). The generator takes random samples from the latent space (latency) as input, and its output needs to mimic as much as possible the real samples in the training set. The input to the discriminator is the real sample or the output of the generation network, the purpose of which is to distinguish the output of the generator from the real sample as much as possible. The generator should fool the arbiter as much as possible. The two networks resist each other and continuously adjust the weight of each layer of network, and the final purpose is to make the discriminator unable to judge whether the output result of the generator is real or not. The objective function V (D, G) of the generative countermeasure network may be expressed as follows:

wherein G denotes a generator, D denotes a discriminator, x denotes an input sample, z denotes a random variable of the input, min (-) denotes a minimization operation, max (-) denotes a maximization operation, log (-) is a base-10 logarithm operation, E (-) denotes expectation, pdata (x) denotes a data distribution obeying a real sample, and pz (z) denotes a data distribution obeying a random distribution. In addition, the output part of the discriminator adopts a softmax function, and the essence of the function is to compress (map) any real number vector of one K-dimension into a real number vector of another K-dimension, wherein each element in the vector takes a value between (0, 1). The softmax function is of the form:

in the formula, z_jDenotes the jth element, z_kDenotes the kth element, e is a natural constant, σ (z)_jDenotes the softmax value of the jth element.

In the network training process described in the fifth step, a first-order optimization algorithm, namely Adaptive Moment Estimation (Adam), is adopted to optimize the network weights, and the Adam method pays attention to the selection of step sizes during updating and dynamically adjusts the learning rate of each weight. In addition, in order to prevent the problem of gradient disappearance, the invention also adopts a gradient penalty strategy, namely, a penalty item is added into the objective function

Wherein λ is 10, the ratio of the total of the two,

alpha is at

A random variable between the number of bits in the random variable to 1,

indicating gradient and E (-) indicating expectation. The objective function V (D, G) is then: . The objective function V (D, G) of the present invention is then:

the invention adopts a common index 'F-measure-Openness curve' evaluation model effect in open set identification, wherein the F-measure is defined as follows:

wherein the content of the first and second substances,

The invention provides a human body action open set identification method based on radar images and a generated confrontation model, which comprises five steps as shown in figure 1. The invention is further explained below with reference to the figures and examples.

Firstly, human motion data is collected.

The invention utilizes an ultra-wideband radar Pulse ON 440 module to send and receive human body echo signals, the radar working Frequency is 3.1GHz to 4.8GHz, the sampling Frequency is 16GHz, the Pulse Repetition Frequency (PRF) is 368Hz, the Coherent Pulse accumulation (CPI) is 0.2 s, and two directional antennas with the height of about 1.2m are adopted to receive the human body echo signals during data acquisition.

In the experiment, four human subjects make seven typical human body motions in the sight line direction of the radar, and the seven selected human body motions are respectively as follows: walking, boxing, crawling on the ground, sneaking, standing, forward standing and jumping, and running. Each action is repeated three times by each subject, and the acquisition time is about 7 seconds. After the raw echo data is obtained, its corresponding time-distance image is made (as shown in figure 2),the interval with human motion is determined through the image, and then the time starting point and the time ending point for time-frequency transformation are reasonably set. Then, a mean background cancellation method is adopted, echo signals are regarded as two-dimensional matrixes, mean values of all the line data are calculated respectively and recorded as m_iWhere i represents the ith row data, the row vector M' of the average value of the echo intensities is obtained₁,m₂,,m_n]^T. It is extended to an n × n mean matrix:

and subtracting corresponding elements of the original data matrix and the mean value matrix to obtain a signal matrix after cancellation.

Then, a Short-time Fourier Transform (STFT) is performed on the signal matrix. The short-time Fourier transform is realized by windowing on a time dimension by regarding a non-stationary process as superposition of a series of short-time stationary signals and performing Fourier transform on the signals in the window to obtain a time-varying frequency spectrum of the signals. The formula for the short-time fourier transform can be written as:

wherein τ is the length of a time window, ω is the angular frequency, t is the time, j is an imaginary number, e is a natural constant, G (t) is a window function, f (t) is the collected echo signal of the human body, G_f(. to) is a transformed time-varying spectrum. The time window length of the short-time Fourier transform adopted by the invention is 0.1 second, the overlapping rate of the two time windows is 0.9, and the number of Fourier transform points in each window is 1024.

And secondly, setting an intensity threshold value to be displayed.

And further eliminating noise interference by using a method for setting a threshold value, and only displaying points with the echo intensity larger than the threshold value in the radar micro Doppler image.

The method needs to set the undisplayed intensity threshold value in a manual selection mode when the radar micro Doppler image is generated. The movement range of the testee is within the range of 1.2 meters to 5.4 meters away from the radar in the data acquisition process, and the echo intensity inevitably tends to be gradually increased because some human body movement is from far to near. If a uniform threshold value is adopted, the problems that the noise of the short-distance echo signal is not sufficiently filtered, or the noise of the long-distance echo signal is excessively filtered, so that the detail information of the human body echo signal is lost can be caused.

Therefore, the invention takes the influence of the distance into consideration and adopts a sectional threshold value method to filter out noise. Assuming that the maximum value of the signal strength of a certain segment is Max, the minimum strength to be displayed in each distance range is shown in table 1.

TABLE 1

Distance between two adjacent plates	Minimum intensity value
		1.2-2 m	Max-90
2-3.2 m	Max-80
		3.2-4.5 m	Max-70
4.5-5.4 m	Max-60

And thirdly, constructing a data set.

Through the first two steps of operation, 700 images can be obtained for each action, and a schematic diagram of each action radar micro-Doppler image is shown in FIG. 3. Calibrating the generated radar micro Doppler image, representing seven actions of walking, boxing, ground crawling, sneaking, standing, forward standing jumping and running by numbers from 0 to 6 respectively, and then dividing the image of each action into a training set, a verification set and a test set according to the proportion of 4:2: 1. Thus, for each action, a training set of 400 sheets, a validation set of 200 sheets, and a test set of 100 sheets may be obtained.

For the open set identification problem, a known class and an unknown class are defined, and the class which is not contained in the training set in the test set becomes the unknown class and is marked as U; the classes in both the test set and the training set are called "known classes," denoted as K. The invention verifies the effect of the model when the opening degree of the data set is different.

And fourthly, constructing and generating a confrontation model.

The generation countermeasure model adopted by the invention consists of a Generator (Generator) and a Discriminator (Discriminator). The generator randomly samples a variable z from a latent space (latency) as an input, and the output result needs to imitate the real sample in the training set as much as possible. The input to the discriminator is the real sample or the output of the generation network, the purpose of which is to distinguish the output of the generator from the real sample as much as possible. The generator should fool the arbiter as much as possible. The two networks resist each other and continuously adjust the weight of each layer of network, and the final purpose is to make the discriminator unable to judge whether the output result of the generator is real or not. The objective function V (D, G) of the generative confrontation model may be expressed as follows:

wherein G represents a generator, D represents a discriminator, x represents an input sample, z represents a random variable of the input, min (-) represents a minimization operation, max (-) represents a maximization operation, log (-) is a logarithm operation with a base 10, E (-) represents an expectation, P (-) represents a minimum of the input sample, and_data(x) Representing data distribution obeying real samples, P_z(z) denotes a data distribution subject to a random distribution.

Since the discriminator itself is a two-classifier that can determine whether the input image is "true" or "false", the discriminator can also realize the function of determining whether the input image is "known" or "unknown" in the open set identification problem. In addition, in order to enable the discriminator to simultaneously realize the function of the classifier, the output part of the classifier is changed into a softmax function, and the output is changed into the probability of the input image on each class. The essence is to compress (map) an arbitrary real vector of one dimension K into a real vector of another dimension K, where each element in the vector takes on a value between (0, 1). The softmax function is of the form:

The structure of the generator and the discriminator adopts a Dense connection network, the Dense connection network comprises a connection Block (sense Block) and a Transition Layer (Transition Layer), and the two components are respectively described below.

where p is the input to the cell.

The transition layer represents the portion between two connected block structures. In a generator for generating a countermeasure model, a transition layer is composed of a convolution layer and an anti-convolution layer; in the discriminator, the transition layer is composed of a convolution layer and a mean pooling layer. Fig. 4 and 5 show the structures of the generator and the discriminator, respectively. Specific parameters in the model are listed in table 2, where "n × n deconv" denotes an deconvolution layer with a convolution kernel size of n × n, "n × n conv" denotes a convolution layer with a convolution kernel size of n × n, "Padding" denotes the number of filled pixels around a picture, and "Pooling" denotes a mean Pooling operation.

TABLE 2 specific parameters of each layer of the generative confrontation model

And fifthly, training and testing the model.

In order to generate the common gradient disappearance problem in the training process of the confrontation model, the invention adopts a gradient punishment strategy in the training process, namely, a punishment item is added into the objective function

Wherein λ is 10, the ratio of the total of the two,

alpha is

A random variable between the number of bits in the random variable to 1,

indicating gradient and E (-) indicating expectation. The objective function V (D, G) is then: .

The objective function V (D, G) of the present invention is then:

an optimizer in the network training process adopts a first-order optimization algorithm, namely Adaptive Moment Estimation (Adam), to adjust network weights, and the Adam method can pay attention to the selection of step sizes and dynamically adjust the learning rate of each weight during updating.

The invention adopts the commonly used 'F-measure-Openness curve' in open set identification to evaluate the model effect, and the F-measure is defined as follows:

wherein the content of the first and second substances,

TP represents positive samples predicted by the model, TN represents negative samples predicted by the model, FP represents negative samples predicted by the model, and FN represents positive samples predicted by the model. The F-measure value range is between (0,1), and the larger the value is, the better the open set identification algorithm effect is.

Openness is used to express the degree of open sets in an open set identification problem, which is defined as:

wherein N is_TARepresenting in a training setNumber of classes, N_TGIndicating the number of classes to be identified, N_TEIndicating the number of categories in the test set.

The experimental result shows that compared with other common open set identification algorithms, the performance of the method can be improved by about ten percent, and the experimental result is shown in fig. 6.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A human body action open set identification method based on radar images and a generated countermeasure model is characterized in that micro Doppler images of radar can reflect the characteristics of human body micromotion, meanwhile, a discriminator in the generated countermeasure model is used as an open set identifier, the known or unknown action types of input images are directly distinguished, and the type information of the input images is output, so that end-to-end open set identification of human body actions is realized; the method comprises the following specific steps:

the method comprises the following steps: sending and receiving a human body echo signal by using an ultra-wideband radar module, preprocessing the data after acquiring the data, and performing short-time Fourier transform and noise cancellation on the echo signal to determine a useful signal interval;

step four: establishing and generating a countermeasure model GAN (generic adaptive network) by utilizing a dense Connected network DenseNet (Densey Connected probabilistic network) structure, and mapping the output probability of the originally generated countermeasure network to the probability of each class by adopting a softmax function at the output end of a discriminator of the model;

step five: training the confrontation model generated in the fourth step by using the training set data determined in the third step, taking the weight of the discriminator to test the test set data after the training is finished, and verifying the open set identification effect of the model;

wherein, the dense connection network in the fourth step includes two parts of a "connection block" and a "transition layer", specifically: each connecting block structure consists of two convolution layers and a connecting operation layer, the connecting block structure connects the characteristics of each layer before the layer as the input of the layer, each convolution layer is followed by a batch normalization operation BN (batch normalization) and a modified Linear unit ReLU (modified Linear units) or a leaked Linear modified unit Leaky ReLU, the expressions of ReLU and Leaky ReLU are respectively as follows:

wherein p is the input to the cell;

2. The method for recognizing the human body motion open set based on the radar image and the generated countermeasure model according to claim 1, wherein the ultra-wideband radar used in the first step is a pulseon 440 radar module, the radar operating frequency is 3.1GHz to 4.8GHz, two directional antennas are used for receiving the human body echo signals during data acquisition, the data are acquired in an indoor environment, seven typical human body motions are acquired, and the seven selected human body motions are respectively: walking, boxing, crawling on the ground, sneaking, standing, forward standing and jumping and running, wherein each action is repeated three times by each testee, and the acquisition time is 7 seconds each time.

3. The method of claim 1, wherein the short-time fourier transform is a superposition of a series of short-time stationary signals, the non-stationary process is regarded as a non-stationary process, the short-time property is realized by windowing in the time dimension, and fourier transform is performed on the signals in the window to obtain a time-varying frequency spectrum of the signals, and the formula of the short-time fourier transform is as follows:

4. The method according to claim 1, wherein the noise cancellation is a mean background cancellation method, that is, the whole echo signal is subtracted by the echo intensity average value column vector; the method for determining the useful signal interval is to determine the interval with human body motion through the time-distance image of the signal and then reasonably set the time starting point and the time ending point for time-frequency conversion.

5. The method as claimed in claim 1, wherein the intensity threshold value set in step two is selected manually, and the noise is filtered by using a segmented threshold value.

6. The method for recognizing the action open set of the human body based on the radar image and the generated countermeasure model according to claim 1, wherein specifically, the calibration for generating the radar micro-Doppler image in the third step is sequentially marked with numbers from "0" to "6" for seven actions of walking, boxing, ground crawling, sneaking, standing, forward standing jump and running, and then the images generated in the first step and the second step are divided into a training set, a verification set and a test set according to a ratio of 4:2: 1.

7. The method as claimed in claim 1, wherein the generated confrontation model in step four is composed of a generator and a discriminator, the generator randomly samples from the potential space as input, the output result of the generator needs to imitate the real sample in the training set as much as possible, the input of the discriminator is the real sample or the output of the generated network, the purpose is to distinguish the output of the generator from the real sample as much as possible, the generator cheats the discriminator as much as possible, the two networks confront each other and continuously adjust the network weights of each layer, the final purpose is to make the discriminator unable to judge whether the output result of the generator is real, and the objective function V (D, G) of the generated confrontation network is expressed as follows:

wherein G represents a generator, D represents a discriminator, x represents an input sample, z represents a random variable of the input, min (-) represents a minimization operation, max (-) represents a maximization operation, log (-) is a logarithm operation with a base 10, E (-) represents an expectation, P (-) represents a minimum of the input sample, and_data(x) Representing data distribution obeying real samples, P_z(z) represents the distribution of data subject to random distribution, and the output part of the discriminator uses a softmax function, which is essentially to compress any real vector in one K-dimension into a real vector in another K-dimension, where each element in the vector takes on a value between (0,1), and the softmax function is of the form:

in this way, the output of the discriminator can be understood as the probability of the input image on each action category, and the highest probability is the category of the input image judged by the discriminator;

Wherein λ is 10, the ratio of the total of the two,

alpha is at

A random variable between the number of bits in the random variable to 1,

wherein the content of the first and second substances,