CN117079416A

CN117079416A - Multi-person 5D radar falling detection method and system based on artificial intelligence algorithm

Info

Publication number: CN117079416A
Application number: CN202311335592.8A
Authority: CN
Inventors: 杨绍分; 袁文忠
Original assignee: Dexin Intelligent Technology Changzhou Co ltd
Current assignee: Dexin Intelligent Technology Changzhou Co ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-11-17
Anticipated expiration: 2043-10-16
Also published as: CN117079416B

Abstract

The application provides a multi-person 5D radar falling detection method and system based on an artificial intelligence algorithm, and relates to a falling detection technology, wherein the method comprises the steps of acquiring a plurality of radar echo information of a plurality of target objects in a target area based on a millimeter wave radar, and extracting a plurality of motion features corresponding to the plurality of radar echo information through a preset echo feature extraction model; acquiring video information of a plurality of target objects in the target area based on an image sensor, determining target areas corresponding to the plurality of target objects in each frame of video information based on a target detector, matching the target areas of each target object with motion characteristics, and determining comprehensive characteristics corresponding to each target object; and determining the action type of the target object through a preset action classification model according to the comprehensive characteristics corresponding to each target object, and sending out alarm information when the action type is matched with the falling action.

Description

Multi-person 5D radar falling detection method and system based on artificial intelligence algorithm

Technical Field

The application relates to a fall detection technology, in particular to a method and a system for detecting fall of a multi-person 5D radar based on an artificial intelligence algorithm.

Background

At present, research on fall risk assessment of the elderly focuses on gait analysis and posture detection, and a large number of risk assessment models are obtained. However, most risk assessment models are constructed without daily life data in a real scene, so that the physical state of the old cannot be effectively assessed, and the risk assessment models are not effective. Furthermore, most studies use a single device to perceive the physical state of the elderly, and noise and occlusion problems are difficulties that cannot be overcome by a single perception assessment model. Meanwhile, the invasiveness of the sensing equipment brings the problems of subconscious resistance and data credibility to the old.

Disclosure of Invention

The embodiment of the application provides a multi-person 5D radar falling detection method and system based on an artificial intelligence algorithm, which at least can solve part of problems in the prior art.

In a first aspect of an embodiment of the present application,

the utility model provides a multi-person 5D radar fall detection method based on artificial intelligence algorithm, which comprises the following steps:

acquiring multiple radar echo information of multiple target objects in a target area based on a millimeter wave radar, and extracting multiple motion features corresponding to the multiple radar echo information through a preset echo feature extraction model, wherein the echo feature extraction model transmits the output of each layer to the next layer through the connection coefficient of the adjacent layer and combining a nonlinear compression function;

acquiring video information of a plurality of target objects in the target area based on an image sensor, determining target areas corresponding to the plurality of target objects in each frame of video information based on a target detector, matching the target areas of each target object with motion characteristics, and determining comprehensive characteristics corresponding to each target object;

and determining the action type of the target object through a preset action classification model according to the comprehensive characteristics corresponding to each target object, and sending out alarm information when the action type is matched with the falling action, wherein the action classification model is constructed based on a combined network model, and the action classification is carried out by distributing corresponding weights for different network models.

In an alternative embodiment of the present application,

acquiring a plurality of radar echo information of a plurality of target objects in a target area based on a millimeter wave radar, and extracting a plurality of motion features corresponding to the plurality of radar echo information through a preset echo feature extraction model comprises:

multiplying the radar echo information with a weight matrix corresponding to each layer in the echo feature extraction model to obtain low-level features;

according to the low-level characteristics, combining the connection coefficients of the current layer and the adjacent next layer to determine the high-level characteristics of the adjacent next layer;

and based on the high-level features, carrying out space dimension mapping on the high-level features by combining a nonlinear compression function of the echo feature extraction model, and determining a plurality of motion features corresponding to the radar echo information.

In an alternative embodiment of the present application,

determining a plurality of motion features corresponding to the plurality of radar echo information includes:

a plurality of motion characteristics are determined according to the following formula:

；

wherein I is _l 、I _h Representing low-level features and high-level features, respectively, C _lh Representing the connection coefficient of the current layer with the next adjacent layer, W _l Representing a weight matrix, b _l Representing the bias parameter, P _l 、Q _h Respectively represent a low-level learnable matrix parameter and a high-level learnable matrix parameter, S _h Representing motion characteristics, f () represents a nonlinear compression function, e, r representing nonlinearity and sensitivity, respectively.

In an alternative embodiment of the present application,

before determining the target areas corresponding to the plurality of target objects in each frame of video information based on the target detector, the method further comprises training the target detector:

the target detector comprises a convolution layer, a candidate layer, a pooling layer and a full connection layer;

converting a training data set into a feature map through a convolution layer based on the pre-acquired training data set; generating a plurality of anchor frames with different sizes and aspect ratios in each sliding window by the candidate layer, and screening out the anchor frames with the front scores from the plurality of anchor frames as candidate frames by applying non-maximum value inhibition;

the pooling layer maps the candidate frames to the feature images with the same size as the candidate frames to carry out pooling operation and generate feature vectors;

and determining classification loss and boundary box regression loss through a full connection layer according to the feature vector, and iteratively optimizing a classification loss function and a boundary box regression loss function through a back propagation algorithm in combination with an adaptive learning rate until the classification loss and the boundary box regression loss are minimized.

In an alternative embodiment of the present application,

determining the classification loss and the boundary box regression loss through the full connection layer according to the feature vector, and iteratively optimizing the classification loss function and the boundary box regression loss function through a back propagation algorithm in combination with the self-adaptive learning rate comprises the following steps:

；

wherein L represents the sum of the classification loss value and the regression loss value, S represents the learning rate, r represents the loss weight coefficient, g _t Represents the model gradient at time t, θ represents the model parameters of the target detection model, m _t A first moment estimation representing a time t;

L _cls represents the classification loss corresponding to the classification loss function, N represents the sample number of the training data set, y _i 、p _i Respectively representing the actual label corresponding to the ith sample of the training data set and the label prediction probability corresponding to the ith sample of the training data set,

L _reg representing regression loss corresponding to the boundary box regression loss function, T _i Representing the actual regression objective corresponding to the ith sample of the training dataset, smooth () represents the smoothing loss function.

In an alternative embodiment of the present application,

according to the comprehensive characteristics corresponding to each target object, determining the action type of the target object through a preset action classification model comprises the following steps:

the action classification model comprises a space sub-model and a time sub-model, wherein the space sub-model is constructed based on a convolutional neural network, the time sub-model is constructed based on a cyclic neural network,

taking the output of the space sub-model as the input of the time sub-model, applying global average pooling operation to the output of the time sub-model, distributing corresponding time weights to the feature vectors after the global average pooling operation through an adaptive weight distribution algorithm, and inputting the feature vectors after the global average pooling operation and the corresponding time weights into a full connection layer of the time sub-model;

and determining the probability distribution of the action category corresponding to the comprehensive characteristics through the classification activation function of the full connection layer, and taking the highest probability value in the probability distribution of each action category as the final action type.

In an alternative embodiment of the present application,

the method for distributing the corresponding time weight to the feature vector after the global average pooling operation through the self-adaptive weight distribution algorithm comprises the following steps:

wherein S is _T The time-weight is represented by a time-weight,、representing the sigmod function and the ReLu function respectively,the number of feature vectors is represented,and the time weight of the feature vector after the k-1 global average pooling operation is represented, and U represents the feature vector after the global average pooling operation.

In a second aspect of an embodiment of the present application,

provided is an artificial intelligence algorithm-based multi-person 5D radar fall detection system, comprising:

the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for acquiring a plurality of radar echo information of a plurality of target objects in a target area based on a millimeter wave radar, extracting a plurality of motion characteristics corresponding to the plurality of radar echo information through a preset echo characteristic extraction model, and the echo characteristic extraction model transmits the output of each layer to the next layer through the connection coefficient of the adjacent layer and combining a nonlinear compression function;

the second unit is used for acquiring video information of a plurality of target objects in the target area based on the image sensor, determining the target area corresponding to the plurality of target objects in each frame of video information based on the target detector, matching the target area of each target object with the motion characteristics, and determining the comprehensive characteristics corresponding to each target object;

and the third unit is used for determining the action type of the target object through a preset action classification model according to the comprehensive characteristics corresponding to each target object, and sending out alarm information when the action type is matched with the falling action, wherein the action classification model is constructed based on a combined network model, and the action classification is carried out by distributing corresponding weights for different network models.

In a third aspect of an embodiment of the present application,

there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of an embodiment of the present application,

there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

The beneficial effects of the embodiments of the present application may refer to the effects corresponding to technical features in the specific embodiments, and are not described herein.

Drawings

Fig. 1 is a schematic flow chart of a multi-person 5D radar fall detection method based on an artificial intelligence algorithm according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a multi-person 5D radar fall detection system based on an artificial intelligence algorithm according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The technical scheme of the application is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flow chart of a multi-person 5D radar fall detection method based on an artificial intelligence algorithm according to an embodiment of the present application, as shown in fig. 1, the method includes:

s101, acquiring multiple radar echo information of multiple target objects in a target area based on millimeter wave radar, and extracting multiple motion features corresponding to the multiple radar echo information through a preset echo feature extraction model, wherein the echo feature extraction model transmits the output of each layer to the next layer through the connection coefficient of the adjacent layer and combining a nonlinear compression function;

for example, when the millimeter wave radar emits a signal and reflects the signal by the target object, the received signal is radar echo information, and the information contains the position, speed and other characteristics of the target object, which is basic data for analyzing the movement of the target object.

Optionally, the signals received by the millimeter wave radar may be preprocessed, including denoising, signal amplification, and time/frequency domain conversion, to design a deep neural network structure that includes multiple layers, each layer receiving the output of the previous layer and passing the features to the next layer through a connection coefficient (which may be understood as a weight in the network) and a nonlinear compression function (such as a ReLU activation function). A nonlinear compression function is applied after the output of each layer, which may be a ReLU (Rectified Linear Unit) function, or other activation function, to introduce nonlinear properties that enable the model to learn the nonlinear relationship.

The connection coefficients are weight parameters learned during the network training process for adjusting the weights of each layer output, and are optimized during the network training process to enable the model to automatically learn and extract meaningful features. At the last layer of the network, an output layer may be designed for outputting final motion characteristics, which may include information on the speed, acceleration, etc. of the target object.

The millimeter wave radar can provide high-precision target position and speed information, and can extract fine motion characteristics of a plurality of target objects by combining an echo characteristic extraction model, and the characteristics can describe the motion state of the target objects more accurately. By extracting the motion characteristics of multiple target objects, the system can achieve multi-target tracking and analysis, which is important for monitoring the motion behavior of multiple people or multiple objects. The extracted motion features can be used in decision making by an intelligent system, for example in fall detection, where the system can automatically trigger an alarm when abnormal motion features are detected (e.g. rapid descent speed), providing emergency assistance or supervision.

In an alternative embodiment of the present application,

Illustratively, in a neural network, there are connections between neurons of each layer and neurons of the previous layer, which connections correspond to weights, i.e., a matrix representation of the weights. The low-level features are used to indicate that at an early level of the echo feature extraction model, the original features learned by the feature extraction model are typically directly related to the input data (radar echo information). The high-level features are used for indicating abstract features learned by the feature extraction model at the later level of the echo feature extraction model, and the features are combined and extracted from the low-level features and can better represent the motion characteristics of a target object.

Optionally, inputting radar echo information to an input layer of the network, and sequentially calculating the output of each layer through a weight matrix and a nonlinear compression function of each layer; at the early level of the neural network, obtaining low-level features by multiplying input data with a corresponding weight matrix and applying a nonlinear compression function; and each radar echo information is used as one dimension of the vector, nonlinear compression function (such as ReLU activation function) is applied to carry out nonlinear mapping on each element of the output of each layer, an activation value of the layer is obtained, the activation value is used as the input of the next layer, and forward propagation is continued until the final layer of the model is reached.

And in the later level of the neural network, combining the connection coefficients of the current layer and the adjacent next layer, and obtaining high-level characteristics through corresponding nonlinear compression functions. The method comprises the steps of calculating low-level characteristics to obtain an output vector of a current layer, and multiplying the output of the current layer by a connection coefficient matrix of an adjacent next layer to obtain an input vector of the adjacent next layer. The input vector is multiplied by the weight matrix of the next adjacent layer, and then a nonlinear compression function (e.g., reLU) is applied to obtain the output vector of the next adjacent layer. If more layers exist in the network, repeating the steps until the final layer of the network is reached.

The network is trained using appropriate loss functions (e.g., mean square error, cross entropy, etc.) and optimization algorithms (e.g., gradient descent, adam, etc.), optimizing the weight matrix and connection coefficients so that the network can learn the motion characteristics of the target object.

In the forward propagation of the neural network, the output of the last hidden layer or the penultimate hidden layer is obtained through multiple layers of computation, and the output is a high-level characteristic and represents the abstract characteristic learned by the network. And inputting the high-level features into a nonlinear compression function to obtain mapped features, wherein the mapped features are final motion features.

In an alternative embodiment of the present application,

；

By multiplying the radar echo information with a weight matrix, low-level features can be extracted, which typically include basic motion information of the target, such as position and velocity, which enables the system to accurately describe the basic motion state of the target. In combination with the low-level features, the connection coefficients and the weight matrix, the system can determine the high-level features of the next layer adjacent to the low-level features, so that the system can extract high-level motion features of a plurality of targets from a plurality of radar echo information, including complex motion patterns and behaviors. By applying a nonlinear compression function, the system can introduce nonlinear factors on high-level features, which enables the system to learn more complex motion features, and nonlinear mapping increases the rich expressive power of the model on target motion behaviors.

S102, acquiring video information of a plurality of target objects in the target area based on an image sensor, determining target areas corresponding to the plurality of target objects in each frame of video information based on a target detector, matching the target areas of each target object with motion characteristics, and determining comprehensive characteristics corresponding to each target object;

illustratively, the target region information and motion characteristics of the target are combined to form a comprehensive feature vector for more fully describing the target object. Capturing a video frame sequence of a target area by using an image sensor, processing each frame of video by using a target detector, determining the position of each target object in an image to obtain a boundary box of the target area (ROI), and extracting the motion characteristics of each target object in the video sequence by using a motion analysis technology (such as an optical flow method or a frame difference-based method) including speed, direction and the like; the target area information (coordinates of the bounding box) and the motion features (speed, direction, etc.) are combined into a comprehensive feature vector, and the coordinates of the bounding box and the values of the motion features can be spliced to form a feature vector containing spatial and temporal information.

In an alternative embodiment of the present application,

Illustratively, the image is converted into feature maps using a convolution layer, the feature maps containing semantic information of the image; a plurality of anchor boxes of different sizes and aspect ratios are generated at each location on the feature map as candidate boxes for target detection.

Mapping the candidate frames to the feature map, and carrying out pooling operation to generate feature vectors; classifying and carrying out bounding box regression by using the full connection layer according to the feature vector, and calculating classification loss and bounding box regression loss; using a back propagation algorithm, in combination with an adaptive learning rate (e.g., adam optimizer), the classification loss function and the bounding box regression loss function are iteratively optimized until the loss is minimized. NMS (non-maximum suppression) is applied to the predicted result, candidate frames with the front scores are screened out, and the candidate frames with larger overlap are removed, so that the final target detection result is obtained.

Optionally, multiple anchor frames of different sizes and aspect ratios are generated for each pixel point, sliding windows are used on the feature map, the generated anchor frames are applied to each window position, each candidate frame is mapped onto the feature map, and the candidate frames of different sizes are mapped into feature vectors of the same size through an ROI pooling operation.

The input is a Feature Map (Feature Map) and coordinate information (typically, upper left and lower right coordinates) from candidate boxes, and each candidate box is mapped to the Feature Map. The corresponding region is defined on the feature map according to the size of the candidate frame, and a method is generally adopted in which the region is divided into grids with fixed sizes, and then the feature value in each grid is subjected to a pooling operation (such as maximum pooling) to obtain a value. For different size candidate boxes, mapping them onto feature maps of the same size, a fixed output grid size is typically used to maintain consistency of the output features, and after pooling, each candidate box generates a feature vector of fixed length that will be used for subsequent classification and regression tasks.

In an alternative embodiment of the present application,

；

The target detector obtained through training can accurately position a plurality of target objects in each frame of video information and identify the categories of the target objects; by means of the anchor frames with multiple sizes and aspect ratios generated by the candidate layers, effective detection of targets with different sizes and shapes is achieved, and robustness of target detection is improved; the anchor frames with the front scores are screened out by applying a non-maximum suppression (NMS) algorithm, redundant detection frames are removed, each target is ensured to be detected only once, and the detection efficiency is improved; the pooling layer maps the candidate boxes to feature maps of a fixed size and generates feature vectors through a pooling operation. The method retains the characteristic information of the targets, and maps targets with different sizes into the characteristic vectors with the same size, so that the subsequent processing is facilitated. The classification loss and the bounding box regression loss are calculated through the full connection layer, and the loss functions are iteratively optimized by combining a back propagation algorithm and an adaptive learning rate, so that the classification and the bounding box prediction are more accurate; the whole flow realizes high-efficiency target detection, can rapidly and accurately position a plurality of target objects in a video frame, and provides a reliable target area for subsequent falling detection.

S103, determining the action type of the target object through a preset action classification model according to the comprehensive characteristics corresponding to each target object, and sending out alarm information when the action type is matched with the falling action.

The action classification model is constructed based on a combined network model, and action classification is performed by distributing corresponding weights for different network models. The action classification model comprises a space sub-model and a time sub-model, wherein the space sub-model is constructed based on a convolutional neural network, and the time sub-model is constructed based on a cyclic neural network, wherein the space sub-model of the embodiment of the application can comprise the convolutional neural network model, and the time sub-model can comprise the cyclic neural network model.

In an alternative embodiment of the present application,

Illustratively, the output of the spatial sub-model is taken as the input of the temporal sub-model, which may be achieved by directly connecting the output of the spatial sub-model to the input layer of the temporal sub-model; the output of the time submodel (usually a time sequence feature sequence) is subjected to global average pooling operation, and the time sequence features are compressed into a feature vector with fixed length; and applying an adaptive weight distribution algorithm to the feature vectors subjected to the global averaging pooling operation, wherein the algorithm can distribute a weight for each time step according to the importance of the features, and the weight can be learned according to an optimization algorithm such as gradient descent and the like so as to ensure that the weight is adaptively adjusted in the training process.

The feature vector and the time weight after the global averaging pooling operation are connected into a larger feature vector, and then the feature vector is input into the full connection layer of the time submodel. Linear transformation and nonlinear activation are performed at the fully connected layer to capture complex relationships between features, and the probability distribution of the corresponding action class is obtained through a classification activation function (e.g., softmax function) of the fully connected layer.

The integrated feature vector is input into a fully connected layer, the number of neurons of which is equal to the number of action classes, and then the output of the fully connected layer is converted into a probability distribution of action classes using a class activation function (e.g., softmax function). From the probability distribution of each action category, the final action type with the highest probability value is selected.

Through the fully connected layer, the network can learn the complex relation in the comprehensive characteristics, the quantity of neurons of the layer is matched with the quantity of action categories, and each neuron is ensured to correspond to one action category. The Softmax function converts the output of the fully connected layer into a probability distribution, ensuring that the probability values for all action classes are between 0 and 1 and the sum is 1. From the probability distribution of the Softmax output, the action category with the highest probability value is selected as the final prediction result, because the Softmax function converts the output into the probability distribution, and the most probable action type can be determined by selecting the category with the highest probability.

In an alternative embodiment of the present application,

；

Through the space sub-model and the time sub-model, the system can extract rich space and time sequence characteristics from a target area of a target object; through the self-adaptive weight distribution algorithm, the system can distribute different weights for different parts of the time sequence characteristics, so that key moments of actions can be focused more, and the accuracy of action recognition is improved. Through the processing of the full connection layer, the comprehensive features can be mapped to different action categories, and the features are mapped to probability distributions of the action categories through a classification activation function (usually a Softmax function). Because convolutional neural networks and recurrent neural networks are used, a balance can be achieved between real-time performance and stability, convolutional Neural Networks (CNNs) can generally process spatial features with high efficiency, while Recurrent Neural Networks (RNNs) are suitable for processing temporal features. Through reasonable structural design, good performance can be obtained in real-time and precision.

Fig. 2 is a schematic structural diagram of a multi-person 5D radar fall detection system based on an artificial intelligence algorithm according to an embodiment of the present application, as shown in fig. 2, the system includes:

In a third aspect of an embodiment of the present application,

there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

In a fourth aspect of an embodiment of the present application,

The present application may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present application.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. The method for detecting the falling of the multi-person 5D radar based on the artificial intelligence algorithm is characterized by comprising the following steps of:

2. The method according to claim 1, wherein acquiring a plurality of radar echo information of a plurality of target objects in a target area based on a millimeter wave radar, and extracting a plurality of motion features corresponding to the plurality of radar echo information by a preset echo feature extraction model comprises:

3. The method of claim 2, wherein determining a plurality of motion features corresponding to the plurality of radar echo information comprises:

；

4. The method of claim 1, wherein prior to determining the target regions corresponding to the plurality of target objects in each frame of video information based on the target detector, the method further comprises training the target detector:

5. The method of claim 4, wherein determining classification loss and bounding box regression loss from the feature vectors through the full connection layer and iteratively optimizing the classification loss function and bounding box regression loss function through a back propagation algorithm in combination with an adaptive learning rate comprises:

；

6. The method of claim 1, wherein determining the action type of the target object by the preset action classification model according to the comprehensive characteristics corresponding to each target object comprises:

7. The method of claim 6, wherein assigning corresponding temporal weights to the feature vectors after the global averaging pooling operation by an adaptive weight assignment algorithm comprises:

；

wherein S is _T The time-weight is represented by a time-weight,、/>representing the sigmod function and the ReLu function, respectively,>representing the number of feature vectors, +.>And the time weight of the feature vector after the k-1 global average pooling operation is represented, and U represents the feature vector after the global average pooling operation.

8. An artificial intelligence algorithm-based multi-person 5D radar fall detection system, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.