CN113837005A - Human body falling detection method and device, storage medium and terminal equipment - Google Patents

Human body falling detection method and device, storage medium and terminal equipment Download PDF

Info

Publication number
CN113837005A
CN113837005A CN202110960372.9A CN202110960372A CN113837005A CN 113837005 A CN113837005 A CN 113837005A CN 202110960372 A CN202110960372 A CN 202110960372A CN 113837005 A CN113837005 A CN 113837005A
Authority
CN
China
Prior art keywords
image
frame
data
human body
fall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110960372.9A
Other languages
Chinese (zh)
Inventor
林凡
高欣
宋进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GCI Science and Technology Co Ltd
Original Assignee
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GCI Science and Technology Co Ltd filed Critical GCI Science and Technology Co Ltd
Priority to CN202110960372.9A priority Critical patent/CN113837005A/en
Publication of CN113837005A publication Critical patent/CN113837005A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, a storage medium and a terminal device for detecting human body falling, which receive video data; extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; inputting the video data into a second fall recognition model trained in advance to obtain a second fall probability matrix; the first falling probability matrix and the second falling probability matrix are subjected to mean value processing to obtain a falling recognition result of the human body to be detected, and the contour information, the color information and the skeleton data information of the video image data can be fused, so that the neural network learns abundant action characteristics, and the falling recognition accuracy is improved.

Description

Human body falling detection method and device, storage medium and terminal equipment
Technical Field
The invention relates to the technical field of health monitoring, in particular to a method and a device for detecting human body falling, a storage medium and terminal equipment.
Background
The falling seriously threatens the health and the life of the old, and the free and real-time safety monitoring is provided, so that the method has great application value and research significance for ensuring the life quality and the life of the old. In the literature, "a skeleton sequence-based old people tumble motion recognition method research", a tumble motion recognition method is proposed, in which a skeleton data set extracted from a video is divided into two mutually exclusive sets serving as a training set and a test set by a leave-out method, and the division ratio is 4: 1; then, multiple random divisions are adopted, and after repeated tests, an average value is taken as an evaluation result; then, data is preprocessed through data cleaning, and effective joint point data is stored; analyzing the tumbling process, and respectively extracting the spatial characteristic and the time sequence characteristic of the skeleton; and finally, respectively loading training data and testing data, and after training a model on the training set, evaluating the testing error on the testing set to be used as an approximation for the generalization error, wherein the model trained by the method can effectively identify the tumbling action. However, the motion recognition method based on the skeleton data cannot learn the contour and color information included in the image data, and cannot effectively solve the problem of motion classification such as interaction between a person and a scene.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, a storage medium and a terminal device for detecting human body falling, which can fuse the outline information, the color information and the skeleton data information of video image data, enable a neural network to learn abundant action characteristics and improve the accuracy of falling recognition.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting a human body fall, including:
receiving video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
inputting the video data into a second fall recognition model trained in advance to obtain a second fall probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected.
As an improvement of the above scheme, the extracting of the skeleton joint points of the human body to be detected in the video data to obtain joint data specifically includes:
inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a bounding box containing the human body to be detected in each frame of image;
inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image;
inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model consisting of a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
As an improvement of the above scheme, the method includes inputting the human body image in the bounding box of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining joint point two-dimensional coordinate data of each frame of image, and specifically includes:
inputting the human body image in the boundary frame of each frame of image into a space transformation network to obtain a preprocessed human body image of each frame of image;
inputting the preprocessed human body image of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image;
performing confidence comparison on all skeleton joint points at each human body joint point position in each frame of image to obtain a skeleton joint point with the maximum confidence of each human body joint point position in each frame of image;
and fitting all skeleton joint points with the maximum confidence coefficient in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image.
As an improvement of the above solution, the first fall identification model includes n space-time diagram convolutional layers and a first classifier, n is an integer and n is greater than 1;
when i is equal to 1, the input data of the space-time diagram convolutional layer of the ith layer is the joint data; when i is larger than 1, the input data of the ith layer of the space-time diagram convolutional layer is the output data of the (i-1) th layer of the space-time diagram convolutional layer, and the input data of the first classifier is the output data of the nth layer of the space-time diagram convolutional layer;
each layer of the space-time diagram convolution layer comprises a diagram convolution network used for processing the space structure information of the skeleton joint points of the human body to be detected and a time convolution network used for processing time dimension characteristics;
the first classifier is used for performing action classification on output data of the space-time diagram convolutional layer of the nth layer to obtain a first falling probability matrix.
As an improvement of the scheme, the ith skeleton joint of the t frame image processed by the image convolution networkFeature output of points
Figure BDA0003221857350000031
The method specifically comprises the following steps:
Figure BDA0003221857350000032
wherein,
Figure BDA0003221857350000033
is the feature of the jth skeletal joint point in the image of the tth frame,
Figure BDA0003221857350000034
adding a closed-loop adjacency matrix for the ith skeletal joint point of the tth frame image,
Figure BDA0003221857350000035
adding a closed-loop adjacency matrix for the jth skeleton joint point of the tth frame image,
Figure BDA0003221857350000036
is composed of
Figure BDA0003221857350000037
The degree matrix of (c) is,
Figure BDA0003221857350000038
is composed of
Figure BDA0003221857350000039
ω is a parameter of the graph convolution network.
As an improvement of the above, the second fall identification model comprises: the system comprises a depth residual error network, an attention module, a convolution long-time and short-time memory network and a second classifier;
the depth residual error network is used for extracting the features of each frame of image in the video data and outputting the image features of each frame of image;
the attention module is used for processing output data of the depth sub-network and outputting an attention heat map of each frame of image;
the convolution long-time and short-time memory network is used for extracting time sequence characteristics of the attention heat map of each frame of image;
and the second classifier is used for performing action classification on the output data of the convolution long-time and short-time memory network to obtain a second falling probability matrix.
In order to achieve the above object, an embodiment of the present invention further provides a human body fall detection apparatus, including:
the data receiving module is used for receiving video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
the data preprocessing module is used for extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
the first falling identification module is used for inputting the joint data into a first falling identification model trained in advance to carry out falling identification so as to obtain a first falling probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
the second falling identification module is used for inputting the video data into a second falling identification model for falling identification to obtain a second falling probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and the data operation module is used for carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected.
As an improvement of the above scheme, the data preprocessing module specifically includes:
the boundary frame detection unit is used for inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a boundary frame containing the human body to be detected in each frame of image;
the first coordinate unit is used for inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining two-dimensional coordinate data of the joint points of each frame of image;
the second coordinate unit is used for inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model formed by a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and the fitting unit is used for fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for detecting a human fall according to any one of the above embodiments.
To achieve the above object, an embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor, when executing the computer program, implements the human fall detection method described in any one of the above.
Compared with the prior art, the method, the device, the storage medium and the terminal device for detecting human body falling provided by the embodiment of the invention comprise the steps of firstly receiving video data, and extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; secondly, inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; then, inputting the video data into a second fall recognition model trained in advance to obtain a second fall probability matrix; and finally, carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected. The method can fuse the contour information, the color information and the skeleton data information of the video image data, so that the neural network learns rich action characteristics, and the accuracy of fall identification is improved.
Drawings
Fig. 1 is a flowchart of a method for detecting a human fall according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a first fall identification model according to a preferred embodiment of the invention;
fig. 3 is a schematic structural diagram of a human fall detection apparatus provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for detecting a human fall according to an embodiment of the present invention.
The human body falling detection method comprises the following steps:
s1, receiving video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
s2, extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
s3, inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
s4, inputting the video data into a pre-trained second fall recognition model to obtain a second fall probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and S5, carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected.
The dimension of the joint data is (17, 3, 300), that is, the number of human joint points is 17, the information dimension of the input joint points is 3, and the total frame number of the video data is 300.
In an optional embodiment, in step S2, the extracting skeleton joint points of the human body to be detected in the video data is performed to obtain joint data, which specifically includes:
s21, inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a bounding box containing the human body to be detected in each frame of image;
s22, inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining two-dimensional coordinate data of the joint points of each frame of image;
s23, inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model consisting of a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and S24, fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
It should be noted that, in step S21, before the video data is input into the pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of the image, the convolutional neural network needs to be trained by using the image data set labeled with the label, so that the trained convolutional neural network has a higher activation value for the image pixels in the region where the human body is located. It should be noted that the image data set labeled with the label is an image data set selected from a frame of a human body to be measured.
Preferably, the convolutional neural network adopted by the network model is a convolutional neural network based on residual modules.
In step S23, before inputting the two-dimensional coordinate data of the joint point of each frame of image into the pre-trained network model composed of the convolutional neural network and the time convolutional network, the network model needs to be trained using the coordinates of the skeleton joint point of the human body to be measured in the three-dimensional space as the label, so that the trained network model can realize the conversion from the two-dimensional coordinates to the three-dimensional coordinates.
In an optional embodiment, in step S22, the inputting the human body image in the bounding box of each frame of the image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of the image, and obtaining joint point two-dimensional coordinate data of each frame of the image includes:
s221, inputting the human body image in the boundary frame of each frame of image into a space transformation network to obtain a preprocessed human body image of each frame of image;
s222, inputting the preprocessed human body image of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image;
s223, performing confidence comparison on all skeleton joint points of each human joint point position in each frame of image to obtain a skeleton joint point with the maximum confidence of each human joint point position in each frame of image;
and S224, fitting all skeleton joint points with the maximum confidence coefficient in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image.
Preferably, the spatial transformation network is a layer.
In an alternative embodiment, the first fall identification model comprises n space-time diagram convolutional layers and a first classifier, n being an integer and n being greater than 1;
when i is equal to 1, the input data of the space-time diagram convolutional layer of the ith layer is the joint data; when i is larger than 1, the input data of the ith layer of the space-time diagram convolutional layer is the output data of the (i-1) th layer of the space-time diagram convolutional layer, and the input data of the first classifier is the output data of the nth layer of the space-time diagram convolutional layer;
each layer of the space-time diagram convolution layer comprises a diagram convolution network used for processing the space structure information of the skeleton joint points of the human body to be detected and a time convolution network used for processing time dimension characteristics;
the first classifier is used for performing action classification on output data of the space-time diagram convolutional layer of the nth layer to obtain a first falling probability matrix.
Preferably, n is 9, and the first classifier is a SoftMax classifier.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a first fall identification model according to a preferred embodiment of the invention;
the first fall identification model comprises 9 space-time diagram convolutional layers and a SoftMax classifier;
the space-time graph convolution layer is composed of a graph convolution network layer and a time convolution network layer;
taking the output data of the graph convolution network as the input data of the time convolution network, and processing the characteristic information of the space-time graph convolution layer through dropout;
the number of channels of a single node of the space-time diagram convolutional layer of the 1 st layer, the 2 nd layer and the 3 rd layer is 64, the number of channels of a single node of the space-time diagram convolutional layer of the 4 th layer, the 5 th layer and the 6 th layer is 128, and the number of channels of a single node of the space-time diagram convolutional layer of the 7 th layer, the 8 th layer and the 9 th layer is 256;
performing global average pooling on output data of the space-time diagram convolutional layer of the layer 9 to obtain feature vectors with 256 channels;
and inputting the 256 feature vectors into a SoftMax classifier for action classification to obtain a first fall probability matrix.
Before inputting the joint data into the first fall recognition model trained in advance in step S3, a human skeleton graph structure is constructed, which can be regarded as a graph structure formed by connecting skeleton joint points serving as graph nodes and skeleton edges, and which can be represented by G ═ V, E, where V is a set of graph nodes and includes all skeleton joint points; e is a set of edges comprising a first subset, which is a skeletal connection of each frame of image, and a second subset, which is a connected skeletal joint point in successive frames of images. Preferably, the joint data is input into a first fall recognition model trained in advance as a graph node feature of the human skeleton graph structure.
Specifically, the characteristic output of the ith skeleton joint point of the t frame image processed by the graph convolution network
Figure BDA0003221857350000091
The method specifically comprises the following steps:
Figure BDA0003221857350000092
wherein,
Figure BDA0003221857350000093
is the feature of the jth skeletal joint point in the image of the tth frame,
Figure BDA0003221857350000094
adding a closed-loop adjacency matrix for the ith skeletal joint point of the tth frame image,
Figure BDA0003221857350000101
adding a closed-loop adjacency matrix for the jth skeleton joint point of the tth frame image,
Figure BDA0003221857350000102
is composed of
Figure BDA0003221857350000103
The degree matrix of (c) is,
Figure BDA0003221857350000104
is composed of
Figure BDA0003221857350000105
ω is a parameter of the graph convolution network.
Specifically, the adjacent matrix of the ith skeleton joint point of the t frame image added with a closed loop
Figure BDA0003221857350000106
The method specifically comprises the following steps:
Figure BDA0003221857350000107
wherein,
Figure BDA0003221857350000108
is a adjacency matrix of the ith skeletal joint point of the image of the t-th frame,
Figure BDA0003221857350000109
is prepared by reacting with
Figure BDA00032218573500001010
Identity matrix with same dimension.
In an alternative embodiment, the second fall identification model comprises: the system comprises a depth residual error network, an attention module, a convolution long-time and short-time memory network and a second classifier;
the depth residual error network is used for extracting the features of each frame of image in the video data and outputting the image features of each frame of image;
the attention module is used for processing output data of the depth sub-network and outputting an attention heat map of each frame of image;
the convolution long-time and short-time memory network is used for extracting time sequence characteristics of the attention heat map of each frame of image;
and the second classifier is used for performing action classification on the output data of the convolution long-time and short-time memory network to obtain a second falling probability matrix.
Preferably, the depth residual network is ResNet-34, and the second classifier is a SoftMax classifier.
Specifically, the processing the output data of the depth-based sub-network and outputting the attention heat map of each frame of the image includes:
calculating the attention heat map of each frame of the image by the following formula
Figure BDA00032218573500001011
Figure BDA00032218573500001012
Figure BDA0003221857350000111
Figure BDA0003221857350000112
Wherein,
Figure BDA0003221857350000113
characteristic diagram of c channel output by convolutional layer of the last layer of ResNet-34 networkiActivation value of individual position, thetaCFor the parameter corresponding to the c-th channel, M (P)i) To score attention, α is attentionAnd (4) weighting.
It should be noted that the attention heat map is added to enable the second fall recognition module to learn pixel information and contour information of a single frame of the image, and the time convolution duration memory network enables the second fall recognition module to learn feature information of the whole video in a time sequence, so as to improve the accuracy of fall recognition.
The embodiment of the present invention further provides a device for detecting human body falls, which can implement all the processes of the method for detecting human body falls provided in any of the above embodiments, and the functions and technical effects of the modules and units in the device are respectively the same as those of the method for detecting human body falls provided in the above embodiment, and are not described herein again.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a device for detecting a human fall according to an embodiment of the present invention.
The human body falling detection device comprises:
a data receiving module 11, configured to receive video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
the data preprocessing module 12 is configured to perform skeleton joint point extraction on the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
the first fall recognition module 13 is used for inputting the joint data into a first fall recognition model trained in advance to perform fall recognition, so as to obtain a first fall probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
the second fall identification module 14 is configured to input the video data into a second fall identification model for fall identification, so as to obtain a second fall probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and the data operation module 15 is configured to perform mean processing on the first fall probability matrix and the second fall probability matrix to obtain a fall identification result of the human body to be detected.
The data preprocessing module 12 specifically includes:
the boundary frame detection unit is used for inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a boundary frame containing the human body to be detected in each frame of image;
the first coordinate unit is used for inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining two-dimensional coordinate data of the joint points of each frame of image;
the second coordinate unit is used for inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model formed by a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and the fitting unit is used for fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
The first coordinate unit is specifically configured to:
inputting the human body image in the boundary frame of each frame of image into a space transformation network to obtain a preprocessed human body image of each frame of image;
inputting the preprocessed human body image of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image;
performing confidence comparison on all skeleton joint points at each human body joint point position in each frame of image to obtain a skeleton joint point with the maximum confidence of each human body joint point position in each frame of image;
and fitting all skeleton joint points with the maximum confidence coefficient in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image.
Preferably, the first fall identification model comprises n space-time diagram convolutional layers and a first classifier, n is an integer and n is greater than 1;
when i is equal to 1, the input data of the space-time diagram convolutional layer of the ith layer is the joint data; when i is larger than 1, the input data of the ith layer of the space-time diagram convolutional layer is the output data of the (i-1) th layer of the space-time diagram convolutional layer, and the input data of the first classifier is the output data of the nth layer of the space-time diagram convolutional layer;
each layer of the space-time diagram convolution layer comprises a diagram convolution network used for processing the space structure information of the skeleton joint points of the human body to be detected and a time convolution network used for processing time dimension characteristics;
the first classifier is used for performing action classification on output data of the space-time diagram convolutional layer of the nth layer to obtain a first falling probability matrix.
Preferably, the second fall identification model comprises: the system comprises a depth residual error network, an attention module, a convolution long-time and short-time memory network and a second classifier;
the depth residual error network is used for extracting the features of each frame of image in the video data and outputting the image features of each frame of image;
the attention module is used for processing output data of the depth sub-network and outputting an attention heat map of each frame of image;
the convolution long-time and short-time memory network is used for extracting time sequence characteristics of the attention heat map of each frame of image;
and the second classifier is used for performing action classification on the output data of the convolution long-time and short-time memory network to obtain a second falling probability matrix.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein the computer program, when running, controls an apparatus on which the computer-readable storage medium is located to execute the method for detecting a human fall according to any of the above embodiments.
An embodiment of the present invention further provides a terminal device, which is shown in fig. 4 and is a schematic structural diagram of a terminal device provided in an embodiment of the present invention, and the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, where the processor 10, when executing the computer program, implements the method for detecting a human body fall according to any of the embodiments.
Preferably, the computer program may be divided into one or more modules/units (e.g., computer program 1, computer program 2, … …) that are stored in the memory 20 and executed by the processor 10 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.
The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.
It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural diagram of fig. 4 is only an example of the terminal device and does not constitute a limitation of the terminal device, and may include more or less components than those shown, or combine some components, or different components.
To sum up, the embodiment of the present invention provides a method, an apparatus, a storage medium, and a terminal device for detecting a human body falling, and the method includes receiving video data, and performing skeleton joint point extraction on the human body to be detected in the video data to obtain joint data; secondly, inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; then, inputting the video data into a second fall recognition model trained in advance to obtain a second fall probability matrix; and finally, carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected. The method can fuse the contour information, the color information and the skeleton data information of the video image data, so that the neural network learns rich action characteristics, and the accuracy of fall identification is improved.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for detecting a human fall, comprising:
receiving video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
inputting the joint data into a first fall recognition model trained in advance to obtain a first fall probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
inputting the video data into a second fall recognition model trained in advance to obtain a second fall probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected.
2. A method for detecting a human fall as claimed in claim 1, wherein the extracting of skeleton joint points from the human body to be detected in the video data to obtain joint data specifically comprises:
inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a bounding box containing the human body to be detected in each frame of image;
inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image;
inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model consisting of a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
3. A method as claimed in claim 2, wherein the method for detecting a human body fall comprises inputting the human body image in the bounding box of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining two-dimensional coordinate data of the joint points of each frame of image, specifically:
inputting the human body image in the boundary frame of each frame of image into a space transformation network to obtain a preprocessed human body image of each frame of image;
inputting the preprocessed human body image of each frame of image into a single posture prediction network, and extracting skeleton joint points of the human body to be detected in each frame of image;
performing confidence comparison on all skeleton joint points at each human body joint point position in each frame of image to obtain a skeleton joint point with the maximum confidence of each human body joint point position in each frame of image;
and fitting all skeleton joint points with the maximum confidence coefficient in each frame of image to obtain joint point two-dimensional coordinate data of each frame of image.
4. A method of fall detection as claimed in claim 1, wherein the first fall identification model comprises n space-time diagram convolutional layers and a first classifier, n being an integer and n being greater than 1;
when i is equal to 1, the input data of the space-time diagram convolutional layer of the ith layer is the joint data; when i is larger than 1, the input data of the ith layer of the space-time diagram convolutional layer is the output data of the (i-1) th layer of the space-time diagram convolutional layer, and the input data of the first classifier is the output data of the nth layer of the space-time diagram convolutional layer;
each layer of the space-time diagram convolution layer comprises a diagram convolution network used for processing the space structure information of the skeleton joint points of the human body to be detected and a time convolution network used for processing time dimension characteristics;
the first classifier is used for performing action classification on output data of the space-time diagram convolutional layer of the nth layer to obtain a first falling probability matrix.
5. A method for detecting a human fall as claimed in claim 4, wherein the feature output of the ith skeletal joint point of the tth frame of image processed by the image volume network is output
Figure FDA0003221857340000021
The method specifically comprises the following steps:
Figure FDA0003221857340000031
wherein,
Figure FDA0003221857340000032
is the feature of the jth skeletal joint point in the image of the tth frame,
Figure FDA0003221857340000033
adding a closed-loop adjacency matrix for the ith skeletal joint point of the tth frame image,
Figure FDA0003221857340000034
adding a closed-loop adjacency matrix for the jth skeleton joint point of the tth frame image,
Figure FDA0003221857340000035
is composed of
Figure FDA0003221857340000036
The degree matrix of (c) is,
Figure FDA0003221857340000037
is composed of
Figure FDA0003221857340000038
ω is a parameter of the graph convolution network.
6. A method of detecting a personal fall as claimed in claim 1, wherein the second fall identification model comprises: the system comprises a depth residual error network, an attention module, a convolution long-time and short-time memory network and a second classifier;
the depth residual error network is used for extracting the features of each frame of image in the video data and outputting the image features of each frame of image;
the attention module is used for processing output data of the depth sub-network and outputting an attention heat map of each frame of image;
the convolution long-time and short-time memory network is used for extracting time sequence characteristics of the attention heat map of each frame of image;
and the second classifier is used for performing action classification on the output data of the convolution long-time and short-time memory network to obtain a second falling probability matrix.
7. A device for detecting a fall of a human body, comprising:
the data receiving module is used for receiving video data; the video data consists of a plurality of frames of images containing human bodies to be detected;
the data preprocessing module is used for extracting skeleton joint points of the human body to be detected in the video data to obtain joint data; the joint data comprise joint point three-dimensional coordinate data of the human body to be detected in each frame of the image;
the first falling identification module is used for inputting the joint data into a first falling identification model trained in advance to carry out falling identification so as to obtain a first falling probability matrix; wherein the first fall identification model is a space-time graph convolutional network for fall identification; the first falling probability matrix comprises a first probability belonging to falling actions corresponding to each frame of the image;
the second falling identification module is used for inputting the video data into a second falling identification model for falling identification to obtain a second falling probability matrix; wherein the second fall identification model is a graph attention network for fall identification; the second falling probability matrix comprises a second probability belonging to falling actions corresponding to each frame of the image;
and the data operation module is used for carrying out mean value processing on the first falling probability matrix and the second falling probability matrix to obtain a falling identification result of the human body to be detected.
8. A human fall detection apparatus as claimed in claim 7, wherein the data preprocessing module specifically comprises:
the boundary frame detection unit is used for inputting the video data into a pre-trained convolutional neural network to detect the position of the human body to be detected in each frame of image, and obtaining a boundary frame containing the human body to be detected in each frame of image;
the first coordinate unit is used for inputting the human body image in the boundary frame of each frame of image into a single posture prediction network, extracting skeleton joint points of the human body to be detected in each frame of image, and obtaining two-dimensional coordinate data of the joint points of each frame of image;
the second coordinate unit is used for inputting the two-dimensional coordinate data of the joint points of each frame of image into a pre-trained network model formed by a convolutional neural network and a time convolutional network to obtain the three-dimensional coordinate data of the joint points of each frame of image;
and the fitting unit is used for fitting the three-dimensional coordinate data of the joint points of each frame of image to obtain joint data.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method for detecting a human fall according to any one of claims 1 to 6.
10. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of detecting a human fall according to any one of claims 1 to 6 when executing the computer program.
CN202110960372.9A 2021-08-20 2021-08-20 Human body falling detection method and device, storage medium and terminal equipment Withdrawn CN113837005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110960372.9A CN113837005A (en) 2021-08-20 2021-08-20 Human body falling detection method and device, storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110960372.9A CN113837005A (en) 2021-08-20 2021-08-20 Human body falling detection method and device, storage medium and terminal equipment

Publications (1)

Publication Number Publication Date
CN113837005A true CN113837005A (en) 2021-12-24

Family

ID=78961021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110960372.9A Withdrawn CN113837005A (en) 2021-08-20 2021-08-20 Human body falling detection method and device, storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN113837005A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114229646A (en) * 2021-12-28 2022-03-25 苏州汇川控制技术有限公司 Elevator control method, elevator and elevator detection system
CN116386087A (en) * 2023-03-31 2023-07-04 阿里巴巴(中国)有限公司 Target object processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074227A1 (en) * 2016-11-09 2020-03-05 Microsoft Technology Licensing, Llc Neural network-based action detection
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions
CN111539941A (en) * 2020-04-27 2020-08-14 上海交通大学 Parkinson's disease leg flexibility task evaluation method and system, storage medium and terminal
CN112052816A (en) * 2020-09-15 2020-12-08 山东大学 Human behavior prediction method and system based on adaptive graph convolution countermeasure network
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
CN112233222A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Human body parametric three-dimensional model deformation method based on neural network joint point estimation
WO2021114892A1 (en) * 2020-05-29 2021-06-17 平安科技(深圳)有限公司 Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
CN113111865A (en) * 2021-05-13 2021-07-13 广东工业大学 Fall behavior detection method and system based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074227A1 (en) * 2016-11-09 2020-03-05 Microsoft Technology Licensing, Llc Neural network-based action detection
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions
CN111539941A (en) * 2020-04-27 2020-08-14 上海交通大学 Parkinson's disease leg flexibility task evaluation method and system, storage medium and terminal
WO2021114892A1 (en) * 2020-05-29 2021-06-17 平安科技(深圳)有限公司 Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
CN112052816A (en) * 2020-09-15 2020-12-08 山东大学 Human behavior prediction method and system based on adaptive graph convolution countermeasure network
CN112233222A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Human body parametric three-dimensional model deformation method based on neural network joint point estimation
CN113111865A (en) * 2021-05-13 2021-07-13 广东工业大学 Fall behavior detection method and system based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YI CAO ET AL.: "Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure", 《MULTIMEDIA TOOLS AND APPLICATIONS》, no. 80, pages 29139 *
孙于成: "基于时空图卷积的乒乓球基础技术动作识别", 《中国优秀硕士学位论文全文数据库 社会科学Ⅱ辑》, no. 12, pages 12 - 15 *
李炫烨 等: "结合多注意力机制与时空图卷积网络的人体动作识别方法", 《计算机辅助设计与图形学学报》, vol. 33, no. 7, pages 1055 - 1063 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114229646A (en) * 2021-12-28 2022-03-25 苏州汇川控制技术有限公司 Elevator control method, elevator and elevator detection system
CN114229646B (en) * 2021-12-28 2024-03-22 苏州汇川控制技术有限公司 Elevator control method, elevator and elevator detection system
CN116386087A (en) * 2023-03-31 2023-07-04 阿里巴巴(中国)有限公司 Target object processing method and device
CN116386087B (en) * 2023-03-31 2024-01-09 阿里巴巴(中国)有限公司 Target object processing method and device

Similar Documents

Publication Publication Date Title
Zhang et al. A bi-directional message passing model for salient object detection
CN111160379B (en) Training method and device of image detection model, and target detection method and device
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
CN108932500B (en) A kind of dynamic gesture identification method and system based on deep neural network
Liang et al. Semantic object parsing with local-global long short-term memory
WO2019149071A1 (en) Target detection method, device, and system
CN112446398B (en) Image classification method and device
US11244191B2 (en) Region proposal for image regions that include objects of interest using feature maps from multiple layers of a convolutional neural network model
US8805018B2 (en) Method of detecting facial attributes
US20180114071A1 (en) Method for analysing media content
CN108875708A (en) Behavior analysis method, device, equipment, system and storage medium based on video
CN113326835B (en) Action detection method and device, terminal equipment and storage medium
Bae Object detection based on region decomposition and assembly
Yang et al. Semi-supervised learning of feature hierarchies for object detection in a video
CN113837005A (en) Human body falling detection method and device, storage medium and terminal equipment
CN104063719A (en) Method and device for pedestrian detection based on depth convolutional network
CN113065460A (en) Establishment method of pig face facial expression recognition framework based on multitask cascade
Julina et al. Facial emotion recognition in videos using hog and lbp
CN109063626A (en) Dynamic human face recognition methods and device
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN111444957B (en) Image data processing method, device, computer equipment and storage medium
CN111144220A (en) Personnel detection method, device, equipment and medium suitable for big data
CN116721132B (en) Multi-target tracking method, system and equipment for industrially cultivated fishes
CN113627326A (en) Behavior identification method based on wearable device and human skeleton

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211224

WW01 Invention patent application withdrawn after publication