CN109886102B - Fall-down behavior time-space domain detection method based on depth image - Google Patents

Fall-down behavior time-space domain detection method based on depth image Download PDF

Info

Publication number
CN109886102B
CN109886102B CN201910032206.5A CN201910032206A CN109886102B CN 109886102 B CN109886102 B CN 109886102B CN 201910032206 A CN201910032206 A CN 201910032206A CN 109886102 B CN109886102 B CN 109886102B
Authority
CN
China
Prior art keywords
depth image
frame
image sequence
tensor
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910032206.5A
Other languages
Chinese (zh)
Other versions
CN109886102A (en
Inventor
肖阳
姜文祥
曹治国
王焱乘
朱子豪
李帅
张明阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910032206.5A priority Critical patent/CN109886102B/en
Publication of CN109886102A publication Critical patent/CN109886102A/en
Application granted granted Critical
Publication of CN109886102B publication Critical patent/CN109886102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a time-space domain detection method for falling behavior based on depth images, which comprises the following steps: acquiring a depth image; selecting a multi-time window video sequence; extracting the normal vector characteristics of each section of depth video; fusing the characteristics; encoding a dynamic graph; detecting by the behavior detection network; processing a detection result; recording the detection result; and training a behavior detection network. According to the time-space domain detection method for the falling behavior based on the depth image, the characteristics of the depth image are fully mined, the characteristic sequence is coded by dynamic image coding through the characteristic fusion of a normal vector and a depth image, the time domain and the space domain are not greatly inhibited by a target detection network trained by a large amount of marked data, and the falling behavior is detected frame by frame, so that the method is guaranteed to have higher real-time performance, accuracy, robustness, privacy protection and practicability.

Description

Fall-down behavior time-space domain detection method based on depth image
Technical Field
The invention belongs to the field of digital image recognition, and particularly relates to a time-space domain detection method for falling behavior based on a depth image.
Background
Falls are the main causes of accidental injuries in the elderly (65 years old or older), and it is statistically calculated that 60% of the injuries in the head, 90% of the injuries in the buttocks and the wrists are caused by falls, and that 30% of solitary elderly and 50% of elderly in long-term care facilities (such as nursing homes) fall at least once a year. Therefore, the timely detection of falls is very important in the long-term care of the elderly. On the other hand, with the aging trend of the world population becoming more and more serious, the cost of long-term care for the elderly is also increasing, especially in some care institutions such as hospitals and nursing homes, and therefore, the demand for a real-time fall detection system for the elderly is very large.
Currently, there are three main types of methods for fall detection: wearable device based methods, environmental sensor based methods, and computer vision based methods.
The method based on the wearable device detects the falling behavior of the human body by detecting the acceleration through the sensor, and has the advantages of small calculated amount and simple use, but the human body needs to be worn all the time to influence normal life; the detection method based on the environmental sensor, such as a pressure sensor and a sound sensor, has the advantages of small calculation amount, but the method is influenced by the pressure change of the environment, the sound change is greatly influenced, and the false alarm is high; the method based on computer vision mainly uses monitored video image information, does not need wearing equipment, is easily influenced by illumination and cannot be used at night, and in addition, the privacy protection is poor, and the precision cannot reach a higher standard.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a time-space domain detection method for a falling behavior based on a depth image, so that the technical problems that the existing falling detection method is influenced by illumination, low in accuracy, and needs to wear a sensor and the like are solved.
In order to achieve the above object, the present invention provides a time-space domain detection method for fall behavior based on depth images, comprising:
(1) acquiring depth image data of an indoor scene, and intercepting M sections of depth image sequences with different lengths from the depth image data, wherein M is an integer;
(2) extracting the normal vector features of each depth image sequence, wherein the normal vector features of W H3N are extracted from each depth image sequence, W is the width of a depth image, H is the height of the depth image, and N is the frame number of the corresponding depth image sequence;
(3) converting each obtained depth image sequence into a W x H x N gray-scale image, and performing feature fusion on the gray-scale image of each depth image sequence and the normal vector features of the corresponding depth image sequence to obtain a corresponding W x H x 3 x N tensor of the corresponding depth image sequence;
(4) carrying out dynamic graph coding on tensor features W x H x 3 x N corresponding to each section of depth image sequence, wherein each section of depth image sequence obtains a tensor of W x H x 3;
(5) taking the tensor of W x H x 3 corresponding to each section of depth image sequence as the input of a target detection network to obtain the occurrence probability of the falling behavior and the spatial position information of the falling behavior corresponding to each section of depth image sequence;
(6) the method comprises the steps of carrying out spatial domain non-maximum inhibition on spatial position information of falling behavior corresponding to each depth image sequence according to falling probability obtained by a target detection network, carrying out time domain non-maximum inhibition processing on detection results of M depth image sequences in a time domain according to falling probability obtained by the target detection network, combining the spatial positions and time windows corresponding to the M depth image sequences, determining falling behavior occurrence if the target falling probability is greater than a first preset value and the length of the combined time window is greater than a second preset value, recording the falling probability of the falling behavior, combining the positions in images, the falling occurrence moment and the falling duration time, carrying out early warning, and otherwise, not causing the falling behavior.
Preferably, step (1) comprises:
(1.1) acquiring N frames of depth images, carrying out feature extraction coding detection from the Nth frame of depth image, and discarding the depth image with the earliest time in the current depth image sequence before processing the next frame of depth image so that the length of the depth image sequence for fall detection is N, wherein N represents the length of the depth image sequence;
and (1.2) taking M depth image sequences with different lengths, wherein the lengths N of the different depth image sequences are different, and the corresponding N value of each depth image sequence is fixed when each depth image sequence is processed.
Preferably, step (2) comprises:
(2.1) for each depth image sequence, extracting the depth image sequenceNormal vector S of each frame depth imagen=Sxn×SynWherein S isnA normal vector representing the depth image of the nth frame,
Figure GDA0002659967000000034
Figure GDA0002659967000000031
respectively are normal vectors in the x and y directions, and the depth image of the nth frame is pn=(xn,yn,dn(xn,yn)),(xn,yn) Representing the coordinates of the pixel points, dn(xn,yn) As (x) in the depth imagen,yn) A corresponding pixel value, N ═ 1,2, 3.., N;
and (2.2) fusing the normal vectors of each frame of depth image in the depth image sequence to obtain the normal vector characteristics of W x H x 3 x N of the depth image sequence.
Preferably, step (3) comprises:
(3.1) for each depth image sequence, converting each frame of depth image in the depth image sequence into a gray-scale map of W x H;
(3.2) calculating the first dimension of the normal vector of each frame of depth image in the depth image sequence pixel by pixel
Figure GDA0002659967000000032
And a second dimension
Figure GDA0002659967000000033
Obtaining two matrixes with the size of W x H, (x, y) represents the coordinates of pixel points, and d (x, y) is the pixel value of the depth image;
(3.3) merging the gray-scale image matrix W x H and the normal vector front two-dimensional matrix W x H x 2 into a tensor of W x H3, encoding each frame of depth image W x H into a feature tensor of W x H3, further obtaining a tensor of W x H3 x N corresponding to the section of depth image sequence, wherein W and H are the width and the height of the depth image.
Preferably, step (4) comprises:
(4.1) For each depth image sequence, recording the characteristic tensor sequence of the N frames of depth images in the depth image sequence as X ═ X [ X ]1,x2,...,xt,...,xN]Wherein x istN is the tensor of W × H × 3 encoded by the t-th frame depth image;
(4.2) designing a mapping function
Figure GDA0002659967000000041
So that the depth image x for the t-th frametIs inputted
Figure GDA0002659967000000042
Is a feature tensor that maps the past corresponding t-th frame depth image, where the mapping function is used to convert the original depth image data type to the range [0,255%]And vectorizing the matrix;
(4.3) obtaining the score S (v; u) ═ u of the t-th frame depth image according to the average characteristic of the t-th frame depth image and the sorting functiont·vtWherein, in the step (A),
Figure GDA0002659967000000043
representing the average feature of the depth image of the t-th frame, utRepresenting a transpose of a vector resulting from optimizing a sorting function, wherein the sorting function is configured to make frame images further behind the time series have a larger score;
(4.4) optimizing the parameter u in the sorting function through a rank SVM, so that the frame images which are positioned behind the time sequence and between different frames in the depth image sequence have larger scores, and converting the obtained optimal value of the parameter u into a tensor of W.H.3, wherein the tensor is used as the tensor of W.H.3 encoded by the tensor of W.H.3.N corresponding to the depth image sequence;
(4.5) use
Figure GDA0002659967000000044
As an approximation of the parameter u,
Figure GDA0002659967000000045
is the step (4.4) The vector after vectorization of W x H x 3 feature tensor coded by the ith frame image obtained in the step (b), alphai2(N-i +1), N denotes a corresponding depth image sequence length.
Preferably, the feature tensors of M W × H × 3 encoded by the M segments of depth video sequences in step (5) are detected, and a detection result is output, where the detection network includes YOLOv1, YOLOv2, YOLOv3, Fast R-CNN, MobileNets V1, MobileNets V2, or ShuffleNet, and when the target detection network uses YOLOv2, the input of the target detection network is 413 × 3, W × H3 is used as 413 × 3 by using image size transformation, and the target detection network outputs fall probability of the target to be detected, horizontal and vertical coordinate positions of the target to be detected in the image, and width and height of the target to be detected.
Preferably, the method further comprises:
after time-space domain labeling is carried out on a preset fall detection data set, converting a depth image into a gray image and normal vector features of the depth image for feature fusion, and carrying out dynamic image coding on tensor features obtained by feature fusion to manufacture a dynamic image training sample;
pre-training a convolutional neural network by using ImageNet million images, and finally performing end-to-end multi-batch training on dynamic image training samples aiming at behavior detection to obtain a target detection network, wherein the output of the target detection network comprises: the falling probability of the target to be detected, the position of the target to be detected in the image, the width of the target to be detected and the height of the target to be detected.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
the time-space domain detection method for the falling behavior based on the depth image fully excavates the characteristics of the depth image, encodes a characteristic sequence through the characteristic fusion of a normal vector and the depth image, carries out frame-by-frame detection on the falling behavior through a target detection network trained by a large amount of labeled data and the non-maximum inhibition of a time domain and a space domain in a detection result, and ensures that the method has higher real-time performance, accuracy, robustness, privacy protection and practicability.
Drawings
Fig. 1 is a schematic flow chart of a fall behavior detection method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a physical meaning corresponding to a normal vector feature extracted from a depth image according to an embodiment of the present invention;
FIG. 3 is a flow chart of a dynamic graph encoding algorithm provided by an embodiment of the present invention;
FIG. 4 is a visualization result of a first dimension of feature vectors encoded by an algorithm after encoding a simplified dynamic graph according to an embodiment of the present invention;
FIG. 5 is a network architecture of a behavior detection network used in accordance with an embodiment of the present invention;
fig. 6 is a flowchart of a complete fall detection method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a time-space domain detection method for a falling behavior based on a depth image, which fully excavates the characteristics of the depth image, encodes a characteristic sequence through the characteristic fusion of a normal vector and the depth image by dynamic image coding, detects a target detection network through a large amount of labeled data training and greatly inhibits a time domain and a space domain in a detection result frame by frame to detect the falling behavior, has higher real-time property, accuracy, robustness, privacy protection and practicability, and solves the technical problems that the existing falling detection method is influenced by illumination light, has low accuracy, needs to wear a sensor and the like.
The invention provides a time-space domain detection method for a falling behavior based on a depth image, which comprises the steps of obtaining the depth image, updating a depth image sequence, selecting a multi-time window video sequence, carrying out normal vector feature extraction, feature fusion and dynamic image coding on the depth image sequence, extracting features by a behavior detection network, outputting a detection result and training the behavior detection network. The time-space domain detection method for fall behaviors provided by the invention is specifically described below by combining with an example.
The method for detecting the time-space domain of the falling behavior based on the depth image, provided by the embodiment of the invention, comprises the following specific steps, and the whole process is shown in fig. 1 and 6:
(1) acquiring a depth image: the depth image data of the indoor scene is acquired through the depth sensor, and the depth image is not influenced by illumination, so that the all-weather detection can be performed. Meanwhile, the characteristics of the depth image cannot be used for identifying the identity of the user, and privacy protection is strong.
In addition, particularly, the invention is real-time detection, and after the detection of whether each frame has a falling behavior is completed, the depth image of the next frame is read in.
(2) Selecting a multi-time window video sequence: since the falling behavior itself cannot be confirmed according to a single depth image, a depth image sequence is needed for judgment, and the length of the image sequence is N.
Reading N frames of images first, and starting to perform subsequent feature extraction coding detection on the Nth frame of image. Thereafter, before each frame of depth image is read in, the image with the earliest time in the current image sequence is discarded, so that the length of the depth image sequence for fall detection is N.
Since the duration of the fall action varies from person to person, a plurality of time-length depth image sequences are taken, so that in step (2), there are M-length depth image sequences at the same time, and the length N of each depth image sequence is different, but the value of N of each depth image sequence is not changed when each depth image sequence is processed, and the processing of each depth image sequence in the subsequent steps is the same, and the description is not repeated. In addition, the selection of the video time window length in the embodiment of the invention is obtained by clustering operation based on the training sample.
(3) Carrying out normal vector on each section of depth image sequenceAnd (5) feature extraction. And extracting a normal vector for each frame of depth image. Let the nth frame depth image be pn=(xn,yn,dn(xn,yn) N ═ 1,2,3,. N, where (x)n,yn) Representing the coordinates of the pixel points, dn(xn,yn) For the nth frame depth image (x)n,yn) The physical meaning of the corresponding pixel value is the distance of the image from the camera (in millimeters), which is different from the physical meaning of the data collected by an RGB camera. The normal vector of the depth image is calculated according to the following formula:
Sn=Sxn×Syn
wherein the content of the first and second substances,
Figure GDA0002659967000000071
are the normal vectors in the x, y directions, respectively. The normal vector is calculated according to the following formula:
Figure GDA0002659967000000081
so that each point p on the nth frame depth mapn=(xn,yn,dn(xn,yn) The normal vector of) is:
Figure GDA0002659967000000082
in general
Figure GDA0002659967000000083
And
Figure GDA0002659967000000084
the following approximate method is used for calculation:
Figure GDA0002659967000000085
Figure GDA0002659967000000086
the physical meaning is shown in figure 2.
(4) And (5) feature fusion. Converting each frame of depth image obtained in the step (2) into a gray scale map W x H, and calculating a first dimension and a second dimension of a normal vector of each frame of depth image pixel by pixel in the step (3)
Figure GDA0002659967000000087
And
Figure GDA0002659967000000088
two matrices are obtained, again of size W × H. And (3) merging the gray-scale image matrix W x H in the step (2) and the front two-dimensional matrix W x H x 2 of the normal vector in the step (3) into a tensor of W x H3, so that each frame of depth image W x H is coded into a feature tensor of W x H3, wherein the depth image sequence W x H is a tensor of W x H3 x N, the coding result of the depth image sequence is a tensor of W x H3 x N, and W and H are the width and the height of the depth image.
(5) And (5) encoding the dynamic graph. And (4) carrying out dynamic graph coding on the W H3N characteristic tensor obtained in the step (4) to obtain the tensor of W H3. As shown in fig. 3, the N-frame feature tensor sequence X ═ X1,x2,...,xN]Wherein the t-th frame xtDesigning a mapping function to express the tensor of W H3 coded by each frame of depth image in the step (4)
Figure GDA0002659967000000091
So that the depth image x for the t-th frametIs inputted
Figure GDA0002659967000000092
Is a feature tensor that maps the past corresponding tth frame, where the mapping function acts to convert the original depth image data type to the range 0,255]And vectorizing the matrix; then define one
Figure GDA0002659967000000093
An average feature, denoted as frame t, based on which a ranking function is defined and a score S (v; u) ═ u is obtained, since there is a significant temporal order between frames of the video of the fall behaviourt·vtThe function of the ranking function is such that the images of frames further down the time series have a larger score S for all frames, i.e. the image of a frame further down the time series has a larger score S
Figure GDA0002659967000000094
The final objective is to optimize the parameter u of such a ranking function S by means of rankSVM such that the frame images between different frames that satisfy the further temporal sequence have a larger score. Using the structural risk minimization and maximum separation optimization framework here, the objective optimization problem can be expressed as:
Figure GDA0002659967000000095
the first term is a regularization term and the second term is a change-loss error penalty term. The equation is proved to be a convex optimization problem, the RankSVM can be used for solving, and the optimized parameter u is obtained*Can be a new representation of the entire sequence of feature tensors. Parameter u*After resize, becomes a tensor feature of W × H × 3.
This formula is simplified and d represents the better parameter u to be obtained in the previous method:
Figure GDA0002659967000000096
from
Figure GDA0002659967000000097
At first, the first approximate solution
Figure GDA0002659967000000098
Figure GDA0002659967000000101
Can therefore obtain
Figure GDA0002659967000000102
Summing the left series of numbers
αt=2(N-t+1)-(N+1)(HN-Ht-1)
Wherein
Figure GDA0002659967000000103
The tensor characteristics of W × H × 3 that are finally desired become, as shown in fig. 4:
Figure GDA0002659967000000104
in the present embodiment, α is usedtThe tensor eigensequence is processed 2(N-t +1), formula αt=2(T-t+1)-(T+1)(HT-Ht-1) The second item in (1) does not influence the coding effect, and the consumption of much time is reduced.
(6) The behavior detection network performs detection. And (4) encoding M sections of depth video sequences encoded in the step (5) to obtain M characteristic W x H x 3 tensors, and outputting a detection result by a subsequent detection network.
The target detection network in the embodiment of the invention can use the existing target detection networks such as YOLOv1, YOLOv2, YOLOv3, Fast R-CNN, Faster R-CNN, MobileNet V1, MobileNet V2 and ShuffleNet. In the embodiment of the present invention, YOLOv2(You Only Look one: Unifield, Real-Time Object Detection) is preferably used, and the network structure is shown in FIG. 5.
The input of the convolutional neural network is 413 × 3, the direct resize of W × H × 3 is 413 × 3, the network output is 13 × 130, that is, 13 × 13 (B × 5+ C), B is 5, C is 21, B is the number of output bounding boxes, C is the detected target category, the first 20 categories are non-falling, and the 21 st category is falling. The 5 in front of the detection result respectively refers to the probability p of the target and the position (x, y) of the target in the image, and the width and height w and h of the target.
(7) And processing detection results, namely respectively carrying out non-maximum inhibition on the M depth video sequences in a spatial domain according to the falling probability output by the detection network in a spatial falling occurrence position, carrying out non-maximum inhibition on the detection results of the M videos in a time domain according to the falling probability output by the detection network in a time domain, combining the processed spatial positions and time windows, judging whether a falling behavior occurs if the target falling probability and the length of the combined falling time window are greater than a threshold value set by an experiment, otherwise, not judging whether the falling behavior occurs, and obtaining the falling detection result at the current moment.
(8) Recording the detection result, recording the target falling probability of falling, the position in the image, the falling occurrence time and the falling duration, and performing early warning. And outputting the judgment probability, the spatial position and the occurrence time of the falling behavior according to the detected falling probability and the time window length when the time window length is larger than the threshold value, and archiving the depth image in the occurrence time window so as to be convenient for analyzing the reason of the falling behavior in the follow-up process.
(9) And a behavior detection network training part. The feature extraction network and the detection network are obtained by labeling the existing fall detection data set to make a dynamic graph training sample, then pre-training the dynamic graph by using ImagenNet million graphs, and finally performing end-to-end multi-batch training on behavior detection.
In the embodiment of the invention, the SDUFall tumble identification data set of Shandong university and the NTU RGBD data set of the Nanyang university of Singapore are used, the tumble time domain labeling and the spatial labeling are respectively carried out on the identification data set, and the original data set is expanded.
And after the detection result of the current moment is recorded, reading in the next frame of image.
In the training process, the VOC 2007-VOC 2012 competition data sets are used, so that the output result is 21 types, wherein the former 20 types are 20 types of the VOC data sets, the transfer learning technology is used in the training process, the initialized parameters of the trunk network (more than 24 layers of serial numbers) in the drawing are obtained after pre-training on the ImageNet million-level image classification data sets, and then the final parameters of the convolution kernel are obtained by training through the NTU RGBD data set and the SDU Fall data set labeled by the invention, so that the robustness and the stability of the method are greatly improved.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A time-space domain detection method for falling behavior based on a depth image is characterized by comprising the following steps:
(1) acquiring depth image data of an indoor scene, and intercepting M sections of depth image sequences with different lengths from the depth image data, wherein M is an integer;
(2) extracting the normal vector features of each depth image sequence, wherein the normal vector features of W H3N are extracted from each depth image sequence, W is the width of a depth image, H is the height of the depth image, and N is the frame number of the corresponding depth image sequence;
(3) converting each obtained depth image sequence into a W x H x N gray-scale image, and performing feature fusion on the gray-scale image of each depth image sequence and the normal vector features of the corresponding depth image sequence to obtain a corresponding W x H x 3 x N tensor of the corresponding depth image sequence; the method specifically comprises the following steps:
(3.1) for each depth image sequence, converting each frame of depth image in the depth image sequence into a gray-scale map of W x H;
(3.2) calculating the first dimension of the normal vector of each frame of depth image in the depth image sequence pixel by pixel
Figure FDA0002659966990000011
And a second dimension
Figure FDA0002659966990000012
Obtaining two matrixes with the size of W x H, (x, y) represents the coordinates of pixel points, and d (x, y) is the pixel value of the depth image;
(3.3) merging the gray-scale map matrix W × H and the normal vector front two-dimensional matrix W × H × 2 into a tensor of W × H × 3, so as to encode each frame depth image W × H into a feature tensor of W × H × 3, and further obtain a tensor of W × H3 × N corresponding to the section of depth image sequence;
(4) carrying out dynamic graph coding on tensor features W x H x 3 x N corresponding to each section of depth image sequence, wherein each section of depth image sequence obtains a tensor of W x H x 3; the method specifically comprises the following steps:
(4.1) for each depth image sequence, recording the feature tensor sequence of the N frames of depth images in the depth image sequence as X ═ X [ < X >1,x2,...,xt,...,xN]Wherein x istT is 1,2,3, …, N is the tensor of W × H × 3 encoded by the t-th frame depth image;
(4.2) designing a mapping function
Figure FDA0002659966990000021
So that the depth image x for the t-th frametIs inputted
Figure FDA0002659966990000022
Is a feature tensor that maps the past corresponding t-th frame depth image, where the mapping function is used to convert the original depth image data type to the range [0,255%]And vectorizing the matrix;
(4.3) obtaining the score S (v; u) ═ u of the t-th frame depth image according to the average characteristic of the t-th frame depth image and the sorting functiont·vtWherein, in the step (A),
Figure FDA0002659966990000023
representing the average feature of the depth image of the t-th frame, utRepresenting a transpose of a vector resulting from optimizing a sorting function, wherein the sorting function is configured to make frame images further behind the time series have a larger score;
(4.4) optimizing the parameter u in the sorting function through a rank SVM, so that the frame images which are positioned behind the time sequence and between different frames in the depth image sequence have larger scores, and converting the obtained optimal value of the parameter u into a tensor of W.H.3, wherein the tensor is used as the tensor of W.H.3 encoded by the tensor of W.H.3.N corresponding to the depth image sequence;
(4.5) use
Figure FDA0002659966990000024
As an approximation of the parameter u, wherein,
Figure FDA0002659966990000025
is the vectorized vector of W x H3 characteristic tensor coded by the i frame image obtained in the step (4.4), alphai2(N-i +1), N denotes a corresponding depth image sequence length;
(5) taking the tensor of W x H x 3 corresponding to each section of depth image sequence as the input of a target detection network to obtain the occurrence probability of the falling behavior and the spatial position information of the falling behavior corresponding to each section of depth image sequence;
(6) the method comprises the steps of carrying out spatial domain non-maximum inhibition on spatial position information of falling behavior corresponding to each depth image sequence according to falling probability obtained by a target detection network, carrying out time domain non-maximum inhibition processing on detection results of M depth image sequences in a time domain according to falling probability obtained by the target detection network, combining the spatial positions and time windows corresponding to the M depth image sequences, determining falling behavior occurrence if the target falling probability is greater than a first preset value and the length of the combined time window is greater than a second preset value, recording the falling probability of the falling behavior, combining the positions in images, the falling occurrence moment and the falling duration time, carrying out early warning, and otherwise, not causing the falling behavior.
2. The method of claim 1, wherein step (1) comprises:
(1.1) acquiring N frames of depth images, carrying out feature extraction coding detection from the Nth frame of depth image, and discarding the depth image with the earliest time in the current depth image sequence before processing the next frame of depth image so that the length of the depth image sequence for fall detection is N, wherein N represents the length of the depth image sequence;
and (1.2) taking M depth image sequences with different lengths, wherein the lengths N of the different depth image sequences are different, and the corresponding N value of each depth image sequence is fixed when each depth image sequence is processed.
3. The method of claim 1 or 2, wherein step (2) comprises:
(2.1) for each depth image sequence, extracting a normal vector S of each frame of depth image in the depth image sequencen=Sxn×SynWherein S isnA normal vector representing the depth image of the nth frame,
Figure FDA0002659966990000031
Figure FDA0002659966990000032
respectively are normal vectors in the x and y directions, and the depth image of the nth frame is pn=(xn,yn,dn(xn,yn)),(xn,yn) Representing the coordinates of the pixel points, dn(xn,yn) As (x) in the depth imagen,yn) A corresponding pixel value, N ═ 1,2, 3.., N;
and (2.2) fusing the normal vectors of each frame of depth image in the depth image sequence to obtain the normal vector characteristics of W x H x 3 x N of the depth image sequence.
4. The method according to claim 1, wherein the feature tensors of M W × H3 encoded by the M segments of depth video sequence in step (5) are detected and the detection result is output, wherein the detection network includes YOLOv1, YOLOv2, YOLOv3, Fast R-CNN, fastr-CNN, MobileNets V1, MobileNets V2 or ShuffleNet; and when the target detection network uses YOLOv2, the input of the target detection network is 413 x 3, W x H3 is changed into 413 x 3 by adopting image size, and the target detection network outputs the falling probability of the target to be detected, the horizontal and vertical coordinate position of the target to be detected in the image and the width and the height of the target to be detected.
5. The method of claim 4, further comprising:
after time-space domain labeling is carried out on a preset fall detection data set, converting a depth image into a gray image and normal vector features of the depth image for feature fusion, and carrying out dynamic image coding on tensor features obtained by feature fusion to manufacture a dynamic image training sample;
pre-training a convolutional neural network by using ImageNet million images, and finally performing end-to-end multi-batch training on dynamic image training samples aiming at behavior detection to obtain a target detection network, wherein the output of the target detection network comprises: the falling probability of the target to be detected, the position of the target to be detected in the image, the width of the target to be detected and the height of the target to be detected.
CN201910032206.5A 2019-01-14 2019-01-14 Fall-down behavior time-space domain detection method based on depth image Active CN109886102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910032206.5A CN109886102B (en) 2019-01-14 2019-01-14 Fall-down behavior time-space domain detection method based on depth image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910032206.5A CN109886102B (en) 2019-01-14 2019-01-14 Fall-down behavior time-space domain detection method based on depth image

Publications (2)

Publication Number Publication Date
CN109886102A CN109886102A (en) 2019-06-14
CN109886102B true CN109886102B (en) 2020-11-17

Family

ID=66925930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910032206.5A Active CN109886102B (en) 2019-01-14 2019-01-14 Fall-down behavior time-space domain detection method based on depth image

Country Status (1)

Country Link
CN (1) CN109886102B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598606B (en) * 2019-09-02 2022-05-27 南京邮电大学 Indoor falling behavior detection method with visual privacy protection advantage
CN110765860B (en) * 2019-09-16 2023-06-23 平安科技(深圳)有限公司 Tumble judging method, tumble judging device, computer equipment and storage medium
CN111310647A (en) * 2020-02-12 2020-06-19 北京云住养科技有限公司 Generation method and device for automatic identification falling model
CN113077426B (en) * 2021-03-23 2022-08-23 成都国铁电气设备有限公司 Method for detecting defects of clamp plate bolt on line in real time

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318248A (en) * 2014-10-21 2015-01-28 北京智谷睿拓技术服务有限公司 Action recognition method and action recognition device
CN107480729A (en) * 2017-09-05 2017-12-15 江苏电力信息技术有限公司 A kind of transmission line forest fire detection method based on depth space-time characteristic of field
CN107506706A (en) * 2017-08-14 2017-12-22 南京邮电大学 A kind of tumble detection method for human body based on three-dimensional camera
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9597016B2 (en) * 2012-04-27 2017-03-21 The Curators Of The University Of Missouri Activity analysis, fall detection and risk assessment systems and methods
CN104361321B (en) * 2014-11-13 2018-02-09 侯振杰 A kind of method for judging the elderly and falling down behavior and balance ability
US20160349918A1 (en) * 2015-05-29 2016-12-01 Intel Corporation Calibration for touch detection on projected display surfaces
CN105279483B (en) * 2015-09-28 2018-08-21 华中科技大学 A kind of tumble behavior real-time detection method based on depth image
CN105868707B (en) * 2016-03-28 2019-03-08 华中科技大学 A kind of falling from bed behavior real-time detection method based on deep image information
CN107016350A (en) * 2017-04-26 2017-08-04 中科唯实科技(北京)有限公司 A kind of Falls Among Old People detection method based on depth camera
CN107220604A (en) * 2017-05-18 2017-09-29 清华大学深圳研究生院 A kind of fall detection method based on video
CN107657244B (en) * 2017-10-13 2020-12-01 河海大学 Human body falling behavior detection system based on multiple cameras and detection method thereof
CN108038420B (en) * 2017-11-21 2020-10-30 华中科技大学 Human behavior recognition method based on depth video
CN107944459A (en) * 2017-12-09 2018-04-20 天津大学 A kind of RGB D object identification methods
CN108229421B (en) * 2018-01-24 2021-07-02 华中科技大学 Depth video information-based method for detecting falling-off from bed in real time
CN108737785B (en) * 2018-05-21 2020-07-03 北京奇伦天佑创业投资有限公司 Indoor automatic detection system that tumbles based on TOF 3D camera

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318248A (en) * 2014-10-21 2015-01-28 北京智谷睿拓技术服务有限公司 Action recognition method and action recognition device
CN107506706A (en) * 2017-08-14 2017-12-22 南京邮电大学 A kind of tumble detection method for human body based on three-dimensional camera
CN107480729A (en) * 2017-09-05 2017-12-15 江苏电力信息技术有限公司 A kind of transmission line forest fire detection method based on depth space-time characteristic of field
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study

Also Published As

Publication number Publication date
CN109886102A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109886102B (en) Fall-down behavior time-space domain detection method based on depth image
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN111209810A (en) Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
CN111191667B (en) Crowd counting method based on multiscale generation countermeasure network
CN108960047B (en) Face duplication removing method in video monitoring based on depth secondary tree
CN110717389B (en) Driver fatigue detection method based on generation countermeasure and long-short term memory network
CN112580523A (en) Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN110097028B (en) Crowd abnormal event detection method based on three-dimensional pyramid image generation network
CN108960076B (en) Ear recognition and tracking method based on convolutional neural network
CN107767416B (en) Method for identifying pedestrian orientation in low-resolution image
CN112991269A (en) Identification and classification method for lung CT image
CN110598606B (en) Indoor falling behavior detection method with visual privacy protection advantage
CN106709419B (en) Video human behavior recognition method based on significant trajectory spatial information
CN111666852A (en) Micro-expression double-flow network identification method based on convolutional neural network
CN111738054A (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN110084201A (en) A kind of human motion recognition method of convolutional neural networks based on specific objective tracking under monitoring scene
CN115527269B (en) Intelligent human body posture image recognition method and system
CN106056078A (en) Crowd density estimation method based on multi-feature regression ensemble learning
Huo et al. 3DVSD: An end-to-end 3D convolutional object detection network for video smoke detection
CN113688761A (en) Pedestrian behavior category detection method based on image sequence
CN113221812A (en) Training method of face key point detection model and face key point detection method
CN112633179A (en) Farmer market aisle object occupying channel detection method based on video analysis
CN112488213A (en) Fire picture classification method based on multi-scale feature learning network
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant