CN112541403B - Indoor personnel falling detection method by utilizing infrared camera - Google Patents

Indoor personnel falling detection method by utilizing infrared camera Download PDF

Info

Publication number
CN112541403B
CN112541403B CN202011313710.1A CN202011313710A CN112541403B CN 112541403 B CN112541403 B CN 112541403B CN 202011313710 A CN202011313710 A CN 202011313710A CN 112541403 B CN112541403 B CN 112541403B
Authority
CN
China
Prior art keywords
image
optical flow
infrared
head
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011313710.1A
Other languages
Chinese (zh)
Other versions
CN112541403A (en
Inventor
葛敏婕
刘志坚
张宇
赵子涵
郭皓捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Key System and Integrated Circuit Co Ltd
Original Assignee
China Key System and Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Key System and Integrated Circuit Co Ltd filed Critical China Key System and Integrated Circuit Co Ltd
Priority to CN202011313710.1A priority Critical patent/CN112541403B/en
Publication of CN112541403A publication Critical patent/CN112541403A/en
Application granted granted Critical
Publication of CN112541403B publication Critical patent/CN112541403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an indoor personnel falling detection method by utilizing an infrared camera, and belongs to the technical field of falling prevention. Reading a test infrared video segment; extracting an infrared video frame based on a video extraction algorithm; inputting a time domain behavior prediction model into the optical flow video frame sequence to obtain a time domain prediction result; based on the LRCN model, the final behavior prediction result is obtained. The infrared cameras can be arranged in places such as a toilet, a hospital and a nursing home, which need to pay attention to whether personnel fall down, have wide application scenes, and the privacy of the personnel can be effectively protected by utilizing the infrared cameras according to the thermal imaging principle; once the falling of a person is detected, the real-time alarm can be given, and the position of the falling person is positioned according to the equipment ID number of the camera.

Description

Indoor personnel falling detection method by utilizing infrared camera
Technical Field
The invention relates to the technical field of fall prevention, in particular to an indoor personnel fall detection method by utilizing an infrared camera.
Background
Falls are common injury events for human beings, and it is particularly important to intelligently identify safety anomalies of indoor active personnel. Considering various indoor scenes, especially the camera installed in the sensitive area should fully ensure the privacy safety of personnel.
There are two basic approaches to fall detection for people: firstly, based on wearable equipment, falling detection is carried out through a sensor; based on computer vision, video information is acquired by using a camera, and the fall detection is performed after the video information is processed by using an image processing technology. However, the traditional detection method has long recognition time and low accuracy.
Disclosure of Invention
The invention aims to provide an indoor personnel falling detection method by utilizing an infrared camera, so as to solve the problems that the traditional detection method is long in identification time, low in accuracy and incapable of protecting personal privacy.
In order to solve the technical problems, the invention provides an indoor personnel falling detection method by utilizing an infrared camera, which comprises the following steps:
step 1, collecting an infrared video data set of a basic human body gesture, dividing data training into a positive sample set and a negative sample set according to whether falling behaviors occur or not, and marking corresponding class labels;
step 2, extracting an infrared thermal imaging image frame sequence in the video from the sample set one by one, and carrying out image preprocessing;
step 3, constructing an optical flow frame generation network of the infrared video data, and training by adopting a marked data set to generate the optical flow frame generation network;
step 4, inputting the infrared thermal imaging image frame sequence obtained in the step 2 into a trained optical flow frame generation network to generate an optical flow frame sequence representing human body posture information in video data;
step 5, constructing a long-time recursive convolutional neural network by taking an optical flow frame sequence as input, and training the network by utilizing the data set obtained in the step 1;
step 6, cascading the optical flow frame generation network in the step 3 and the long-time recursive convolutional neural network in the step 5 to obtain a human body posture prediction model;
and 7, acquiring a human body posture video to be identified, processing the human body posture video in the step 2 to obtain an infrared thermal imaging image frame sequence, inputting the infrared thermal imaging image frame sequence into a human body posture prediction model, and identifying whether personnel fall down in the infrared video.
Optionally, the image preprocessing in step 2 includes:
coarsely positioning the head, and searching the approximate position of the head, namely, finding pixels of a head area;
precisely positioning the head, and searching a center point of a head area; and, positioning the human trunk.
Optionally, the coarse positioning of the head includes the following steps:
according to common knowledge, the temperature of the head of the human body is higher, and the brightness of the head area reflected in the infrared image is higher, so that the rough position of the head is locked;
firstly, uniformly scaling an infrared image into omega×h, and standardizing, wherein omega and h are the width and the height of the standardized image respectively;
binarization is carried out, pixels with true values after binarization are taken to form a head candidate region, and the head of a person is positioned in the middle upper position of the image; thus, using spatial position constraints, the head coarse positioning can be modeled as an optimization problem:
wherein Ω denotes a set of pixels of the binarized image, f B (x, y) represents the value of the pixel with coordinates (x, y) in the binarized image, and the initial position of the human headSet to (ω/2,h/6),>indicating the initial positionThe distance from (x, y), defined as follows:
wherein λ is the weight of the lateral-longitudinal distance cost;when a=1 is given to the person,representation->Euclidean distance from (x, y); considering the middle position of the human head in the transverse direction of the image, the cost of the transverse distance is higher than the longitudinal distance, wherein lambda takes a value of 1.5;
the optimization problem of the coarse positioning of the human head is quickly solved by traversing the binarized image according to the increment of the distance and searching for the pixel point with the true value.
Optionally, the precise positioning of the head includes the following steps:
results of coarse positioning of headAs an input, searching for a position of a pixel having the highest brightness as a head center position; head accurate positioning can be modeled as the following optimization problem:
wherein f s (x, y) represents the value of the pixel of the normalized infrared image with coordinates (x, y), Ω represents the set of pixels of the normalized infrared image;
the process of locating the center of the human head in the given local area with rough location as the center is a process of single peak search, and the precise location of the human head is realized by using a hill climbing method.
Optionally, the human torso positioning includes the steps of:
firstly, dividing candidate areas of a trunk according to a head positioning result, and vertically projecting each pixel in the candidate areas to obtain an accumulated histogram of the candidate areas; taking the abscissa corresponding to the maximum value as the horizontal offset of the trunk, and recording as x 0
Thus, the center position of the human trunk is: (x) b ,y b )=(x h +x 0 ,y h +h 2 /2);
Wherein h is 2 The height of the trunk of the human body is set to be 3h/8; from a priori study of the body image model, the estimated torso center position should have the following constraints: the included angle between the connecting line of the center position of the head and the center position of the trunk and the vertical line of the image is not more than 12 degrees; if this constraint is not satisfied, the torso position will be recalculated.
Optionally, the optical flow frame generation network includes: the optical flow is the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and is a method for finding out the corresponding relation between the previous frame and the current frame by utilizing the change of the pixels in an image sequence on a time domain and the correlation between adjacent frames, so as to calculate the motion information of the object between the adjacent frames;
the optical flow image frame extraction network adopts a convolutional neural network-based network model FlowNet2.0, the network architecture of the model adopts a stacking mode, the model is composed of base networks FlowNetC (FlowNetCorrelation) and FlowNetS (FlowNetSample), and the effective processing of large displacement and small displacement in the video segment is realized through the re-design of the network architecture;
FlowNetS (FlowNetSimple), two images are directly overlapped according to the channel dimension and then input; flowNetC (FlowNetCorr) in order to improve the matching performance of the network, a cross-correlation layer is designed by artificially simulating the standard matching process, namely, features are extracted firstly, and then the correlation of the features is calculated; the correlation is calculated by actually convolving the features of the two images in the spatial dimension.
Optionally, in the step 4, the process of generating the optical flow frame sequence is as follows:
(1) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetC network in an optical flow frame generation network to generate an optical flow sub-graph I;
(2) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map II;
(3) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map III;
(4) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetSD network in an optical flow frame generation network to generate an optical flow sub-graph IV;
(5) The third optical flow sub-image, the fourth optical flow sub-image and the brightness error image are input into a convolutional neural network together to generate an optical flow frame sequence.
Optionally, the step 5 includes:
(1) Acquiring a human behavior video segment by using an infrared detector;
(2) An image frame extraction algorithm is compiled, and an image frame sequence in the infrared video is extracted;
(3) Taking the infrared image frame sequence as the input of a HW-FlowNet optical flow extraction network;
(4) Reading a first frame of video frame, setting the current read frame number as n, and setting an initial value as 1;
(5) Determining whether the current frame number n+30 is greater than the total frame number N of the input infrared video image frame;
(6) If yes, ending the optical flow frame prediction;
(7) If not, continuing to read the n+30th frame of video frame;
(8) Substituting the read n frames and n+30 frames into HW-FlowNetCSS and HW-FlowNetSD to obtain two predicted optical flow frames; then, an image fusion method is adopted to generate a final predicted optical flow frame;
(9) Performing an n=n+1 operation;
(10) Jump to (5) and continue the judgment.
Optionally, the process of constructing the long-term recurrent convolutional neural network is as follows:
(1) Inputting each frame of the optical flow frame sequence into a convolutional neural network taking a residual network as a basic network, and extracting a characteristic vector so as to obtain a characteristic vector sequence;
(2) Inputting the feature vector sequence into a long-period and short-period memory network, and taking the output of the long-period and short-period memory network as the input of a subsequent full-connection layer;
(3) Based on the characteristic vector sequence, an SVM support vector machine classifier is adopted to predict whether the corresponding person falls down in each frame of image.
Optionally, in the step 7, an application process of the human body posture prediction model is as follows:
(1) Setting the video segment to be identified as vd= [ I1, I2, … …, IN]Each frame In (N is more than or equal to 1 and less than or equal to N) of VD passes through the human body posture prediction model, and the prediction probability of falling and non-falling of detected personnel is respectively obtainedThe prediction probability matrix of the VD under the time domain model can be obtained:
(2) For two behavior types of whether a fall occurs, the average prediction probability is:
(3) Thereby obtaining a predictive probability vector of the video segment VD in the time domain:
(4) Finally, the orientation quantity p a The behavior type corresponding to the maximum value of (2) is used as the behavior type identification result of the video segment.
Optionally, the sources of the video data set collection include an imaging device of a monitoring system, a video website and a human infrared video public video library.
In the method for detecting the falling of the indoor personnel by utilizing the infrared camera, provided by the invention, the infrared video segment is read and tested; extracting an infrared video frame based on a video extraction algorithm; inputting an optical flow video frame sequence into a behavior prediction model to obtain a human behavior prediction result; based on the LRCN model, the final behavior prediction result is obtained.
The invention has the following beneficial effects:
(1) The infrared cameras can be arranged in places such as a toilet, a hospital and a nursing home, which need to pay attention to whether personnel fall down, have wide application scenes, and the privacy of the personnel can be effectively protected by utilizing the infrared cameras according to the thermal imaging principle;
(2) The infrared image frames are preprocessed by the aid of the bipartite graph generation algorithm, environmental interference can be effectively filtered, and difficulty is reduced for a subsequent recognition algorithm;
(3) The optical flow method is adopted to process the image, so that the gesture of the human body can be effectively identified, and particularly, the optical flow method has a good detection effect on a large gesture amplitude such as falling;
(4) Once the falling of a person is detected, the real-time alarm can be given, and the position of the falling person is positioned according to the equipment ID number of the camera.
Drawings
Fig. 1 is a schematic flow chart of an indoor personnel fall detection method using an infrared camera;
FIG. 2 is a schematic diagram of an infrared target enhancement method based on full-automatic matting;
FIG. 3 is a schematic diagram of body differentiation;
fig. 4 is a schematic diagram of LRCN model.
Detailed Description
The invention provides an indoor personnel falling detection method by utilizing an infrared camera, which is further described in detail below with reference to the accompanying drawings and the specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.
Example 1
The invention provides an indoor personnel falling detection method by utilizing an infrared camera, which is shown in a flow chart in fig. 1 and comprises the following steps:
step S11: and collecting an infrared video data set of the basic posture of the human body from an imaging device of the monitoring system, or a video website or a human body infrared video public video library. Dividing training data into a positive sample set and a negative sample set according to whether falling behaviors occur in the video clips, and marking corresponding class labels;
step S12: extracting an infrared thermal imaging image frame sequence in the video from the sample set one by one and carrying out image preprocessing;
infrared images have the unique advantage of being unaffected by illumination factors and protecting personnel privacy over visible light images, but suffer from low resolution and lack of color information, resulting in very limited information available. Considering that the natural image matting technology provides a tool for precisely separating the foreground from the cluttered background, the images are preprocessed by using an infrared target enhancement algorithm based on automatic matting enhancement as shown in fig. 2, so that the detection accuracy is improved.
The image preprocessing process realizes the enhancement of a foreground object and the suppression of irrelevant backgrounds, can 'focus on' the characteristics of a foreground object (human body) of interest, and the generated trimap image provides the head and trunk partial areas of the human body, partial background areas and the rest is an unknown area. The automatic three-dimensional map generation algorithm firstly positions the head of a human body, and then positions the trunk part by utilizing the constraint between the head and the trunk.
Three-dimensional graph automatic generation algorithm facing infrared targets:
(1) Coarsely positioning the head: the goal is to find the approximate location of the human head, i.e. to find the pixels of the head area. According to common knowledge, the temperature of the head of the human body is higher, and the brightness of the head area reflected in the infrared image is higher, so that the rough position of the head is locked; the infrared image is uniformly scaled to omega×h, standardization is carried out, and omega and h are the width and the height of the standardized image respectively; binarization is carried out, pixels with true values after binarization are taken to form a head candidate region, and the head of a person is usually positioned in the middle of the image and is positioned at the upper position; from this, using spatial position constraints, the coarse head positioning can be modeled as an optimization problem:
wherein Ω denotes a set of pixels of the binarized image, f B (x, y) represents the value of the pixel with coordinates (x, y) in the binarized image, and the initial position of the human headSet to (ω/2,h/6),>indicating the initial positionThe distance from (x, y), defined as follows:
where λ is the weight of the lateral-longitudinal distance cost. When a=1 is given to the person,representation->Euclidean distance from (x, y). Considering that the human head is usually at the middle position of the image in the transverse direction, the transverse distance is higher than the longitudinal distance, and the lambda value is 1.5. The optimization problem of the coarse positioning of the human head can be quickly solved by traversing the binarized image according to the increment of the distance and searching for the pixel point with the true value.
(2) Head accurate positioning: find the center point of the human head area. Since the body temperature of the human body is in most cases higher than that of the background object, the head of the human body radiates more than the background areaThe far infrared image shows that the head area of the human body is brighter than the background area. In addition, more energy may be incident into the sensor at the center region of the head during far infrared imaging, so that the center region of the head is brighter than the edge regions of the head, and thus the center point of the head is a local maximum. Head accurate positioning results of head coarse positioningAs an input, the center position of the head of the human body is searched in the partial area, and the position of the pixel with the highest brightness is finally used as the head center position. Head accurate positioning can be modeled as the following optimization problem:
wherein f s (x, y) represents the value of the pixel of the normalized infrared image with coordinates (x, y), Ω represents the set of pixels of the normalized infrared image,representing coordinates +.>Euclidean distance, h, from coordinates (x, y) 1 Is the height of the head area of the human body and has the value of h/16. The process of locating the center of the human head in a given coarse-centered local area can be considered as a single-peak search process, and the mountain climbing method is utilized to realize the accurate positioning of the human head.
(3) Positioning human body trunk: as shown in FIG. 3, the trunk candidate regions are divided according to the head positioning result, the vertical projection of each pixel in all candidate regions is performed to obtain a cumulative histogram, and the maximum value in the cumulative histogram is marked as x 0 As a horizontal offset of the torso.
The center position of the human trunk is: (x) b ,y b )=(x h +x 0 ,y h +h 2 /2)
Wherein h is 2 The height of the human trunk is represented, and the value is 3h/8; x is x h Is the transverse coordinate of the center point of the human head, y h Is the longitudinal coordinate of the center point of the human head. From a priori study of the body image model, the estimated torso center position should have the following constraints: the included angle between the connecting line of the center position of the head and the center position of the trunk and the vertical line of the image is not more than 12 degrees. If the constraint is not satisfied, the torso position is recalculated.
The trimap image is thus generated as follows: the three-dimensional image template of the human head and the trunk is preset and is set to be a single-channel image with the same size and zero value of the standardized image. The anchor points of the head and torso of the template are aligned with the estimated center positions of the head and torso of the human body, respectively. Thus, a three-dimensional image oriented to the infrared target is obtained.
Step S13: constructing an optical flow frame generation network of infrared video data, and training by adopting a marked data set to generate the optical flow frame generation network;
the optical flow frame generation network includes: the optical flow is the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and is a method for finding the corresponding relation between the previous frame and the current frame by utilizing the change of the pixels in an image sequence on a time domain and the correlation between the adjacent frames, so as to calculate the motion information of the object between the adjacent frames.
The optical flow image frame extraction network adopts a convolutional neural network-based network model FlowNet2.0, the network architecture of the model adopts a stacking mode, the model is composed of base networks FlowNetC (FlowNetCorrelation) and FlowNetS (FlowNetSample), and the effective processing of large displacement and small displacement in the video segment is realized through the re-design of the network architecture. FlowNetS (FlowNetSimple) is typically to directly superimpose two images in the channel dimension and then input the superimposed images. FlowNetC (FlowNetCorr) in order to improve the matching performance of the network, a cross-correlation layer is designed by artificially simulating the standard matching process, namely, features are extracted first, and then the correlation of the features is calculated. The calculation of the correlation can be regarded as actually a convolution operation of the features of the two images in the spatial dimension.
Step S14: inputting the infrared thermal imaging image frame sequence obtained in the step 2 into a trained optical flow frame generation network, so as to generate an optical flow frame sequence representing human body posture information in video data;
the process of generating a sequence of optical flow frames is as follows:
(1) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetC network in an optical flow frame generation network to generate an optical flow sub-graph I;
(2) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map II;
(3) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map III;
(4) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetSD network in an optical flow frame generation network to generate an optical flow sub-graph IV;
(5) The third optical flow sub-image, the fourth optical flow sub-image and the brightness error image are input into a convolutional neural network together to generate an optical flow frame sequence.
The method comprises the following steps of extracting an infrared video segment light-wave image frame based on a convolutional neural network and training the network:
(1) Acquiring a human behavior video segment by using an infrared detector;
(2) An image frame extraction algorithm is compiled, and an image frame sequence in the infrared video is extracted;
(3) Taking the infrared image frame sequence as the input of a HW-FlowNet optical flow extraction network;
(4) Reading a first frame of video frame, setting the current read frame number as n, and setting an initial value as 1;
(5) Determining whether the current frame number n+30 is greater than the total frame number N of the input infrared video image frame;
(6) If yes, ending the optical flow frame prediction;
(7) If not, continuing to read the n+30th frame of video frame;
(8) Substituting the read n frames and n+30 frames into HW-FlowNetCSS and HW-FlowNetSD to obtain two predicted optical flow frames; then, an image fusion method is adopted to generate a final predicted optical flow frame;
(9) Performing an n=n+1 operation;
(10) Jump to (5) and continue the judgment.
Step S15: constructing a long-time recursive convolutional neural network (LRCN) by taking the optical flow frame sequence obtained in the step 4 as input, and training the network by utilizing the data set obtained in the step S11;
the Long-term recurrent convolutional neural network (Long-term recurrent convolutional network, LRCN) was constructed as follows:
(1) Inputting each frame of the optical flow frame sequence into a convolutional neural network taking a residual network as a basic network, and extracting a characteristic vector so as to obtain a characteristic vector sequence;
(2) Inputting the feature vector sequence into a long-period and short-period memory network, and taking the output of the long-period and short-period memory network as the input of a subsequent full-connection layer;
(3) Based on the characteristic vector sequence, predicting whether a corresponding person falls down in each frame of image by adopting an SVM support vector machine classifier;
the LRCN (Long-term recurrent Convolutional Networks) model is shown in figure 4, training steps of the LRCN human behavior recognition network:
(1) Initializing parameters of the I3D network by adopting a random initialization method (the weight value is initialized to be normal distributed noise with the standard deviation of 0.1, and the offset value is initialized to be 0);
(2) Reading a subset of infrared human thermal imaging data;
(3) Pre-training the I3D based LRCN network on the subset;
(4) Reading an image frame of a training sample, setting iteration times i=1, an initial learning rate alpha=0.001, learning rate attenuation times k=1, wherein n is the total iteration times of model training, and n represents one iteration learning rate attenuation per n times;
(5) Judging whether the current iteration number i is smaller than or equal to the total iteration number N, if so, turning to (6), otherwise, ending the current training;
(6) Judging whether the current iteration number i is equal to the product of n iterations and the learning rate attenuation number k, if so, turning to (7), otherwise, turning to (8);
(7) After n iterations, the learning rate alpha is reduced to 10% of the original learning rate, and the attenuation times of the learning rate are increased by 1;
(8) Calculating a loss value, and updating a weight and bias;
(9) The iteration number i is increased by 1 and the process goes to (5).
Step S16: cascading the optical flow frame generation network in the step 3 and the long-time recursive convolutional neural network in the step 5 to obtain a human body posture prediction model;
step S17: and (3) acquiring a human body posture video to be identified, processing the human body posture video in the step (2) to obtain an infrared thermal imaging image frame sequence, inputting the infrared thermal imaging image frame sequence into a human body posture prediction model, and identifying whether falling behaviors exist in the infrared video.
In step 7, the application process of the human body posture prediction algorithm is as follows:
(1) Setting the video segment to be identified as vd= [ I1, I2, … …, IN]Each frame In (N is more than or equal to 1 and less than or equal to N) of VD passes through a time domain model to obtain the prediction probability that the detected person falls and does not fall respectivelyThe prediction probability matrix of the VD under the time domain model can be obtained:
(2) For both behavior types of whether a fall occurs, the average predictive probability is
(3) Thereby obtaining the predictive probability vector p of the video segment VD under the time domain module a
Is the average predicted probability of a person falling, +.>Is the average predicted probability that the person has not fallen.
(4) Finally, the behavior type corresponding to the maximum value of the orientation quantity Pa is used as a behavior type identification result of the video segment.
The invention can be arranged in sensitive areas such as a toilet, a bedroom and the like by utilizing the infrared camera, the privacy safety of indoor personnel can be effectively ensured by the infrared image shot by the infrared camera, and the infrared camera still has a good detection and identification effect at night, ensures the safety guarantee of the behavior of personnel with inconvenient actions at night, and has higher practical application value. Compared with the traditional method of firstly extracting manual features (HOG, HOF, dense Trajectories and the like) and then classifying by using a classifier, the human body gesture recognition algorithm based on deep learning has the remarkable advantages of short recognition time, high recognition accuracy and the like.
The above description is only illustrative of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention, and any alterations and modifications made by those skilled in the art based on the above disclosure shall fall within the scope of the appended claims.

Claims (11)

1. An indoor personnel fall detection method using an infrared camera is characterized by comprising the following steps:
step 1, collecting an infrared video data set of a basic human body gesture, dividing data training into a positive sample set and a negative sample set according to whether falling behaviors occur or not, and marking corresponding class labels;
step 2, extracting an infrared thermal imaging image frame sequence in the video from the sample set one by one, and carrying out image preprocessing;
step 3, constructing an optical flow frame generation network of the infrared video data, and training by adopting a marked data set to generate the optical flow frame generation network;
step 4, inputting the infrared thermal imaging image frame sequence obtained in the step 2 into a trained optical flow frame generation network to generate an optical flow frame sequence representing human body posture information in video data;
step 5, constructing a long-time recursive convolutional neural network by taking an optical flow frame sequence as input, and training the network by utilizing the data set obtained in the step 1;
step 6, cascading the optical flow frame generation network in the step 3 and the long-time recursive convolutional neural network in the step 5 to obtain a human body posture prediction model;
and 7, acquiring a human body posture video to be identified, processing the human body posture video in the step 2 to obtain an infrared thermal imaging image frame sequence, inputting the infrared thermal imaging image frame sequence into a human body posture prediction model, and identifying whether personnel fall down in the infrared video.
2. An indoor personal fall detection method using an infrared camera as set forth in claim 1, wherein the image preprocessing in step 2 includes:
coarsely positioning the head, and searching the approximate position of the head, namely, finding pixels of a head area;
precisely positioning the head, and searching a center point of a head area; and, positioning the human trunk.
3. An indoor personal fall detection method using an infrared camera as set forth in claim 2, wherein the head rough positioning includes the steps of:
according to common knowledge, the temperature of the head of the human body is higher, and the brightness of the head area reflected in the infrared image is higher, so that the rough position of the head is locked;
firstly, uniformly scaling an infrared image into omega×h, and standardizing, wherein omega and h are the width and the height of the standardized image respectively;
binarization is carried out, pixels with true values after binarization are taken to form a head candidate region, and the head of a person is positioned in the middle upper position of the image; thus, using spatial position constraints, the head coarse positioning can be modeled as an optimization problem:
wherein, the liquid crystal display device comprises a liquid crystal display device, Ω a set of pixels representing a binarized image, f B (x, y) represents the value of the pixel with coordinates (x, y) in the binarized image, and the initial position of the human headSet to (ω/2,h/6),>representing the initial position +.>The distance from (x, y), defined as follows:
wherein λ is the weight of the lateral-longitudinal distance cost; when a=1 is given to the person,representation->Euclidean distance from (x, y); considering the middle position of the human head in the transverse direction of the image, the cost of the transverse distance is higher than the longitudinal distance, wherein lambda takes a value of 1.5;
the optimization problem of the coarse positioning of the human head is quickly solved by traversing the binarized image according to the increment of the distance and searching for the pixel point with the true value.
4. An indoor personal fall detection method using an infrared camera as set forth in claim 2, wherein the head accurate positioning comprises the steps of:
results of coarse positioning of headAs an input, searching for a position of a pixel having the highest brightness as a head center position; head accurate positioning can be modeled as the following optimization problem:
wherein f s (x, y) represents the value of the pixel of the normalized infrared image with coordinates (x, y), Ω represents the set of pixels of the normalized infrared image;
the process of locating the center of the human head in the given local area with rough location as the center is a process of single peak search, and the precise location of the human head is realized by using a hill climbing method.
5. An indoor personal fall detection method using an infrared camera as set forth in claim 2, wherein the human torso positioning includes the steps of:
firstly, dividing candidate areas of a trunk according to a head positioning result, and vertically projecting each pixel in the candidate areas to obtain an accumulated histogram of the candidate areas; taking the abscissa corresponding to the maximum value as the horizontal offset of the trunk, and recording as x 0
Thus, the center position of the human trunk is: (x) b ,y b )=(x h +x 0 ,y h +h 2 /2);
Wherein h is 2 The height of the trunk of the human body is set to be 3h/8; from a priori study of the body image model, the estimated torso center position should have the following constraints: the included angle between the connecting line of the center position of the head and the center position of the trunk and the vertical line of the image is not more than 12 degrees; if this constraint is not satisfied, the torso position will be recalculated.
6. An indoor personal fall detection method with an infrared camera as claimed in claim 1, wherein the optical flow frame generation network comprises: the optical flow is the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane, and is a method for finding out the corresponding relation between the previous frame and the current frame by utilizing the change of the pixels in an image sequence on a time domain and the correlation between adjacent frames, so as to calculate the motion information of the object between the adjacent frames;
the optical flow image frame extraction network adopts a convolutional neural network-based network model FlowNet2.0, the network architecture of the model adopts a stacking mode, the model is composed of base networks FlowNetC (FlowNetCorrelation) and FlowNetS (FlowNetSample), and the effective processing of large displacement and small displacement in the video segment is realized through the re-design of the network architecture;
FlowNetS (FlowNetSimple), two images are directly overlapped according to the channel dimension and then input; flowNetC (FlowNetCorr) in order to improve the matching performance of the network, a cross-correlation layer is designed by artificially simulating the standard matching process, namely, features are extracted firstly, and then the correlation of the features is calculated; the correlation is calculated by actually convolving the features of the two images in the spatial dimension.
7. An indoor personal fall detection method with infrared camera as set forth in claim 6, wherein in the step 4, the process of generating the optical flow frame sequence is as follows:
(1) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetC network in an optical flow frame generation network to generate an optical flow sub-graph I;
(2) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map II;
(3) Inputting the bilinear interpolation map, the optical flow sub-map and the brightness error map of the first image, the second image and the second image into a FlowNet network of an optical flow frame generation network together to generate an optical flow sub-map III;
(4) Two frames before and after the infrared thermal imaging image sequence: the first image and the second image are input into a FlowNetSD network in an optical flow frame generation network to generate an optical flow sub-graph IV;
(5) The third optical flow sub-image, the fourth optical flow sub-image and the brightness error image are input into a convolutional neural network together to generate an optical flow frame sequence.
8. An indoor personal fall detection method using an infrared camera as set forth in claim 1, wherein the step 5 includes:
(1) Acquiring a human behavior video segment by using an infrared detector;
(2) An image frame extraction algorithm is compiled, and an image frame sequence in the infrared video is extracted;
(3) Taking the infrared image frame sequence as the input of a HW-FlowNet optical flow extraction network;
(4) Reading a first frame of video frame, setting the current read frame number as n, and setting an initial value as 1;
(5) Determining whether the current frame number n+30 is greater than the total frame number N of the input infrared video image frame;
(6) If yes, ending the optical flow frame prediction;
(7) If not, continuing to read the n+30th frame of video frame;
(8) Substituting the read nth frame and the read (n+30) th frame into HW-FlowNetCSS and HW-FlowNetSD to obtain two predicted optical flow frames; then, an image fusion method is adopted to generate a final predicted optical flow frame;
(9) Performing an n=n+1 operation;
(10) Jump to (5) and continue the judgment.
9. An indoor personal fall detection method using an infrared camera as set forth in claim 1, wherein the process of constructing the long-term recurrent convolutional neural network is as follows:
(1) Inputting each frame of the optical flow frame sequence into a convolutional neural network taking a residual network as a basic network, and extracting a characteristic vector so as to obtain a characteristic vector sequence;
(2) Inputting the feature vector sequence into a long-period and short-period memory network, and taking the output of the long-period and short-period memory network as the input of a subsequent full-connection layer;
(3) Based on the characteristic vector sequence, an SVM support vector machine classifier is adopted to predict whether the corresponding person falls down in each frame of image.
10. An indoor personal fall detection method using an infrared camera as set forth in claim 1, wherein in the step 7, the application process of the human body posture prediction model is as follows:
(1) Setting the video segment to be identified as vd= [ I1, I2, … …, IN]Each frame In (N is more than or equal to 1 and less than or equal to N) of VD passes through the human body posture prediction model, and the prediction probability of falling and non-falling of detected personnel is respectively obtainedThe prediction probability matrix of the VD under the time domain model can be obtained:
(2) For two behavior types of whether a fall occurs, the average prediction probability is:
(3) Thereby obtaining a predictive probability vector of the video segment VD in the time domain:
(4) Finally, the orientation quantity p a The behavior type corresponding to the maximum value of (2) is used as the behavior type identification result of the video segment.
11. An indoor personal fall detection method using an infrared camera as claimed in claim 1, wherein the sources of video dataset collection include imaging devices of a monitoring system, video websites and human infrared video public video libraries.
CN202011313710.1A 2020-11-20 2020-11-20 Indoor personnel falling detection method by utilizing infrared camera Active CN112541403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011313710.1A CN112541403B (en) 2020-11-20 2020-11-20 Indoor personnel falling detection method by utilizing infrared camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011313710.1A CN112541403B (en) 2020-11-20 2020-11-20 Indoor personnel falling detection method by utilizing infrared camera

Publications (2)

Publication Number Publication Date
CN112541403A CN112541403A (en) 2021-03-23
CN112541403B true CN112541403B (en) 2023-09-22

Family

ID=75015011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011313710.1A Active CN112541403B (en) 2020-11-20 2020-11-20 Indoor personnel falling detection method by utilizing infrared camera

Country Status (1)

Country Link
CN (1) CN112541403B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842397B (en) * 2022-05-19 2023-04-07 华南农业大学 Real-time old man falling detection method based on anomaly detection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206887A1 (en) * 2013-06-25 2014-12-31 Morpho Method for detecting a real face
CN107784291A (en) * 2017-11-03 2018-03-09 北京清瑞维航技术发展有限公司 target detection tracking method and device based on infrared video
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
WO2019237567A1 (en) * 2018-06-14 2019-12-19 江南大学 Convolutional neural network based tumble detection method
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206887A1 (en) * 2013-06-25 2014-12-31 Morpho Method for detecting a real face
CN107784291A (en) * 2017-11-03 2018-03-09 北京清瑞维航技术发展有限公司 target detection tracking method and device based on infrared video
WO2019237567A1 (en) * 2018-06-14 2019-12-19 江南大学 Convolutional neural network based tumble detection method
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video

Also Published As

Publication number Publication date
CN112541403A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
US10706285B2 (en) Automatic ship tracking method and system based on deep learning network and mean shift
CN111783576B (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
EP2345999A1 (en) Method for automatic detection and tracking of multiple objects
Benedek 3D people surveillance on range data sequences of a rotating Lidar
CN112488073A (en) Target detection method, system, device and storage medium
US8355576B2 (en) Method and system for crowd segmentation
CN108197604A (en) Fast face positioning and tracing method based on embedded device
Nyaruhuma et al. Verification of 2D building outlines using oblique airborne images
CN104517125B (en) The image method for real time tracking and system of high-speed object
KR101645959B1 (en) The Apparatus and Method for Tracking Objects Based on Multiple Overhead Cameras and a Site Map
Almagbile Estimation of crowd density from UAVs images based on corner detection procedures and clustering analysis
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
CN113743260B (en) Pedestrian tracking method under condition of dense pedestrian flow of subway platform
CN112634369A (en) Space and or graph model generation method and device, electronic equipment and storage medium
CN112613668A (en) Scenic spot dangerous area management and control method based on artificial intelligence
US20170053172A1 (en) Image processing apparatus, and image processing method
Lowphansirikul et al. 3D Semantic segmentation of large-scale point-clouds in urban areas using deep learning
Gündüz et al. A new YOLO-based method for social distancing from real-time videos
CN110636248B (en) Target tracking method and device
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
KR101032098B1 (en) Stand-alone environment traffic detecting system using thermal infra-red
Xie et al. A deep-learning-based fusion approach for global cyclone detection using multiple remote sensing data
CN112183287A (en) People counting method of mobile robot under complex background
CN116862832A (en) Three-dimensional live-action model-based operator positioning method
Lee et al. Vehicle counting based on a stereo vision depth maps for parking management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant