CN117238035A

CN117238035A - Fall detection method and system based on image recognition technology

Info

Publication number: CN117238035A
Application number: CN202311335804.2A
Authority: CN
Inventors: 肖德虎; 袁振涛; 宋梦; 戴权; 杨佳; 刘冉韬; 何腾鹏
Original assignee: Wuhan Changjiang Computing Technology Co ltd
Current assignee: Wuhan Changjiang Computing Technology Co ltd
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2023-12-15

Abstract

The invention relates to the technical field of image recognition, in particular to a method and a system for detecting falling based on an image recognition technology. Mainly comprises the following steps: defining key points used for judging the gesture on the character, and acquiring at least one characteristic of falling behaviors according to the position relation of the key points, wherein the key points comprise: head keypoints, torso keypoints, and foot keypoints; and enhancing the infrared video stream by using the radar imaging image, acquiring coordinates of key points on each person in the enhanced video stream, matching the characteristics of corresponding falling behaviors according to the coordinates of the key points on each person, and judging whether the corresponding person falls according to a matching result. The method can avoid the problem that the image to be detected cannot be acquired at night and under the shielding condition; different falling behaviors are distinguished by using 25 key points, so that the detection precision is improved; the falling detection is carried out according to the key points of each person, so that the problem that falling detection cannot be carried out on multiple persons is avoided.

Description

Fall detection method and system based on image recognition technology

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a system for detecting falling based on an image recognition technology.

Background

With the increase of population aging, the proportion of young people is reduced, and the old people cannot be found out in time and take necessary measures, so that the problem is becoming serious. Once the old people fall down, the old people can have serious consequences such as fracture, trauma and the like, and even life is endangered if the old people cannot be found and sent to the doctor in time. Therefore, it is important for the elderly to detect a fall in time and take corresponding measures. Conventional fall detection techniques are typically based on a sensor approach and a wearable device approach. However, these methods all require special hardware facilities, are easy to forget to charge or wear, and are difficult to popularize and apply.

At present, some methods for performing fall detection by using image recognition exist, but only a common camera is generally used for performing image acquisition, so that the problems that fall recognition cannot be performed at night and that part of human body cannot be recognized when the human body is blocked exist. In addition, the prior art has the problems of poor detection accuracy and the like because fewer prediction key points are used. On the other hand, in some existing detection techniques, different persons appearing in an image are not distinguished, and when a plurality of persons appear in the image, there is also a problem that each person cannot be distinguished.

In view of this, how to overcome the defects existing in the prior art, and solve the problems that the image cannot be acquired and the detection accuracy is low in some scenes in the prior art is a problem to be solved in the technical field.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention solves the problems that images cannot be acquired in certain scenes and the detection precision is low in the prior art.

The embodiment of the invention adopts the following technical scheme:

in a first aspect, the present invention provides a fall detection method based on an image recognition technology, specifically: defining key points used for judging the gesture on the character, and acquiring at least one characteristic of falling behaviors according to the position relation of the key points, wherein the key points comprise: head keypoints, torso keypoints, and foot keypoints; and enhancing the infrared video stream by using the radar imaging image, acquiring coordinates of key points on each person in the enhanced video stream, matching the characteristics of corresponding falling behaviors according to the coordinates of the key points on each person, and judging whether the corresponding person falls according to a matching result.

Preferably, the head keypoints comprise a nose keypoint and a neck keypoint, the torso keypoint comprises a mid-hip keypoint, and the characteristic of the fall behavior comprises: the ratio of the horizontal to vertical lengths of the vector between the neck keypoint and the mid-hip keypoint is greater than a first parameter, and the body aspect ratio is greater than a second parameter, and the ratio of the horizontal to vertical lengths of the vector between the at least one foot keypoint and the mid-hip keypoint is greater than a third parameter; and/or, the ratio of the horizontal and vertical lengths of the vector between the neck keypoint and the mid-hip keypoint is greater than the first parameter, and the mid-hip keypoint is below the at least one foot keypoint; and/or the nose keypoints are below at least one foot keypoint; and/or the neck keypoint is below the at least one foot keypoint.

Preferably, the key points further include: the head keypoints further comprise: one or more of a right eye keypoint, a left eye keypoint, a right ear keypoint, and a left ear keypoint; the torso key points further include: one or more of a right shoulder keypoint, a right elbow keypoint, a right wrist keypoint, a left shoulder keypoint, a left elbow keypoint, a left wrist keypoint, a right hip keypoint, a right knee keypoint, a left hip keypoint, and a left knee keypoint; the foot keypoints include: a right ankle keypoint, a left foot thumb keypoint, a left foot little finger keypoint, a left heel keypoint, a right foot thumb keypoint, a right foot little finger keypoint, and a right heel keypoint.

Preferably, the body aspect ratio specifically includes: and obtaining a first difference value of the maximum value of the abscissa and the minimum value of the abscissa of all the key points, obtaining a second difference value of the maximum value of the ordinate and the minimum value of the ordinate of all the key points, and taking the ratio of the first difference value to the second difference value as the aspect ratio of the body.

Preferably, the enhancing the infrared video stream using the radar imaging image specifically includes: and extracting pictures with a specified interval frame number from the infrared video stream, superposing the extracted pictures with the radar imaging image, and generating an enhanced infrared video stream based on the superposed pictures.

Preferably, the acquiring coordinates of key points on each person in the enhanced video stream specifically includes: and carrying out target tracking on each moving object appearing in the video stream, distinguishing each person in the video stream based on a target tracking result, and respectively acquiring coordinates of key points on each person in the video stream.

Preferably, the acquiring coordinates of key points on each person in the video stream respectively further includes: judging whether the continuous frame images with the designated frame number detect the same person or not, and acquiring the coordinates of each key point of the person in each frame in the continuous frame images after the continuous frame images detect the same person.

Preferably, the step of respectively matching the features of the corresponding falling behaviors according to the coordinates of the key points on each person, and judging whether the corresponding person falls according to the matching result specifically includes: and matching the coordinates of the key points on each person with the characteristics of each falling action, and judging that one or more persons fall when the position relation of the coordinates of the key points on the one or more persons are matched with the characteristics of one or more falling actions.

On the other hand, the invention provides a fall detection system based on an image recognition technology, which comprises the following specific steps: the system comprises an image acquisition device and an image processing device, and specifically comprises the following components: the image acquisition equipment user acquires an enhanced video stream to be detected; the image processing apparatus is configured to perform fall detection on the enhanced video stream according to the fall detection method based on the image recognition technique described in the first aspect.

Preferably, the image acquisition device comprises an infrared camera and an imaging radar, and specifically: the infrared camera user acquires an infrared video stream; the imaging radar user acquires a radar imaging image.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: the radar imaging image is used for enhancing the infrared video stream, so that the problem that an image to be detected cannot be acquired at night and under the shielding condition is avoided; the image detection is carried out by using 25 key points, different falling behaviors are distinguished according to different key point characteristics, and the falling detection is carried out according to the key points of each person respectively, so that the detection precision is improved, and the problem that falling detection cannot be carried out on multiple persons is avoided.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of a gesture recognition key point in the prior art;

fig. 2 is a flowchart of a fall detection method based on an image recognition technology according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of gesture recognition keypoints used in an embodiment of the present invention;

fig. 4 is a schematic diagram showing characteristics of a fall behavior used in the embodiment of the present invention;

fig. 5 is a schematic diagram showing characteristics of another fall behavior used in the embodiment of the present invention;

fig. 6 is a schematic diagram of the characteristics of another fall behavior used in an embodiment of the invention;

fig. 7 is a schematic diagram of another fall behavior feature used in an embodiment of the invention;

fig. 8 is a flowchart of another fall detection method based on the image recognition technology according to the embodiment of the present invention;

fig. 9 is a flowchart of another fall detection method based on the image recognition technology according to the embodiment of the present invention;

fig. 10 is a flowchart of another fall detection method based on the image recognition technology according to the embodiment of the present invention;

fig. 11 is a flowchart of another fall detection method based on the image recognition technology according to the embodiment of the present invention;

fig. 12 is a flowchart of another fall detection method based on image recognition technology according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a fall detection system based on an image recognition technology according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The present invention is an architecture of a specific functional system, so that in a specific embodiment, functional logic relationships of each structural module are mainly described, and specific software and hardware implementations are not limited.

In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other. The invention will be described in detail below with reference to the drawings and examples.

Through literature search of the prior art, there are some methods for fall detection using image recognition in the prior art:

(1) Based on the key point, the method for detecting falling down is disclosed in patent application number 202011026089.0. An image is acquired through a camera, then a human body in the image is framed by using a deep learning algorithm (Region Convolutional Neural Networks, abbreviated as RCNN) for target detection, if the aspect ratio of the human body is larger than a set threshold value, the human body is judged to fall, otherwise 18 key points of the human body shown in fig. 1 are extracted by using a target detection separation algorithm (Mask RCNN), if the key points are not complete, an antagonism network is generated, and then whether the human body falls is judged according to the angles and the distances of the key points of bones. The method can not detect human body at night when no light exists; moreover, when a plurality of people are in the field of view of the camera, each person cannot be distinguished; moreover, the key points of the human body, which are complemented by the generation countermeasure network, can be quite different from the actual key points, so that the detection is inaccurate.

(2) Human body fall detection method, device, electronic equipment and storage medium, patent application number 202010660416.1. The method comprises the steps of obtaining images through a camera, obtaining key points of the hip, the neck and the knee of a human body, taking the relative position relationship between the key points as key point characteristics of the human body, obtaining character shape characteristics, inputting the characteristics into a classifier respectively to obtain a falling detection result, and comprehensively analyzing the obtained detection result to obtain a final falling detection result. The method does not consider the situation of no light at night or the situation that part of human body is blocked, only three key points are considered as features, and the precision of taking the extreme learning machine as a classifier is low.

(3) Fall detection method, system, storage medium and device based on spatio-temporal information, patent application number 202210536743.5. Obtaining videos of objects to be detected, detecting object frames by using yolo (fully called you only look once) v3, obtaining 18 key points of each human body detection frame shown in fig. 1 by using a multi-stage posture estimation network (multi-stage pose estimation network, abbreviated as MSPN), dividing human skeleton key points of a current frame image and human skeleton key points of 29 continuous frames of images into a group by using a sliding window, inputting samples into a falling detection model, and obtaining falling detection results of the objects to be detected. The method does not consider the situation of no light at night or the situation that part of human body is blocked, only 18 key points are considered as features, and if different people appear in the video in sequence or a plurality of people appear in the video at the same time, the distinction cannot be performed.

Example 1:

in order to solve the above-mentioned problems in the prior art, the present embodiment provides a fall detection method based on an image recognition technology, which can extract images from a large number of laid image capturing devices or installed home cameras, and detect fall behaviors in combination with back-end algorithm analysis.

As shown in fig. 2, the fall detection method based on the image recognition technology provided by the embodiment of the invention specifically includes the following steps.

Step 101: and defining key points on the character for judging the gesture, and acquiring at least one characteristic of falling behaviors according to the position relation of the key points.

In order to improve the accuracy of fall detection, unlike 18 key points common in the prior art, in the method provided in this embodiment, 25 key points are defined according to the standard font of the character skeleton, and are marked in sequence, so as to be used as the reference for fall judgment later. As shown in fig. 3, the marked key points include: head keypoints, torso keypoints, and foot keypoints. Wherein the head keypoints comprise one or more of the following keypoints: a 0 nose key point, a 1 neck key point, a 15 right eye key point, a 16 left eye key point, a 17 right ear key point and a 18 left ear key point; torso keypoints include one or more of the following: 2 right shoulder key points, 3 right elbow key points, 4 right wrist key points, 5 left shoulder key points, 6 left elbow key points, 7 left wrist key points, 8 middle hip key points, 9 right hip key points, 10 right knee key points, 12 left hip key points and 13 left knee key points; foot keypoints include one or more of the following: 11 right ankle keypoint, 14 left ankle keypoint, 19 left foot thumb keypoint, 20 left foot little finger keypoint, 21 left heel keypoint, 22 right foot thumb keypoint, 23 right foot little finger keypoint, 24 right heel keypoint. In practical implementations, to improve the accuracy of the detection, it is preferable to use all 25 keypoints for calculation, or to use a set of keypoints that includes all foot keypoints for calculation. For simplicity of description, the above key points are abbreviated by the location names in the following without ambiguity, for example: the 0 nose key point is simply referred to as 0 nose or nose.

Because the shooting angles of the characters in the images to be detected are different, and the reasons for falling are different, the postures presented on the images after falling are also different; meanwhile, when a person sits down, squats down or lies down, the posture characteristics similar to those of falling down are also presented. In order to avoid the wrong falling judgment caused by different angles and postures, in this embodiment, the falling image features of different angles and postures also need to be classified, so that the falling behaviors under different angles and postures are defined through the corresponding features, and the detection calculation of the corresponding key points is performed during the subsequent falling detection.

Step 102: and enhancing the infrared video stream by using the radar imaging image, acquiring coordinates of key points on each person in the enhanced video stream, matching the characteristics of corresponding falling behaviors according to the coordinates of the key points on each person, and judging whether the corresponding person falls according to a matching result.

In the method provided by the embodiment, in order to avoid incapability of detection or inaccurate detection caused by insufficient light or shielding, an infrared video stream is used as a main image data source for image recognition, so that a person image to be recognized can be obtained under the condition of insufficient light. Meanwhile, the radar imaging image is used for enhancing the infrared video stream so as to further enhance the image recognition capability in the dim light or no light environment, and the shielded part in the infrared video stream image is recognized through the radar imaging image so as to avoid inaccurate recognition caused by shielding.

After the video stream to be detected is obtained, the gesture of the person on each frame of picture in the video stream can be analyzed, and the actual coordinates of 25 key points defined in the step 101 are obtained. And calculating the position relation features among different key points according to the actual coordinates of the key points, and matching the position relation features with the features of the falling behaviors acquired in the step 101, wherein if the features can be matched, the person is indicated to fall down.

After steps 101-102 provided in this embodiment, the falling situation of the person in various environments, angles and postures can be detected more accurately.

Compared with the scheme of using 18 points in the prior art, in the method provided by the embodiment, 25 key points are used in the model for detecting the human body posture. Compared with the prior art, the middle hip key point is added at the connection point of the trunk and the leg, the spine gesture is abstracted through the connection line of the neck key point and the middle hip key point, and the pelvic bone gesture is abstracted through the connection line of the left hip key point, the middle hip key point and the right hip key point, so that the model is more in line with the skeleton structure of a human body, the leg and the trunk are easier to distinguish, and the gesture of the trunk is easier to detect; and 6 key points are added to the foot, 4 key points are respectively used for modeling the left foot and the right foot, the foot gesture can be detected through the positions of the key points, and the problem that the foot gesture cannot be detected due to the fact that only one key point is respectively used for the left foot and the right foot in the prior art is avoided. The advantages are that:

(1) The 25 key point identification can more comprehensively capture the details of each part of the human body, so that the accuracy of human body posture estimation is improved. For example, for some specific poses, a model of 18 keypoints may not accurately capture the pose changes of certain locations, while a model of 25 keypoints may more accurately capture the pose changes of these locations;

(2) The 25 key point identification can better cope with the attitude estimation problems under complex environments such as shielding, noise and the like. Because the 25 key point models divide human body parts more carefully, even if certain parts are shielded or noise interference occurs, the models can estimate the postures of the parts through the posture information of other parts;

(3) In a falling scene, the 25 key point models can better cope with special challenges such as non-rigid deformation of a human body, posture change under different visual angles and the like, so that the problem of missed detection is reduced, and the detection precision of an algorithm is improved.

Further, when acquiring the key points and performing matching, in order to facilitate the correspondence with the coordinate system of the general imaging device, in practical implementation, as shown in fig. 3, the coordinate system of each key point also uses the upper left corner of the image as the origin of coordinates, the X axis is horizontally to the right from the origin, and the Y axis is vertically downward from the origin. Further, in practical implementations, due to occlusion or the like, errors or inaccuracy in the coordinates of the keypoints may occur. Therefore, the confidence of each key point coordinate can also be calculated, and key points with confidence below a specified lower limit can be filtered out. On the other hand, when the neural network identifies an abnormality or a certain part is located at the edge of the screen, the abscissa or ordinate of the corresponding key point may be 0, and in order to avoid a judgment error, the key point with the abscissa or ordinate of 0 may also be filtered. After the filtering, or when the shielding is serious, the number of key points available for calculation is reduced, so that the accuracy of the falling judgment is reduced, therefore, whether the number of the remaining key points is smaller than the lower limit of the designated number is also required to be judged, and if the number of the remaining key points is smaller than the lower limit of the designated number, the falling judgment is not performed. The specified number lower limit capable of ensuring accuracy may be set to 10 according to the actual calculation result.

In practical implementation, the characteristics of the falling behavior required to be used in the embodiment can be obtained according to the standard data of the human body posture, the angle for obtaining the image, the big data analysis result and other modes. In the following, some corresponding features of common falling behavior types are provided, in practical implementation, the following type classification manner can be referred, and features of other falling behavior types can also be obtained based on 25 key points according to actual needs.

(1) Corresponding to the illustration of fig. 4, the ratio of the horizontal to vertical lengths of the vectors between the neck keypoint and the mid-hip keypoint is greater than the first parameter, and the body aspect ratio is greater than the second parameter, and the ratio of the horizontal to vertical lengths of the vectors between the at least one foot keypoint and the mid-hip keypoint is greater than the third parameter.

In actual calculation, a vector between the buttocks in 1 neck and 8 can be obtained, and whether the ratio of the horizontal length to the vertical length of the vector is larger than the first parameter alpha is judged. The preferred value of the first parameter alpha is 1.6 to 1.8, which means that the angle between the central axis of the upper body of the person and the horizontal plane is less than 30 degrees.

Namely:

further, a first difference value between the maximum value of the abscissa and the minimum value of the abscissa of all the key points can be obtained, a second difference value between the maximum value of the ordinate and the minimum value of the ordinate of all the key points can be obtained, and whether the aspect ratio is larger than the second parameter beta is judged by taking the ratio of the first difference value to the second difference value as the aspect ratio of the body.

The preferred value of the second parameter beta is 1, which indicates that the included angle between the central axis of the whole character and the horizontal plane is smaller than 45 degrees. When calculating the body aspect ratio, all keypoints must be used to avoid calculation result errors due to missing keypoints.

Namely:

in the actual calculation, calculating the ratio of the horizontal length to the vertical length of the inter-hip vector in each foot key point and 8, and judging whether the ratio of any foot key point is larger than a third parameter gamma. The preferred value of the third parameter y is 0.5, indicating that the leg of the person is at an angle of less than 30 degrees to the horizontal.

Namely:

(2) Corresponding to FIG. 5, the ratio of the horizontal to vertical lengths of the vector between the neck keypoint and the mid-hip keypoint is greater than the first parameter, and the mid-hip keypoint is below the at least one foot keypoint.

In actual calculation, the ratio of the horizontal length to the vertical length of the vector between the neck and the middle hip of the person can be obtained, and whether the ratio is larger than the first parameter alpha or not is judged, wherein the preferable value of the first parameter alpha is 1.6-1.8.

Namely:

in the actual calculation, the magnitude of the ordinate of the vector between each foot key point and the hip key point in 8 is calculated, and whether the ordinate of the hip vector in 8 is larger than the ordinate of any foot key point is judged.

Namely: y is ₈ >y _j ,j∈{11,14,19,20,21,22,23,24}。

(3) Corresponding to the illustration of fig. 6, the nose keypoints are below the foot keypoints;

in the actual calculation, the ordinate of the vector between each foot key point and the 0 nose key point is calculated, and whether the ordinate of the 0 nose key point vector is larger than the ordinate of any foot key point is judged.

Namely: y is ₀ >y _j ,j∈{11,14,19,20,21,22,23,24}。

(4) Corresponding to the illustration of fig. 7, the neck keypoint is below the at least one foot keypoint.

In the actual calculation, the ordinate of the vector between the hip key point and the foot key point in the step 8 is calculated, and whether the ordinate of any foot key point vector is larger than the ordinate of the hip key point in the step 8 is judged.

Namely: y is ₁ >y _j ,j∈{11,14,19,20,21,22,23,24}。

With reference to the gestures in the corresponding figures, and the actual meaning of the parameter values, it can be seen that different types of fall behaviors can be detected and determined using the features described above.

In the method provided by the embodiment, in order to obtain the image to be detected accurately under various light conditions and shielding conditions, the video stream can be acquired by using the infrared camera, the shielded part of the person is enhanced by combining the radar imaging image, and each frame of image in the enhanced video stream is used as the image to be detected. Specific: and extracting pictures with a specified interval frame number from the infrared video stream, superposing the extracted pictures with the radar imaging image, and generating an enhanced infrared video stream based on the superposed pictures.

In order to perform fall detection on a plurality of characters in an image to be detected, target tracking can be performed on each moving object appearing in a video stream, each character in the video stream is distinguished based on a target tracking result, and coordinates of key points on each character in the video stream are acquired respectively. Specifically, a target tracking algorithm may be used to box select the people that appear in the video stream and to make the mark distinction. As shown in FIG. 8, the distinguishing and calibrating of the characters can be accomplished by

Step 201: the video is processed using a target tracking algorithm to frame and track the people present within the video stream.

In practical implementation, people can be distinguished according to the needs through an image feature analysis method or a target tracking algorithm. For example: deep algorithm, byteTrack algorithm, etc. Preferably a simple and efficient ByteTrack algorithm is used.

Step 202: judging whether the continuous frame images with the designated frame number detect the same person or not, and acquiring the coordinates of each key point of the person in each frame in the continuous frame images after the continuous frame images detect the same person.

And judging whether the same person is detected in the continuous N frames of images, if so, estimating the gesture of the person, wherein N is greater than or equal to 1. In actual implementation, the coordinates of the key points of the person can be obtained through an image feature analysis method or a person gesture estimation algorithm according to the requirement. For example: and estimating key points needed to be used in calculation by using methods such as an openpore algorithm, a deep algorithm, an alpha pore algorithm, a Mask RCNN and the like. Preferably, an openpost algorithm with high accuracy and convenience is used.

After steps 201 to 202 provided in this embodiment, the key point of each person in the video stream can be obtained, so as to complete the fall detection.

After the key points of the figure gesture are obtained, the determination and detection can be performed based on the characteristics of the falling behavior in step 101. And matching the coordinates of the key points on each person with the characteristics of each falling action, and judging that one or more persons fall when the position relation of the coordinates of the key points on the one or more persons are matched with the characteristics of one or more falling actions.

In order to improve the accuracy of the judgment, the occurrence of the situation that the normal lying down or sitting down action is misjudged as falling is reduced, and the speed characteristic and the time characteristic of falling can be assisted in judgment to avoid the misinformation problem. Specific: in the first stage, since the falling is an instant state change and the speed is far greater than the normal falling or lying speed, the movement speed V of the gravity center point P of the human body can be calculated according to the coordinate information of 25 key points of the continuous K-frame images of the video to be detected _P A speed threshold V may be set, when V _p >V, entering a second stage of judgment; in the second stage, the person state usually keeps still when falling, and in most cases, the task gesture is lying or semi-lying, the lying or semi-lying gesture can be defined as falling behavior, continuous N frames of images of the video to be detected are obtained, whether the person gesture in the N frames of images of the video to be detected is lying or semi-lying is detected based on the reference coordinates of the key points and the gesture estimation, namely, whether the falling behavior exists in the N frames of images is judged, and if all the persons in the W frames of images are detected to fall, the person is judged to fall. Wherein, the frame number W is a time window for judging the state change, the frame number N is a time window for detecting the falling behavior, and K is a continuous judging threshold value of the falling behavior; and W is not less than N and not more than K.

In practical implementation, the number of frames and other parameters in the calculation can be obtained according to statistical experience, and can also be adjusted according to the accuracy to be detected. In a typical detection scenario, the settings may be made with reference to the following specific values:

(1) The number of image frames K of the video to be detected is greater than or equal to 1;

(2) The fall behavior time is generally between 300 and 500ms, and the gravity center displacement of the fall is about 0.5 to 1m, so the motion speed threshold V in the example can be set to be 2m/s, and the value is only used as a reference and can be adjusted according to actual projects;

(3) The normal human body reaction time after falling is more than 1s, and the 1s of the general monitoring video stream has 30 frames of images, so the image frame number W in the example can be set to be 30, and the value is only used as a reference and can be adjusted according to actual projects.

Further, since the features obtained in step 101 include the attitudes (3) and (4) at the time of a complete fall, and the attitudes (1) and (2) at the time of an impending fall. Thus, the above features can also be used to make predictions of fall status. In the continuous multi-frame images, if the gesture of the person is quickly converted from a normal standing gesture to a gesture when the person falls, the falling state can be predicted when the person falls.

Furthermore, the acquisition and fall detection of the key points can be completed in a deep learning mode. In some practical scenario, as shown in fig. 9, the extraction of the fall behavior feature can be done in the following way.

Step 301: a model library is imported, specifying the image path to be predicted and the model path to be used.

Step 302: and analyzing human skeleton coordinates in the sample data set through a training model, and defining key points of falling behaviors. Wherein the keypoints include foot keypoints, torso keypoints, and head keypoints.

Step 303: and respectively acquiring a key point coordinate list of each behavior key point based on the sample data set.

Step 304: and calculating and acquiring key point reference coordinates of the falling behaviors based on the key point coordinate list.

Step 305: and acquiring a video image to be detected, and predicting the falling behavior in the video image to be detected based on the reference coordinates of the key points and the training model.

After steps 301 to 305 provided in this embodiment, key points in the video to be detected can be obtained by a deep learning method, and the falling state can be detected and predicted.

The fall detection method based on the image recognition technology provided by the embodiment has the following beneficial effects:

1. the method provided by the embodiment adopts a mode of combining radar imaging and infrared video streaming, so that the acquisition of the video of the person is realized under the conditions that the person is not light at night and is shielded, and the problem that the video of the whole person cannot be acquired under dark and shielded environments is solved;

2. the method provided by the embodiment adopts the target tracking algorithm to distinguish the people in the video, so that the problem that confusion is easily caused when judging whether the people fall down or not when a plurality of people appear in the video is solved;

3. the method provided by the embodiment adopts the gesture estimation algorithm to identify 25 human body key points, and is more accurate than the traditional 18 key point estimation results.

4. The method provided by the embodiment adopts the falling characteristic detection to assist in judgment, so that the situation that the normal lying or sitting behaviors are misjudged to be falling can be effectively avoided.

Example 2:

based on the fall detection method based on the image recognition technology provided in embodiment 1, specific methods for matching the features of the corresponding fall behavior according to the coordinates of the key points of each person in some steps 102 are provided in this embodiment. It can be understood that the judging process provided in this embodiment is only a specific implementation method that can be referred to in an actual implementation scenario, and is not limited by the protection scope of the present invention. In practical implementation, the fall behavior determination may be performed with reference to the following manner, or may be adjusted as necessary.

In the judging manner provided in this embodiment, whether each key point of each person matches the feature of the falling behavior may be sequentially judged according to the following manner. After the traversal judgment of all key points of one person is completed, the falling judgment can be carried out according to the matching result.

In order to facilitate the calculation, the calculation methods in this embodiment all use vectors between the key points acquired in the image and the corresponding reference key points as calculation data, the fall types are matched according to the types (1) to (4) in embodiment 1, and the preferred parameters are used for each parameter.

Further, in order to facilitate sequential matching, the foot keypoints may be: the left ankle 11, the left heel 21, the left thumb 19, the left little finger 29, the left heel 21, the right thumb 22, the right little finger 23 and the right foot 24 are arranged according to the sequence numbers, the foot key points with lower confidence and 0 abscissa or 0 ordinate are filtered, and the foot key point with the smallest sequence number in the rest foot key points is used as a comparison object. For ease of description, the lowest sequence number footer keypoint after sorting and filtering is referred to as the first footer keypoint hereinafter.

Further, in order to facilitate the determination of different types of fall behavior characteristics, the determination flag amounts condition_ A, condition _ B, condition _ C, condition _ D, condition _e and condition_f are defined in the present embodiment. The initial values of the judgment flag amounts are False.

Fig. 10 to 12 show a specific manner of performing fall judgment.

Step 401: it is determined whether the abscissa and the ordinate of the neck and the middle hip are both different from 0. If yes, go to step 402; if not, go to step 406.

Step 402: it is determined whether the ratio of the horizontal to vertical lengths of the neck and mid-hip vectors is greater than 1.6. If yes, condition_a=true, go to step 403; if not, go to step 403.

Step 403: judging whether the number of the foot key points is 0. If yes, go to step 404; if not, go to step 406.

Step 404: it is determined whether the ratio of the horizontal to vertical lengths of the first foot keypoint and the mid-hip vector is greater than 0.5. If yes, condition_e=true, go to step 405; if not, go to step 405.

Step 405: it is determined whether the mid-hip is under the foot and condition_a=true, or whether the neck is under the foot. If yes, condition_b=true, go to step 406; if not, go to step 406.

Step 406: it is determined whether the abscissa and ordinate of the nose and mid-hip are both different from 0. If yes, go to step 407; if not, go to step 411.

Step 407: it is determined whether the ratio of the horizontal to vertical lengths of the nose and mid-hip vectors is greater than 1.6. If yes, condition_a=true, go to step 408; if not, go to step 408.

Step 408: judging whether the number of key points of the foot is not 0. If not, go to step 409; if yes, go to step 411.

Step 409: and judging that the ratio of the horizontal length to the vertical length of the first foot key point and the middle hip-to-hip vector which are not 0 is larger than the second parameter. If yes, condition_e=true, go to step 410; if not, go to step 410.

Step 410: it is determined whether the mid-hip is under the foot and condition_a=true, or whether the nose is under the foot. If yes, condition_b=true, go to step 411; if not, go to step 411.

Step 411: a keypoint of the 25 keypoints, with an abscissa of no 0, is defined as all non-zero nodes.

Step 412: and obtaining the maximum value and the minimum value of the abscissas in all non-zero nodes.

Step 413: and judging whether the ratio of the horizontal length to the vertical length of the vector between the point with the largest abscissa and the point with the smallest abscissa is greater than 1. If yes, condition_c=true, go to step 414. If not, go to step 414.

Step 414: judging that the abscissa and the ordinate of the nose, the neck, the right shoulder, the left shoulder and the middle hip are not 0. If yes, go to step 415; if not, go to step 419.

Step 415: the vector length between the neck and mid-hip is defined as upper body.

Step 416: the maximum length of the vector between the point defining the foot and the mid-hip is lower body.

Step 417: the direct ratio of upper_body and lower_body is defined as up_low_ratio.

Step 418: judging whether the up_low_ratio is less than or equal to 0.45, wherein the neck is above the middle hip, and the ratio of the horizontal length to the vertical length of the vector between the neck and the middle hip is less than 1. If yes, condition_d=true, go to step 419; if not, go to step 419.

Step 419: and judging whether the horizontal coordinate and the vertical coordinate of the neck and the middle buttocks are not 0, and the neck is below the middle buttocks. If yes, condition_f=true, go to step 420; if not, go to step 420.

Step 420: it is determined whether the condition_ A, condition _c and the condition_e are both True, or whether the condition_b is True, or whether the condition_f is True. If yes, judging that the patient falls; if not, judging that the patient falls.

After steps 401 to 420 provided in this embodiment, the matching of the falling behavior features provided in embodiment 1 may be completed, so as to implement falling detection of different types of gestures.

Example 3:

on the basis of the fall detection method based on the image recognition technique provided in the above embodiments 1 to 2, the present invention also provides a fall detection system based on the image recognition technique that can be used to implement the above method,

fig. 13 is a schematic diagram of a system architecture according to an embodiment of the present invention. The system comprises an image acquisition device and an image processing device. The user of the image acquisition device acquires the enhanced video stream to be detected. The image processing apparatus is used for fall detection of the enhanced video stream according to the fall detection method based on the image recognition technique provided in embodiment 1 or 2, for example, performing the respective steps shown in fig. 2, 8 and 9 described above.

In order to acquire the infrared video stream and the radar imaging image required to be used in step 102, in the system provided in this embodiment, the image acquisition device includes an infrared camera and an imaging radar, specifically: the infrared camera user obtains an infrared video stream; an imaging radar user acquires a radar imaging image. In the implementation, an infrared camera can be used for collecting video streams, UWB MIMO radar imaging is combined for enhancing the shielded part of the human body, and the enhanced video streams are transmitted into an image processing device for detection.

On the other hand, in order to facilitate deployment and reduce power consumption, the image processing apparatus may use various edge processing apparatuses to calibrate each person in an image, acquire a posture key point of each person, and perform fall detection from the posture key point. Furthermore, when the falling of the person is judged, an alarm can be sent out directly through the edge processing equipment or an alarm signal can be pushed to the appointed equipment.

In some practical implementation scenario, atlas500 intelligent substation is used as the image processing device. Decoding the video stream by using a decoding module of the Atlas500 intelligent substation; selecting a falling person through a display interface frame of the Atlas500 intelligent substation; alarming by using a buzzer through an alarm interface of the Atlas500 intelligent substation; an alert is sent to the monitoring app through the Atlas500 intelligent substation's wired and wireless network interfaces. The Atlas500 intelligent substation can be used for processing video input, prediction and alarm in one station, so that the deployment difficulty is reduced.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The fall detection method based on the image recognition technology is characterized by comprising the following steps of:

defining key points used for judging the gesture on the character, and acquiring at least one characteristic of falling behaviors according to the position relation of the key points, wherein the key points comprise: head keypoints, torso keypoints, and foot keypoints;

and enhancing the infrared video stream by using the radar imaging image, acquiring coordinates of key points on each person in the enhanced video stream, matching the characteristics of corresponding falling behaviors according to the coordinates of the key points on each person, and judging whether the corresponding person falls according to a matching result.

2. The fall detection method based on image recognition technology according to claim 1, wherein the head keypoints comprise nose keypoints and neck keypoints, the torso keypoints comprise mid-hip keypoints, and the fall behavior is characterized in particular by:

the ratio of the horizontal to vertical lengths of the vector between the neck keypoint and the mid-hip keypoint is greater than a first parameter, and the body aspect ratio is greater than a second parameter, and the ratio of the horizontal to vertical lengths of the vector between the at least one foot keypoint and the mid-hip keypoint is greater than a third parameter;

and/or, the ratio of the horizontal and vertical lengths of the vector between the neck keypoint and the mid-hip keypoint is greater than the first parameter, and the mid-hip keypoint is below the at least one foot keypoint;

and/or the nose keypoints are below at least one foot keypoint;

and/or the neck keypoint is below the at least one foot keypoint.

3. A fall detection method based on image recognition technology as claimed in claim 2, wherein the key points further comprise:

the head keypoints further comprise: one or more of a right eye keypoint, a left eye keypoint, a right ear keypoint, and a left ear keypoint;

the torso key points further include: one or more of a right shoulder keypoint, a right elbow keypoint, a right wrist keypoint, a left shoulder keypoint, a left elbow keypoint, a left wrist keypoint, a right hip keypoint, a right knee keypoint, a left hip keypoint, and a left knee keypoint;

the foot keypoints include: a right ankle keypoint, a left foot thumb keypoint, a left foot little finger keypoint, a left heel keypoint, a right foot thumb keypoint, a right foot little finger keypoint, and a right heel keypoint.

4. A fall detection method based on image recognition technology as claimed in claim 3, characterized in that the body aspect ratio specifically comprises:

and obtaining a first difference value of the maximum value of the abscissa and the minimum value of the abscissa of all the key points, obtaining a second difference value of the maximum value of the ordinate and the minimum value of the ordinate of all the key points, and taking the ratio of the first difference value to the second difference value as the aspect ratio of the body.

5. A fall detection method based on image recognition technology as claimed in claim 1, wherein the enhancing of the infrared video stream using radar imaging images comprises:

and extracting pictures with a specified interval frame number from the infrared video stream, superposing the extracted pictures with the radar imaging image, and generating an enhanced infrared video stream based on the superposed pictures.

6. The fall detection method based on image recognition technology as claimed in claim 1, wherein the acquiring coordinates of key points on each person in the enhanced video stream specifically comprises:

and carrying out target tracking on each moving object appearing in the video stream, distinguishing each person in the video stream based on a target tracking result, and respectively acquiring coordinates of key points on each person in the video stream.

7. The fall detection method as claimed in claim 6, wherein the acquiring coordinates of key points on each person in the video stream, respectively, further comprises:

judging whether the continuous frame images with the designated frame number detect the same person or not, and acquiring the coordinates of each key point of the person in each frame in the continuous frame images after the continuous frame images detect the same person.

8. The fall detection method based on image recognition technology as claimed in claim 1, wherein the step of matching the characteristics of the corresponding fall behavior according to the coordinates of the key points on each person, and determining whether the corresponding person falls according to the matching result, specifically comprises:

and matching the coordinates of the key points on each person with the characteristics of each falling action, and judging that one or more persons fall when the position relation of the coordinates of the key points on the one or more persons are matched with the characteristics of one or more falling actions.

9. The fall detection system based on the image recognition technology is characterized by comprising an image acquisition device and an image processing device, and is particularly characterized in that:

the image acquisition equipment user acquires an enhanced video stream to be detected;

the image processing apparatus is configured to perform fall detection on an enhanced video stream according to the fall detection method based on image recognition technology as claimed in any one of claims 1-8.

10. The fall detection system based on image recognition technology according to claim 9, wherein the image acquisition device comprises an infrared camera and an imaging radar, in particular:

the infrared camera user acquires an infrared video stream;

the imaging radar user acquires a radar imaging image.