CN110287923B

CN110287923B - Human body posture acquisition method, device, computer equipment and storage medium

Info

Publication number: CN110287923B
Application number: CN201910581506.9A
Authority: CN
Inventors: 陈泳君
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-29
Filing date: 2019-06-29
Publication date: 2023-09-15
Anticipated expiration: 2039-06-29
Also published as: CN110287923A

Abstract

The application relates to a human body posture acquisition method. The method comprises the following steps: acquiring an effective video frame in a video stream acquired by image acquisition equipment; extracting human body key point information of a target human body in the effective video frame, wherein the human body key point information comprises position information of at least two key points in the target human body; acquiring the human body posture of the target human body according to the human body key point information, wherein the human body posture comprises a falling posture and a non-falling posture; according to the scheme, continuous video frames are not required to be identified, faces in the video frames are not required to be identified, and only two or more than two human body key points in the independent video frames are required to be identified, so that whether a human body falls down can be identified.

Description

Human body posture acquisition method, device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of video monitoring, in particular to a human body posture acquisition method, a human body posture acquisition device, computer equipment and a storage medium.

Background

Fall has become a high-risk health problem for the elderly, especially solitary old, bears more fall risks, and fall detection based on a common monitoring camera can reduce the abnormal condition of uninterrupted inspection fall by manpower.

In the related art, when detecting whether a target human body falls, a face detection module can detect a face position in a monitoring video frame, the face position is taken as a position of a target on the head of the human body, the falling speed of the head of the target human body is calculated according to the positions of the heads of the target human body in two continuous video frames, and whether the target human body falls is judged by comparing with a preset falling speed threshold.

However, the solutions in the related art need to detect the face position in the continuous video frames, so as to further detect whether the target human body corresponding to the face falls, and the detection omission is easy to occur, which results in low accuracy of falling gesture detection.

Disclosure of Invention

The embodiment of the application provides a human body posture acquisition method, a device, computer equipment and a storage medium, which can improve the accuracy of falling posture detection based on video images, and the technical scheme is as follows:

In one aspect, a method for acquiring a human body posture is provided, the method comprising:

acquiring an effective video frame in a video stream acquired by image acquisition equipment;

extracting human body key point information of a target human body in the effective video frame, wherein the human body key point information comprises position information of at least two key points in the target human body;

and acquiring the human body posture of the target human body according to the human body key point information, wherein the human body posture comprises a falling posture and a non-falling posture.

In another aspect, there is provided a human body posture acquisition apparatus, the apparatus including:

the video frame acquisition module is used for acquiring effective video frames in the video stream acquired by the image acquisition equipment;

the key point extraction module is used for extracting the human body key point information of the target human body in the effective video frame, wherein the human body key point information comprises the position information of at least two key points in the target human body;

the gesture acquisition module is used for acquiring the human body gesture of the target human body according to the human body key point information, wherein the human body gesture comprises a falling gesture and a non-falling gesture.

Optionally, the gesture obtaining module is configured to

Acquiring the relative position relation between the at least two key points according to the position information of the at least two key points;

and acquiring the human body posture of the target human body according to the relative position relation between the at least two key points.

Optionally, the position information of the at least two key points includes position information of at least two head key points in the target human body and position information of at least two lower body key points in the target human body;

a gesture obtaining module, configured to, when obtaining a relative positional relationship between the at least two key points according to the positional information of the at least two key points,

acquiring a first ordinate according to the position information of the at least two head key points, wherein the first ordinate is an average value of the ordinates of the at least two head key points in a designated coordinate system; the direction of the ordinate axis of the specified coordinate system is the vertical direction in the effective video frame;

acquiring a second ordinate according to the position information of the at least two lower body key points, wherein the second ordinate is an average value of the ordinate of each of the at least two lower body key points in the appointed coordinate system;

Acquiring the height relationship between the first ordinate and the second ordinate as the relative position relationship between the at least two key points;

a gesture obtaining module for obtaining the gesture of the target human body according to the relative position relation between the at least two key points,

and when the first ordinate is lower than the second ordinate, determining the human body posture of the target human body as a falling posture.

Optionally, the position information of the at least two key points includes a left upper body key point of the target human body and a left crotch key point of the target human body; the left upper body key points comprise left eye key points or left shoulder key points;

determining a connection line between the upper left body keypoint and the left crotch keypoint according to the position information of the upper left body keypoint and the position information of the left crotch keypoint;

acquiring an included angle between the connecting line and the vertical direction in the effective video frame as a relative position relation between the at least two key points;

and when the included angle is larger than a first angle threshold, determining that the human body posture of the target human body is a falling posture.

Optionally, the position information of the at least two key points includes a right upper body key point of the target human body and a right crotch key point of the target human body; the right upper half body key points comprise right eye key points or right shoulder key points;

determining a connection line between the upper right body keypoint and the right crotch keypoint according to the position information of the upper right body keypoint and the position information of the right crotch keypoint;

Optionally, the video frame acquisition module is configured to,

acquiring a current video frame acquired by the image acquisition equipment in real time;

detecting a moving target of the current video frame;

and when the current video frame contains a moving target, determining the current video frame as an effective video frame acquired at the time.

Optionally, the video frame acquisition module is further configured to,

when the current video frame does not contain a moving target, acquiring a current frame interval, wherein the current frame interval is a frame interval between the current video frame and an effective video frame acquired in the previous time;

and when the current frame interval is larger than a frame interval threshold, determining the current video frame as the effective video frame acquired at this time.

Optionally, the apparatus further includes:

the optical flow speed acquisition module is used for acquiring the optical flow speed of the head and shoulder area of the target human body in the effective video frame when the human body posture of the target human body is a non-falling posture;

and the gesture modifying module is used for modifying the human gesture of the target human body into a falling gesture when the optical flow speed meets a preset condition.

Optionally, the apparatus further includes:

the head-shoulder speed acquisition module is used for dividing the optical flow speed by the area of the head-shoulder area before the posture modification module modifies the human posture of the target human body into a falling posture to obtain the head-shoulder speed of the target human body;

and the condition determining module is used for determining that the optical flow speed meets the preset condition when the head-shoulder speed is greater than a speed threshold value.

Optionally, the apparatus further includes:

the key point acquisition module is used for acquiring at least three key points of the head and shoulder parts of the target human body before the optical flow speed acquisition module acquires the optical flow speed of the head and shoulder parts of the target human body in the effective video frame; the at least three key points comprise a left shoulder key point and a right shoulder key point of the target human body;

the region acquisition module is used for acquiring a region corresponding to a minimum circumscribed rectangle of the at least three key points as a head-shoulder region of the target human body in the effective video frame, wherein the minimum circumscribed rectangle is the minimum rectangle containing the at least three key points.

Optionally, the apparatus further includes:

and the alarm module is used for sending alarm information to the user terminal when the human body posture of the target human body is a falling posture.

Optionally, the alarm module is configured to send the alarm information to the user terminal if the human body posture of the target human body maintains a falling posture in at least two valid video frames acquired recently.

In yet another aspect, a computer device is provided, the computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by the processor to implement a human gesture acquisition method as described above.

In yet another aspect, a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by a processor to implement a human gesture acquisition method as described above is provided.

The technical scheme provided by the application can comprise the following beneficial effects:

according to the scheme, continuous video frames are not required to be identified, faces in the video frames are not required to be identified, and only two or more human body key points in the independent video frames are required to be identified, so that whether a human body falls down can be identified.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a system configuration diagram of a video surveillance system according to various embodiments of the present application;

FIG. 2 is a schematic diagram of the system functionality of the embodiment of FIG. 1 involving a video surveillance system;

FIG. 3 is a flowchart illustrating a method of human gesture acquisition according to an exemplary embodiment;

FIG. 4 is a flowchart illustrating a method of human gesture acquisition according to an exemplary embodiment;

fig. 5 is a schematic diagram of a fall gesture detection flow related to the embodiment shown in fig. 4;

fig. 6 is a block diagram showing a structure of a human body posture acquisition apparatus according to an exemplary embodiment;

fig. 7 is a schematic diagram of a computer device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The application provides a scheme for determining whether a human body falls or not through the positions of key points of the human body in a video frame, and the scheme can accurately acquire the human body posture through a single video frame. In order to facilitate understanding, several terms related to embodiments of the present application are explained below.

Referring to fig. 1, a system configuration diagram of a video monitoring system according to various embodiments of the present application is shown. As shown in fig. 1, the system includes: a plurality of image capturing devices 110, a gesture recognition device 120, a server 130 and a user terminal 140.

Wherein the image capturing device 110 may be a dedicated monitoring camera; alternatively, the image capturing device 110 may also be another device that temporarily functions as a monitoring camera, such as a smart phone with a camera, a tablet computer, a smart television, or the like.

The image capturing device 110 may be disposed at a place where the falling situation of the human body is high, or at a place where rapid rescue is required after the falling of the human body, so as to monitor the crowd falling easily in real time. For example, the image capturing apparatus 110 may be disposed in any place where the falling posture of a human body needs to be monitored, such as a private home, a nursing home, a hospital, or a kindergarten.

The image capturing device 110 is connected to the gesture recognition device 120 through a communication network.

The gesture recognition apparatus 120 described above may be a computer apparatus that is independently deployed and dedicated to recognizing the human gesture of the target human body from the video frame.

For example, the gesture recognition apparatus 120 may be a dedicated apparatus installed in a user house or a machine room and running a gesture recognition algorithm.

Alternatively, the gesture recognition apparatus 120 may be an intelligent home apparatus in which gesture recognition software is installed or in which a gesture recognition algorithm is run.

For example, the gesture recognition apparatus 120 may be an intelligent gateway or an intelligent router, or the like.

The gesture recognition apparatus 120 is connected to the server 130 through a communication network. Alternatively, the gesture recognition apparatus 120 described above may also be implemented as part of the server 130.

The server 130 may be a server, or several servers, or a virtualization platform, or a cloud computing service center.

Server 130 may be comprised of one or more functional units. Optionally, the server 130 may also be connected to a database. The database may be a distributed database, or may be another type of database. The database is used for storing various data, such as video frames acquired by the respective image acquisition devices 110, and the like.

The server 130 may be connected to the user terminal 140 through a communication network.

The terminal 140 may be a terminal device having a network connection function, and for example, the terminal 160 may be a mobile phone, a tablet computer, an electronic book reader, smart glasses, a smart watch, a laptop portable computer, a desktop computer, and the like.

Optionally, an application program corresponding to the server 130 may be installed in the terminal 140, and based on the application program, related operations on the image capturing device 110 may be implemented, for example, viewing an instant video or a historical video captured by the image capturing device 110, or performing control such as opening, closing, dormancy, waking up, and adjusting a shooting angle on the image capturing device 110.

Optionally, the communication network is a wired network or a wireless network.

Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

Referring to fig. 2, a schematic diagram of a system function of a video monitoring system according to an embodiment of the application is shown. As shown in fig. 2, the above system functions as follows:

1) The user terminal 140 or the server 130 issues a control command to the image capturing device 110.

In one possible implementation, the user terminal 140 may send a control command to the image capturing device 110 through an application program corresponding to the server 130, so as to control the image capturing device 110 to turn on, turn off, sleep, wake up, and adjust a shooting angle.

In another possible implementation, the control command may also be automatically sent by the server 130 to the image acquisition device 110 according to a pre-configured control strategy.

The control policy may be a control policy preset by a developer, or may be a control policy set and uploaded by a user corresponding to the user terminal 140 through an application program.

2) The image capturing device 110 reports the status to the server 130 or the user terminal.

In the embodiment of the present application, when the state of the image capturing device 110 changes, the state of the image capturing device may be reported to the server 130 or the user terminal 140, for example, whether the image capturing device is in a capturing state, the current capturing angle, the current electric quantity, and so on.

3) After the image capturing device 110 captures a real-time video, the captured video frame is provided to the gesture recognition device 120, and gesture recognition is performed by the gesture recognition device 120.

The image capturing device 110 in the capturing state transmits the captured video frame to the gesture recognition device 120, which recognizes whether the human body gesture of the target human body contained in the video frame is a falling gesture or a non-falling gesture.

4) The gesture recognition apparatus 120 transmits alert information to the user terminal 140.

In the embodiment of the present application, when the gesture recognition apparatus 120 recognizes that the human gesture of the target human body is a falling gesture, an alarm may be issued to the user terminal 140.

Fig. 3 is a flowchart illustrating a human body posture acquisition method that may be used in a computer device to perform posture detection of a target human body in a video frame acquired by an image acquisition device, according to an exemplary embodiment. Taking the example that the computer device is the gesture recognition device 120 in the system shown in fig. 1, as shown in fig. 3, the human gesture obtaining method may include the following steps:

step 301, extracting a valid video frame from a video stream acquired by an image acquisition device.

In one possible implementation, the video data transmitted by the image capture device to the gesture recognition device may be video streaming data in a format that may be in a real time streaming protocol (Real Time Streaming Protocol, RTSP) format/real time messaging protocol (Real Time Messaging Protocol, RTMP) format or other custom protocol format. The gesture recognition device decodes the acquired video stream data to obtain video frames, and extracts valid video frames from the obtained video frames.

In another possible implementation, the video data transmitted by the image acquisition device to the gesture recognition device may be a sequence of video frames consisting of a series of video frames, from which the gesture recognition device extracts the valid video frames directly.

In an embodiment of the present application, the effective video frame may be a video frame that satisfies a filtering condition, where the filtering condition may include at least one of the following conditions: moving targets exist in the video frames, human bodies exist in the video frames, and the image quality of the video frames reaches the preset quality requirement.

Step 302, extracting human body key point information of a target human body in the effective video frame, wherein the human body key point information comprises position information of at least two key points in the target human body.

Wherein the keypoints in the target human body may be individual keypoints on the target human body having distinct visual features. For example, the at least two keypoints in the target human body may include at least two of the following 18 keypoints:

left eye keypoints, right eye keypoints, left ear keypoints, right ear keypoints, nose keypoints, neck keypoints, left shoulder keypoints, right shoulder keypoints, left elbow keypoints, right elbow keypoints, left wrist keypoints, right wrist keypoints, left crotch keypoints, right crotch keypoints, left knee keypoints, right knee keypoints, left ankle keypoints, and right ankle keypoints.

In the embodiment of the application, the key point may be a center point of an area where the corresponding human body part is located in the effective video frame.

For example, the left eye key point of the target human body may be the center point of the area where the left eye or the left pupil of the target human body is located in the effective video frame, for another example, the left ear key point of the target human body may be the center point of the area where the left ear of the target human body is located in the effective video frame, for another example, the nose key point of the target human body may be the center point of the area where the nose of the target human body is located in the effective video frame, and so on.

In the embodiment of the present application, only the above 18 key points are taken as examples to describe the key points of the human body, and in other possible implementation manners, the key points in the target human body may also include the key points of other parts in the human body.

In the embodiment of the present application, the location information of the key point may include two-dimensional coordinates of the corresponding key point in the video frame.

Step 303, acquiring the human body posture of the target human body according to the human body key point information, wherein the human body posture comprises a falling posture and a non-falling posture.

In the embodiment of the application, the gesture recognition device can determine whether the target human body is in a falling gesture through the position information of at least two key points in the target human body. For example, when it is recognized that a certain key point (such as a nose key point) in the head-shoulder portion of the target human body is below a lower body key point (such as a knee key point), it is possible to determine that the human body posture is a falling posture.

In summary, in the implementation process of the scheme of the application, the gesture recognition device does not need to recognize continuous video frames, does not need to recognize faces in the video frames, only needs to recognize two or more key points of human bodies in separate video frames, and can recognize whether the human bodies fall down.

Fig. 4 is a flowchart illustrating a human body posture acquisition method that may be used in a computer device to perform posture detection of a target human body in a video frame acquired by an image acquisition device, according to an exemplary embodiment. Taking the example that the computer device is the gesture recognition device 120 in the system shown in fig. 1, as shown in fig. 4, the human gesture obtaining method may include the following steps:

step 401, acquiring an effective video frame in a video stream acquired by an image acquisition device.

In one possible implementation manner, the gesture recognition device may acquire a current video frame acquired by the image acquisition device in real time; detecting a moving target of the current video frame; and when the current video frame contains a moving target, determining the current video frame as an effective video frame acquired at the time.

In the embodiment of the application, if gesture detection is performed on each video frame in the video stream acquired by the image acquisition equipment, higher processing performance is required, and deployment scenes of gesture recognition are affected. In order to avoid unnecessary gesture detection process and reduce the consumption of processing resources in the subsequent gesture detection process, for a current video frame in a video stream acquired by an image acquisition device, the current video frame can be subjected to moving object detection to detect whether a moving object exists in the current video frame, if the moving object exists, the situation that a human body possibly exists in the current video frame is indicated, and the human body gesture detection is required, and at the moment, the current video frame can be determined to be an effective video frame acquired at this time.

In one possible implementation, the gesture recognition apparatus may perform moving object detection on the current video frame through a moving object detection algorithm. The motion target detection algorithm may be an inter-frame difference method, or the target motion detection algorithm may be another background modeling algorithm, such as a mixed gaussian modeling algorithm, etc.

Alternatively, when the moving object is not included in the current video frame, the gesture recognition apparatus may acquire a current frame interval, which is a frame interval between the current video frame and a valid video frame acquired last time; and when the current frame interval is larger than the frame interval threshold, determining the current video frame as the effective video frame acquired at the time.

When a human body falls in a video frame, if the falling human body is still, the gesture recognition device may not recognize a moving target therein, and if the falling human body is not recognized, a condition of missed detection occurs in such a video frame, so in the embodiment of the application, if the gesture detection device does not detect the moving target from the current video frame, the current video frame is not directly discarded, the frame interval between the current video frame and the effective video frame acquired at the previous time is continuously acquired, and if the acquired frame interval is higher than a preset frame interval threshold value, the current video frame is also determined as the effective video frame acquired at this time, so that at least one detection of the human body gesture can be ensured at intervals of a certain time, so as to avoid the condition of missed detection.

For example, taking a monitoring camera as an example, it is assumed that, at a first moment, a human body falls in a shooting range of the monitoring camera, but is blocked by another object (such as a vehicle or another human body, etc.), so that the gesture recognition device fails to detect a moving target, or detects a moving target but does not detect a human body falling gesture subsequently, after the first moment, the falling human body remains stationary in the falling gesture, and at a second moment after the first moment, the gesture recognition device determines that a video frame corresponding to the second moment is acquired as a valid video frame when an interval between the video frame corresponding to the second moment and a valid video frame acquired previously is greater than a certain threshold, and performs a step of detecting a subsequent human body falling.

Alternatively, the frame interval and the frame interval threshold may be a number of frames (e.g., 150 frames) separated by two frames, or the frame interval and the frame interval threshold may be a duration (e.g., 3 s) separated by two frames.

Alternatively, the frame interval threshold may be set in advance in the gesture recognition apparatus by the developer, or the frame interval threshold may be set by the user through an application program in the user terminal.

After the effective video frame is obtained, the gesture recognition device can recognize the human gesture of the effective video frame.

Step 402, extracting human body key point information of a target human body in the effective video frame, wherein the human body key point information comprises position information of at least two key points in the target human body.

In the embodiment of the present application, the step of extracting the human body key point information of the target human body from the effective video frame by the gesture recognition device may be divided into the following two steps:

1) And detecting whether a human body exists in the effective video frame.

In the embodiment of the application, the gesture recognition device can detect whether the human body exists in the effective video frame, if not, the effective video frame is discarded, otherwise, the subsequent steps are executed.

The gesture recognition device can detect whether a human body exists in the effective video frame through a human body detection algorithm. For example, the human detection algorithm may be any one of a single point multi-frame detector (Single Shot MultiBox Detector, SSD) algorithm, YOLO (You Only Look Once) algorithm, and a fast regional convolutional neural network (faster Region-Convolutional Neural Networks, faster RCNN) algorithm.

2) When a human body exists in the effective video frame, human body key point information of a target human body is extracted from the effective video frame through a human body gesture recognition algorithm.

In the embodiment of the application, the gesture recognition device can extract the human body key point information of the target human body through a human body gesture recognition algorithm such as a dense gesture (Densepose) algorithm, an open gesture (OpenPose) algorithm, a real-time Multi-person gesture estimation (real-time Multi-Person Pose Estimation) algorithm and the like.

The position information of the key points included in the human body key point information may be two-dimensional coordinates of the key points in a specified coordinate system. Wherein the direction of the ordinate axis of the specified coordinate system is the vertical direction in the effective video frame, and the direction of the abscissa axis is the horizontal direction in the effective video frame.

In the embodiment of the application, the human body posture recognition algorithm may be a machine learning algorithm based on a convolutional neural network. When the gesture recognition device extracts the human body key point information through the human body gesture recognition algorithm, the image data corresponding to the image containing the target human body can be input into a trained convolutional neural network, the convolutional neural network performs processing operations such as feature extraction, feature mapping and the like on the image data, and the human body key point information corresponding to each key point in the target human body is output.

In one possible implementation manner, the convolutional neural network may be obtained by training sample images of labeled key points of the human body in advance, and the sample images, the hierarchical structure of the convolutional neural network and the training process may be different for different human body gesture recognition algorithms.

The image including the target human body may be an entire effective video frame, or may be an image of the position of the target human body cut from the effective video frame according to the human body detection result.

Step 403, acquiring the human body posture of the target human body according to the human body key point information, wherein the human body posture comprises a falling posture and a non-falling posture.

Optionally, when acquiring the human body posture of the target human body according to the human body key point information, the posture identifying device may acquire the relative positional relationship between the at least two key points according to the positional information of the at least two key points; and acquiring the human body posture of the target human body according to the relative position relation between the at least two key points.

When the human body falls, the relative positions of the key points of each part in the human body can meet certain conditions, for example, certain key points of the upper half body of the human body can be lower than certain key points of the lower half body of the human body, or connecting lines between certain key points in the human body can form certain angles. Based on the principle, the scheme disclosed by the application is to determine whether the human body posture of the target human body is a falling posture according to the position relation among all key points in the target human body.

The manner in which the gesture recognition apparatus obtains the gesture of the human body may include the following:

1) The position information of the at least two key points includes position information of at least two head key points in the target human body and position information of at least two lower body key points in the target human body.

When the relative positional relationship between the at least two key points is acquired according to the positional information of the at least two key points, the gesture recognition apparatus may acquire a first ordinate according to the positional information of the at least two head key points, and acquire a second ordinate according to the positional information of the at least two lower body key points.

Wherein the first ordinate is an average value of the ordinate of each of the at least two head key points in a specified coordinate system, and the second ordinate is an average value of the ordinate of each of the at least two lower body key points in the specified coordinate system; the direction of the ordinate axis of the specified coordinate system is the vertical direction in the effective video frame.

When the human body posture of the target human body is acquired according to the relative position relation between the at least two key points, the posture identifying device can determine that the human body posture of the target human body is a falling posture when the first ordinate is lower than the second ordinate.

The at least two head key points may include any two or more of 8 key points, such as a left eye key point, a right eye key point, a left ear key point, a right ear key point, a nose key point, a neck key point, a left shoulder key point, and a right shoulder key point.

The at least two lower body keypoints may include any two or more of a left crotch keypoint, a right crotch keypoint, a left knee keypoint, a right knee keypoint, a left ankle keypoint, and a right ankle keypoint.

Since the vertical direction in the image acquired by the image acquisition component is generally the same as the vertical direction in the actual three-dimensional space, in the coordinate system using the vertical direction in the effective video frame as the ordinate axis direction, when the average ordinate of the head key points of the target human body is lower than the average ordinate of the lower body key points, it is indicated that the head of the target human body is generally flush with or lower than the lower body, and at this time, the human body posture of the target human body can be regarded as a falling posture. Conversely, if the average ordinate of the head key points of the target human body is not lower than the average ordinate of the lower body key points, it is indicated that the head of the target human body is generally located on the lower body, and at this time, the human body posture of the target human body can be regarded as a non-falling posture.

2) The position information of the at least two key points comprises a left upper half body key point of the target human body and a left crotch key point of the target human body; the left upper body key point comprises a left eye key point or a left shoulder key point;

when acquiring the relative positional relationship between the at least two key points based on the positional information of the at least two key points, the posture identifying apparatus may determine a connection line between the upper left body key point and the left crotch key point based on the positional information of the upper left body key point and the positional information of the left crotch key point; and then acquiring an included angle between the connecting line and the vertical direction in the effective video frame as a relative position relation between the at least two key points.

When the human body posture of the target human body is acquired according to the relative position relation between the at least two key points, the posture identifying device can determine that the human body posture of the target human body is a falling posture when the included angle is larger than a first angle threshold value.

3) The position information of the at least two key points comprises a key point of the right upper half body of the target human body and a key point of the right crotch of the target human body; the right upper body keypoints comprise right eye keypoints or right shoulder keypoints;

When acquiring the relative positional relationship between the at least two key points based on the positional information of the at least two key points, the posture identifying apparatus may determine a connection line between the upper right body key point and the right crotch key point based on the positional information of the upper right body key point and the positional information of the right crotch key point; and acquiring an included angle between the connecting line and the vertical direction in the effective video frame as a relative position relation between the at least two key points.

The first angle threshold and the second angle threshold may be the same angle threshold or different angle thresholds.

When a human body falls, the head and shoulder parts and the crotch part of the human body are close to the ground, so that the heights of the head and shoulder parts and the crotch part on the same side of the human body are basically equivalent, correspondingly, the included angle between the connecting line between the head and shoulder parts and the crotch part on the same side of the human body and the vertical direction in an effective video frame is larger, and the included angle between the connecting line between the head and shoulder parts and the crotch part on the same side of the target human body and the vertical direction in the effective video frame is smaller, and therefore, in the embodiment of the application, the gesture recognition device can acquire the included angle between the connecting line between the head and shoulder parts and the crotch part on the same side of the target human body and the vertical direction in the effective video frame, and compare the acquired included angle with a preset angle threshold (such as 45 DEG), and determine the gesture of the human body of the target human body to fall if the acquired included angle is larger than the preset angle threshold. Otherwise, if the acquired included angle is not larger than the preset angle threshold, determining that the human body posture of the target human body is a non-falling posture.

The angle threshold may be preset in the gesture recognition apparatus by a developer, or may be set by a user through an application program.

In the embodiment of the application, the three ways of acquiring the human body posture can be used independently or in combination.

For example, if any one of the three ways of acquiring the human body posture determines that the human body posture of the target human body is a falling posture, the posture identifying device acquires the human body posture of the target human body as the falling posture.

Or, in the three ways of acquiring the human body posture, when the human body posture of the target human body is determined to be the falling posture in the first way, and the human body posture of the target human body is determined to be the falling posture in any one of the second way and the third way, the posture identifying device acquires the human body posture of the target human body to be the falling posture.

The scheme shown in the embodiment of the application only takes the three modes as an example to describe how the gesture recognition device obtains the human gesture of the target human body. It will be appreciated by those skilled in the art that it is also possible to determine whether the target human body is in a falling posture according to the relative positional relationship between the respective key points in the target human body in other manners.

Step 404, when the human body posture of the target human body is a non-falling posture, acquiring the optical flow speed of the head and shoulder region of the target human body in the effective video frame.

Optionally, before acquiring the optical flow velocity of the head-shoulder area of the target human body in the effective video frame, the gesture recognition device may acquire at least three key points of the head-shoulder area of the target human body; the at least three key points comprise a left shoulder key point and a right shoulder key point of the target human body; and acquiring a region corresponding to the minimum circumscribed rectangle of the at least three key points as a head-shoulder region of the target human body in the effective video frame, wherein the minimum circumscribed rectangle is the minimum rectangle containing the at least three key points.

In the embodiment of the application, the head-shoulder region of the target human body can be determined by the key points of the head-shoulder part in the target human body. The key points of the head-shoulder part in the target human body can comprise at least 3 key points of 7 key points, namely a left eye key point, a right eye key point, a left ear key point, a right ear key point, a nose key point, a left shoulder key point and a right shoulder key point of the target human body, which are extracted from the effective video frame by the gesture recognition device, and the at least 3 key points need to contain the left shoulder key point and the right shoulder key point at the same time. If the key points of the head and shoulder parts extracted from the effective video frame by the gesture recognition equipment do not contain the left shoulder key point and the right shoulder key point at the same time, the step of acquiring the optical flow speed is not executed.

The minimum circumscribed rectangle may be a rectangle in which two adjacent sides are respectively in a vertical direction and a horizontal direction in the effective video frame. For example, the minimum bounding rectangle is an upper edge and a lower edge, which are respectively two edges passing through the key points at the uppermost and the lowermost of the at least three key points; the left and right sides of the minimum circumscribed rectangle are two sides passing through the key points at the leftmost side and the rightmost side of the at least three key points respectively.

Or, the minimum circumscribed rectangle can be a rectangle with any rotation angle, and only the key points of the head and shoulder parts in the identified target human body are required to be ensured to be positioned in the rectangular range, and the area is minimum.

Step 405, when the optical flow speed satisfies a preset condition, modifying the human body posture of the target human body into a falling posture.

In the embodiment of the application, the gesture recognition device can detect not only the target human body in the falling gesture, but also the falling target human body.

Wherein, when the human body falls down, the head and shoulder parts of the human body can move rapidly in a short time. Based on the above principle, in the embodiment of the present application, when it is determined that the target human body is in the non-falling posture through the above steps 402 and 403, the posture identifying apparatus further obtains the optical flow speed of the head-shoulder portion in the target human body, then determines the head-shoulder speed of the target human body according to the optical flow speed, and when the head-shoulder speed satisfies a certain condition, for example, when the head-shoulder speed is greater than a certain speed threshold, it may be considered that the target human body is falling, and at this time, the human body posture of the target human body may be modified from the falling posture to the non-falling posture.

Optionally, before modifying the human body posture of the target human body into the falling posture, the posture identifying device may divide the optical flow speed by the area of the head-shoulder region to obtain the head-shoulder speed of the target human body; and when the head-shoulder speed is greater than a speed threshold, determining that the optical flow speed satisfies the preset condition.

The speed threshold may be a threshold set in advance by a developer in the gesture recognition apparatus, or may be a threshold set by the developer by himself/herself through an application program.

In the embodiment of the application, the head-shoulder speed of the target human body can be calculated by the optical flow speed. For example, after the gesture recognition apparatus determines the head-shoulder region of the target human body, the optical flow velocity of the head-shoulder region is calculated by a dense optical flow algorithm, and then the calculated optical flow velocity is divided by the area of the head-shoulder region, thereby obtaining the head-shoulder velocity of the target human body.

Alternatively, the algorithm for calculating the head-shoulder area may be other optical flow algorithms, or the head-shoulder speed may be calculated by a difference in position of the head-shoulder area of the target human body in the effective video frame and a frame preceding the effective video frame.

And step 406, when the human body posture of the target human body is a falling posture, sending alarm information to the user terminal.

In the embodiment of the application, when the human body posture of the target human body acquired by the posture identification equipment is a falling posture, an alarm can be sent out to remind a user of falling, so that the user can determine whether rescue is needed in time.

The gesture recognition device may send the alarm information to the user terminal directly or indirectly through a server, for example, the alarm information may be sent to the user terminal through a short message, a mail, an application notification message, an application popup window, or the like.

Optionally, the alert information may include a current valid video frame, so that the user can more accurately understand the falling situation of the human body.

Optionally, if the body posture of the target body maintains the falling posture in at least two valid video frames acquired recently, the alarm information is sent to the user terminal.

In the embodiment of the application, in order to avoid false alarm caused by the fact that the gesture recognition device erroneously recognizes the target human body in the single-frame effective video frame as a falling gesture, when the target human body in the current effective video frame is recognized as the falling gesture, the human body gesture of the target human body in the previous frame or multiple frames of effective video frames can be further acquired, and if the human body gesture of the target human body in the previous frame or multiple frames of effective video frames is the falling gesture, alarm information is sent to the user terminal.

The scheme of the application firstly utilizes an inter-frame difference method to extract video frames with moving targets, and combines the fixed time interval setting to extract video frames, so as to carry out falling detection; and then detecting personnel and the body gestures thereof in the video picture by utilizing a human body detection and human body gesture estimation algorithm, and finally judging whether falling occurs or not by judging the mutual position relation among key points of the human body and the movement speed of head and shoulders. The scheme can detect the falling behavior in real time.

Referring to fig. 5, a schematic diagram of a fall gesture detection flow according to an embodiment of the present application is shown. As shown in fig. 5, the gesture recognition apparatus receives the decoded video frame (S51), first, judges whether there is a moving object by a moving object detection algorithm, and whether the interval time exceeds a set time interval threshold (S52), if there is no moving object, and if the interval time does not exceed the set time interval threshold, discards the current frame, and continues to acquire the next video frame (i.e., returns to S51); if there is a moving object or if there is no moving object but the interval time exceeds the set time interval threshold, then using a human body detection algorithm to detect whether there is a human body and a person in the picture (S53), after detecting the human body, using a human body posture estimation algorithm to locate coordinates of key points of the human body (currently using 18 key points: left eye, right eye, left ear, right ear, nose, neck, left and right shoulder, left and right elbow, left and right wrist, left and right crotch, left and right knee, left and right ankle) (S54), then judging whether the human body posture belongs to a fall (S55), if so, directly alarming (S56), acquiring a head and shoulder speed when the posture is judged to be normal (S57), judging whether the head and shoulder speed is greater than the set speed (S58), if the head and shoulder movement speed is greater than the set speed upper limit, otherwise, returning to S51.

The scheme utilizes the human body gesture in the single-frame static video picture and the continuous video frame to calculate the motion speed, so that the motion speed can be detected at the moment of falling, and monitoring can be continued after falling. The detection rate of falling behaviors and the flexibility of alarm setting are effectively increased.

In addition, in the scheme shown in the embodiment of the application, when the effective video frame is acquired, the gesture recognition equipment detects the moving target of the current video frame, and when the moving target is detected, the current video frame is acquired as the effective video frame, so that the problem of excessive consumption of calculation resources caused by detecting all the video frames is avoided, the resource consumption is reduced, and the requirement on hardware capacity is reduced.

In addition, when detecting that no moving object exists in the current video frame, the gesture recognition device further acquires a frame interval between the current video frame and an effective video frame acquired at the previous time, and when the frame interval reaches a preset threshold value, the current video frame is acquired as the effective video frame, so that the condition of missing detection is avoided, and the accuracy of falling gesture detection is improved.

In addition, in the scheme shown in the embodiment of the application, when the human body posture of the target human body is detected to be a non-falling posture through the key point information of the target human body, the posture identifying equipment further obtains the head-shoulder speed of the target human body and determines whether the target human body is in a falling state according to the head-shoulder speed, so that the accuracy of falling posture detection is further improved.

Fig. 6 is a block diagram showing a structure of a human body posture acquisition apparatus according to an exemplary embodiment. The body position acquisition device may be used in a computer apparatus to perform all or part of the steps of the embodiments shown in fig. 3 or fig. 4. The human body posture acquisition apparatus may include:

the video frame acquisition module 601 is configured to acquire an effective video frame in a video stream acquired by the image acquisition device;

A key point extraction module 602, configured to extract human body key point information of a target human body in the effective video frame, where the human body key point information includes position information of at least two key points in the target human body;

the gesture obtaining module 603 is configured to obtain a human gesture of the target human body according to the human body key point information, where the human gesture includes a falling gesture and a non-falling gesture.

Optionally, the gesture obtaining module 603 is configured to

upon acquiring the relative positional relationship between the at least two keypoints according to the positional information of the at least two keypoints, a posture acquisition module 603 for,

upon acquiring the human body posture of the target human body according to the relative positional relationship between the at least two key points, a posture acquisition module 603 for,

Optionally, the video frame acquisition module 601 is configured to,

detecting a moving target of the current video frame;

Optionally, the video frame acquisition module 601 is further configured to,

Optionally, the apparatus further includes:

Fig. 7 is a schematic diagram of a computer device, according to an example embodiment. The computer apparatus 700 includes a Central Processing Unit (CPU) 701, a system memory 704 including a Random Access Memory (RAM) 702 and a Read Only Memory (ROM) 703, and a system bus 705 connecting the system memory 704 and the central processing unit 701. The computer device 700 also includes a basic input/output system (I/O system) 706, which helps to transfer information between various devices within the computer, and a mass storage device 707 for storing an operating system 713, application programs 714, and other program modules 715.

The basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse, keyboard, or the like, for a user to input information. Wherein the display 708 and the input device 709 are coupled to the central processing unit 701 through an input output controller 710 coupled to a system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input output controller 710 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable media provide non-volatile storage for the computer device 700. That is, the mass storage device 707 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.

The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

The computer device 700 may be connected to the internet or other network device through a network interface unit 711 connected to the system bus 705.

The memory further includes one or more programs stored in the memory, and the central processing unit 701 implements all or part of the steps of the method shown in fig. 3 or fig. 4 by executing the one or more programs.

In exemplary embodiments, a non-transitory computer readable storage medium is also provided, such as a memory, including a computer program (instructions) executable by a processor of a computer device to perform all or part of the steps of the methods shown in the various embodiments of the application. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of acquiring a human body posture, the method comprising:

acquiring the human body posture of the target human body according to the human body key point information, wherein the human body posture comprises a falling posture and a non-falling posture;

when the human body posture of the target human body is a non-falling posture, acquiring at least three key points of the head and shoulder parts of the target human body; the at least three key points comprise a left shoulder key point and a right shoulder key point of the target human body;

acquiring a region corresponding to the minimum circumscribed rectangle of the at least three key points as a head-shoulder region of the target human body in the effective video frame, wherein the minimum circumscribed rectangle is the minimum rectangle containing the at least three key points;

Acquiring the optical flow speed of a head-shoulder area of the target human body in the effective video frame;

dividing the optical flow speed by the area of the head-shoulder area to obtain the head-shoulder speed of the target human body;

when the head-shoulder speed is greater than a speed threshold, determining that the optical flow speed meets a preset condition;

when the optical flow speed satisfies the preset condition, the human body posture of the target human body is modified into a falling posture.

2. The method according to claim 1, wherein the acquiring the human body pose of the target human body according to the human body keypoint information includes:

3. The method according to claim 2, wherein the position information of the at least two keypoints includes position information of at least two head keypoints in the target human body and position information of at least two lower body keypoints in the target human body;

the obtaining the relative positional relationship between the at least two key points according to the positional information of the at least two key points includes:

the obtaining the human body posture of the target human body according to the relative position relationship between the at least two key points includes:

4. The method according to claim 2, wherein the position information of the at least two keypoints includes a left upper body keypoint of the target human body and a left crotch keypoint of the target human body; the left upper body key points comprise left eye key points or left shoulder key points;

5. The method according to claim 2, wherein the position information of the at least two keypoints includes a right upper body keypoint of the target human body and a right crotch keypoint of the target human body; the right upper half body key points comprise right eye key points or right shoulder key points;

6. The method according to any one of claims 1 to 5, wherein the acquiring valid video frames in the video stream acquired by the image acquisition device comprises:

detecting a moving target of the current video frame;

7. The method of claim 6, wherein the method further comprises:

8. The method according to any one of claims 1 to 5, further comprising:

and when the human body posture of the target human body is a falling posture, sending alarm information to the user terminal.

9. The method according to claim 8, wherein the sending the alert information to the user terminal when the human body posture of the target human body is a falling posture includes:

and if the human body posture of the target human body keeps falling postures in at least two recently acquired effective video frames, sending the alarm information to the user terminal.

10. A human body posture acquisition device, the device comprising:

the gesture acquisition module is used for acquiring the human gesture of the target human body according to the human key point information, wherein the human gesture comprises a falling gesture and a non-falling gesture;

The key point acquisition module is used for acquiring at least three key points of the head and shoulder parts of the target human body when the human body posture of the target human body is a non-falling posture; the at least three key points comprise a left shoulder key point and a right shoulder key point of the target human body;

the region acquisition module is used for acquiring a region corresponding to a minimum circumscribed rectangle of the at least three key points as a head-shoulder region of the target human body in the effective video frame, wherein the minimum circumscribed rectangle is a minimum rectangle containing the at least three key points;

the optical flow speed acquisition module is used for acquiring the optical flow speed of the head-shoulder area of the target human body in the effective video frame;

the head-shoulder speed acquisition module is used for dividing the optical flow speed by the area of the head-shoulder area to obtain the head-shoulder speed of the target human body;

the condition determining module is used for determining that the optical flow speed meets a preset condition when the head-shoulder speed is greater than a speed threshold;

and the gesture modifying module is used for modifying the human gesture of the target human body into a falling gesture when the optical flow speed meets the preset condition.

11. A computer device, characterized in that the computer device comprises a processor and a memory, in which a program is stored, which program is executed by the processor to implement the human body posture acquisition method according to any one of claims 1 to 9.

12. A computer-readable storage medium having instructions stored therein, the instructions being executable by a processor of a computer device to implement the human posture acquisition method of any one of claims 1 to 9.