CN113807289A

CN113807289A - Human body posture detection method and device, electronic equipment and storage medium

Info

Publication number: CN113807289A
Application number: CN202111120416.3A
Authority: CN
Inventors: 项新建; 潘磊; 玉正英; 吴海腾
Original assignee: Hangzhou Sheng Guan Technology Co ltd; Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Hangzhou Sheng Guan Technology Co ltd; Zhejiang Lover Health Science and Technology Development Co Ltd; Zhejiang University of Science and Technology ZUST
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-17

Abstract

The application provides a method and a device for detecting human body gestures, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: determining personnel position information of a to-be-detected person in the target image; detecting key points aiming at the personnel images corresponding to the personnel position information in the target image to obtain a plurality of human body key points; determining a plurality of limb angles in the target image based on the plurality of human body key points; and determining the human body posture of the person to be detected according to a plurality of limb angles corresponding to the continuous multi-frame target images. According to the scheme, after the personnel image where the detection personnel are located is identified, a plurality of limb angles are obtained through the key point detection and limb angle conversion modes, accurate human posture identification can be achieved according to the limb angles of the multi-frame target images, interference of a complex background is avoided, and the human posture is directly identified through the human key points, and the identification accuracy rate is improved.

Description

Human body posture detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting a human body posture, an electronic device, and a computer-readable storage medium.

Background

In the workplaces such as the transformer substation, the bad behaviors of workers (such as smoking and wall turning) have potential safety hazards. In this case, it is necessary to detect the posture of the worker and issue an alarm according to the detection result. In the related art, after a scene image is acquired, the region where a person is located is identified from the scene image, and key points are extracted from the image of the region where the person is located, so that the human body posture of the person is determined. However, due to the complex background of the workplace, the personnel are often interfered by the background when being identified, and accurate identification cannot be realized; the human body posture determined directly by the key points is not accurate enough.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for detecting a human body posture, an electronic device, and a computer-readable storage medium, which are used for accurately recognizing a human body posture.

In one aspect, the application provides a method for detecting a human body posture, including:

determining personnel position information of a to-be-detected person in the target image;

detecting key points aiming at the personnel images corresponding to the personnel position information in the target image to obtain a plurality of human body key points;

determining a plurality of limb angles in the target image based on the plurality of human body key points;

and determining the human body posture of the person to be detected according to a plurality of limb angles corresponding to the continuous multi-frame target images.

In an embodiment, the determining the person position information of the person to be detected in the target image includes:

and taking the target image as an image to be detected, inputting the image to be detected into a trained target detection model, and acquiring the personnel position information output by the target detection model.

In one embodiment, the target image is a visible light image;

the determining of the person position information of the person to be detected in the target image includes:

acquiring an infrared image corresponding to the visible light image as an image to be detected;

and inputting the image to be detected into a trained target detection model to obtain the personnel position information output by the target detection model.

In one embodiment, the target detection model comprises a first feature extraction network, a second feature extraction network;

the inputting the image to be detected into the trained target detection model to obtain the personnel position information output by the target detection model comprises the following steps:

extracting high-resolution image features from the image to be detected based on the first feature extraction network;

extracting a first specified image feature, a second specified image feature and a third specified image feature from the high-resolution image features through the second feature extraction network;

and regressing to obtain the personnel position information of the personnel to be detected according to the first designated image characteristic, the second designated image characteristic and the third designated image characteristic.

In one embodiment, the second feature extraction network comprises a first extraction module, a second extraction module, a third extraction module, and a fourth extraction module;

the extracting, by the second feature extraction network, a first specified image feature, a second specified image feature, and a third specified image feature from the high-resolution image features includes:

extracting medium-resolution image features from the high-resolution image features based on the first extraction module;

extracting low-resolution image features from the medium-resolution image features based on the second extraction module as the first designated image features;

extracting a first intermediate image feature from the first specified image feature based on the third extraction module, and fusing the first intermediate image feature and the medium-resolution image feature to obtain a second specified image feature;

and extracting a second intermediate image feature from the second specified image feature based on the fourth extraction module, and fusing the second intermediate image feature and the high-resolution image feature to obtain a third specified image feature.

In an embodiment, the plurality of limb angles includes angle information corresponding to a plurality of angle categories;

the method for determining the human body posture of the person to be detected according to the plurality of limb angles corresponding to the continuous multi-frame target images comprises the following steps:

for each angle category, inputting the angle information of the continuous multi-frame target images in the angle category into a bidirectional recurrent neural network corresponding to the angle category to obtain a plurality of output parameters corresponding to the angle category;

aiming at a plurality of output parameters corresponding to each angle category, fusing to obtain specified output parameters corresponding to the angle categories;

and classifying to obtain the human body posture based on the designated output parameters corresponding to all the angle categories.

In an embodiment, before the classifying the human body posture based on the designated output parameters corresponding to all the angle categories, the method further includes:

determining a plurality of differential information corresponding to the angle categories according to the angle information of the continuous multi-frame target images in the angle categories aiming at each angle category;

aiming at a plurality of differential information corresponding to each angle category, fusing to obtain specified differential information corresponding to the angle category;

the classifying based on the designated output parameters corresponding to all the angle categories to obtain the human body posture comprises the following steps:

and classifying to obtain the human body posture based on the specified output parameters and the specified difference information corresponding to all the angle categories.

On the other hand, this application still provides a detection device of human gesture, includes:

the first determining module is used for determining the personnel position information of the personnel to be detected in the target image;

the first detection module is used for detecting key points of the personnel images corresponding to the personnel position information in the target image to obtain a plurality of human body key points;

the second determination module is used for determining a plurality of limb angles in the target image based on the plurality of human body key points;

and the second detection module is used for determining the human body posture of the person to be detected according to a plurality of limb angles corresponding to the continuous multi-frame target images.

Further, the present application also provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the above human body posture detection method.

In addition, the present application also provides a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program can be executed by a processor to complete the above human body posture detection method.

According to the scheme, after the position information of the personnel in the target image is determined, the position information of the personnel in the target image can be subjected to key point detection to obtain a plurality of key points of the human body, a plurality of limb angles are determined according to the key points of the human body, and then the posture of the human body is identified according to the plurality of limb angles corresponding to the continuous multi-frame target image; according to the scheme, after the personnel image where the detection personnel are located is identified, a plurality of limb angles are obtained through the key point detection and limb angle conversion modes, accurate human posture identification can be achieved according to the limb angles of the multi-frame target images, interference of a complex background is avoided, the human posture is directly identified through the human key points, and the identification accuracy rate is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for detecting a human body posture according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a method for detecting a position of a person according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of a feature extraction method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a method for determining a human body posture according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating an architecture of a method for determining a human body posture according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a method for determining a human body posture according to another embodiment of the present application;

fig. 8 is a schematic diagram illustrating a method for determining a human body posture according to another embodiment of the present application;

fig. 9 is a block diagram of a human body posture detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor 11 being exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10, and the memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the flow of the method in the embodiments described below. In an embodiment, the electronic device 1 may be an inspection robot with a camera device, and is used for executing the human body posture detection method.

The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

The present application also provides a computer-readable storage medium storing a computer program executable by the processor 11 to perform the human body posture detection method provided by the present application.

Referring to fig. 2, a flowchart of a method for detecting a human body posture according to an embodiment of the present application is shown, and as shown in fig. 2, the method may include the following steps 210 to 240.

Step 210: and determining the personnel position information of the personnel to be detected in the target image.

The target image is an image which needs to be subjected to human body posture detection; the target image may be an image of a workplace acquired by the robot during the inspection process. The person appearing in the target image is the person to be detected.

The robot can acquire images of a workplace as target images through camera equipment (such as a gun camera, a dome camera, a panoramic camera and the like) mounted on the robot. In one case, the robot may take a picture of the workplace and take each video frame as the target image during the inspection process. In another case, during the inspection process, the robot may take a picture of a region where a person stays, and continuously take a plurality of pictures (for example, 10 pictures in 1 second) at a time as a target picture.

For each frame of target image, the robot can determine the personnel position information of the personnel to be detected in the target image. The person position information may be expressed as coordinates of any one of the upper left corner, the upper right corner, the lower left corner, the lower right corner, the center point of the anchor frame defining the person position in the image coordinate system, and the length and width of the anchor frame.

Step 220: and detecting key points aiming at the personnel images corresponding to the personnel position information in the target image to obtain a plurality of human body key points.

After obtaining the person position information in the target image, the robot may determine that the person to be detected exists in the target image, in which case, the human posture of the person to be detected needs to be detected. The robot can cut out a local image of a person to be detected from the target image according to the person position information to serve as a person image, and key point detection is carried out on the person image according to a key point detection algorithm to obtain a plurality of human body key points. Here, the keypoint detection algorithm may be AlphaPose, which may be trained over an MPII data set.

For example, the detected body key points may include a head, a neck, a right shoulder, a left shoulder, a right elbow, a left wrist, a right wrist, a chest, a right hip, a left hip, a right knee, a left knee, a right ankle, and a left ankle. The human body key point can express the position of the human body key point in the target image according to the coordinates in the image coordinate system.

And after the personnel image is cut out, the key point detection is executed, so that the interference of the image background to the key point detection can be reduced, and the calculated amount is greatly reduced.

Step 230: and determining a plurality of limb angles in the target image based on the plurality of human body key points.

For any person image, after obtaining a plurality of human body key points corresponding to the person image, the robot may convert the human body key points into limb angles. The robot can determine a limb angle according to three human key points.

For example, the robot can determine the head-neck-right-shoulder included angle according to the coordinates of three human key points, namely the head, the neck and the right shoulder; determining a head-neck-left-shoulder included angle according to coordinates of three human body key points of the head, the neck and the left shoulder; determining the right-shoulder right-elbow included angle of the neck according to the coordinates of three human body key points of the neck, the left shoulder and the right elbow; determining the included angle of the neck, the left shoulder and the left elbow according to the coordinates of three human key points of the neck, the left shoulder and the left elbow; determining the right shoulder right elbow right wrist included angle according to the coordinates of three human body key points of the right shoulder, the right elbow and the right wrist; determining the left shoulder, the left elbow and the left wrist included angle according to the coordinates of three human key points of the left shoulder, the left elbow and the left wrist; determining the included angle of the chest, the right hip and the right knee according to the coordinates of the three key points of the chest, the right hip and the right knee; determining the included angle of the chest, the left hip and the left knee according to the coordinates of the three human body key points of the chest, the left hip and the left knee; determining a right hip, right knee and right ankle included angle according to coordinates of three human body key points of the right hip, the right knee and the right ankle; and determining the left hip, left knee and left ankle included angle according to the coordinates of the three human body key points of the left hip, the left knee and the left ankle.

The robot can determine a limb angle according to two key points of the human body and a horizontal line. For example, the robot may determine a horizontal line angle of the right knee and the right ankle according to coordinates of two key points of the right knee and the right ankle and a horizontal line in the image; the robot can determine the horizontal line included angle of the left knee and the left ankle according to the coordinates of the two key points of the left knee and the left ankle and the horizontal line in the image.

Step 240: and determining the human body posture of the person to be detected according to the plurality of limb angles corresponding to the continuous multi-frame target images.

The human body posture to be detected can be preset according to the application scene requirements. For example, the human body gestures such as wall turning, falling, smoking, running, walking and the like can be detected in a workplace.

For each frame of target image, the robot can determine a plurality of limb angles, and after the steps are performed on the continuous multi-frame target images, the plurality of limb angles corresponding to the multi-frame target images can be obtained. The robot can determine the human body posture of the person to be detected according to the plurality of limb angles corresponding to the target image, and accurately determine the human body posture according to the plurality of limb angles corresponding to the continuous multi-frame target image, so that misjudgment based on detection of a single target image is avoided.

After the human body posture is obtained, if the human body posture is the specified human body posture needing early warning, the robot can report early warning information containing the target image to the back-end equipment. Illustratively, the appointed human body postures can comprise wall turning, falling and smoking, and when the robot detects that the human body posture of the person to be detected is any one of the appointed human body postures, the robot can report early warning information to avoid potential safety hazards of illegal behaviors in a workplace.

In an embodiment, when the robot determines the person position information of the person to be detected in the target image, the target image may be used as the image to be detected. The image to be detected is an image needing to receive target detection. The robot can input the image to be detected into the trained target detection model, so that the personnel position information output by the target detection model is obtained.

The target detection model may be a model such as yolo (young Only Look one), SSD (Single Shot multi box Detector), Fast RCNN (Fast Regions with volumetric Neural Networks resources), or the like.

In an embodiment, the robot may be equipped with an infrared camera device that can capture images of the same field of view simultaneously with a conventional camera device (visible light camera device). In this case, the robot takes the visible light image captured by the visible light imaging device as the target image.

When the person position information of the person to be detected in the target image is determined, the robot can acquire an infrared image corresponding to the visible light image as the target image as the image to be detected. Wherein, the infrared image is collected by an infrared camera device; the infrared image collected simultaneously with the visible light image is an infrared image corresponding to the visible light image.

The robot can input the image to be detected into the trained target detection model to obtain the personnel position information output by the target detection model. The target detection model may be a model such as YOLO, SSD, Fast RCNN, etc. Because the infrared image corresponding to the visible light image has the same visual field as the visible light image and the same acquisition time, the person position information of the person to be detected in the infrared image is the person position information of the person to be detected in the visible light image (target image).

Due to the fact that the target detection effect of the infrared image is better, the position information of the personnel is determined based on the infrared image, missing detection can be avoided, and accuracy of human body posture detection is improved.

In an embodiment, the object detection network may comprise a first feature extraction network and a second feature extraction network. Referring to fig. 3, which is a flowchart illustrating a method for detecting a position of a person according to an embodiment of the present disclosure, as shown in fig. 3, when determining position information of the person according to a target image or an infrared image, a robot may perform the following steps 310 to 330.

Step 310: and extracting high-resolution image features from the image to be detected based on the first feature extraction network.

The first feature extraction Network may be an adjusted VGG16(Visual Geometry Group Network 16) Network. On the basis of a conventional VGG16 network, the first fully-connected layer of the VGG16 network may be changed to a convolutional layer of convolution kernel 3 × 3, the second fully-connected layer of the VGG16 network may be changed to a convolutional layer of convolution kernel 1 × 1, and the third fully-connected layer may be removed. The adjusted VGG16 network may focus more on global context and target details.

The robot may input the target image into the adjusted VGG16 network and extract high resolution image features from the target image through the adjusted VGG16 network. Here, the image feature may be a feature map (feature map).

Step 320: and extracting the first specified image feature, the second specified image feature and the third specified image feature from the high-resolution image features through a second feature extraction network.

After obtaining the high-resolution image features, the robot may calculate the high-resolution image features based on the second feature extraction network, thereby obtaining the first designated image feature, the second designated image feature, and the third designated image feature.

Step 330: and regressing to obtain the personnel position information of the personnel to be detected according to the first designated image characteristic, the second designated image characteristic and the third designated image characteristic.

The robot can perform frame regression based on the first specified image feature, the second specified image feature and the third specified image feature, and obtains an anchor frame of the person to be detected in the target image through non-maximum suppression to serve as the person position information.

In an embodiment, the second feature extraction network may include a first extraction module, a second extraction module, a third extraction module, and a fourth extraction module. Referring to fig. 4, which is a flowchart illustrating a feature extraction method according to an embodiment of the present application, as shown in fig. 4, when the robot extracts image features according to the second feature extraction network, the robot may perform the following steps 321 to 324.

Step 321: the medium-resolution image features are extracted from the high-resolution image features based on a first extraction module.

The first extraction module may include a convolution layer with a convolution kernel of 1 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a step size of 2.

After the robot inputs the high-resolution image features into the first extraction module, the high-resolution image features are subjected to convolution calculation through the two convolution layers, and the medium-resolution image features are obtained.

Step 322: and extracting the low-resolution image features from the medium-resolution image features based on a second extraction module to serve as the first specified image features.

The second extraction module may include a convolution layer with a convolution kernel of 1 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a step size of 2.

And after the robot inputs the medium-resolution image features into the second extraction module, carrying out convolution calculation on the medium-resolution image features through the two convolution layers to obtain the low-resolution image features, and taking the low-resolution image features as the first specified image features.

Step 323: and extracting a first intermediate image feature from the first specified image feature based on a third extraction module, and fusing the first intermediate image feature and the medium-resolution image feature to obtain a second specified image feature.

Wherein, the third extraction module may include a convolution layer with convolution kernel 1 × 1 and an upsampling layer.

And the robot inputs the first specified image feature into a third extraction module, performs convolution calculation on the convolution layer, and performs up-sampling processing to obtain a first intermediate image feature. The first intermediate image feature is an image feature output by the third extraction module.

The robot may perform a fusion process on the first intermediate image feature and the medium-resolution image feature to obtain a second specified image feature. Here, the robot may superimpose the first intermediate image feature and the middle-resolution image feature in the channel direction, and then perform convolution calculation, thereby implementing the fusion process.

Step 324: and extracting a second intermediate image feature from the second specified image feature based on a fourth extraction module, and fusing the second intermediate image feature and the high-resolution image feature to obtain a third specified image feature.

Wherein, the fourth extraction module may include a convolution layer with convolution kernel 1 × 1 and an upsampling layer.

The robot may input the second specified image feature into the fourth extraction module, perform convolution calculation, and perform upsampling processing to obtain a second intermediate image feature. The second intermediate image feature is an image feature output by the fourth extraction module.

The robot may perform a fusion process on the second intermediate image feature and the high resolution image feature to obtain a third specified image feature. Here, the robot may superimpose the first intermediate image feature and the middle-resolution image feature in the channel direction, and then perform convolution calculation, thereby implementing the fusion process.

Due to the fact that the distance between a person to be detected in the target image and the robot is different, the size difference exists in the image, the designated image features for detection are determined by utilizing the image features with three resolutions, the global features of the image and the local features of the area where the person to be detected is located can be effectively combined together, and the subsequent target detection result is more accurate.

In an embodiment, referring to fig. 5, which is a flowchart illustrating a method for determining a human body posture according to an embodiment of the present application, as shown in fig. 5, when a robot determines a human body posture according to a plurality of limb angles corresponding to a plurality of consecutive multi-frame target images, the following steps 510 to 550 may be performed.

Step 510: and inputting the angle information of the continuous multi-frame target images in the angle category into a bidirectional recurrent neural network corresponding to the angle category according to each angle category to obtain a plurality of output parameters corresponding to the angle category.

Referring to fig. 6, which is a schematic diagram of an architecture of a method for determining a human body posture according to an embodiment of the present disclosure, as shown in fig. 6, each angle category has a corresponding Bidirectional Recurrent Neural Network (BRNN).

Illustratively, the angle categories may include a head neck right shoulder included angle, a head neck left shoulder included angle, a neck right shoulder right elbow included angle, a neck left shoulder left elbow included angle, a right shoulder right elbow right wrist included angle, a left shoulder left elbow left wrist included angle, a chest right hip right knee included angle, a chest left hip left knee included angle, a right hip right knee right ankle included angle, a left hip left knee ankle included angle, a right knee right ankle included angle, a left knee left ankle included angle, a right knee right ankle horizontal line included angle, a left knee left ankle horizontal line included angle, and the like.

For each angle category, the robot may input angle information for a specified number of consecutive target images into the bidirectional recurrent neural network, thereby obtaining a plurality of output parameters corresponding to the angle category. Here, the output parameter may be a single value. For example, the robot may input angle information of 10 consecutive frames of target images at an angle between the head, the neck, and the right shoulder into a bidirectional circulation neural network corresponding to the angle between the head, the neck, and the right shoulder, to obtain 10 output parameters.

After the angle information of each angle category is input into the corresponding bidirectional recurrent neural network, a plurality of output parameters corresponding to all angle categories can be obtained.

Step 520: and aiming at the plurality of output parameters corresponding to each angle category, fusing to obtain the specified output parameters corresponding to the angle categories.

As shown in fig. 6, for a plurality of output parameters corresponding to any angle category, the robot may calculate the plurality of output parameters through a nonlinear activation function and a full connection layer, so as to obtain a designated output parameter corresponding to the angle category by fusion.

Illustratively, the robot fuses 10 output parameters corresponding to the head-neck right shoulder included angle through a nonlinear activation function and a full connection layer to obtain specified output parameters corresponding to the head-neck right shoulder included angle.

After the output parameters of each angle category are fused, the robot can obtain the designated output parameters corresponding to all the angle categories.

Step 550: and classifying to obtain the human body posture based on the designated output parameters corresponding to all the angle categories.

After obtaining the designated output parameters corresponding to all the angle categories, as shown in fig. 6, the robot may perform fusion processing on the plurality of designated output parameters through the nonlinear activation function and the full-link layer, and perform normalization processing on the fusion result through the normalization function to obtain a multidimensional vector. The number of elements of the multi-dimensional vector is the same as the type of the preset human body posture, and each element corresponds to one human body posture category and represents the confidence coefficient of the human body posture.

For example, in the present application, five human body postures of wall turning, falling, smoking, running, and walking are detected, and then the multidimensional vector includes 5 elements, each element being a confidence level of the corresponding human body posture.

The robot can determine the human body posture with the highest confidence coefficient as the human body posture of the person to be detected.

In an embodiment, referring to fig. 7, a flowchart of a method for determining a human body posture according to another embodiment of the present application is shown in fig. 7, and as shown in fig. 7, when a robot determines a human body posture according to a plurality of limb angles corresponding to a plurality of consecutive multi-frame target images, the following steps 510 to 551 may be performed.

Referring to fig. 8, which is an architectural diagram illustrating a method for determining a human body posture according to another embodiment of the present application, as shown in fig. 8, each angle class has a corresponding bidirectional recurrent neural network.

After performing

steps

510 and 520, the robot may obtain the specified output parameters corresponding to all angle categories.

Step 530: and determining a plurality of difference information corresponding to the angle categories according to the angle information of the continuous multi-frame target images in the angle categories aiming at each angle category.

For any angle category, the robot may subtract the angle information of the previous frame target image in the angle category from the angle information of the next frame target image in the angle category to obtain a plurality of difference information. For example, the robot performs human body posture detection according to 10 continuous frames of target images, and can calculate 9 pieces of differential information corresponding to the included angle between the head, the neck and the right shoulder.

After processing the angle information for each angle category, the robot may obtain multiple sets of differential information. Illustratively, there are 12 angle categories and 10 frames of target images, and 12 sets of differential information can be determined, each set containing 9 differential information.

Step 540: and fusing the plurality of differential information corresponding to each angle type to obtain the designated differential information corresponding to the angle type.

As shown in fig. 8, for a plurality of pieces of differential information corresponding to any angle type, the robot may calculate the plurality of pieces of differential information through the full link layer, and obtain the designated differential information corresponding to the angle type by fusion.

Illustratively, the robot can fuse 9 pieces of differential information corresponding to the head-neck right shoulder included angle through the full connection layer to obtain the designated differential information corresponding to the head-neck right shoulder included angle.

After the difference information of each angle category is fused, the robot can obtain the designated difference information corresponding to all the angle categories.

Step 551: and classifying to obtain the human body posture based on the specified output parameters and the specified difference information corresponding to all the angle categories.

After obtaining the designated output parameters and the designated differential information corresponding to all the angle categories, as shown in fig. 8, the robot may perform fusion processing on the plurality of designated output parameters and the plurality of designated differential information through the nonlinear activation function and the full link layer, further calculate the fusion result through the nonlinear activation function and the full link layer, and normalize the calculation result to obtain the multidimensional vector. The number of elements of the multi-dimensional vector is the same as the type of the preset human body posture, and each element corresponds to one human body posture category and represents the confidence coefficient of the human body posture.

By the measures, the angle information of the same angle category is used for detecting the human body posture after calculating the differential information, so that more accurate human body posture can be obtained. For human body postures with similar final postures and different substantives, such as falling, sitting, resting, lying, sleeping, falling injury, lying and the like, the difference information of the limb angles among continuous multi-frame images is introduced, so that the action information of the human body postures can be fully learned, and the similar human body postures and the substantive different human body postures can be distinguished.

Fig. 9 is a device for detecting a posture of a human body according to an embodiment of the present invention, and as shown in fig. 9, the device may include:

a first determining module 910, configured to determine person position information of a person to be detected in a target image;

a first detection module 920, configured to perform key point detection on a person image corresponding to the person position information in the target image to obtain a plurality of human body key points;

a second determining module 930, configured to determine a plurality of limb angles in the target image based on the plurality of human body key points;

the second detecting module 940 is configured to determine the human body posture of the person to be detected according to a plurality of limb angles corresponding to the continuous multi-frame target images.

The implementation processes of the functions and actions of the modules in the device are specifically described in the implementation processes of the corresponding steps in the human body posture detection method, and are not described herein again.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method for detecting human body posture is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining the person position information of the person to be detected in the target image comprises:

3. The method of claim 1, wherein the target image is a visible light image;

4. The method of claim 2 or 3, wherein the object detection model comprises a first feature extraction network, a second feature extraction network;

5. The method of claim 4, wherein the second feature extraction network comprises a first extraction module, a second extraction module, a third extraction module, and a fourth extraction module;

6. The method of claim 1, wherein the plurality of limb angles comprises angle information corresponding to a plurality of angle categories;

7. The method of claim 6, wherein before the classifying the human body gesture based on the specified output parameters corresponding to all the angle classes, the method further comprises:

8. A human body posture detection device, comprising:

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of human gesture detection of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the method of human gesture detection of any one of claims 1-7.