CN112784786A - Human body posture recognition method and device - Google Patents

Human body posture recognition method and device Download PDF

Info

Publication number
CN112784786A
CN112784786A CN202110126255.2A CN202110126255A CN112784786A CN 112784786 A CN112784786 A CN 112784786A CN 202110126255 A CN202110126255 A CN 202110126255A CN 112784786 A CN112784786 A CN 112784786A
Authority
CN
China
Prior art keywords
image
information
frame
target object
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110126255.2A
Other languages
Chinese (zh)
Inventor
马骁
钟诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202110126255.2A priority Critical patent/CN112784786A/en
Publication of CN112784786A publication Critical patent/CN112784786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Abstract

The application discloses a human body posture identification method and a human body posture identification device, wherein a multi-frame two-dimensional image containing a target object is obtained; determining a human body motion track of a target object based on image characteristic information of at least two frames of two-dimensional images in a plurality of frames of two-dimensional images; acquiring first image structure information of a first image and second image structure information of a second image in a plurality of frames of two-dimensional images; and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information. The recognition of the human body posture in the three-dimensional space through the multi-frame two-dimensional image is realized, and the accuracy of the recognition result is improved.

Description

Human body posture recognition method and device
Technical Field
The application relates to the technical field of information processing, in particular to a human body posture identification method and device.
Background
Human body posture recognition is widely applied to the fields of human-computer interaction, intelligent control, human body behavior analysis and the like. At present, the human body posture is generally recognized by using a key point detection technology, and the precision of a human body posture recognition result can be reduced due to instability of a single two-dimensional image in the key point detection process.
Disclosure of Invention
In view of this, the present application provides the following technical solutions:
a human body posture recognition method comprises the following steps:
acquiring a multi-frame two-dimensional image containing a target object;
determining a human body motion track of the target object based on image characteristic information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images;
acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional images, wherein the number of image frames spaced between the first image and the second image meets a specific condition;
and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information.
Optionally, the method further comprises:
acquiring a reference image of a target object;
and detecting whether the current human body posture is matched with the target posture or not based on the current special body posture and the target posture corresponding to the reference image to obtain a detection result.
Optionally, the method further comprises:
and if the detection result meets the prompt condition, generating prompt information, wherein the prompt information is used for prompting the target object to adjust the sub-target posture of the current human body posture.
Optionally, the acquiring a reference image including the target object includes:
determining a shot image of the target object in a target posture as a reference image;
alternatively, the first and second electrodes may be,
generating a screening condition corresponding to the target object being in a target posture;
and determining a reference image in the multi-frame two-dimensional images based on the screening condition.
Optionally, the determining the human motion trajectory of the target object based on the image feature information of at least two frames of images of the multiple frames of two-dimensional images includes:
performing face recognition on at least two frames of two-dimensional images in the multi-frame two-dimensional images to determine face motion track information;
performing key point identification on at least two frames of two-dimensional images in the multi-frame two-dimensional images, and determining key point motion track information, wherein the key points at least comprise human face key points and body key points;
and determining the human body motion track of the target object based on the human face motion track information and the key point motion track information.
Optionally, the performing face recognition on at least two-dimensional images of the multiple frames of two-dimensional images to determine a face recognition track includes:
performing face recognition on at least two frames of images in the multi-frame two-dimensional images to obtain the size and position information of a face recognition frame of each recognized image;
determining face motion track information based on the size and position information of the face recognition frame of each recognized image;
the identifying key points of at least two frames of two-dimensional images in the multi-frame two-dimensional images and determining the motion trail information of the key points comprises the following steps:
performing key point identification on at least two frames of images in the multi-frame two-dimensional images to obtain the position information of the key point of each identified image;
and determining the motion trail information of the key points based on the position information of the key points of each identified image.
Optionally, the determining the human body motion trajectory of the target object based on the face motion trajectory information and the key point motion trajectory information includes:
inputting the face motion track information and the key point motion track information into a track recognition model to obtain a plurality of human motion sub-tracks and confidence degrees corresponding to each human motion sub-track;
and determining the human motion trajectory of the target object in a plurality of human motion sub-trajectories based on the confidence corresponding to each human motion sub-trajectory.
Optionally, the acquiring first image structure information of a first image and second image structure information of a second image in the multiple frames of two-dimensional images includes:
acquiring a first face recognition frame and a first group of key points of a first image in the multi-frame images;
determining the first face recognition frame and a first structural relationship formed by the first group of key points as first image structural information;
acquiring a second face recognition frame and a second group of key points of a second image in the multi-frame image;
and determining a second structural relationship formed by the first face recognition frame and the second group of key points as second image structural information.
Optionally, the identifying, based on the human motion trajectory, the first image structure information, and the second image structure information, to obtain the current human posture of the target object includes:
recognizing and obtaining the posture information of the target object based on the first image structure information and the second image structure information;
and determining the current human body posture of the target object based on the human body motion track and the posture information.
A human body posture recognition apparatus comprising:
an image acquisition unit configured to acquire a plurality of frames of two-dimensional images including a target object;
the track determining unit is used for determining the human motion track of the target object based on the image characteristic information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images;
the information acquisition unit is used for acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional images, wherein the number of image frames spaced between the first image and the second image meets a specific condition;
and the gesture recognition unit is used for recognizing and obtaining the current human body gesture of the target object based on the human body motion track, the first image structure information and the second image structure information.
According to the technical scheme, the application discloses a human body posture identification method and device, and a multi-frame two-dimensional image containing a target object is obtained; determining a human body motion track of a target object based on image characteristic information of at least two frames of two-dimensional images in a plurality of frames of two-dimensional images; acquiring first image structure information of a first image and second image structure information of a second image in a plurality of frames of two-dimensional images; and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information. The recognition of the human body posture in the three-dimensional space through the multi-frame two-dimensional image is realized, and the accuracy of the recognition result is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a human body posture recognition method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of obtaining a human motion trajectory based on a trajectory recognition model according to an embodiment of the present application;
fig. 3 is a schematic diagram of image structure information in different frame images according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of image structure information corresponding to a target frame image according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a human body posture recognition device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a human body posture identification method, which is an important component part of human body behavior and action identification and aims to input parameter information of whole or local limbs of a human body, such as the position and the orientation of the head of the human body. The posture of the human body is recognized by the aid of the multi-frame two-dimensional images, so that the posture recognition result is more robust, and the posture in a three-dimensional space can be recognized, for example, the bending condition of the cervical vertebra on the side face of the human body is recognized according to the multi-frame front human body images.
Referring to fig. 1, which shows a schematic flow chart of a human body posture recognition method provided in an embodiment of the present application, the method may include the following steps:
s101, acquiring a multi-frame two-dimensional image containing a target object.
The target object refers to a living organism to be identified, such as a human user, and it should be noted that the target object may be one or a plurality of objects. The human body posture recognition method provided by the embodiment of the application can be applied to posture recognition of a specific user in front of the electronic equipment, and also can be used for posture recognition of a plurality of users shot by the camera, such as posture recognition of all students in a classroom.
The multi-frame two-dimensional image can be a plurality of frames of images in the collected video data, the video can be understood as being composed of at least one frame of image data, in order to identify the human body posture in the video, the video can be divided into one frame of image data, and each frame of image data is analyzed respectively. The plurality of frames of two-dimensional images may be a plurality of consecutive frames of two-dimensional images, i.e. the acquired two-dimensional images are temporally consecutive images. Correspondingly, in order to reduce the amount of data calculation, a specific multi-frame two-dimensional image may be selected from the video data, for example, the video data is a video with a period of 5 seconds, and two frames of images in each second may be combined to form a final multi-frame two-dimensional image to be processed. The two-dimensional image is a planar image containing no depth information, and is two-dimensional, i.e., in four directions, i.e., left and right, and up and down, and has no front and back directions. The two-dimensional image acquired in the embodiment of the present application may be an RGB (color representing three channels of red, green, and blue) color two-dimensional image, or may be a grayscale two-dimensional image, which is not limited herein.
S102, determining the human motion track of the target object based on the image characteristic information of at least two frames of two-dimensional images in the multi-frame two-dimensional images.
The image feature information of each two-dimensional image of the plurality of frames of two-dimensional images may be processed. Correspondingly, in order to improve the processing efficiency and reduce the occupation of processing resources, at least two frames of two-dimensional images in the multi-frame two-dimensional images may also be processed, and the selected at least two frames of two-dimensional images may be two-dimensional images corresponding to temporally continuous image frames, or two-dimensional images corresponding to sequentially continuous image frames, that is, two-dimensional images of a plurality of continuous frames, such as two-dimensional images of continuous 8 frames. The determination can be specifically made according to the application scenario and the specific recognition accuracy.
The image feature information of the two-dimensional image refers to features that can be obtained by recognition in the image, and may include face features and key point features, where the key points may be face key points or body key points, and may include key points of human body parts such as a neck, a left shoulder, and a right shoulder, for example. Specifically, the face features can be obtained by identifying the face identification frame, so that the face features can be obtained by the size and position information of the face identification frame. The keypoint features are obtained by means of position information of the keypoints. Therefore, after the image characteristic information is obtained, the human body motion track can be obtained by analyzing and drawing according to the human face characteristics and the key point characteristics. For example, the movement track of the key points and the movement track of the human face are determined according to the coordinate positions of the key points, and the human motion track is obtained through comprehensive analysis. The specific implementation process of the human body motion trajectory will be described in the following embodiments of the present application.
S103, acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional image.
Wherein the first image and the second image both belong to an image of a certain frame of the multi-frame two-dimensional image, and the number of image frames spaced between the first image and the second image satisfies a certain condition. The particular condition may be that the number of image frames of the interval satisfies a particular number threshold, such as the number of image frames of the interval between the first image and the second image being greater than 20 frames. In another possible implementation, the specific condition may be that the number of image frames of an interval satisfies a specific acquisition time condition. For example, the first image may be any frame of image within a first second of acquisition time and the second image may be any frame of image within a last second of acquisition time. In order to distinguish the first image from the second image more clearly, the first two-dimensional image of the plurality of two-dimensional images may be used as the first image, and the last two-dimensional image of the plurality of two-dimensional images may be used as the second image.
Wherein the image structure information is information determined based on a structural relationship between target points in a current image (e.g., the first image or the second image). The target point may be a determined key point in the target object in the first image or the second image, such as a face key point and a body key point. Correspondingly, the structural relationship between the target points is obtained by representing the closed loop structures formed by the target points according to a certain sequence. Correspondingly, the human face identification frame can be obtained by performing closed-loop structure representation on the human face target point, and the closed-loop structure obtained by sequentially connecting a plurality of key points on the body can be the structural relationship of a group of key points. Therefore, the structural relationship between the face recognition frame and a set of key points is determined as image structure information.
By analyzing the image structure information of the first image and the second image, the position change between key points in different images can be obtained, and the image depth information can be deduced. For example, the face recognition frame becomes larger gradually, and it can be concluded that the action made by the target object is approaching the camera gradually. Namely, the depth information of the forward movement of the body of the target object is obtained.
And S104, identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information.
The human body motion track can represent the track of the movement of the gravity center of the human body, and the first image structure information and the second image structure information can be analyzed to obtain the depth information of the image, namely the posture information of the human body can be obtained. Then, the current human body posture of the target object, such as the direction and distance information of head movement, can be identified through the human body motion track and the human body posture information. The image depth information is obtained on the basis of the original two-dimensional information, so that the combined movement information of any one direction or a plurality of directions of front and back, left and right, upper and lower in the human body posture can be obtained, and the recognized human body posture is the posture in a three-dimensional space, such as whether the body leans forward, the forward leaning amplitude, the cervical vertebra bending condition and the like. The specific identification process will be described in detail in the following examples of the present application.
The human body posture identification method provided by the embodiment of the application obtains a plurality of frames of two-dimensional images containing a target object; determining a human body motion track of a target object based on image characteristic information of at least two frames of two-dimensional images in a plurality of frames of two-dimensional images; acquiring first image structure information of a first image and second image structure information of a second image in a plurality of frames of two-dimensional images; and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information. The recognition of the human body posture in the three-dimensional space through the multi-frame two-dimensional image is realized, and the accuracy of the recognition result is improved.
The human body posture recognition method provided by the embodiment of the application can be applied to recognition of human body postures, and further judges the states and behaviors of the human body through the postures obtained through recognition so as to achieve the purpose of controlling electronic equipment through the human body states or adjusting the human body postures.
Taking the control of the electronic equipment through the human body state as an example, whether a control instruction matched with the current human body posture exists can be detected, if so, the control instruction matched with the current human body posture is obtained, and the electronic equipment is controlled to reach the target state according to the control instruction. For example, if the control instruction corresponding to the shaking head is stored in advance as the page turning during reading of the reading interface, the reading interface is controlled to turn the page when the current human body posture of the target object is recognized to be the shaking head.
Taking the adjustment of the human body posture as an example, the current human body posture of the target object obtained by recognition needs to be compared with the target posture, and whether the target object is reminded to adjust the current human body posture is judged according to the comparison result. In a possible implementation manner, a reference image including a target object may be acquired, and based on a target posture corresponding to the current human body posture and the reference image, whether the current human body posture matches the target posture is detected, so as to obtain a detection result. Correspondingly, if the detection result meets the prompt condition, prompt information is generated, and the prompt information is used for prompting the target object to adjust the current human body posture to the target posture. The prompting condition may be a condition representing that the current human body posture is matched with the target posture, such as whether the current human body posture is within an error range of the target posture. For example, the target posture is a head-end posture, and if it is detected that the target object is in a slightly head-tilted posture, that is, within an error range of the head-end posture, the guidance information is not generated. And if the target object is in the head-tilting posture and the amplitude exceeds the range of the head correcting posture, generating prompt information to prompt the target object to correct the head posture.
The target pose in the embodiment of the present application is the pose of the target object in the reference image. The captured image in which the target object is in the target posture may be determined as a reference image, and for example, an image in which the target object is moving correctly may be captured as a reference image as a reference for subsequent processing. A filtering condition corresponding to the target object being in the target posture may be generated, and the reference image may be specified in the multi-frame two-dimensional image based on the filtering condition. Since a multi-frame two-dimensional image containing the target object is to be acquired, the reference image can be determined here in accordance with the filtering condition. The screening condition may include two-dimensional coordinate information of a corresponding key point when the target pose is reached, and the reference image is determined in the acquired multi-frame two-dimensional image according to the two-dimensional coordinate information.
In order to make the human body posture recognition result more robust, in the embodiment of the present application, the human body motion trajectory is determined by the human face motion trajectory information and the key point motion trajectory information. Correspondingly, the determining the human body motion trajectory of the target object based on the image feature information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images includes:
performing face recognition on at least two frames of images in the multi-frame two-dimensional images to determine face motion track information;
performing face recognition on at least two frames of images in the multi-frame two-dimensional images, and determining the motion trail information of key points, wherein the key points comprise face key points and body key points;
and determining the human body motion track of the target object based on the human face motion track information and the key point motion track information.
The face recognition of at least two frames of two-dimensional images in the multi-frame two-dimensional images to determine face motion trajectory information includes: performing face recognition on at least two frames of images in the multi-frame two-dimensional images to obtain the size and position information of a face recognition frame of each recognized image; determining face motion track information based on the size and position information of the face recognition frame of each recognized image;
the identifying key points of at least two frames of two-dimensional images in the multi-frame two-dimensional images and determining the motion trail information of the key points comprises the following steps: performing key point identification on at least two frames of images in the multi-frame two-dimensional images to obtain the position information of the key point of each identified image; and determining the motion trail information of the key points based on the position information of the key points of each identified image.
Usually, the recognition of the face features is performed by a face recognition frame, and the face recognition frame for recognizing the face features also changes in position along with the movement of the target object in the movement process of the target object. Correspondingly, the corresponding face motion track information can be determined through the size and the position of the face recognition frame. The distance between the target object and the camera can be judged according to the size of the face recognition frame, namely depth information in a two-dimensional image containing the target object can be obtained, then the position of the face recognition frame is analyzed, face position offset information can be obtained, and therefore face motion track information can be determined according to the size and the position of the face recognition frame.
Correspondingly, the motion trail of the key point can be drawn through the position coordinates of the key point in different frames, so that the motion trail information of the key point is obtained. And then analyzing the motion track information of the face to obtain depth information and face position offset information in the image, analyzing the motion track information of the key points to obtain the position offset information of the key points, and judging by combining the information to obtain the motion track information of the human body, namely the motion track of the gravity center of the human body.
On the basis of the above embodiment, an artificial intelligence neural network model can also be adopted to process the above information. The neural network training is carried out through a sample set comprising a plurality of face feature samples, a face motion track recognition model is obtained, wherein the face feature samples comprise face recognition frame size and position information, and the face motion track recognition model can recognize and obtain a face motion track. Correspondingly, neural network training is carried out through a sample set comprising a plurality of key point feature samples, so as to obtain a key point motion track recognition model, wherein the key point feature samples comprise key point two-dimensional coordinate information, and key point motion track recognition is used for recognizing and obtaining key point tracks. And identifying to obtain the human body motion track by using the human face motion track and the key point motion track as a track identification model obtained by training a training sample. The method and the device do not limit the types and training processes of the face motion track recognition model, the key point motion track recognition model and the track recognition model, and can adopt a convolutional neural network model, a cyclic neural network model and the like.
Referring to fig. 2, a schematic flow chart of obtaining a human motion trajectory based on a trajectory recognition model according to an embodiment of the present application is shown.
The method comprises the steps of obtaining multi-frame two-dimensional images containing a target object, carrying out image feature recognition on at least two frames of images, determining the images subjected to the image feature recognition as recognized images, obtaining the size and position information of a face recognition frame of each recognized image, using the size and position information of the face recognition frame of each recognized image as a face information queue, inputting the face information queue to a face motion track recognition model, and obtaining face motion track information. And then obtaining the position information of the key point of each identified image, taking the position information of the key point of each identified image as a key point information queue, and inputting the key point information queue to a key point motion track identification model to obtain the key point motion track information. And then inputting the human face motion track information and the key point motion track information into a track recognition model to obtain the human body motion track of the target object.
In one embodiment, the trajectory recognition model in the embodiment of the present application is a neural network model that can output a confidence level. The confidence coefficient refers to the probability of outputting the current human motion track when the track recognition model recognizes the human motion track. The track recognition model is provided with a confidence level, for example, input information including face motion track information and key point motion track information is input into the track recognition model, the track recognition model outputs a corresponding confidence level, and when the confidence level is greater than a confidence level threshold value, a sub-track of a human motion corresponding to the confidence level is determined as a human motion track of the target object. Inputting the face motion track information and the key point motion track information into a track recognition model to obtain a plurality of human motion sub-tracks and confidence degrees corresponding to each human motion sub-track; and determining the human motion trajectory of the target object in a plurality of human motion sub-trajectories based on the confidence corresponding to each human motion sub-trajectory. For example, the confidence value is a number between 0 and 1, with 1 indicating the highest accuracy and 0 indicating the lowest accuracy. And if the confidence coefficient of the recognized human motion sub-track A is 0.9 and the confidence coefficient of the recognized human motion sub-track B is 0.3, determining the human motion sub-track A as the human motion track of the target object.
Referring to fig. 3, which shows a schematic diagram of image structure information in images of different frames provided by an embodiment of the present application, the image structure information presented in the images of different frames is different, the image structure information is represented in the size and the position of the face recognition frame in fig. 3, and the shape after the key points are connected, and it can be seen that when the target object is in different states, the structure information is different. For a detailed description, reference may be made to the description in the embodiment of fig. 4, which is not repeated herein.
After the human body motion track is obtained in the application, the current human body posture is not determined directly based on the human body motion track, but is determined based on the first image structure information of the first image and the second image structure information of the second image in the multi-frame two-dimensional image.
The image structure information is information determined based on a structural relationship between target points in a current image (e.g., the first image or the second image). The target point may be a determined key point in the target object in the first image or the second image, such as a face key point and a body key point. Correspondingly, the structural relationship between the target points is obtained by representing the closed loop structures formed by the target points according to a certain sequence. Correspondingly, the human face identification frame can be obtained by performing closed-loop structure representation on the human face target point, and the closed-loop structure obtained by sequentially connecting a plurality of key points on the body can be the structural relationship of a group of key points. Therefore, the structural relationship between the face recognition frame and a set of key points is determined as image structure information.
Therefore, a first face recognition frame and a first group of key points of a first image in the multi-frame images are obtained; determining a first structural relationship formed by the first face recognition frame and the first group of key points as first image structural information; acquiring a second face recognition frame and a second group of key points of a second image in the multi-frame image; and determining the first face recognition frame and a second structure relation formed by the second group of key points as second image structure information. The first group of key points and the second group of key points have the same key point type, for example, both include a left and right pupil key point, a chin key point, a left and right shoulder key point, and the like. In this way, the closed-loop frames connecting the first set of key points in a certain order are taken as the first structural relationship, for example, the closed-loop frames can be connected in a clockwise order. Similarly, a closed-loop frame formed by connecting the second group of key points according to a certain sequence is used as a second structural relationship.
By recognizing the first image structure information and the second image structure information, the posture information of the target object can be obtained. The first image and the second image are separated by the preset image frame, and the first image and the second image can better reflect the actual motion state of the target object, for example, the first image is the first frame image of a plurality of frames of two-dimensional images, the second image is the last frame image of the plurality of frames of two-dimensional images, and the posture information of the target object, namely the information of the changed posture, can be obtained from the image structure information in the first frame image and the last frame image. Target object trajectory information such as center of gravity shift information can be obtained from the human motion trajectory, and posture information, that is, information causing a change in the posture of the current motion trajectory, can be obtained from the image structure information. For example, if the trajectory information of the center of gravity of the target object is determined by the trajectory of the human body, the information of whether the target object generates forward inclination, left-right movement, etc. can be obtained by the image structure information. The obtained current human body posture is more in line with the actual situation of the target object, and the actual motion state of the target object generating the posture can be more definite. So that the target pose of the reference image can be accurately compared.
Referring to fig. 4, a schematic diagram of image structure information corresponding to a target frame image provided in an embodiment of the present application is shown. The target frame in fig. 4 refers to an image frame, a first frame, and a last frame where the reference image is located, and corresponds to the reference image, the first frame image, and the last frame image, where the first frame image is an image corresponding to the first frame in the multi-frame two-dimensional images, that is, the first image in the above-mentioned embodiment, and the last frame image is an image corresponding to the last frame in the multi-frame two-dimensional images, that is, the second image in the above-mentioned embodiment. In the figure, a dotted line frame is a face recognition frame, and a solid line frame is a closed graph obtained by sequentially connecting key points, namely the structural relationship formed by the key points. In fig. 4, the outer frames of the respective diagrams are consistent, that is, the sizes of the photographed images are consistent, the posture information of the human body can be obtained by analyzing the different image structure information presented in the diagrams, and correspondingly, the key points in the embodiment shown in fig. 4 are selected from the key points of the left and right eyes, the mouth, the left and right shoulders and the chest. The structural relationship of the key points can be obtained after the points are connected, and the structural relationship formed by the face recognition frame and the key points is determined as image structural information. The image structure information is determined by the dashed and solid boxes in fig. 4.
As shown in fig. 4, in the reference image and the first frame image, the postures presented in the user image are closer, but the face recognition frame (i.e., the dashed line frame in fig. 4) in the reference image is larger than the face recognition frame in the first frame image, and the normal approach of the user to the camera makes the image captured by the camera larger, so that it can be concluded that the distance between the user and the camera in the first frame image is larger than the distance between the user and the camera in the reference image, that is, the user is far from the camera when capturing the first frame image relative to when capturing the reference image. For clear description, the shape after the key point connection is the shape after the key point connection extracted on the right side of fig. 4 can be referred to, and the shape after the key point connection in the reference image is basically consistent with the shape after the key point connection in the first frame image, only the distance between the key point and the key point is changed, that is, the side length in the shape after the connection is inconsistent, so that it can be analyzed that the user only moves in the front-back direction, that is, moves in the direction away from the camera, compared with the reference image.
Comparing the last frame image with the first frame image, the size and the position of the face recognition frame in the last frame image are changed, and the size of the face recognition frame can be embodied by the vertex coordinates of the face recognition frame, that is, the movement of the user in the front-back direction in fig. 4 can be judged, that is, the depth information is generated. The shapes of the extracted key points shown on the right side of fig. 4 after connecting the lines are also different, and it can be seen that the positions of the key points are changed, and the change is not only caused by the movement of the user in the front-back direction, but also caused by the change of the left-right or up-down posture of the user. Therefore, the posture information of the user can be obtained through the shape after the key points are connected and the size, the position and the like of the face recognition frame. By analyzing the image structure information of the first frame image and the image structure information of the last frame image, the situation that the face recognition frame is enlarged, namely the user presents a gesture close to the camera, can be obtained. Correspondingly, whether the user moves to the camera and moves left and right can be judged according to the shape of the connected key points, for example, the user presents a head-bending posture in fig. 4. The size, the position and the structural relationship formed by key points of the face recognition frame in the reference image can represent the information that the user is in the target posture, then the current posture of the user is judged through the motion track of the user and the structural information in the first frame image and the last frame image, the current posture of the user can be judged through the structural information presented in the graph 4 to be in the head-tilting posture when being close to the camera, the posture is not matched with the information presented in the reference image, if the camera is positioned on the electronic equipment watched by the user, the movement in the direction far away from the electronic equipment can be prompted, and the posture of the user for correcting the head can be prompted, so that the experience effect of the user is ensured.
The foregoing embodiment is described by taking the forward tilting posture of the user as an example, and meanwhile, the present application may also determine whether the lateral cervical vertebrae of the human body are curved according to the two-dimensional image, that is, the corresponding depth information may be obtained according to the two-dimensional image. Therefore, the embodiment of the application realizes the recognition of the human body posture in the three-dimensional space through the multi-frame two-dimensional image, and improves the accuracy of the recognition result.
Referring to fig. 5, in an embodiment of the present application, there is also provided a human body posture recognition apparatus, including:
an image acquisition unit 10 for acquiring a plurality of frames of two-dimensional images including a target object;
a track determining unit 20, configured to determine a human motion track of the target object based on image feature information of at least two-dimensional images of the multiple frames of two-dimensional images;
an information obtaining unit 30 configured to obtain first image structure information of a first image and second image structure information of a second image in the plurality of frames of two-dimensional images, where the number of image frames spaced between the first image and the second image satisfies a specific condition;
and the posture recognition unit 40 is configured to recognize and obtain the current human body posture of the target object based on the human body motion trajectory, the first image structure information, and the second image structure information.
On the basis of the embodiment shown in fig. 4, the apparatus further includes:
a reference image acquisition unit configured to acquire a reference image including the target object;
and the posture detection unit is used for detecting whether the current human body posture is matched with the target posture or not based on the current human body posture and the target posture corresponding to the reference image to obtain a detection result.
Optionally, the apparatus further comprises:
and the prompt information generating unit is used for generating prompt information if the detection result meets a prompt condition, wherein the prompt information is used for prompting the target object to adjust the current human body posture to the target posture.
In one embodiment, the reference image acquiring unit is specifically configured to:
determining a shot image of the target object in a target posture as a reference image;
alternatively, the first and second electrodes may be,
generating a screening condition corresponding to the target object being in a target posture;
and determining a reference image in the multi-frame two-dimensional images based on the screening condition.
On the basis of the above embodiment, the trajectory determination unit includes:
the first identification subunit is used for carrying out face identification on at least two frames of two-dimensional images in the multiple frames of two-dimensional images to determine face motion track information;
the second identification subunit is used for identifying key points of at least two frames of two-dimensional images in the multiple frames of two-dimensional images and determining the motion trail information of the key points, wherein the key points at least comprise human face key points and body key points;
and the first determining subunit is used for determining the human body motion track of the target object based on the human face motion track information and the key point motion track information.
Optionally, the first identifier unit is specifically configured to:
performing face recognition on at least two frames of images in the multi-frame two-dimensional images to obtain the size and position information of a face recognition frame of each recognized image;
determining face motion track information based on the size and position information of the face recognition frame of each recognized image;
the second identifier unit is specifically configured to:
performing key point identification on at least two frames of images in the multi-frame two-dimensional images to obtain the position information of the key point of each identified image;
and determining the motion trail information of the key points based on the position information of the key points of each identified image.
Optionally, the first determining subunit is specifically configured to:
inputting the face motion track information and the key point motion track information into a track recognition model to obtain a plurality of human motion sub-tracks and confidence degrees corresponding to each human motion sub-track;
and determining the human motion trajectory of the target object in a plurality of human motion sub-trajectories based on the confidence corresponding to each human motion sub-trajectory.
On the basis of the above embodiment, the information acquisition unit includes:
the first acquiring subunit is used for acquiring a first face recognition frame and a first group of key points of a first image in the multi-frame images;
a second determining subunit, configured to determine, as first image structure information, the first structural relationship formed by the first group of key points and the first face recognition frame;
the second acquiring subunit is used for acquiring a second face recognition frame and a second group of key points of a second image in the multi-frame images;
and the third determining subunit is configured to determine, as second image structure information, the first face recognition frame and a second structure relationship formed by the second group of key points.
Optionally, the gesture recognition unit is specifically configured to:
recognizing and obtaining the posture information of the target object based on the first image structure information and the second image structure information;
and determining the current human body posture of the target object based on the human body motion track and the posture information.
It should be noted that, for the specific implementation of each unit in the present embodiment, reference may be made to the corresponding content in the foregoing, and details are not described here.
The embodiment of the application provides a human body posture recognition device, wherein an image acquisition unit acquires a multi-frame two-dimensional image containing a target object; the track determining unit determines the human motion track of the target object based on the image characteristic information of at least two frames of two-dimensional images in the multi-frame two-dimensional images; the information acquisition unit acquires first image structure information of a first image and second image structure information of a second image in a plurality of frames of two-dimensional images; the gesture recognition unit recognizes and obtains the current human body gesture of the target object based on the human body motion track, the first image structure information and the second image structure information. The recognition of the human body posture in the three-dimensional space through the multi-frame two-dimensional image is realized, and the accuracy of the recognition result is improved.
The embodiment of the application further provides electronic equipment, and the technical scheme provided by the embodiment mainly realizes analysis of the multi-frame two-dimensional image, identifies the human body posture in the three-dimensional space, and improves the accuracy of the identification result.
Specifically, the electronic device in this embodiment may include the following structure:
a memory for storing an application program and data generated by the application program running;
a processor for executing the application to implement:
acquiring a multi-frame two-dimensional image containing a target object;
determining a human body motion track of the target object based on image characteristic information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images;
acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional images, wherein the number of image frames spaced between the first image and the second image meets a specific condition;
and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information.
It should be noted that, the specific implementation of the processor in the present embodiment may refer to the corresponding content in the foregoing, and is not described in detail here.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A human body posture recognition method comprises the following steps:
acquiring a multi-frame two-dimensional image containing a target object;
determining a human body motion track of the target object based on image characteristic information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images;
acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional images, wherein the number of image frames spaced between the first image and the second image meets a specific condition;
and identifying and obtaining the current human body posture of the target object based on the human body motion track, the first image structure information and the second image structure information.
2. The method of claim 1, further comprising:
acquiring a reference image containing the target object;
and detecting whether the current human body posture is matched with the target posture or not based on the current human body posture and the target posture corresponding to the reference image to obtain a detection result.
3. The method of claim 2, further comprising:
and if the detection result meets a prompt condition, generating prompt information, wherein the prompt information is used for prompting the target object to adjust the current human body posture to the target posture.
4. The method of claim 2, the acquiring a reference image containing the target object, comprising:
determining a shot image of the target object in a target posture as a reference image;
alternatively, the first and second electrodes may be,
generating a screening condition corresponding to the target object being in a target posture;
and determining a reference image in the multi-frame two-dimensional images based on the screening condition.
5. The method according to claim 1, wherein the determining the human motion trajectory of the target object based on the image feature information of at least two-dimensional images of the plurality of frames of two-dimensional images comprises:
performing face recognition on at least two frames of two-dimensional images in the multi-frame two-dimensional images to determine face motion track information;
performing key point identification on at least two frames of two-dimensional images in the multi-frame two-dimensional images, and determining key point motion track information, wherein the key points at least comprise human face key points and body key points;
and determining the human body motion track of the target object based on the human face motion track information and the key point motion track information.
6. The method according to claim 5, wherein the performing face recognition on at least two-dimensional images of the plurality of frames of two-dimensional images to determine face motion trajectory information comprises:
performing face recognition on at least two frames of images in the multi-frame two-dimensional images to obtain the size and position information of a face recognition frame of each recognized image;
determining face motion track information based on the size and position information of the face recognition frame of each recognized image;
the identifying key points of at least two frames of two-dimensional images in the multi-frame two-dimensional images and determining the motion trail information of the key points comprises the following steps:
performing key point identification on at least two frames of images in the multi-frame two-dimensional images to obtain the position information of the key point of each identified image;
and determining the motion trail information of the key points based on the position information of the key points of each identified image.
7. The method of claim 5, wherein the determining the human motion trajectory of the target object based on the face motion trajectory information and the keypoint motion trajectory information comprises:
inputting the face motion track information and the key point motion track information into a track recognition model to obtain a plurality of human motion sub-tracks and confidence degrees corresponding to each human motion sub-track;
and determining the human motion trajectory of the target object in a plurality of human motion sub-trajectories based on the confidence corresponding to each human motion sub-trajectory.
8. The method of claim 1, wherein the obtaining first image structure information of a first image and second image structure information of a second image in the plurality of frames of two-dimensional images comprises:
acquiring a first face recognition frame and a first group of key points of a first image in the multi-frame images;
determining the first face recognition frame and a first structural relationship formed by the first group of key points as first image structural information;
acquiring a second face recognition frame and a second group of key points of a second image in the multi-frame image;
and determining a second structural relationship formed by the first face recognition frame and the second group of key points as second image structural information.
9. The method of claim 8, wherein the identifying the current body posture of the target object based on the body motion trajectory, the first image structure information and the second image structure information comprises:
recognizing and obtaining the posture information of the target object based on the first image structure information and the second image structure information;
and determining the current human body posture of the target object based on the human body motion track and the posture information.
10. A human body posture recognition apparatus comprising:
an image acquisition unit configured to acquire a plurality of frames of two-dimensional images including a target object;
the track determining unit is used for determining the human motion track of the target object based on the image characteristic information of at least two frames of two-dimensional images in the multiple frames of two-dimensional images;
the information acquisition unit is used for acquiring first image structure information of a first image and second image structure information of a second image in the multi-frame two-dimensional images, wherein the number of image frames spaced between the first image and the second image meets a specific condition;
and the gesture recognition unit is used for recognizing and obtaining the current human body gesture of the target object based on the human body motion track, the first image structure information and the second image structure information.
CN202110126255.2A 2021-01-29 2021-01-29 Human body posture recognition method and device Pending CN112784786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126255.2A CN112784786A (en) 2021-01-29 2021-01-29 Human body posture recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126255.2A CN112784786A (en) 2021-01-29 2021-01-29 Human body posture recognition method and device

Publications (1)

Publication Number Publication Date
CN112784786A true CN112784786A (en) 2021-05-11

Family

ID=75759749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126255.2A Pending CN112784786A (en) 2021-01-29 2021-01-29 Human body posture recognition method and device

Country Status (1)

Country Link
CN (1) CN112784786A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN115131879A (en) * 2022-08-31 2022-09-30 飞狐信息技术(天津)有限公司 Action evaluation method and device
CN115937964A (en) * 2022-06-27 2023-04-07 北京字跳网络技术有限公司 Method, device, equipment and storage medium for attitude estimation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103513768A (en) * 2013-08-30 2014-01-15 展讯通信(上海)有限公司 Control method and device based on posture changes of mobile terminal and mobile terminal
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device
CN110276298A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Determination method, device, storage medium and the computer equipment of user behavior
CN110633004A (en) * 2018-06-21 2019-12-31 杭州海康威视数字技术股份有限公司 Interaction method, device and system based on human body posture estimation
CN111209812A (en) * 2019-12-27 2020-05-29 深圳市优必选科技股份有限公司 Target face picture extraction method and device and terminal equipment
CN111476609A (en) * 2020-04-10 2020-07-31 广西中烟工业有限责任公司 Retail data acquisition method, system, device and storage medium
CN111931869A (en) * 2020-09-25 2020-11-13 湖南大学 Method and system for detecting user attention through man-machine natural interaction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103513768A (en) * 2013-08-30 2014-01-15 展讯通信(上海)有限公司 Control method and device based on posture changes of mobile terminal and mobile terminal
CN110633004A (en) * 2018-06-21 2019-12-31 杭州海康威视数字技术股份有限公司 Interaction method, device and system based on human body posture estimation
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device
CN110276298A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Determination method, device, storage medium and the computer equipment of user behavior
CN111209812A (en) * 2019-12-27 2020-05-29 深圳市优必选科技股份有限公司 Target face picture extraction method and device and terminal equipment
CN111476609A (en) * 2020-04-10 2020-07-31 广西中烟工业有限责任公司 Retail data acquisition method, system, device and storage medium
CN111931869A (en) * 2020-09-25 2020-11-13 湖南大学 Method and system for detecting user attention through man-machine natural interaction

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN115937964A (en) * 2022-06-27 2023-04-07 北京字跳网络技术有限公司 Method, device, equipment and storage medium for attitude estimation
CN115937964B (en) * 2022-06-27 2023-12-15 北京字跳网络技术有限公司 Method, device, equipment and storage medium for estimating gesture
CN115131879A (en) * 2022-08-31 2022-09-30 飞狐信息技术(天津)有限公司 Action evaluation method and device
CN115131879B (en) * 2022-08-31 2023-01-06 飞狐信息技术(天津)有限公司 Action evaluation method and device

Similar Documents

Publication Publication Date Title
CN112784786A (en) Human body posture recognition method and device
CN105184246B (en) Living body detection method and living body detection system
CN107609383B (en) 3D face identity authentication method and device
JP5629803B2 (en) Image processing apparatus, imaging apparatus, and image processing method
JP4317465B2 (en) Face identification device, face identification method, and face identification program
US8515136B2 (en) Image processing device, image device, image processing method
JP5010905B2 (en) Face recognition device
WO2019127262A1 (en) Cloud end-based human face in vivo detection method, electronic device and program product
US7970212B2 (en) Method for automatic detection and classification of objects and patterns in low resolution environments
JP2005056387A (en) Image processor, imaging apparatus and image processing method
JP2007257043A (en) Occupant state estimating device and occupant state estimating method
WO2017161734A1 (en) Correction of human body movements via television and motion-sensing accessory and system
CN112101124B (en) Sitting posture detection method and device
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN111814556A (en) Teaching assistance method and system based on computer vision
KR20130043366A (en) Gaze tracking apparatus, display apparatus and method therof
CN112132797B (en) Short video quality screening method
JP2022021537A (en) Biometric authentication device and biometric authentication method
CN113971841A (en) Living body detection method and device, computer equipment and storage medium
CN105279764B (en) Eye image processing apparatus and method
CN111539911B (en) Mouth breathing face recognition method, device and storage medium
CN111182207B (en) Image shooting method and device, storage medium and electronic equipment
CN112149517A (en) Face attendance checking method and system, computer equipment and storage medium
CN109409322B (en) Living body detection method and device, face recognition method and face detection system
CN116453230A (en) Living body detection method, living body detection device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination