CN117456558A

CN117456558A - Human body posture estimation and control method based on camera and related equipment

Info

Publication number: CN117456558A
Application number: CN202311490551.6A
Authority: CN
Inventors: 胡潇; 李毅
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2024-01-26

Abstract

The application discloses a human body posture estimation and control method based on a camera and related equipment, relating to the field of posture control, wherein the method comprises the following steps: obtaining a target picture corresponding to a camera, wherein the target picture comprises a target person; identifying human skeleton point coordinate information of a target person according to the target picture; and controlling the execution action of the controlled object based on the human skeleton point coordinate information, wherein the controlled object comprises a virtual character and a target robot.

Description

Human body posture estimation and control method based on camera and related equipment

Technical Field

The present disclosure relates to the field of gesture control, and more particularly, to a method and related apparatus for estimating and controlling a human body gesture based on a camera.

Background

At present, most of human body posture estimation applications stay in video, human body skeleton point coordinates are output through input video processing through a neural network, and then the skeleton point coordinates are used for driving a three-dimensional model or a robot. Because the neural network for attitude estimation is often too huge, the speed is lower in practical application, high frame rate and stability are difficult to ensure in real-time detection, and the requirement on equipment calculation force is extremely high. Therefore, most pose estimation is only applied in post-processing, and real-time detection and real-time driving are rarely used.

Disclosure of Invention

In the summary, a series of concepts in a simplified form are introduced, which will be further described in detail in the detailed description. The summary of the present application is not intended to define the key features and essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter.

In a first aspect, the present application proposes a method for estimating and controlling a human body posture based on a camera, where the method includes:

acquiring a target picture corresponding to a camera, wherein the target picture comprises a target person;

identifying human skeleton point coordinate information of a target person according to the target picture;

and controlling the execution action of the controlled object based on the human skeleton point coordinate information, wherein the controlled object comprises a virtual character and a target robot.

In one embodiment, the camera is a camera with a Kinect sensor, and the method further includes:

acquiring RGB images and depth images based on the cameras;

and carrying out feature fusion according to the RGB image and the depth image to obtain human skeleton coordinate information.

In one embodiment, the feature fusion based on the RGB image and the depth image to obtain human skeleton coordinate information includes

Preprocessing the depth image, wherein the preprocessing comprises noise removal, cavity filling and depth information smoothing;

performing three-dimensional point cloud conversion on the depth image subjected to the preprocessing operation to obtain point cloud information;

acquiring first skeleton coordinate information based on the RGB image information and the lightweight gesture estimation model;

acquiring second bone coordinate information based on the point cloud information and the lightweight gesture estimation model;

and performing feature fusion operation based on the first bone coordinate information and the second bone coordinate information to acquire human bone coordinate information.

In one embodiment, the lightweight pose estimation model includes a blazepost model and a Simple yet baseline model.

In one embodiment, the performing a feature fusion operation based on the first bone coordinate information and the second bone coordinate information to obtain human bone coordinate information includes:

and carrying out weight fusion operation on the first bone coordinate information and the second bone coordinate information to obtain the human bone coordinate information.

In one embodiment, the weight coefficient of the first bone coordinate information is determined based on the color depth information of the RGB image, and the weight coefficient of the second bone coordinate information is determined based on the depth distribution information.

In a second aspect, the present application further proposes a camera-based human body posture estimation and control device, including:

the device comprises an acquisition unit, a camera and a display unit, wherein the acquisition unit is used for acquiring a target picture corresponding to the camera, and the target picture comprises a target person;

the identification unit is used for identifying the human skeleton point coordinate information of the target person according to the target picture;

and the control unit is used for controlling the execution action of the controlled object based on the human skeleton point coordinate information, wherein the controlled object comprises a virtual character and a target robot.

In a third aspect, an electronic device, comprising: the camera-based human body posture estimation and control method according to any one of the first aspect described above is implemented by a memory, a processor, and a computer program stored in the memory and executable on the processor, when the computer program stored in the memory is executed by the processor.

In a fourth aspect, the present application further proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the camera-based human body posture estimation and control method of any one of the first aspects.

In summary, the method for estimating and controlling the human body posture based on the camera in the embodiment of the application comprises the following steps: acquiring a target picture corresponding to a camera, wherein the target picture comprises a target person; identifying human skeleton point coordinate information of a target person according to the target picture; and controlling the execution action of the controlled object based on the human skeleton point coordinate information, wherein the controlled object comprises a virtual character and a target robot. According to the camera-based human body posture estimation and control method, static images are acquired based on the camera, real-time performance can be achieved more easily, and the defect that a video stream needs to be processed and possibly subjected to larger processing delay in the related technology can be overcome. The method provided by the application only needs to analyze a single static image, and has lower calculation requirement and thus higher calculation efficiency compared with the method for processing the video stream. Facilitating pose estimation and control on resource-constrained devices. The use of cameras in the present application reduces reliance on equipment requirements relative to schemes that require the use of complex multi-camera settings (e.g., RGB-D cameras or multi-camera systems). The method provided by the application can be more easily adapted to different application scenes, because the method does not need to process video streams in real time and can be used on wider equipment.

Additional advantages, objects, and features of the present application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the present application.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the specification. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a schematic flow chart of a method for estimating and controlling human body posture based on a camera according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another method for estimating and controlling human body posture based on a camera according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a camera-based human body posture estimation and control device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a camera-based human body posture estimation and control electronic device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application.

Referring to fig. 1, a schematic flow chart of a method for estimating and controlling human body posture based on a camera according to an embodiment of the present application may specifically include:

s110, acquiring a target picture corresponding to the camera, wherein the target picture comprises a target person;

illustratively, an image is acquired from a camera, the image including one or more target persons, typically an image of an actual human being. This may be achieved by a real-time video stream or still image captured by a camera. The image may capture the actual pose of a person in the room, which person may be the user or a simulated virtual character.

S120, identifying human skeleton point coordinate information of a target person according to the target picture;

illustratively, the image is analyzed and coordinate information of human skeletal points is detected according to a deep learning model or a conventional computer vision algorithm. These coordinate information describe the location of key body parts of the target person, such as the head, arms, legs, etc. By analyzing the image, key skeletal points of the arms, heads and legs of the target person can be detected, so that three-dimensional coordinate information of the target person can be obtained.

S130, controlling the execution action of a controlled object based on the human skeleton point coordinate information, wherein the controlled object comprises a virtual character and a target robot.

Illustratively, human skeletal point coordinate information is used to control the execution of actions by the controlled object. The controlled object may be a virtual character or a target robot. The skeletal point coordinate information is interpreted by a control algorithm and mapped to a corresponding motion or gesture. If the target is a avatar, skeletal point coordinate information may be mapped to the avatar's actions to enable real-time imitation or interaction of the avatar. If the target is a target robot, the system may use skeletal point coordinate information to control movements, gestures, or other actions of the robot.

In summary, the human body posture estimation and control method based on the camera, which is provided by the embodiment of the application, is capable of easily realizing real-time performance based on the acquisition of the still image by the camera, and overcoming the defect that the video stream needs to be processed in the related art and possibly suffers from larger processing delay. The method provided by the application only needs to analyze a single static image, and has lower calculation requirement and thus higher calculation efficiency compared with the method for processing the video stream. Facilitating pose estimation and control on resource-constrained devices. The use of cameras in the present application reduces reliance on equipment requirements relative to schemes that require the use of complex multi-camera settings (e.g., RGB-D cameras or multi-camera systems). The method provided by the application can be more easily adapted to different application scenes, because the method does not need to process video streams in real time and can be used on wider equipment.

acquiring RGB images and depth images based on the cameras;

Illustratively, a camera with a Kinect sensor is capable of capturing both RGB color images and depth images. The RGB image provides visual appearance and color information of the human body, while the depth image provides distance information of each pixel point to the camera. The combination of these two images provides a powerful source of data for subsequent human body pose estimation. The information of the RGB image and the depth image are fused together to obtain more accurate and rich human skeleton coordinate information. The feature fusion can employ a variety of methods, including feature level fusion, weight fusion, or deep learning methods, that can generate more comprehensive human skeletal coordinate information for use in subsequent pose estimation and control.

Illustratively, preprocessing the depth image can improve the quality of the depth image so as to more accurately extract the human skeleton information. Specific preprocessing operations include removing noise, filling holes, and smoothing depth information. The noise removal operation is to remove clutter signals in the depth image by filtering or other techniques to reduce inaccuracy of the depth measurement. The hole filling operation is to fill holes so as to ensure the integrity of the depth image and repair the missing area by filling holes, wherein some discontinuous or missing depth information possibly exists in the depth image. The smoothing of depth information is performed by applying a smoothing filter or the like to reduce irregular fluctuations in the depth image, thereby improving the stability of the depth data. And converting the preprocessed depth image into three-dimensional point cloud data. Each point cloud point represents a spatial point in the scene, with X, Y and Z coordinate values to describe the position of the object. The three-dimensional point cloud provides a spatial representation of depth information that can be used for further analysis and processing.

With RGB image information, a lightweight pose estimation model can be used to estimate initial human skeletal coordinate information. This model can be used to identify body parts such as head, hand, foot, etc. and estimate their three-dimensional coordinates.

Based on the point cloud information, a lightweight gesture estimation model is also used to further estimate the bone coordinate information of the human body. The point cloud provides additional depth information that helps to improve the accuracy of the bone coordinates.

The first bone coordinate information and the second bone coordinate information are fused to generate final human bone coordinate information. This may be accomplished by weighted fusion, fusion algorithms, or other methods to provide more accurate and stable bone coordinates.

Illustratively, blazePose is a lightweight human posture estimation model developed by Google that aims to detect and track the skeletal joints of the human body in real time. The method is based on deep learning technology, and uses a lightweight neural network architecture, so that the method is suitable for embedded equipment and mobile application.

Simple Yet Baseline is a classical model for human posture estimation that aims to provide a simple but effective baseline method for research and experimental use. SYB is a classical two-stage model, and is divided into two stages of human body key point detection and posture estimation.

For example, the first bone coordinate information and the second bone coordinate information may be weight fused. The determination of the weight coefficients is based on information from different sources.

For the first skeletal coordinate information, its weight coefficient is determined based on color depth information of the RGB image. The color depth information of the RGB image affects the weight coefficient and is positively correlated with the weight coefficient according to the depth information.

For the second bone coordinate information, its weight coefficient is determined based on the depth distribution information. The more light the depth distribution information is, the greater the response weight coefficient is.

According to the method, the weight coefficient of the first skeleton coordinate information is determined based on the color depth information of the RGB image, so that uncertainty of posture estimation can be corrected by utilizing the color information better. Depth distribution information of a depth image is often affected by noise and other factors, and inaccurate depth information may be generated in some areas. By determining the weight coefficient of the second bone coordinate information based on the depth distribution information, the quality of the depth information can be better weighted, and the influence of the depth noise on the posture estimation can be reduced. Fusing information from different sources together can improve the stability of the system. If the information from one source is unreliable in some cases, the information from another source may provide a supplement to reduce the instability of the pose estimation. The scheme is suitable for various different scenes, including poor light conditions, complex backgrounds or shielding conditions, integrates color information and depth information, and can have good precision in various environments.

In one implementation, as shown in fig. 2, a schematic flow chart of another camera-based human body posture estimation and control method provided in this application is that a static human body image is obtained through an RGB camera, preprocessing such as graying is performed on the image, the preprocessed image is input into a blazePose model, two-position coordinates of human skeleton points on an image coordinate system are obtained, the two-position coordinates are input into a Simple Yet Baseline model, three-dimensional coordinates of the human skeleton points on a camera coordinate system are obtained, a depth image is obtained through a depth camera, and depth calibration is performed by using the depth image. The bone point data is transmitted to driving software (Unity or robot driving software) through socket, and four-tuple is calculated to drive the three-dimensional virtual character or robot to move.

Referring to fig. 3, an embodiment of a camera-based human body posture estimation and control device in an embodiment of the present application may include:

an obtaining unit 21, configured to obtain a target picture corresponding to the camera, where the target picture includes a target person;

an identifying unit 22 for identifying human skeleton point coordinate information of the target person based on the target picture;

and a control unit 23 for controlling the execution of the controlled object based on the human skeletal point coordinate information, wherein the controlled object includes a virtual character and a target robot.

As shown in fig. 4, the embodiment of the present application further provides an electronic device 300, including a memory 310, a processor 320, and a computer program 311 stored in the memory 310 and capable of running on the processor, where the processor 320 implements the steps of any of the above-mentioned methods for estimating and controlling a human body posture based on a camera when executing the computer program 311.

Since the electronic device described in this embodiment is a device for implementing a camera-based human body posture estimation and control device in this embodiment, based on the method described in this embodiment, those skilled in the art can understand the specific implementation of the electronic device in this embodiment and various modifications thereof, so how to implement the method in this embodiment for this electronic device will not be described in detail herein, and only those devices for implementing the method in this embodiment for this application will belong to the scope of protection intended by this application.

In a specific implementation, the computer program 311 may implement any of the embodiments corresponding to fig. 1 when executed by a processor.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Embodiments also provide a computer program product comprising computer software instructions that, when run on a processing device, cause the processing device to perform camera-based human body posture estimation and control flow in corresponding embodiments

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be stored by a computer or data storage devices such as servers, data centers, etc. that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid State Disks (SSDs)), among others.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. The human body posture estimating and controlling method based on the camera is characterized by comprising the following steps:

obtaining a target picture corresponding to a camera, wherein the target picture comprises a target person;

2. The camera-based human body posture estimation and control method of claim 1, wherein the camera is a camera with a Kinect sensor, the method further comprising:

acquiring an RGB image and a depth image based on the camera;

3. The camera-based human body posture estimation and control method according to claim 2, wherein the feature fusion to obtain human body skeleton coordinate information according to the RGB image and the depth image comprises

Preprocessing the depth image, wherein the preprocessing comprises noise removal, hole filling and depth information smoothing;

acquiring first bone coordinate information based on the RGB image information and a lightweight pose estimation model;

acquiring second bone coordinate information based on the point cloud information and the lightweight pose estimation model;

4. The camera-based human body posture estimation and control method of claim 1, wherein the lightweight posture estimation model includes a blazepost model and a Simple yet baseline model.

5. The camera-based human body posture estimation and control method according to claim 1, wherein the performing a feature fusion operation based on the first bone coordinate information and the second bone coordinate information to obtain human body bone coordinate information includes:

6. The camera-based human body posture estimation and control method of claim 5, wherein the weight coefficient of the first bone coordinate information is determined based on color depth information of the RGB image, and the weight coefficient of the second bone coordinate information is determined based on depth distribution information.

7. The utility model provides a human gesture estimation and controlling means based on camera which characterized in that includes:

8. An electronic device, comprising: memory and processor, characterized in that the processor is adapted to carry out the steps of the camera-based human body posture estimation and control method according to any one of claims 1-6 when executing a computer program stored in the memory.

9. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the camera-based human body posture estimation and control method of any one of claims 1-6.