CN112464791A - Gesture recognition method, device, equipment and storage medium based on two-dimensional camera - Google Patents

Gesture recognition method, device, equipment and storage medium based on two-dimensional camera Download PDF

Info

Publication number
CN112464791A
CN112464791A CN202011339565.4A CN202011339565A CN112464791A CN 112464791 A CN112464791 A CN 112464791A CN 202011339565 A CN202011339565 A CN 202011339565A CN 112464791 A CN112464791 A CN 112464791A
Authority
CN
China
Prior art keywords
human body
information
dimensional camera
node
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011339565.4A
Other languages
Chinese (zh)
Other versions
CN112464791B (en
Inventor
颜泽龙
王健宗
吴天博
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011339565.4A priority Critical patent/CN112464791B/en
Publication of CN112464791A publication Critical patent/CN112464791A/en
Priority to PCT/CN2021/084543 priority patent/WO2021208740A1/en
Application granted granted Critical
Publication of CN112464791B publication Critical patent/CN112464791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Abstract

The application relates to the field of artificial intelligence, and provides a gesture recognition method, a gesture recognition device, gesture recognition equipment and a storage medium based on a two-dimensional camera, wherein a human body picture of a human body to be recognized is acquired; extracting body contour information, posture information and gender information of a human body to be recognized according to the human body picture; generating an SMPL model according to the body contour information, the posture information and the gender information; acquiring 3D node characteristics in an SMPL model; extracting 2D node characteristics of the human body to be identified in the human body picture; generating an error function according to the 3D node characteristics and the 2D node characteristics; adjusting the SMPL model according to the error function, and extracting target 3D joint features of the adjusted SMPL model; acquiring skeleton information according to the target 3D joint characteristics; and carrying out posture analysis according to the skeleton information. The gesture recognition method, device, equipment and storage medium based on the two-dimensional camera can be applied to the field of block chains, and can accurately recognize the gesture of the human body to be recognized.

Description

Gesture recognition method, device, equipment and storage medium based on two-dimensional camera
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for gesture recognition based on a two-dimensional camera.
Background
Since the human body three-dimensional gesture recognition technology has wide application scenes and value, more and more attention is paid in recent years. In the gesture analysis, there is a problem that the limbs of the person are not normally interlaced together, and at the same time, some gestures are mutually shielded, especially when a plurality of persons take a picture together. The current human body posture analysis is mainly a process of positioning the position of a human body according to a two-dimensional image or a three-dimensional image and extracting a skeleton so as to recognize the posture. With the progress of science and technology, particularly the development of computer technology, the technology of human body three-dimensional gesture recognition is widely applied to many fields, and the connection between a real object and a mathematical model and between a virtual object and a real object is tight. Estimating complete three-dimensional human morphology and pose (motion) from images or video has been a challenge in the computer field for decades.
Currently, the most widely used method of human pose estimation is the marker tracking method, which requires multiple calibrated cameras and markers to be carefully attached to the subject's body. This technique can achieve higher accuracy but is costly. In view of cost, a marker-free capture method based on a multi-view two-dimensional camera human body three-dimensional reconstruction technique or a depth camera has been developed in the past 20 years. However, these new methods still require additional devices to evaluate the three-dimensional posture of the subject, so that the technology cannot be popularized in many application scenarios. In addition, the posture analysis based on the human body three-dimensional model depends on a deep learning technology, the training data amount of the existing three-dimensional human body motion is very deficient, and the precision of posture analysis directly based on the human body three-dimensional model by using deep learning is very low.
Disclosure of Invention
The application mainly aims to provide a gesture recognition method, a gesture recognition device, gesture recognition equipment and a storage medium based on a two-dimensional camera, and aims to solve the technical problem that the existing gesture recognition precision is low.
In order to achieve the above object, the present application provides a gesture recognition method based on a two-dimensional camera, including the following steps:
acquiring a human body picture of a human body to be identified through a two-dimensional camera;
extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture;
generating an SMPL model according to the body contour information, the posture information and the gender information;
acquiring 3D node characteristics in the SMPL model;
extracting 2D node characteristics of the human body to be identified in the human body picture;
generating an error function according to the 3D node characteristics and the 2D node characteristics;
adjusting the SMPL model according to the error function, and extracting target 3D joint features of the adjusted SMPL model;
acquiring skeleton information according to the target 3D joint characteristics;
and carrying out posture analysis according to the skeleton information.
Further, the step of extracting the body contour information, the posture information and the gender information of the human body to be recognized according to the human body picture comprises the following steps:
carrying out mask processing on the human body picture to obtain a mask of the human body to be identified;
extracting potential edge points of the mask by using a Sobel operator;
connecting each potential edge point by using an edge tracking algorithm with one potential edge point as a starting point to obtain a closed curve connected end to end as the body contour information of the human body to be recognized;
inputting the human body picture into a preset gesture recognition model to extract the gesture information of the human body to be recognized;
and inputting the human body picture into a pre-trained gender classifier to extract the gender information.
Further, the step of generating an error function according to the 3D node features and the 2D node features includes:
acquiring the two-dimensional camera parameters;
inputting the two-dimensional camera parameters, the 3D node characteristics and the 2D node characteristics into a preset error formula to generate the error function, wherein the error function is Eall=Ej(a,b;k,J)+E1(b)+E2(b;a)+E3(a) Wherein a is the body contour information, b is the pose information, k is the two-dimensional camera parameter, and J is the 2D node feature.
Further, the step of adjusting the SMPL model according to the error function and extracting the target 3D joint features of the adjusted SMPL model includes:
optimizing the error function through a Bowden dogleg algorithm to obtain an optimal solution of the error function;
and adjusting the SMPL model according to the optimal demodulation, and extracting the target 3D joint features of the adjusted SMPL model.
Further, the step of acquiring the two-dimensional camera parameters includes:
acquiring EXIF information of the human body picture;
determining a focal length of the two-dimensional camera according to the EXIF information;
and determining the shooting depth of the two-dimensional camera by a similar triangle principle.
Further, the step of extracting the 2D node features of the human body to be recognized in the human body picture includes:
extracting the 2D node characteristics through a pre-trained node characteristic extraction model; and the node feature extraction model is completed based on the training of a fully-connected convolutional neural network model.
The application also provides a gesture recognition device based on two-dimensional camera, include: the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a human body picture of a human body to be recognized through a two-dimensional camera;
the first extraction unit is used for extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture;
a generating unit, configured to generate an SMPL model according to the body contour information, the posture information, and the gender information;
a second obtaining unit, configured to obtain a 3D node feature in the SMPL model;
the second extraction unit is used for extracting the 2D node characteristics of the human body to be identified in the human body picture;
a generating unit, configured to generate an error function according to the 3D node feature and the 2D node feature;
the adjusting unit is used for adjusting the SMPL model according to the error function and extracting target 3D joint features of the adjusted SMPL model;
a third obtaining unit, configured to obtain skeleton information according to the target 3D joint feature;
and the analysis unit is used for carrying out posture analysis according to the skeleton information.
Further, the first extraction unit includes:
the mask subunit is used for performing mask processing on the human body picture to obtain a mask of the human body to be identified;
the first extraction subunit is used for extracting potential edge points of the mask by adopting a Sobel operator;
the connecting subunit is configured to use one of the potential edge points as a starting point, and connect each potential edge point by using an edge tracking algorithm to obtain an end-to-end closed curve as the body contour information of the human body to be recognized;
the second extraction subunit is used for inputting the human body picture into a preset posture recognition model to extract the posture information of the human body to be recognized;
and the third extraction subunit is used for inputting the human body picture into a pre-trained gender classifier to extract the gender information.
The present application further provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the two-dimensional camera-based gesture recognition method according to any one of the above embodiments when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the two-dimensional camera-based pose recognition method according to any one of the preceding claims.
According to the gesture recognition method, the gesture recognition device, the gesture recognition equipment and the storage medium based on the two-dimensional camera, the human body picture is obtained through the single two-dimensional camera, the human body pictures with multiple angles are not needed, the 3D node characteristics are combined on the 2D node characteristics of the human body picture, the error function is generated, the SMPL model is adjusted according to the error function, the target 3D joint characteristics of the adjusted SMPL model are extracted, the skeleton information is obtained according to the target 3D node characteristics, the problem of gesture analysis caused by shielding of foreign matters and body crossing can be well solved through the obtained human body three-dimensional skeleton information, and the gesture analysis result is more accurate.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a two-dimensional camera-based gesture recognition method according to an embodiment of the present application;
FIG. 2 is a block diagram of a two-dimensional camera based gesture recognition apparatus according to an embodiment of the present disclosure;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a gesture recognition method based on a two-dimensional camera, including the following steps:
step S1, acquiring a human body picture of a human body to be recognized through a two-dimensional camera;
step S2, extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture;
step S3, generating an SMPL model according to the body contour information, the posture information and the gender information;
step S4, acquiring 3D node characteristics in the SMPL model;
step S5, extracting the 2D node characteristics of the human body to be identified in the human body picture;
step S6, generating an error function according to the 3D node characteristics and the 2D node characteristics;
step S7, the SMPL model is adjusted according to the error function, and the target 3D joint characteristics of the adjusted SMPL model are extracted;
step S8, skeleton information is obtained according to the target 3D joint characteristics;
and step S9, performing posture analysis according to the skeleton information.
In this embodiment, as described in the above steps S1-S2, the human body picture is obtained by the two-dimensional camera, the two-dimensional camera is a single one, the two-dimensional camera can be fixedly placed in a single place for shooting, and the body contour information, the posture information and the gender information in the human body picture are extracted, the posture information includes standing, running, jumping, squatting, legs crossing, etc., the body contour is a boundary frame of the human body to be recognized in the human body picture, specifically, the body contour information can be extracted by performing pixel-level segmentation on the human body to be recognized using the mask _ rcnn segmentation algorithm, in another embodiment the body contour information can be extracted using an edge extraction operator such as the canny operator, in another embodiment, a flooding algorithm can be used for filling small holes generated in a small edge detection process, and a morphological processing algorithm is used for connecting discontinuous contour boundaries to extract body contour information.
As described in the above steps S3-S4, the SMPL (a Skinned Multi-Person Linear Model) Model is a three-dimensional human body Model, and the joints of the three-dimensional human body are 3D node features; the SMPL model may be represented in the form of M (a, b, k), where a is body contour information, b is pose information, and k is a two-dimensional camera parameter.
As described in the above steps S5-S6, 2D node features, i.e., feature points of the human body part of the human body to be recognized, such as the wrist, elbow, shoulder, ankle, knee, etc., are extracted. The SMPL model is generated based on body contour information, posture information and gender information, and the body contour information, the posture information and the gender information are extracted from a heavy human body picture, so that a certain error exists between the generated 3D node characteristics and the 2D node characteristics, an error function can be generated according to the 3D node characteristics and the 2D node characteristics, and the correct 3D joint characteristics can be determined according to the error function.
As described in step S7, the SMPL model is adjusted by the error function, so that the difference between the length between the 3D node features and the length between the 2D node features is reduced, and the generated three-dimensional human body model can be more accurate.
As described in the above steps S8-S9, the target 3D joint features are similar to the real joint features, the skeleton information of the human body to be recognized can be accurately obtained according to the target 3D node features, the posture of the human body to be recognized can be accurately analyzed according to the skeleton information, and the human body to be recognized is analyzed to be currently in a certain posture, such as standing, running, jumping, and the like.
In this embodiment, only a single two-dimensional camera is used to obtain a human body picture, human body pictures at multiple angles are not needed, and the obtained human body three-dimensional skeleton information can well solve the gesture analysis caused by the shielding of foreign objects and the crossing of the body by combining the 3D node characteristics on the 2D node characteristics of the human body picture.
In an embodiment, the step S2 of extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture includes:
step S21, performing mask processing on the human body picture to obtain a mask of the human body to be identified;
step S22, extracting potential edge points of the mask by using a Sobel operator;
step S23, taking one of the potential edge points as a starting point, and adopting an edge tracking algorithm to connect each potential edge point to obtain a closed curve connected end to end as the body contour information of the human body to be recognized;
step S24, inputting the human body picture into a preset gesture recognition model to extract the gesture information of the human body to be recognized;
and step S25, inputting the human body picture into a pre-trained gender classifier to extract the gender information.
In this embodiment, as described in the above steps S21-S22, the connected region with the largest area can be found by using the maximum connected algorithm based on the four neighborhoods, so as to obtain the mask of the human body to be identified. The potential edge points of the mask are extracted by using a Sobel operator, the Sobel operator is a discrete difference operator and is used for calculating the approximate value of the gradient of the image brightness function, the operation similar to local average is introduced, so that the method has a smoothing effect on noise and can well eliminate the influence of the noise, and the potential edge points of the mask are extracted by using the Sobel operator, so that the potential edge points can be more accurately extracted.
As described in the above step S23, a potential edge point is selected from the extracted potential edge points as a starting point, and an end-to-end closed curve is obtained from the selected edge points under the condition of eight neighborhoods.
As described in the above steps S24-S25, the preset gesture recognition model can recognize the gesture of the human body to be recognized in the two-dimensional human body picture, and the gender classifier can classify the human body to be recognized, so as to recognize the gender of the human body to be recognized as male or female.
In the embodiment, the body contour information of the human body to be recognized can be accurately extracted by using the mask and the edge tracking algorithm. Meanwhile, the posture recognition model and the gender classifier are preset, and posture information and gender information can be accurately extracted.
In an embodiment, the step S6 of generating an error function according to the 3D node feature and the 2D node feature includes:
step S61, acquiring the two-dimensional camera parameters;
step S62, inputting the two-dimensional camera parameters, the 3D node characteristics and the 2D node characteristics into a preset error formula to generate the error function, wherein the error function is Eall=Ej(a,b;k,J)+E1(b)+E2(b;a)+E3(a) Wherein a is the body contour information, b is the pose information, k is the two-dimensional camera parameter, and J is the 2D node feature.
In this embodiment, E isj(a,b;k,J)=∑EJk(Rb(J(a)))-J);
E2(a,b)=λbEb(b)+λaEa(a) (ii) a Wherein R is a rotation equation.
In the embodiment, the errors of Ej are compensated by introducing three energy functions of E1(b), E2 (b; a) and E3(a), so that the problems that limbs of a human body to be identified are abnormally interwoven together and the postures are mutually shielded can be avoided.
In an embodiment, the step S7 of adjusting the SMPL model according to the error function and extracting the target 3D joint feature of the adjusted SMPL model includes:
step S71, optimizing the error function through a Bowden dog-leg algorithm to obtain the optimal solution of the error function;
and step S72, adjusting the SMPL model according to the optimal demodulation, and extracting the target 3D joint characteristics of the adjusted SMPL model.
In this embodiment, as described in the above steps S71-72, the Bowden dogleg algorithm can find the minimum point of the quadratic function in a finite step. Specifically, there is always one starting point (the starting point of the first round is an optional starting point) and n linearly independent search directions in each round of iteration. And sequentially performing one-dimensional search along n directions from the initial point to obtain the end point. A new search direction is determined by the start and end points. And judging whether the original vector needs to be replaced by a new search direction. If the vector needs to be replaced, the worst vector in the original vector group is further judged, and then the worst vector is replaced by the newly generated vector so as to ensure that the conjugate direction is generated successively. The optimal solution of the error function can be accurately and quickly found through the Bowden dogleg algorithm, the SMPL model is adjusted through optimal demodulation, the three-dimensional image of the human body to be identified can be accurately simulated through the adjusted SMPL model, and the target 3D node characteristics in the adjusted SMPL model are extracted.
In an embodiment, the step S61 of acquiring the two-dimensional camera parameters includes:
step S611, obtaining EXIF information of the human body picture;
step S612, determining the focal length of the two-dimensional camera according to the EXIF information;
step S613, determining the shooting depth of the two-dimensional camera by the principle of similar triangles.
In this embodiment, EXIF information is an abbreviation of exchangeable image file, is set specifically for a photograph of a digital camera, and can record attribute information and shooting data of the digital photograph. EXIF may be attached to a file such as JPEG, TIFF, RIFF, or the like, to which contents of information on digital camera shooting and version information of an index map or image processing software are added, and a focal length at the time of two-dimensional camera shooting is determined from EXIF information. And determining the shooting depth of the two-dimensional camera by a similar triangle principle, namely determining according to the comparison between the trunk length between the 3D node features and the trunk length between the corresponding 2D node features, namely determining the distance between the human body to be recognized and the two-dimensional camera.
In an embodiment, the step of extracting the 2D node features of the human body to be recognized in the human body picture includes:
extracting the 2D node characteristics through a pre-trained node characteristic extraction model; and the node feature extraction model is completed based on the training of a fully-connected convolutional neural network model.
In this embodiment, all layers in the front and the back of the fully-connected convolutional neural network are densely connected, so that feature reuse is realized through the connection of features on the channel, and features can be accurately identified, and the feature extraction model formed by training the fully-connected convolutional neural network model can automatically extract the positions of the feature points of the human body part, that is, the 2D node features.
The gesture recognition method based on the two-dimensional camera can be applied to the field of block chains, the trained node feature extraction model is stored in a block chain network, and the block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
An embodiment of the present application further provides a gesture recognition apparatus based on a two-dimensional camera, including:
a first acquisition unit 10 for acquiring a human body picture of a human body to be recognized by a two-dimensional camera;
a first extracting unit 20, configured to extract body contour information, posture information, and gender information of the human body to be recognized according to the human body picture;
a generating unit 30 for generating an SMPL model from the body contour information, the posture information, and the gender information;
a second obtaining unit 40, configured to obtain a 3D node feature in the SMPL model;
the second extraction unit 50 is configured to extract 2D node features of the human body to be identified in the human body picture;
a generating unit 60, configured to generate an error function according to the 3D node characteristics and the 2D node characteristics;
an adjusting unit 70, configured to adjust the SMPL model according to the error function, and extract a target 3D joint feature of the adjusted SMPL model;
a third obtaining unit 80, configured to obtain skeleton information according to the target 3D joint feature;
and the analysis unit 90 is used for carrying out posture analysis according to the skeleton information.
In one embodiment, the first extraction unit 20 includes:
the mask subunit is used for performing mask processing on the human body picture to obtain a mask of the human body to be identified;
the first extraction subunit is used for extracting potential edge points of the mask by adopting a Sobel operator;
the connecting subunit is configured to use one of the potential edge points as a starting point, and connect each potential edge point by using an edge tracking algorithm to obtain an end-to-end closed curve as the body contour information of the human body to be recognized;
the second extraction subunit is used for inputting the human body picture into a preset posture recognition model to extract the posture information of the human body to be recognized;
and the third extraction subunit is used for inputting the human body picture into a pre-trained gender classifier to extract the gender information.
In an embodiment, the generating unit 60 includes:
a first acquisition subunit, configured to acquire the two-dimensional camera parameters;
a generating subunit, configured to input the two-dimensional camera parameters, the 3D node features, and the 2D node features into a preset error formula to generate the error function, where the error function is Eall=Ej(a,b;k,J)+E1(b)+E2(b;a)+E3(a) Wherein a is the body contour information, b is the pose information, k is the two-dimensional camera parameter, and J is the 2D node feature.
In an embodiment, the adjusting unit 70 includes:
the optimization subunit is used for optimizing the error function through a Bowden dogleg algorithm to obtain an optimal solution of the error function;
and the second acquisition subunit is used for adjusting the SMPL model according to the optimal demodulation and extracting the target 3D joint characteristics of the adjusted SMPL model.
In an embodiment, the first obtaining subunit includes:
the acquisition module is used for acquiring EXIF information of the human body picture;
a first determining module, configured to determine a focal length of the two-dimensional camera according to the EXIF information;
and the second determining module is used for determining the shooting depth of the two-dimensional camera through a similar triangle principle.
In an embodiment, the second extraction unit 50 includes:
the fourth extraction subunit is used for extracting the 2D node features through a pre-trained node feature extraction model; and the node feature extraction model is completed based on the training of a fully-connected convolutional neural network model.
In this embodiment, please refer to the above method embodiment for specific implementation of the above units, sub-units, and modules, which are not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing human body pictures and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a two-dimensional camera based gesture recognition method.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing a method for gesture recognition based on a two-dimensional camera.
In summary, for the gesture recognition method, device, equipment and storage medium based on the two-dimensional camera provided in the embodiment of the present application, a human body picture of a human body to be recognized is obtained by the two-dimensional camera; extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture; generating an SMPL model according to the body contour information, the posture information and the gender information; acquiring 3D node characteristics in the SMPL model; extracting 2D node characteristics of the human body to be identified in the human body picture; generating an error function according to the 3D node characteristics and the 2D node characteristics; adjusting the SMPL model according to the error function, and extracting target 3D joint features of the adjusted SMPL model; acquiring skeleton information according to the target 3D joint characteristics; and carrying out posture analysis according to the skeleton information. According to the scheme, the human body picture is obtained only through the single two-dimensional camera, the human body pictures at multiple angles are not needed, the 3D node characteristics are combined on the 2D node characteristics of the human body picture, and the obtained human body three-dimensional skeleton information can be well analyzed in posture caused by shielding by foreign matters and body intersection.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (10)

1. A gesture recognition method based on a two-dimensional camera is characterized by comprising the following steps:
acquiring a human body picture of a human body to be identified through a two-dimensional camera;
extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture;
generating an SMPL model according to the body contour information, the posture information and the gender information;
acquiring 3D node characteristics in the SMPL model;
extracting 2D node characteristics of the human body to be identified in the human body picture;
generating an error function according to the 3D node characteristics and the 2D node characteristics;
adjusting the SMPL model according to the error function, and extracting target 3D joint features of the adjusted SMPL model;
acquiring skeleton information according to the target 3D joint characteristics;
and carrying out posture analysis according to the skeleton information.
2. The two-dimensional camera-based gesture recognition method according to claim 1, wherein the step of extracting body contour information, gesture information, and gender information of the human body to be recognized according to the human body picture comprises:
carrying out mask processing on the human body picture to obtain a mask of the human body to be identified;
extracting potential edge points of the mask by using a Sobel operator;
connecting each potential edge point by using an edge tracking algorithm with one potential edge point as a starting point to obtain a closed curve connected end to end as the body contour information of the human body to be recognized;
inputting the human body picture into a preset gesture recognition model to extract the gesture information of the human body to be recognized;
and inputting the human body picture into a pre-trained gender classifier to extract the gender information.
3. The two-dimensional camera based gesture recognition method of claim 1, wherein the step of generating an error function from the 3D node features and the 2D node features comprises:
acquiring the two-dimensional camera parameters;
inputting the two-dimensional camera parameters, the 3D node characteristics and the 2D node characteristics into a preset error formula to generate the error function, wherein the error function is Eall=Ej(a,b;k,J)+E1(b)+E2(b;a)+E3(a) Wherein a is the body contour information, b is the pose information, k is the two-dimensional camera parameter, and J is the 2D node feature.
4. The two-dimensional camera-based pose recognition method of claim 1, wherein the step of adjusting the SMPL model according to the error function and extracting the target 3D joint features of the adjusted SMPL model comprises:
optimizing the error function through a Bowden dogleg algorithm to obtain an optimal solution of the error function;
and adjusting the SMPL model according to the optimal demodulation, and extracting the target 3D joint features of the adjusted SMPL model.
5. A two-dimensional camera based pose recognition method according to claim 3, wherein the step of obtaining the two-dimensional camera parameters comprises:
acquiring EXIF information of the human body picture;
determining a focal length of the two-dimensional camera according to the EXIF information;
and determining the shooting depth of the two-dimensional camera by a similar triangle principle.
6. The two-dimensional camera-based gesture recognition method according to claim 1, wherein the step of extracting the 2D node features of the human body to be recognized in the human body picture comprises:
extracting the 2D node characteristics through a pre-trained node characteristic extraction model; and the node feature extraction model is completed based on the training of a fully-connected convolutional neural network model.
7. A gesture recognition apparatus based on a two-dimensional camera, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a human body picture of a human body to be recognized through a two-dimensional camera;
the first extraction unit is used for extracting body contour information, posture information and gender information of the human body to be recognized according to the human body picture;
a generating unit, configured to generate an SMPL model according to the body contour information, the posture information, and the gender information;
a second obtaining unit, configured to obtain a 3D node feature in the SMPL model;
the second extraction unit is used for extracting the 2D node characteristics of the human body to be identified in the human body picture;
a generating unit, configured to generate an error function according to the 3D node feature and the 2D node feature;
the adjusting unit is used for adjusting the SMPL model according to the error function and extracting target 3D joint features of the adjusted SMPL model;
a third obtaining unit, configured to obtain skeleton information according to the target 3D joint feature;
and the analysis unit is used for carrying out posture analysis according to the skeleton information.
8. The two-dimensional camera-based gesture recognition apparatus according to claim 7, wherein the first extraction unit includes:
the mask subunit is used for performing mask processing on the human body picture to obtain a mask of the human body to be identified;
the first extraction subunit is used for extracting potential edge points of the mask by adopting a Sobel operator;
the connecting subunit is configured to use one of the potential edge points as a starting point, and connect each potential edge point by using an edge tracking algorithm to obtain an end-to-end closed curve as the body contour information of the human body to be recognized;
the second extraction subunit is used for inputting the human body picture into a preset posture recognition model to extract the posture information of the human body to be recognized;
and the third extraction subunit is used for inputting the human body picture into a pre-trained gender classifier to extract the gender information.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the two-dimensional camera-based pose recognition method according to any one of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the two-dimensional camera-based pose recognition method according to any one of claims 1 to 6.
CN202011339565.4A 2020-11-25 2020-11-25 Gesture recognition method, device, equipment and storage medium based on two-dimensional camera Active CN112464791B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011339565.4A CN112464791B (en) 2020-11-25 2020-11-25 Gesture recognition method, device, equipment and storage medium based on two-dimensional camera
PCT/CN2021/084543 WO2021208740A1 (en) 2020-11-25 2021-03-31 Pose recognition method and apparatus based on two-dimensional camera, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011339565.4A CN112464791B (en) 2020-11-25 2020-11-25 Gesture recognition method, device, equipment and storage medium based on two-dimensional camera

Publications (2)

Publication Number Publication Date
CN112464791A true CN112464791A (en) 2021-03-09
CN112464791B CN112464791B (en) 2023-10-27

Family

ID=74807909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011339565.4A Active CN112464791B (en) 2020-11-25 2020-11-25 Gesture recognition method, device, equipment and storage medium based on two-dimensional camera

Country Status (2)

Country Link
CN (1) CN112464791B (en)
WO (1) WO2021208740A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021208740A1 (en) * 2020-11-25 2021-10-21 平安科技(深圳)有限公司 Pose recognition method and apparatus based on two-dimensional camera, and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120057761A1 (en) * 2010-09-01 2012-03-08 Sony Corporation Three dimensional human pose recognition method and apparatus
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model
CN111968217A (en) * 2020-05-18 2020-11-20 北京邮电大学 SMPL parameter prediction and human body model generation method based on picture

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007310707A (en) * 2006-05-19 2007-11-29 Toshiba Corp Apparatus and method for estimating posture
CN110189397A (en) * 2019-03-29 2019-08-30 北京市商汤科技开发有限公司 A kind of image processing method and device, computer equipment and storage medium
CN112464791B (en) * 2020-11-25 2023-10-27 平安科技(深圳)有限公司 Gesture recognition method, device, equipment and storage medium based on two-dimensional camera

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120057761A1 (en) * 2010-09-01 2012-03-08 Sony Corporation Three dimensional human pose recognition method and apparatus
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model
CN111968217A (en) * 2020-05-18 2020-11-20 北京邮电大学 SMPL parameter prediction and human body model generation method based on picture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021208740A1 (en) * 2020-11-25 2021-10-21 平安科技(深圳)有限公司 Pose recognition method and apparatus based on two-dimensional camera, and device and storage medium

Also Published As

Publication number Publication date
CN112464791B (en) 2023-10-27
WO2021208740A1 (en) 2021-10-21

Similar Documents

Publication Publication Date Title
KR101791590B1 (en) Object pose recognition apparatus and method using the same
US20200272806A1 (en) Real-Time Tracking of Facial Features in Unconstrained Video
CN107808111A (en) For pedestrian detection and the method and apparatus of Attitude estimation
CN105138980A (en) Identify authentication method and system based on identity card information and face identification
CN109410318A (en) Threedimensional model generation method, device, equipment and storage medium
CN110874865A (en) Three-dimensional skeleton generation method and computer equipment
JP5937823B2 (en) Image collation processing apparatus, image collation processing method, and image collation processing program
JP7192872B2 (en) Iris authentication device, iris authentication method, iris authentication program and recording medium
CN112949468A (en) Face recognition method and device, computer equipment and storage medium
CN111008621B (en) Object tracking method and device, computer equipment and storage medium
CN114894337B (en) Temperature measurement method and device for outdoor face recognition
CN111680573B (en) Face recognition method, device, electronic equipment and storage medium
CN110443254A (en) The detection method of metallic region, device, equipment and storage medium in image
KR20190142553A (en) Tracking method and system using a database of a person's faces
CN112464791B (en) Gesture recognition method, device, equipment and storage medium based on two-dimensional camera
JP5503510B2 (en) Posture estimation apparatus and posture estimation program
US20230132479A1 (en) Systems and methods for personalized patient body modeling
Park et al. 3D face reconstruction from stereo video
CN113516046A (en) Method, device, equipment and storage medium for monitoring biological diversity in area
JP2019012497A (en) Portion recognition method, device, program, and imaging control system
CN111275059A (en) Image processing method and device and computer readable storage medium
CN113610969B (en) Three-dimensional human body model generation method and device, electronic equipment and storage medium
Becker et al. COMBI: artificial intelligence for computer-based forensic analysis of persons
CN110490950B (en) Image sample generation method and device, computer equipment and storage medium
Ince et al. Human Identification Using Video-Based Analysis of the Angle Between Skeletal Joints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant