CN112183506A - Human body posture generation method and system - Google Patents

Human body posture generation method and system Download PDF

Info

Publication number
CN112183506A
CN112183506A CN202011369283.9A CN202011369283A CN112183506A CN 112183506 A CN112183506 A CN 112183506A CN 202011369283 A CN202011369283 A CN 202011369283A CN 112183506 A CN112183506 A CN 112183506A
Authority
CN
China
Prior art keywords
key point
coordinate
human body
neural network
body posture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011369283.9A
Other languages
Chinese (zh)
Inventor
唐浩
范宇航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Tishi Technology Co ltd
Original Assignee
Chengdu Tishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Tishi Technology Co ltd filed Critical Chengdu Tishi Technology Co ltd
Priority to CN202011369283.9A priority Critical patent/CN112183506A/en
Publication of CN112183506A publication Critical patent/CN112183506A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body posture generation method and a system thereof, comprising the following steps: acquiring multi-channel synchronous video stream data; capturing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame; 2D skeleton key point detection is carried out on the pedestrian image set, and 2D skeleton key point coordinates and confidence degrees of pedestrians are generated; and constructing a coordinate transformation matrix based on the projection matrix of the camera unit, the 2D skeleton key point coordinates and the confidence coefficient thereof, and generating human body posture information by utilizing a triangulation algorithm. According to the method, the coordinates and the confidence coefficients of the 2D skeleton key points are calculated, the initial coordinate conversion matrix is established, and meanwhile the weight of each coordinate in the initial coordinate conversion matrix is updated based on the confidence coefficients, so that the problem that the calculation result of the space 3D coordinates deviates from the true value seriously due to different shielding degrees of the 2D skeleton key points of the images acquired by a plurality of camera units is solved.

Description

Human body posture generation method and system
Technical Field
The invention relates to the technical field of human body posture recognition, in particular to a human body posture generation method and a human body posture generation system.
Background
The 3D human body posture estimation method can be used as the basis of tasks such as human body posture recognition, behavior recognition and human body tracking, and has high application value in the fields of medical treatment, monitoring, human-computer interaction and the like. At present, 3D human body posture estimation methods can be divided into a human body posture estimation method based on a single camera and a human body posture estimation method based on multiple cameras. The 3D posture estimation method based on the single camera estimates the depth of a human body in an image through the foreground and background difference of the image, and restores the position of the human body posture in a 3D space by combining a 2D human body posture joint point estimation algorithm. The 3D posture estimation method based on the multiple cameras comprises the steps of firstly estimating 2D joint point coordinates of a human body in each camera respectively, and then calculating 3D space coordinates of the joint points through a triangulation method.
With the development of deep learning with a convolutional neural network as a core and the remarkable improvement of computing power in recent years, real-time 3D human body posture estimation based on multiple cameras becomes the best choice for many applications. The spatial position of the cameras can be determined through external reference calibration among the cameras based on the multiple cameras, and therefore the 3D spatial position is obtained through a triangulation method.
However, the traditional spatial 3D coordinate calculation method based on multiple cameras still has the problems of low detection precision and poor adaptability.
Disclosure of Invention
In view of the above, the invention provides a human body posture generating method and a human body posture generating system, which solve the problems of low detection precision and poor adaptability of the existing spatial 3D coordinate calculation method based on multiple cameras by improving an image detection method.
In order to solve the above problems, the technical scheme of the invention is to adopt a human body posture generation method, which comprises the following steps: s1: acquiring multi-channel synchronous video stream data based on a plurality of camera units; s2: grabbing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame; s3: 2D skeleton key point detection is carried out on the pedestrian image set, and 2D skeleton key point coordinates and confidence degrees of pedestrians are generated; s4: and constructing a coordinate transformation matrix based on the projection matrix of the camera unit, the 2D skeleton key point coordinates and the confidence coefficient thereof, and generating human body posture information by utilizing a triangulation algorithm.
Optionally, the S4 includes: constructing an initial transformation matrix based on the projection matrix and the 2D bone key point coordinates; updating the weight of each 2D bone key point coordinate in the initial conversion matrix based on the confidence coefficient corresponding to the 2D bone key point coordinate and generating the coordinate conversion matrix; and calculating a space 3D coordinate by using a formula (A x W) x Y =0 and generating human body posture information, wherein A is the initial transformation matrix, W is a confidence coefficient corresponding to the 2D skeleton key point coordinate, Y is the space 3D coordinate, and (A x W) is the coordinate transformation matrix.
Optionally, the S3 includes: constructing a neural network model for generating the 2D bone key point coordinates; constructing a regression model based on a training sample set and the neural network model, and training the neural network model; and after the sizes of a plurality of pedestrian images contained in the pedestrian image set are consistent, inputting the pedestrian images into the trained neural network model, and obtaining the 2D skeleton key point coordinates and the confidence coefficient of the pedestrian.
Optionally, the S1 includes: carrying out internal reference calibration on the plurality of camera units to obtain internal reference and distortion parameters; selecting a main camera, carrying out external parameter calibration on the rest of the plurality of camera units outside the main camera to obtain external parameter and translation vectors, and calculating the projection matrix; and acquiring the synchronous video stream data of the multi-channel after the distortion correction by using an image processing function based on the internal parameters and the distortion parameters.
Correspondingly, the invention provides a human body posture generating system which comprises a camera shooting unit and a data processing unit, wherein the data processing unit comprises a camera driving module, an image capturing module, a neural network module and a coordinate conversion module, wherein the camera shooting unit is used for acquiring multi-channel synchronous video stream data; the camera driving module is used for driving the camera shooting unit and receiving the synchronous video stream data; the image capture module is used for capturing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame; the neural network module is used for carrying out 2D skeleton key point detection on the pedestrian image set and generating 2D skeleton key point coordinates and confidence degrees of pedestrians; and the coordinate conversion module is used for constructing a coordinate conversion matrix through the 2D skeleton key point coordinates and the confidence coefficient thereof and generating human body posture information by utilizing a triangulation algorithm.
Optionally, the human body posture generating system further comprises a cross-platform computer vision library unit for providing image processing functions required by the camera driving module and the neural network module.
Optionally, the human body posture generating system further includes a data storage unit, configured to store a training sample set required by the neural network module.
Optionally, the neural network module builds a neural network model for generating the 2D bone key point coordinates, builds a regression model based on the training sample set and the neural network model, trains the neural network model, and inputs a plurality of pedestrian images contained in the pedestrian image set after size unification into the trained neural network model to obtain the 2D bone key point coordinates and confidence thereof of the pedestrian.
Optionally, the coordinate transformation module may construct an initial transformation matrix based on the 2D bone key point coordinates, update a weight of each 2D bone key point coordinate in the initial transformation matrix based on a confidence corresponding to the 2D bone key point coordinates and generate the coordinate transformation matrix, and calculate a spatial 3D coordinate using a formula (a × W) × Y =0 and generate the body posture information, where a is the initial transformation matrix, W is the confidence corresponding to the 2D bone key point coordinates, Y is the spatial 3D coordinate, and (a × W) is the coordinate transformation matrix.
The primary improvement of the human body posture generating method is that the coordinates and the confidence coefficient of the 2D skeleton key points are calculated through the neural network module, the weight of each coordinate in the initial coordinate conversion matrix is updated based on the confidence coefficient while the initial coordinate conversion matrix is established, the problem that the calculation result of the space 3D coordinates is seriously deviated from the true value due to different shielding degrees of the 2D skeleton key points of the images acquired by a plurality of camera units is solved, the accuracy of the output space 3D coordinates is effectively improved, and the anti-interference capability and the detection precision of the human body posture generating system are improved.
Drawings
FIG. 1 is a simplified flow diagram of a human body pose generation method of the present invention;
FIG. 2 is a simplified block diagram of the human gesture generation system of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a human body posture generating method includes: s1: acquiring multi-channel synchronous video stream data based on a plurality of camera units; s2: grabbing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame; s3: 2D skeleton key point detection is carried out on the pedestrian image set, and 2D skeleton key point coordinates and confidence degrees of pedestrians are generated; s4: and constructing a coordinate transformation matrix based on the projection matrix of the camera unit, the 2D skeleton key point coordinates and the confidence coefficient thereof, and generating human body posture information by utilizing a triangulation algorithm.
The inventor finds that if a neural network module of a partial neural network framework is used for calculating the 2D bone key point coordinates of a human body in an image, the neural network module predicts the 2D bone key point coordinates by calculating the confidence coefficient of the 2D bone key point in the calculation process, and outputs the 2D bone key point coordinates and the confidence coefficient thereof, wherein the confidence coefficient is the shielding degree of the 2D bone key point predicted by the neural network module, and the traditional method for calculating the space 3D coordinates considers the 2D bone key point as completely credible. But due to the 2D skeletal key of multiple camera acquisition in most casesThe occlusion degree of the point coordinates is different (namely, the confidence degrees of the 2D bone key point coordinates are different), so that the accuracy of the finally output spatial 3D coordinates is low, and the related information of the confidence degrees of the 2D bone key point coordinates becomes redundant information, which wastes the calculation power of related neural network units. In order to solve the problem, the coordinates and the confidence coefficient of the 2D skeleton key points are calculated through the neural network module, the weight of each coordinate in the initial coordinate conversion matrix is updated based on the confidence coefficient while the initial coordinate conversion matrix is established, the problem that the calculation result of the space 3D coordinates is seriously deviated from the true value due to different shielding degrees of the 2D skeleton key points of the images acquired by a plurality of camera units is solved, the accuracy of the output space 3D coordinates is effectively improved, the anti-interference capability and the detection precision of a human body posture generation system are improved, and redundant information during early-stage operation is fully utilized. Specifically, an initial transformation matrix is constructed based on the projection matrix and the 2D skeleton key point coordinates; updating the weight of each 2D bone key point coordinate in the initial conversion matrix based on the confidence coefficient corresponding to the 2D bone key point coordinate and generating the coordinate conversion matrix; calculating a space 3D coordinate by using a formula (A x W) x Y =0 and generating human body posture information, wherein A is the initial transformation matrix corresponding to the 2D skeleton key point coordinate; w is the confidence corresponding to the 2D skeleton key point coordinate; y is the spatial 3D coordinate; and (A x W) is the coordinate transformation matrix, the meaning of multiplying the matrix A by the numerical value W is that the coefficients of the row vectors of the initial transformation matrix A are updated through the confidence W, and the coordinate transformation matrix is generated. Specifically, the construction of the initial transformation matrix based on the 2D bone key point coordinates includes a projection matrix P based on a plurality of camera units and corresponding 2D bone key point coordinates (x) acquired by each camera uniti,yi) Constructing a transformation matrix, for example: taking the calculation of the first skeleton point as an example, the first 2D skeleton key point coordinates (x, y) and the projection matrix P of the corresponding first camera unit are obtained to construct a first initial transformation matrix a1=
Figure 564030DEST_PATH_IMAGE001
Based onUpdating the first initial transformation matrix and generating a first coordinate transformation matrix according to the confidence degree of the first 2D bone key point coordinate
Figure 35462DEST_PATH_IMAGE002
=
Figure 230951DEST_PATH_IMAGE001
W; repeating the steps until generating a coordinate transformation matrix of the first 2D skeleton key point coordinate under the conditions of different projection matrixes and confidence degrees of all the camera units
Figure 395216DEST_PATH_IMAGE002
Figure 433580DEST_PATH_IMAGE003
Figure 115228DEST_PATH_IMAGE004
Figure 493119DEST_PATH_IMAGE005
And form a complete coordinate transformation matrix
Figure DEST_PATH_IMAGE006
Wherein, 1, 2, 3.. n is the number of the image pickup unit. Performing SVD on the coordinate conversion matrix to obtain a spatial 3D coordinate of a bone key point corresponding to the first 2D bone key point; and repeating the steps to calculate the coordinates of all the 2D skeleton key points, so as to generate complete human body posture information.
Further, the S3 includes:
constructing a neural network model for generating the 2D bone key point coordinates, wherein the neural network model can be a deep convolution neural network comprising a 153-layer network, the input of the model is an RGB three-channel image, and the output of the model is (key points, confidence), wherein the key points are the 2D bone key point coordinates (x)i,yi) Confidence is the confidence that the bone key point is not occluded;
constructing a regression model based on a training sample set (I, key points, confidence) and the neural network model, training the neural network model, wherein I is a pedestrian image, the key points are coordinates of 2D bone key points in the image, the confidence is confidence that the bone key points are not shielded, the regression model adopts a random gradient descent method to calculate a difference value L1 between the output key point coordinates and coordinates in a real label and a predicted difference value L2, a total difference value Loss = L1+ L2 is calculated, parameters of the network are modified through gradient back transmission, and the neural network model is trained;
the sizes of a plurality of pedestrian images contained in the pedestrian image set are unified and then input into the trained neural network model, the 2D skeleton key point coordinates and the confidence coefficient of the pedestrians are obtained, wherein, after each input pedestrian image with consistent size is subjected to convolution operation, maximum pooling operation, deconvolution operation and mean pooling operation, a thermodynamic diagram for predicting a plurality of 2D skeleton key point coordinates and a highest confidence coefficient corresponding to the 2D skeleton key point coordinates are output, and based on the thermodynamic diagram and the highest confidence coefficient corresponding to the 2D skeleton key point coordinates, the coordinate with the highest confidence coefficient in the thermodynamic diagram is extracted as the predicted coordinate of the 2D skeleton key point, the 2D skeleton key point coordinates and the corresponding confidence coefficient can be output, and all the 2D skeleton key point coordinates and the confidence coefficients of the pedestrians can be obtained by repeating the steps.
Further, the S2 includes: and capturing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame, wherein the pedestrian image set can be directly formed by a plurality of pedestrian images acquired by a plurality of camera units, or a plurality of second pedestrian images generated after pedestrian detection is carried out on the pedestrian images acquired by the camera units, and a user can self-regulate the generation mode of the pedestrian image set according to the actual application scene and the requirement of the camera units.
Specifically, the forming the pedestrian image set by using a plurality of second pedestrian images generated after pedestrian detection is performed on a plurality of pedestrian images acquired by a plurality of image pickup units includes:
constructing a second neural network model for pedestrian detection, wherein the second neural network model can be a pedestrian detection model based on a YOLO detection framework, the input of the model is RBG three-channel images of 224 × 3, and the output is (x, y, w, h, confidence), wherein x and y are coordinates of the upper left corner of a target frame for detecting a coming person, w is the width of the target frame, h is the height of the target frame, and confidence represents the confidence of the pedestrian in the target frame;
constructing a regression model based on a training sample set and the second neural network model, training the second neural network model, wherein the training sample set can be a picture set and a label set represented by (I, x, y, w, h), wherein I is a complex background image containing pedestrians, and (x, y, w, h) is the position of a target frame of the pedestrians in the image, specifically, after the regression model is constructed, the RBG image containing the pedestrians is input, and (x, y, w, h, confidence) is output, and the parameters of the network are modified through gradient back-transmission by adopting a random gradient descent method through calculating the difference value of the output coordinates and the coordinates in the real labels;
inputting a plurality of pedestrian images into the trained second neural network model, and acquiring pedestrian coordinates of the plurality of pedestrian images;
and extracting the second pedestrian image contained in the corresponding pedestrian image based on the plurality of pedestrian coordinates, and generating the pedestrian image set. Further, extracting the second pedestrian image included in the pedestrian image corresponding to the second pedestrian image based on the plurality of pedestrian coordinates, and generating the pedestrian image set includes: constructing an extraction frame based on the pedestrian coordinates output by the second neural network model; extracting images of the corresponding areas of the pedestrian images based on the extraction frame to form the second pedestrian image; repeating the steps until the second pedestrian image completely containing the pedestrian coordinates is traversed and the pedestrian image set is formed. Wherein, when at least two second pedestrian images exist in the plurality of pedestrian images, the plurality of second pedestrian images are extracted to form the pedestrian image set; and when only one second pedestrian image exists in the plurality of pedestrian images, acquiring a next frame of acquired image of the multi-channel and detecting the pedestrian.
Further, the S1 includes: carrying out internal reference calibration on a plurality of camera units to obtain internal reference K and distortion parameters, wherein the internal reference K is (f)x,fy,u0,v0) Wherein f isx,fyIs the focal length of the camera, u0And v0The coordinates of the principal points, and the distortion parameters of the camera unit are (k 1, k2, k3, p1, p 2), wherein k1, k2 and k3 are the radial distortion of the camera, and p1 and p2 are the tangential distortion of the camera; selecting a main camera, carrying out external reference calibration on the rest of the plurality of camera units outside the main camera to obtain an external reference T = (Tx, Ty, Tz), wherein Tx, Ty and Tz are coordinates in an X-axis direction, coordinates in a Y-axis direction and translation vectors of the coordinates in a Z-axis direction when the coordinates in a camera coordinate system are converted to coordinates in a world coordinate system, and calculating a projection matrix R = R (alpha, beta and gamma), wherein gamma is a rotation angle around a Z-axis of the camera coordinate system, beta is a rotation angle around a Y-axis, and alpha is a rotation angle around the X-axis; and acquiring the current frame acquisition image of the multi-channel after the distortion correction by using an image processing function based on the internal parameters and the distortion parameters.
Correspondingly, as shown in fig. 2, the present invention provides a human body posture generating system, which is characterized by comprising a camera unit and a data processing unit, wherein the data processing unit comprises a camera driving module, an image capturing module, a neural network module and a coordinate conversion module, wherein the camera unit is used for acquiring multi-channel synchronous video stream data; the camera driving module is used for driving the camera shooting unit and receiving the synchronous video stream data; the image capture module is used for capturing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame; the neural network module is used for carrying out 2D skeleton key point detection on the pedestrian image set and generating 2D skeleton key point coordinates and confidence degrees of pedestrians; and the coordinate conversion module is used for constructing a coordinate conversion matrix through the 2D skeleton key point coordinates and the confidence coefficient thereof and generating human body posture information by utilizing a triangulation algorithm. The image capture module is further used for performing image rectification, filtering and the like on multi-frame image data contained in the synchronous video stream data acquired by the camera unit.
Further, the human body posture generation system further comprises a cross-platform computer vision library unit and a data storage unit, wherein the cross-platform computer vision library unit is used for providing image processing functions required by the camera driving module and the neural network module, and the data storage unit is used for storing a training sample set required by the neural network module.
Further, the neural network module builds a neural network model for generating the 2D bone key point coordinates, builds a regression model based on the training sample set and the neural network model, trains the neural network model, and inputs a plurality of pedestrian images contained in the pedestrian image set after size unification into the trained neural network model to obtain the 2D bone key point coordinates and confidence thereof of the pedestrian.
Further, the coordinate transformation module may construct an initial transformation matrix based on the 2D bone key point coordinates, update a weight of each 2D bone key point coordinate in the initial transformation matrix based on a confidence corresponding to the 2D bone key point coordinates and generate the coordinate transformation matrix, and calculate a spatial 3D coordinate using a formula (a × W) × Y =0 and generate body posture information, where a is the initial transformation matrix, W is the confidence corresponding to the 2D bone key point coordinates, Y is the spatial 3D coordinate, and (a × W) is the coordinate transformation matrix.
The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and these modifications and adaptations should be considered within the scope of the invention.

Claims (9)

1. A human body posture generation method is characterized by comprising the following steps:
s1: acquiring multi-channel synchronous video stream data based on a plurality of camera units;
s2: grabbing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame;
s3: 2D skeleton key point detection is carried out on the pedestrian image set, and 2D skeleton key point coordinates and confidence degrees of pedestrians are generated;
s4: and constructing a coordinate transformation matrix based on the projection matrix of the camera unit, the 2D skeleton key point coordinates and the confidence coefficient thereof, and generating human body posture information by utilizing a triangulation algorithm.
2. The human body posture generation method as claimed in claim 1, wherein said S4 includes:
constructing an initial transformation matrix based on the projection matrix and the 2D bone key point coordinates;
updating the weight of each 2D bone key point coordinate in the initial conversion matrix based on the confidence coefficient corresponding to the 2D bone key point coordinate and generating the coordinate conversion matrix;
and calculating a space 3D coordinate by using a formula (A x W) x Y =0 and generating human body posture information, wherein A is the initial transformation matrix, W is a confidence coefficient corresponding to the 2D skeleton key point coordinate, Y is the space 3D coordinate, and (A x W) is the coordinate transformation matrix.
3. The human body posture generation method as claimed in claim 2, wherein said S3 includes:
constructing a neural network model for generating the 2D bone key point coordinates;
constructing a regression model based on a training sample set and the neural network model, and training the neural network model;
and after the sizes of a plurality of pedestrian images contained in the pedestrian image set are consistent, inputting the pedestrian images into the trained neural network model, and obtaining the 2D skeleton key point coordinates and the confidence coefficient of the pedestrian.
4. The human body posture generation method of claim 3, wherein the S1 includes:
carrying out internal reference calibration on the plurality of camera units to obtain internal reference and distortion parameters;
selecting a main camera, carrying out external parameter calibration on the rest of the plurality of camera units outside the main camera to obtain external parameter and translation vectors, and calculating the projection matrix;
and acquiring the synchronous video stream data of the multi-channel after the distortion correction by using an image processing function based on the internal parameters and the distortion parameters.
5. A human body posture generating system is characterized by comprising a camera shooting unit and a data processing unit, wherein the data processing unit comprises a camera driving module, an image capturing module, a neural network module and a coordinate conversion module,
the camera shooting unit is used for collecting multi-channel synchronous video stream data;
the camera driving module is used for driving the camera shooting unit and receiving the synchronous video stream data;
the image capture module is used for capturing a pedestrian image set of the same frame contained in the synchronous video stream data frame by frame;
the neural network module is used for carrying out 2D skeleton key point detection on the pedestrian image set and generating 2D skeleton key point coordinates and confidence degrees of pedestrians;
and the coordinate conversion module is used for constructing a coordinate conversion matrix through the 2D skeleton key point coordinates and the confidence coefficient thereof and generating human body posture information by utilizing a triangulation algorithm.
6. The human pose generation system of claim 5, further comprising a cross-platform computer vision library unit for providing image processing functions required by the camera drive module and the neural network module.
7. The human pose generation system of claim 6, further comprising a data storage unit for storing a set of training samples required by the neural network module.
8. The human body posture generation system of claim 7, wherein the neural network module obtains 2D bone key point coordinates of a pedestrian and a confidence thereof by constructing a neural network model for generating the 2D bone key point coordinates, constructing a regression model based on the training sample set and the neural network model, training the neural network model, and inputting a plurality of pedestrian images included in the pedestrian image set after size unification into the trained neural network model.
9. The human body pose generation system of claim 8, wherein the coordinate transformation module is capable of constructing an initial transformation matrix based on the 2D bone keypoint coordinates, updating a weight of each 2D bone keypoint coordinate in the initial transformation matrix based on a confidence corresponding to the 2D bone keypoint coordinates and generating the coordinate transformation matrix, and calculating a spatial 3D coordinate using a formula (a x W) x Y =0 and generating the human body pose information, wherein a is the initial transformation matrix, W is the confidence corresponding to the 2D bone keypoint coordinates, Y is the spatial 3D coordinate, and (a x W) is the coordinate transformation matrix.
CN202011369283.9A 2020-11-30 2020-11-30 Human body posture generation method and system Pending CN112183506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011369283.9A CN112183506A (en) 2020-11-30 2020-11-30 Human body posture generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011369283.9A CN112183506A (en) 2020-11-30 2020-11-30 Human body posture generation method and system

Publications (1)

Publication Number Publication Date
CN112183506A true CN112183506A (en) 2021-01-05

Family

ID=73918191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011369283.9A Pending CN112183506A (en) 2020-11-30 2020-11-30 Human body posture generation method and system

Country Status (1)

Country Link
CN (1) CN112183506A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344850A (en) * 2021-04-27 2021-09-03 广东工业大学 Hinge plate weld joint edge detection method
CN113762177A (en) * 2021-09-13 2021-12-07 成都市谛视科技有限公司 Real-time human body 3D posture estimation method and device, computer equipment and storage medium
CN115955603A (en) * 2022-12-06 2023-04-11 广州紫为云科技有限公司 Intelligent camera device based on somatosensory interaction of intelligent screen and implementation method
CN117314976A (en) * 2023-10-08 2023-12-29 玩出梦想(上海)科技有限公司 Target tracking method and data processing equipment
CN117557700A (en) * 2024-01-12 2024-02-13 杭州优链时代科技有限公司 Method and equipment for modeling characters
CN117314976B (en) * 2023-10-08 2024-05-31 玩出梦想(上海)科技有限公司 Target tracking method and data processing equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355825A1 (en) * 2013-06-03 2014-12-04 Samsung Electronics Co., Ltd. Method and apparatus for estimating pose
CN107577451A (en) * 2017-08-03 2018-01-12 中国科学院自动化研究所 More Kinect human skeletons coordinate transformation methods and processing equipment, readable storage medium storing program for executing
CN110378871A (en) * 2019-06-06 2019-10-25 绍兴聚量数据技术有限公司 Game charater original painting copy detection method based on posture feature
WO2019222383A1 (en) * 2018-05-15 2019-11-21 Northeastern University Multi-person pose estimation using skeleton prediction
WO2020115579A1 (en) * 2018-12-03 2020-06-11 Everseen Limited System and method to detect articulate body pose
CN111291687A (en) * 2020-02-11 2020-06-16 青岛联合创智科技有限公司 3D human body action standard identification method
CN111709296A (en) * 2020-05-18 2020-09-25 北京奇艺世纪科技有限公司 Scene identification method and device, electronic equipment and readable storage medium
CN111797753A (en) * 2020-06-29 2020-10-20 北京灵汐科技有限公司 Training method, device, equipment and medium of image driving model, and image generation method, device and medium
CN111881887A (en) * 2020-08-21 2020-11-03 董秀园 Multi-camera-based motion attitude monitoring and guiding method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355825A1 (en) * 2013-06-03 2014-12-04 Samsung Electronics Co., Ltd. Method and apparatus for estimating pose
CN107577451A (en) * 2017-08-03 2018-01-12 中国科学院自动化研究所 More Kinect human skeletons coordinate transformation methods and processing equipment, readable storage medium storing program for executing
WO2019222383A1 (en) * 2018-05-15 2019-11-21 Northeastern University Multi-person pose estimation using skeleton prediction
WO2020115579A1 (en) * 2018-12-03 2020-06-11 Everseen Limited System and method to detect articulate body pose
CN110378871A (en) * 2019-06-06 2019-10-25 绍兴聚量数据技术有限公司 Game charater original painting copy detection method based on posture feature
CN111291687A (en) * 2020-02-11 2020-06-16 青岛联合创智科技有限公司 3D human body action standard identification method
CN111709296A (en) * 2020-05-18 2020-09-25 北京奇艺世纪科技有限公司 Scene identification method and device, electronic equipment and readable storage medium
CN111797753A (en) * 2020-06-29 2020-10-20 北京灵汐科技有限公司 Training method, device, equipment and medium of image driving model, and image generation method, device and medium
CN111881887A (en) * 2020-08-21 2020-11-03 董秀园 Multi-camera-based motion attitude monitoring and guiding method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KARIM ISKAKOV 等: "Learnable Triangulation of Human Pose", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION(ICCV)》 *
RICHARD HARTLEY 等: "Multiple view geometry in computer vision", 《CAMBRIDGE UNIVERSITY PRESS》 *
泡泡机器人SLAM: "基于可学习三角测量的人体姿态估计", 《HTTPS://WWW.SOHU.COM/A/359644158_715754》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344850A (en) * 2021-04-27 2021-09-03 广东工业大学 Hinge plate weld joint edge detection method
CN113762177A (en) * 2021-09-13 2021-12-07 成都市谛视科技有限公司 Real-time human body 3D posture estimation method and device, computer equipment and storage medium
CN115955603A (en) * 2022-12-06 2023-04-11 广州紫为云科技有限公司 Intelligent camera device based on somatosensory interaction of intelligent screen and implementation method
CN115955603B (en) * 2022-12-06 2024-05-03 广州紫为云科技有限公司 Intelligent camera device based on intelligent screen somatosensory interaction and implementation method
CN117314976A (en) * 2023-10-08 2023-12-29 玩出梦想(上海)科技有限公司 Target tracking method and data processing equipment
CN117314976B (en) * 2023-10-08 2024-05-31 玩出梦想(上海)科技有限公司 Target tracking method and data processing equipment
CN117557700A (en) * 2024-01-12 2024-02-13 杭州优链时代科技有限公司 Method and equipment for modeling characters
CN117557700B (en) * 2024-01-12 2024-03-22 杭州优链时代科技有限公司 Method and equipment for modeling characters

Similar Documents

Publication Publication Date Title
CN108537876B (en) Three-dimensional reconstruction method, device, equipment and storage medium
US11285613B2 (en) Robot vision image feature extraction method and apparatus and robot using the same
WO2020001168A1 (en) Three-dimensional reconstruction method, apparatus, and device, and storage medium
CN112183506A (en) Human body posture generation method and system
CN108898676B (en) Method and system for detecting collision and shielding between virtual and real objects
US20190141247A1 (en) Threshold determination in a ransac algorithm
CN112200157A (en) Human body 3D posture recognition method and system for reducing image background interference
WO2021004416A1 (en) Method and apparatus for establishing beacon map on basis of visual beacons
EP3186787A1 (en) Method and device for registering an image to a model
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN108537844B (en) Visual SLAM loop detection method fusing geometric information
Liao et al. Model-free distortion rectification framework bridged by distortion distribution map
WO2021098802A1 (en) Object detection device, method, and systerm
CN109313805A (en) Image processing apparatus, image processing system, image processing method and program
US11562489B2 (en) Pixel-wise hand segmentation of multi-modal hand activity video dataset
EP3185212B1 (en) Dynamic particle filter parameterization
CN114898407A (en) Tooth target instance segmentation and intelligent preview method based on deep learning
EP2800055A1 (en) Method and system for generating a 3D model
CN116580169B (en) Digital man driving method and device, electronic equipment and storage medium
CN113255429B (en) Method and system for estimating and tracking human body posture in video
WO2022018811A1 (en) Three-dimensional posture of subject estimation device, three-dimensional posture estimation method, and program
CN116912467A (en) Image stitching method, device, equipment and storage medium
CN115841602A (en) Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles
CN107993247A (en) Tracking positioning method, system, medium and computing device
CN112818965B (en) Multi-scale image target detection method and system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication