CN116704097B - Digitized human figure design method based on human body posture consistency and texture mapping - Google Patents

Digitized human figure design method based on human body posture consistency and texture mapping Download PDF

Info

Publication number
CN116704097B
CN116704097B CN202310671808.1A CN202310671808A CN116704097B CN 116704097 B CN116704097 B CN 116704097B CN 202310671808 A CN202310671808 A CN 202310671808A CN 116704097 B CN116704097 B CN 116704097B
Authority
CN
China
Prior art keywords
image
data
dimensional
anchor
live
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310671808.1A
Other languages
Chinese (zh)
Other versions
CN116704097A (en
Inventor
李秀平
郑磊
黄欢
王盼
丁煜
袁鹏举
陈飞
卢云强
林礼君
梅文龙
孙颖飞
傅晨嫣
叶玲
郑敏升
徐龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haoyigou Family Shopping Co ltd
Original Assignee
Haoyigou Family Shopping Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haoyigou Family Shopping Co ltd filed Critical Haoyigou Family Shopping Co ltd
Priority to CN202310671808.1A priority Critical patent/CN116704097B/en
Publication of CN116704097A publication Critical patent/CN116704097A/en
Application granted granted Critical
Publication of CN116704097B publication Critical patent/CN116704097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a digital human figure design method based on human body posture consistency and texture mapping, which comprises the following steps: image data sampling is carried out through a live broadcast platform video data stream interface, and live broadcast picture data of a live host are obtained; detecting feature points and estimating human body postures of live image data of a live host, and obtaining host image feature data; based on the character data of the anchor image, a parameterized human body model is defined through stature semantics, an anchor image database is established, and an anchor image data set is generated; optimizing the parameterized human model based on the feature points to obtain a three-dimensional digital human model; and generating a clothing image texture feature map based on live broadcasting picture data of the live broadcasting of the real person, and mapping the clothing image texture feature map to a three-dimensional digital human model to obtain a digital character image. According to the invention, three-dimensional digital human figure design is realized through the garment reuse of the anchor, so that the optimization of the design flow of the current three-dimensional digital human figure is facilitated, and the design efficiency is improved.

Description

Digitized human figure design method based on human body posture consistency and texture mapping
Technical Field
The invention belongs to the field of computer graphics and virtual reality, and particularly relates to a digital human figure design method based on human body posture consistency and texture mapping.
Background
The current internet technology is continuously developed, and the internet live broadcasting industry becomes a novel industry, and social and economic development is driven. The live broadcast of the Internet collects images and audios of the anchor through various image recording devices and pushes the images and audios to terminal devices of all users in a video stream mode through a server. However, in recent years, the development of the traditional live broadcasting industry is in a bottleneck period, and the live broadcasting mode depending on a real man main broadcasting mirror has various defects in the aspects of live broadcasting type, live broadcasting picture, live broadcasting persistence and the like, so that each live broadcasting platform and manufacturer are also continuously exploring new growth points of the Internet live broadcasting industry.
With the continuous breakthrough of the virtual reality technology, the concepts of digital twinning, metauniverse and the like gradually enter the public sight. Three-dimensional digital human images designed by taking a real person as a prototype are gradually applied to the Internet live broadcast industry at present due to the advantages of vivid images, strong reusability, wide use scenes and the like. The live broadcast system has the advantages that the live broadcast is replaced by the three-dimensional digital human image, various defects of the live broadcast can be effectively overcome, the digital live broadcast is strong in interactivity, and better live broadcast experience can be brought to audiences.
The most critical factor for determining the live broadcast effect of the digital anchor is the image design of the digital anchor. Wherein the design of the anchor clothes is a key ring. If each image is designed independently, the cost is high, and the design efficiency is low. Therefore, how to utilize the existing large number of video images of the live-action anchor, and adjust and optimize the video images on the basis, and finally form the image of the digital anchor, is a key problem of the current industrialization of the digital anchor.
Disclosure of Invention
In order to solve the technical problems, the invention provides a digital portrait design method based on human body posture consistency and texture mapping, so as to solve the problem of how to utilize a large number of existing video images of a live host, adjust and optimize the video images on the basis, and finally form a digital host image.
In order to achieve the above object, the present invention provides a method for designing a digitized portrait based on human body posture consistency and texture mapping, comprising the steps of:
image data sampling is carried out through a live broadcast platform video data stream interface, and live broadcast picture data of a live host are obtained;
performing feature point detection and human body posture estimation on the live image data of the live host to obtain host image feature data;
based on the anchor image characteristic data, establishing an anchor image database by parameterizing a human body model through stature semantic definition, and generating an anchor image data set;
optimizing the parameterized human model based on the characteristic points to obtain a three-dimensional digital human model;
and generating a clothing image texture feature map based on live broadcasting picture data of a live man, and mapping the clothing image texture feature map to a three-dimensional digital human model to obtain a digital character image.
Preferably, the method for obtaining live broadcast picture data of live people anchor includes:
accessing a data stream interface of the live broadcast platform through a network protocol and implementing an error processing mechanism;
setting sampling time, sampling frames from the live video stream according to the sampling time, and processing by using an asynchronous processing technology;
and carrying out unified preprocessing on the image size of the frames processed by the asynchronous processing technology to obtain live broadcast picture data of the live broadcasting of the live persons.
Preferably, the method for obtaining the image characteristic data of the anchor comprises the following steps:
detecting characteristic points of the image picture by a mark distribution learning method to obtain characteristic point data;
the human body posture is captured through a human body posture estimation algorithm based on the enhanced channel and the space information, and simultaneously, the posture of the anchor image is captured through a characteristic point thermodynamic diagram and a characteristic point correlation field, so that posture estimation data are obtained;
and combining the feature point data and the gesture estimation data to generate the anchor image feature data.
Preferably, the method of generating a anchor avatar data set includes:
extracting stature semantic features and parameterizing the human body model to obtain a digital human body model;
based on the digital human body model, generating a three-dimensional image of the anchor, and performing texture mapping on the three-dimensional image;
creating a main cast image database to store and manage main cast three-dimensional images defined based on stature semantics;
and generating a data set of the anchor image based on the anchor three-dimensional image defined based on stature semantics in the database.
Preferably, the method for obtaining a three-dimensional digital human model comprises:
based on the image characteristic data of the live broadcast extracted from live broadcast picture data of the live broadcast of the live man, calculating three-dimensional attitude parameters according to the biomechanics and kinematics principle of the human body;
reconstructing the three-dimensional shape of the live broadcast by a shape reconstruction method based on spatial deformation according to the position and the relative relation of the feature points in the live broadcast picture of the live broadcast;
and obtaining a three-dimensional digital human model based on the three-dimensional attitude parameters and the three-dimensional shape.
Preferably, the method of mapping onto a three-dimensional digital human model comprises:
determining a region to be mapped, and calculating vertex projection coordinates of the three-dimensional digital human model in a plane of the region to be mapped;
establishing a texture coordinate system, and calculating vertex texture coordinates of the three-dimensional digital human model;
and mapping the clothing image texture feature map onto a three-dimensional digital human model through a control point consistency rule based on the projection coordinates and the texture coordinates.
Preferably, the method for determining the area to be mapped includes:
by designating a certain triangular patch on the three-dimensional digital human model as an initial seed patch for region growing, sequentially searching adjacent triangular patches with common vertexes according to the sequence of three vertexes of the initial seed patch, and obtaining a triangular patch sequence according to a region growing rule, wherein the sequence is a region to be mapped.
Preferably, the method for calculating the vertex texture coordinates of the three-dimensional digital human model comprises the following steps:
projecting vertexes in the region to be mapped to a reference plane to obtain a coplanar three-dimensional point set;
establishing a two-dimensional coordinate system according to the coplanar point set, and obtaining initial coordinates of projection points based on the two-dimensional coordinate system;
and carrying out normalization processing on the initial coordinates to obtain vertex texture coordinates.
Compared with the prior art, the invention has the following advantages and technical effects:
according to the digital portrait design method based on human body posture consistency and texture mapping, live broadcast picture data of a live host are collected through video stream interfaces of all platforms. And carrying out feature point detection and human body posture estimation based on the enhanced channel and space information on the anchor by a mark distribution learning method to obtain anchor image data. Then parameterized mannequins are defined based on stature semantics and a database of anchor images is built. Aiming at the anchor image selected by the user, the system adjusts the digital person by adopting the characteristic point thermodynamic diagram and the characteristic point correlation field according to the characteristic point detection and posture estimation parameters so that the posture and the shape of the digital person accord with the anchor image. Finally, the invention uses the characteristic points as constraint conditions, adopts a domain migration generation countermeasure method to map the anchor image to the digital human model, and realizes the digital human avatar based on the reuse of the anchor clothing image. The invention forms a database of image images of the anchor, and realizes three-dimensional digital portrait design through the reuse of the anchor clothes. The invention is beneficial to optimizing the design flow of the current three-dimensional digital human figure, and improving the design efficiency, thereby promoting the development of industries such as Internet live broadcast and the like.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method for designing a digitized portrait according to an embodiment of the present invention;
fig. 2 is a coordinate system diagram of an embodiment of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
As shown in FIG. 1, the invention provides a digitized portrait design method based on human body posture consistency and texture mapping. The method collects live broadcast picture data of live persons and anchor persons through video stream interfaces of all platforms. And detecting feature points of the anchor and estimating the human body posture based on the enhanced channel and the space information by a mark distribution learning method to obtain anchor image data. Then parameterized mannequins are defined based on stature semantics and a database of anchor images is built. Aiming at the anchor image selected by the user, the system adjusts the digital person by adopting a characteristic point thermodynamic diagram and a characteristic point correlation field method according to the characteristic point detection and posture estimation parameters so that the posture and the shape of the digital person accord with the anchor image. Finally, the invention uses the characteristic points as constraint conditions, adopts a domain migration generation countermeasure method to map the anchor image to the digital human model, and realizes the digital human avatar based on the reuse of the anchor clothing image. The invention forms a database of image images of the anchor, and realizes three-dimensional digital portrait design through the reuse of the anchor clothes. The specific method comprises the following steps:
step 1, sampling image data based on a live broadcast platform video data stream interface to obtain live broadcast picture data of a live host;
step 1.1, accessing a data stream interface of a live broadcast platform by using a specific network protocol; and writing a network request code to realize the access to the data stream interface of the live broadcast platform. To ensure stable data acquisition, error handling mechanisms are implemented, such as attempting to reconnect or waiting for delay.
Step 1.2, sampling frames from a live video stream according to a preset time interval; and setting a sampling time interval, and reducing the size of the data volume of subsequent processing. To avoid delayed accumulation of video frame processing, asynchronous processing techniques are used to process the video stream data.
And 1.3, carrying out image preprocessing on the sampled frames. Because the resolution of the video stream images of each live broadcast platform is different, the sizes of the images obtained after sampling are subjected to unified processing in order to facilitate subsequent processing.
Step 2, feature point detection is carried out on the anchor image picture based on a mark distribution learning method, and anchor image feature data is obtained based on a human body posture estimation method of the enhanced channel and the space information;
step 2.1: and detecting the characteristic points of the image picture based on a mark distribution learning method. Given a face picture, starting from a low resolution picture input layer, the initial predicted shape is madeContinuously approximates the true shape s * . First, binary label distribution regression is performed on a low resolution layer, then, the input resolution of a picture is doubled in each layer, and binary label distribution regression of a plurality of stages is performed on each layer.
The training of the model comprises a total of m different resolution layers, each layer in turn comprising n phases, so there are a total of t=m×n training phases. The optimization objective of each stage is to learn the mapping function of the pixel block to this pixel block binary label distribution independently for each feature point. Then, annotation feature points provided based on the datasetA binary label distribution for each feature point blue box pixel block is formed. Thereafter, the mapping function Θ of the pixel block to the binary label distribution is optimized t
In the test stage of the model, an image which is not seen by the model is given, and the modelFrom the coarse initial shape s 0 And cutting out the pixel blocks of the characteristic points by taking the current predicted point as the center. Then the mapping function theta trained in the current stage is used t Binary label distribution for each pixel block is predicted. These predicted binary label distributions are passed through a variability template matching algorithm to obtain updated predicted feature points. The updated feature points are used as input for the next stage, and the test is continued until the T stages are finished.
2.2, capturing the gesture by using a human gesture estimation algorithm based on the enhanced channel and the spatial information; and capturing the gesture of the anchor image by utilizing the characteristic point thermodynamic diagram and the characteristic point correlation field. The method comprises three parts of key point detection, inter-part connecting line generation and posture estimation. Furthermore, to improve the accuracy of pose estimation, channel communication operations are used to facilitate cross-channel communication between different scale feature maps.
Step 2.3, combining the feature point data and the gesture estimation data to generate the image feature data of the anchor; in order to better extract and fuse the characteristics, a residual error module based on a space and channel attention mechanism is introduced, and the space and channel attention mechanism is fused into the original residual error module. To reduce the overhead of complex high-dimensional data analysis, principal component analysis is used to convert high-dimensional feature data into low-dimensional features.
Step 3, based on the characteristic data in the step 2, defining a parameterized human body model based on stature semantics, establishing a host image database, and forming a host image data set;
step 3.1, extracting stature semantic features and parameterizing a human body model; the avatar characteristic data of the anchor is used to define a parameterized mannequin based on stature semantics. These parameters are then used to drive the digital deformations.
Step 3.2, generating a three-dimensional image of the anchor based on the parameterized digital human body model; a parameterized mannequin based on stature semantic definitions is used to generate a three-dimensional representation of the anchor. And simultaneously, performing texture mapping on the three-dimensional model so as to keep the sense of reality of the three-dimensional image.
Step 3.3, creating and managing a main broadcasting image database; creating a host image database for storing and managing all generated host three-dimensional images based on stature semantic definition.
Step 3.4, establishing a data set of the image of the anchor; all generated anchor three-dimensional figures defined based on stature semantics are arranged into an anchor figure data set. This dataset will include avatar characteristic data for each anchor, as well as the corresponding 3D avatar. This dataset will be used for subsequent avatar reuse and digitizing of the avatar design.
Step 4, adjusting the selected anchor image model based on the characteristic parameters obtained in the step 2 to enable the posture and the shape of the anchor image model to be more fit with the three-dimensional digital human model;
step 4.1, adjusting attitude parameters of the three-dimensional model by utilizing the feature point data; and calculating corresponding three-dimensional posture parameters according to the biomechanics and kinematics principles of the human body by utilizing the characteristic point data extracted from the anchor image. These parameters, such as joint angle and relative position of the body part, will be used to adjust the pose of the digitized model.
Step 4.2, adjusting the shape of the digital human model by using the characteristic point data; and reconstructing the three-dimensional shape of the anchor according to the position and the relative relation of the feature points in the image by using the feature point data and adopting a shape reconstruction method based on spatial deformation. Then, the shape is applied to the digitized human model to adjust the shape so that the shape of the model is more close to the image of the anchor.
And 5, anchor image mapping based on feature point constraint.
And 5.1, setting the characteristic points determined in the step 3 as seed vertexes, wherein the surface patches related to the seed vertexes are seed surface patches. By designating a certain triangular patch on the three-dimensional model as an initial seed patch for region growing, sequentially searching adjacent triangular patches with common vertexes according to the sequence of three vertexes of the seed patch, and obtaining a triangular patch sequence according to the region growing rule, wherein the sequence is the region to be mapped. The selection of the seed surface patch can be determined by user interaction operation, and the two-dimensional coordinate point of the screen is selected by a user to convert the two-dimensional coordinate into the three-dimensional world coordinate, so that the surface patch where the point is located, namely the initial seed surface patch, can be obtained. Defining a seed patch as BaseD, storing a linked list of three vertexes of the seed patch as pids, and storing a candidate area linked list of the patch as ids and a linked list of a temporary storage patch as cids. The initial seed surface patch selected by the user is marked as a baseID, and three vertexes of the marked seed surface patch are respectively point A, point B and point C. Starting searching by using the mark as the point A, searching all triangular patches taking the point A as the vertex, and if the patches already exist in the candidate area linked list, not performing any processing; if the patch is not in the candidate region list, it is inserted into the candidate region list and marked as existing. And respectively adopting the same algorithm for the point B and the point C, so that a patch linked list is obtained after one-circle diffusion is carried out according to the reference patches, and the patches stored in the linked list are the areas to be mapped.
And 5.2, plane projection of the area to be mapped. Assume that non-collinear three points are taken as a point P, a point M and a point N in a region to be mapped, wherein the point P controls a source point of texture mapping, the source point corresponds to a coordinate origin of a two-dimensional texture image, and vectors PM and PN respectively control the trend of a u axis and a v axis of the texture image. As can be seen from the plane equation a (x-x) +b (y-y) +c (z-z) =0, determining a plane requires knowing the coordinate P (x) of any point on the plane 0 ,y 0 ,z 0 ) And a planar normal vector M { A, B, C }. Given that the three vertices of the plane are P, M, N, the plane normal vector m=m1×m2 can be obtained by calculating the cross product of the vectors assuming the vectors m1=m-P and m2=n-P. Thus, a plane composed of the point P, the point M, and the point N is obtained as the projected reference plane T.
Knowing that the coordinates of any vertex in the region to be mapped are Q (x, y, z), the coordinates Q (x), y, z of the projected point on the reference plane T need to be obtained. According to the simultaneous equations of the two,
k can be obtained;
thereby, the projection coordinate Q can be obtained i
And 5.3, calculating texture coordinates. And projecting the vertexes in the region to be mapped to the reference plane T to obtain a coplanar three-dimensional point set, and establishing a two-dimensional coordinate system called S-T according to the coplanar point set. As shown in fig. 2.
The origin of the T coordinate system is the first P point of three vertices in the reference plane, the edge pm=m-P is taken as the transverse coordinate axis, the range length of the transverse coordinate axis is PM, and the range length of the longitudinal coordinate axis is |pn|, whereina is the angle between the vector PN and the ordinate axis. Knowing the projected coordinates of point Q (x, y, z) on the plane as Q (x), x, z), vector PQ can be calculated. Assuming that the included angle between the vector PQ and the forward vector S of the transverse coordinate axis of the texture coordinate system is 0, Q can be obtained; the coordinates of (x, x), z) in the two-dimensional coordinate system are:
wherein the method comprises the steps of
Through the above calculation, each projected point in the plane obtains a coordinate point in the two-dimensional coordinate system ST. While the texture coordinate system u-v is constructed with the transverse axis and the longitudinal axis being [0,1 ]]So normalization of the coordinate points is required. Set S max And T max Respectively S Qi And T Qi The texture coordinates corresponding to the model vertexes finally obtained by proportional conversion are as follows:
and 5.4, generating texture mapping of the image of the anchor by using domain migration constrained by the feature points. Firstly, taking control points as constraint conditions, and generating a texture feature map by adopting a clothing image. The control point consistency rules then map the texture feature map to a digital human surface. During training, a loss function is defined using the loop consistency loss. In order to achieve a transition from one domain to another domain to the original domain, it is necessary to be able to maintain the consistency of the original input. For example, if the anchor avatar is mapped to a digital person surface by generator G and then mapped back to the anchor avatar by generator F, the resulting image should be as close as possible to the original anchor avatar. After domain migration generation antagonism network training is completed, the trained generator is utilized to map the texture information of the anchor figure to the digital human model to obtain the final digital figure.
Example two
The embodiment provides a method for reusing a garment of a host player by combining human body posture consistency and texture mapping, which comprises the following steps:
step 1, sampling image data based on a live broadcast platform video data stream interface to obtain live broadcast picture data of a live host;
step 1.1, accessing a data stream interface of a live broadcast platform by using a specific network protocol; based on API (application program interface) specifications of the target live platform, a corresponding network protocol (e.g., HTTP or WebSocket, etc.) is selected. And writing a network request code to realize the access to the data stream interface of the live broadcast platform. To ensure stable data acquisition, error handling mechanisms are implemented, such as attempting to reconnect or waiting for delay.
Step 1.2, sampling frames from a live video stream according to a preset time interval; according to the fact that the action switching frequency of live pictures of a live person and a main broadcasting is generally about 3-5 seconds after a large amount of data are analyzed, the sampling time interval is set to be 3 seconds, and the data size of subsequent processing can be greatly reduced through the design. To avoid delayed accumulation of video frame processing, asynchronous processing techniques are used to process the video stream data.
And 1.3, carrying out image preprocessing on the sampled frames. Because the video stream images of each live broadcast platform have different sizes, in order to facilitate subsequent processing, the sampled image data is uniformly scaled to 1920×1080 (the unit is pixels).
Step 2, feature point detection is carried out on the anchor image picture based on a mark distribution learning method, and anchor image feature data is obtained based on a human body posture estimation method of the enhanced channel and the space information;
step 2.1: and detecting the characteristic points of the image picture based on a mark distribution learning method. Given a face picture, starting from a low resolution picture input layer, the initial predicted shape is madeContinuously approximates the true shape s * . First, binary label distribution regression is performed on a low resolution layer, then, the input resolution of a picture is doubled in each layer, and binary label distribution regression of a plurality of stages is performed on each layer.
The training of the model comprises a total of m different resolution layers, each layer in turn comprising n phases, so there are a total of t=m×n training phases. The optimization objective of each stage is to learn the mapping function of the pixel block to this pixel block binary label distribution independently for each feature point. For example, in the t-th training stage, a pixel block is first cut out centering on the current prediction feature point xt. Then, annotation feature points provided based on the datasetA binary label distribution for each feature point blue box pixel block is formed. Thereafter, the mapping function Θ of the pixel block to the binary label distribution is optimized t The aim is to make the predicted and true binary label distributions as close as possible.
In the test stage of the model, given an image which is not seen by the model, the model starts from a rough initial shape s0, and the pixel blocks of each characteristic point are cut out by taking the current predicted point as the center. Then makeWith the mapping function theta trained in the current stage t Binary label distribution for each pixel block is predicted. These predicted binary label distributions are passed through a variability template matching algorithm to obtain updated predicted feature points. The updated feature points are used as input for the next stage, and the test is continued until the T stages are finished.
2.2, capturing the gesture by using a human gesture estimation algorithm based on the enhanced channel and the spatial information; and carrying out gesture capturing on the anchor image by using a Convolutional Neural Network (CNN) based real-time multi-person key point detection algorithm OpenPose. The method comprises three parts of key point detection, inter-part connecting line generation and posture estimation. Furthermore, to improve the accuracy of pose estimation, channel communication operations are used to facilitate cross-channel communication between different scale feature maps.
Step 2.3, combining the feature point data and the gesture estimation data to generate the image feature data of the anchor; to better extract and fuse features, a residual module Spatial, channelwise Attention Residual Bottleneck (SCARB) based on Spatial and channel attention mechanisms was introduced. The SCARB module integrates the space and channel attention mechanism into the original residual error module, and can better process the correlation between complex space information and channels. To reduce the overhead of complex high-dimensional data analysis, principal Component Analysis (PCA) is used to convert high-dimensional feature data into low-dimensional visual features. These visual features may better represent the pose and appearance features of the anchor, providing input for subsequent steps.
Step 3, based on the characteristic data in the step 2, defining a parameterized human body model based on stature semantics, establishing a host image database, and forming a host image data set;
step 3.1, extracting stature semantic features and parameterizing a human body model; the avatar characteristic data of the anchor is used to define a parameterized mannequin based on stature semantics. Specifically, the size parameters of height, weight, shoulder width and the like of the anchor, as well as the posture and shape parameters are introduced. These parameters are then used to adjust the digital manikin.
Step 3.2, generating a three-dimensional image of the anchor based on the parameterized digital human model; a parameterized digital manikin defined based on stature semantics is used to generate a three-dimensional avatar of the anchor.
Step 3.3, creating and managing a main broadcasting image database; creating a host image database for storing and managing all generated host three-dimensional images based on stature semantic definition. Specifically, the data fields stored using the database system include platform name, anchor stature data (size parameters such as height, weight, shoulder width, etc.), live broadcast category, entry time, data storage location.
Step 3.4, establishing a data set of the image of the anchor; all generated anchor figures defined based on stature semantics are arranged into an anchor figure data set. This dataset will include avatar characteristic data for each anchor, as well as the corresponding three-dimensional avatar. This dataset will be used for subsequent avatar reuse and digitizing of the avatar design.
Step 4, based on the parameters obtained in the step 2, adjusting the selected anchor image model to enable the posture and the shape of the anchor image model to be more fit with the three-dimensional digital human model;
step 4.1, adjusting attitude parameters of the three-dimensional model by utilizing the feature point data; and calculating corresponding three-dimensional posture parameters according to the biomechanics and kinematics principles of the human body by utilizing the characteristic point data extracted from the anchor image. Considering the human body structure, the elbow joint, the knee joint, the hip joint and the shoulder joint are selected as characteristic point parameters for marking the figure gesture. Then, by calculating the relative position and angle between the feature points, the rotation angle of each joint and the relative position of each body part can be obtained. The relative position distance calculation formula is:
the angle calculation formula is:
after the relative distance and angle are obtained, a rotation matrix and a displacement vector are introduced for obtaining the absolute position and the gesture of each part (such as each joint and skeleton) of the human body in the three-dimensional space. A right-hand coordinate system with j as an origin and i-j as an x-axis and k-j as a y-axis is defined, and then coordinates of other feature points in the coordinate system are calculated. And obtaining displacement vectors of the characteristic points relative to j. Then, the rotation matrix is calculated by the following formula:
R ijk is a rotation theta around the z-axis ijk A rotation matrix of angles. We can get the rotated displacement vector, i.e. the new position of the feature point with respect to j, by multiplying the displacement vector by this rotation matrix. Then, a displacement vector is calculated assuming that the coordinates of the feature point l in this coordinate system are (x l ,y l ,z l ) The displacement vector is:
step 4.2, adjusting the shape of the digital human model by using the characteristic point data; and reconstructing the three-dimensional shape of the anchor according to the position and the relative relation of the feature points in the image by using the feature point data and adopting a shape reconstruction method based on spatial deformation. Then, the shape is applied to the digitized human model to adjust the shape so that the shape of the model is more close to the image of the anchor.
And 5, anchor image mapping based on feature point constraint.
And 5.1, setting the characteristic points determined in the step 3 as seed vertexes, wherein the surface patches related to the seed vertexes are seed surface patches. By designating a certain triangular patch on the three-dimensional model as an initial seed patch for region growing, sequentially searching adjacent triangular patches with common vertexes according to the sequence of three vertexes of the seed patch, and obtaining a triangular patch sequence according to the region growing rule, wherein the sequence is the region to be mapped. The selection of the seed surface patch can be determined by user interaction operation, and the two-dimensional coordinate point of the screen is selected by a user to convert the two-dimensional coordinate into the three-dimensional world coordinate, so that the surface patch where the point is located, namely the initial seed surface patch, can be obtained. Defining a seed patch as BaseD, storing a linked list of three vertexes of the seed patch as pids, and storing a candidate area linked list of the patch as ids and a linked list of a temporary storage patch as cids. The initial seed surface patch selected by the user is marked as a baseID, and three vertexes of the marked seed surface patch are respectively point A, point B and point C. Starting searching by using the mark as the point A, searching all triangular patches taking the point A as the vertex, and if the patches already exist in the candidate area linked list, not performing any processing; if the patch is not in the candidate region list, it is inserted into the candidate region list and marked as existing. And respectively adopting the same algorithm for the point B and the point C, so that a patch linked list is obtained after one-circle diffusion is carried out according to the reference patches, and the patches stored in the linked list are the areas to be mapped.
And 5.2, plane projection of the area to be mapped. Assume that non-collinear three points are taken as a point P, a point M and a point N in a region to be mapped, wherein the point P controls a source point of texture mapping, the source point corresponds to a coordinate origin of a two-dimensional texture image, and vectors PM and PN respectively control the trend of a u axis and a v axis of the texture image. From the plane equation a (x-x) +b (y-y) +c (z-z) =0, determining a plane requires knowledge of the coordinates P (x 0, y0, z 0) of any point on the plane, and the plane normal vector M { a, B, C }. Given that the three vertices of the plane are P, M, N, the plane normal vector m=m1×m2 can be obtained by calculating the cross product of the vectors assuming the vectors m1=m-P and m2=n-P. Thus, a plane composed of the point P, the point M, and the point N is obtained as the projected reference plane T.
Knowing that the coordinates of any vertex in the region to be mapped are Q (x, y, z), the coordinates Q (x), y, z of the projected point on the reference plane T need to be obtained. According to the simultaneous equations of the two,
k can be obtained;
thereby, the projection coordinate Q can be obtained i
And 5.3, calculating texture coordinates. And projecting the vertexes in the region to be mapped to the reference plane T to obtain a coplanar three-dimensional point set, and establishing a two-dimensional coordinate system called S-T according to the coplanar point set. As shown in fig. 2.
The origin of the T coordinate system is the first P point of three vertices in the reference plane, the edge pm=m-P is taken as the transverse coordinate axis, the range length of the transverse coordinate axis is PM, and the range length of the longitudinal coordinate axis is |pn|, whereina is the angle between the vector PN and the ordinate axis. Knowing the projected coordinates of point Q (x, y, z) on the plane as Q (x), x, z), vector PQ can be calculated. Assuming that the included angle between the vector PQ and the forward vector S of the transverse coordinate axis of the texture coordinate system is 0, Q can be obtained; the coordinates of (x, x), z) in the two-dimensional coordinate system are:
wherein the method comprises the steps of
Through the above calculation, each projected point in the plane obtains a coordinate point in the two-dimensional coordinate system. While the texture coordinate system u-v is constructed with the transverse axis and the longitudinal axis being [0,1 ]]So normalization of the coordinate points is required. Set S max And T max Respectively S Qi And T Qi The texture coordinates corresponding to the model vertexes finally obtained by proportional conversion are as follows:
and 5.4, generating texture mapping of the image of the anchor by using domain migration constrained by the feature points. The CycleGAN network consists of two generators (G and F) and two discriminants (D X And D Y ) The composition is formed. Generators G and F are responsible for converting between two image domains (anchor figures and digitized person models), while arbiter D X And D Y It is responsible for distinguishing whether the generated image is from the target domain. The training goal of this network is to maintain key features of the input image while minimizing the difference of the generated image from the target image, the overall generation countering loss expressed as:
during training, a loop consistency penalty is introduced for bringing the generated texture image as close as possible to the target domain.
To effect a transition from one domain to another domain to the original domain, it must be able to maintain consistency of the original input. By calculating the cyclic consistency loss, the distance between the generated image and the target domain can be measured. For example, converting the anchor avatar to a digitized mannequin by generator G and then back to the anchor avatar by generator F, the resulting image should be as close as possible to the original anchor avatar. The total loss function can be expressed as:
L(G,F,D X ,D Y )=L GAN (G,D Y ,X,Y)+L GAN (F,D X ,Y,X)+λL cyc (G,F)
after the domain migration of the feature point constraint generates the countermeasure network training is finished, the trained generator G is utilized to map the texture information of the anchor image to the digital human model, and the final digital human image is obtained.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. The digitized portrait design method based on the consistency of human body posture and texture mapping is characterized by comprising the following steps:
image data sampling is carried out through a live broadcast platform video data stream interface, and live broadcast picture data of a live host are obtained;
performing feature point detection and human body posture estimation on the live image data of the live host to obtain host image feature data;
based on the anchor image characteristic data, establishing an anchor image database by parameterizing a human body model through stature semantic definition, and generating an anchor image data set;
optimizing the parameterized human model based on the characteristic points to obtain a three-dimensional digital human model;
generating a clothing image texture feature map based on live broadcast picture data of a live man, and mapping the clothing image texture feature map to a three-dimensional digital human model to obtain a digital character image;
the method for obtaining the image characteristic data of the anchor comprises the following steps:
detecting characteristic points of the image picture by a mark distribution learning method to obtain characteristic point data;
the human body posture is captured through a human body posture estimation algorithm based on the enhanced channel and the space information, and simultaneously, the posture of the anchor image is captured through a characteristic point thermodynamic diagram and a characteristic point correlation field, so that posture estimation data are obtained;
and combining the feature point data and the gesture estimation data to generate the anchor image feature data.
2. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 1,
the method for obtaining live broadcast picture data of the live host comprises the following steps:
accessing a data stream interface of the live broadcast platform through a network protocol and implementing an error processing mechanism;
setting sampling time, sampling frames from the live video stream according to the sampling time, and processing by using an asynchronous processing technology;
and carrying out unified preprocessing on the image size of the frames processed by the asynchronous processing technology to obtain live broadcast picture data of the live broadcasting of the live persons.
3. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 1,
the method for generating the anchor image data set comprises the following steps:
extracting stature semantic features and parameterizing the human body model to obtain a digital human body model;
based on the digital human body model, generating a three-dimensional image of the anchor, and performing texture mapping on the three-dimensional image;
creating a main cast image database to store and manage main cast three-dimensional images defined based on stature semantics;
and generating a data set of the anchor image based on the anchor three-dimensional image defined based on stature semantics in the database.
4. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 1,
the method for obtaining the three-dimensional digital human model comprises the following steps:
based on the image characteristic data of the live broadcast extracted from live broadcast picture data of the live broadcast of the live man, calculating three-dimensional attitude parameters according to the biomechanics and kinematics principle of the human body;
reconstructing the three-dimensional shape of the live broadcast by a shape reconstruction method based on spatial deformation according to the position and the relative relation of the feature points in the live broadcast picture of the live broadcast;
and obtaining a three-dimensional digital human model based on the three-dimensional attitude parameters and the three-dimensional shape.
5. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 1,
the method for mapping onto the three-dimensional digital human model comprises the following steps:
determining a region to be mapped, and calculating vertex projection coordinates of the three-dimensional digital human model in a plane of the region to be mapped;
establishing a texture coordinate system, and calculating vertex texture coordinates of the three-dimensional digital human model;
and mapping the clothing image texture feature map onto a three-dimensional digital human model through a control point consistency rule based on the projection coordinates and the texture coordinates.
6. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 5,
the method for determining the area to be mapped comprises the following steps:
by designating a certain triangular patch on the three-dimensional digital human model as an initial seed patch for region growing, sequentially searching adjacent triangular patches with common vertexes according to the sequence of three vertexes of the initial seed patch, and obtaining a triangular patch sequence according to a region growing rule, wherein the sequence is a region to be mapped.
7. The method for digitized portrait design based on human body posture consistency and texture mapping of claim 5,
the method for calculating the vertex texture coordinates of the three-dimensional digital human model comprises the following steps:
projecting vertexes in the region to be mapped to a reference plane to obtain a coplanar three-dimensional point set;
establishing a two-dimensional coordinate system according to the coplanar point set, and obtaining initial coordinates of projection points based on the two-dimensional coordinate system;
and carrying out normalization processing on the initial coordinates to obtain vertex texture coordinates.
CN202310671808.1A 2023-06-07 2023-06-07 Digitized human figure design method based on human body posture consistency and texture mapping Active CN116704097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310671808.1A CN116704097B (en) 2023-06-07 2023-06-07 Digitized human figure design method based on human body posture consistency and texture mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310671808.1A CN116704097B (en) 2023-06-07 2023-06-07 Digitized human figure design method based on human body posture consistency and texture mapping

Publications (2)

Publication Number Publication Date
CN116704097A CN116704097A (en) 2023-09-05
CN116704097B true CN116704097B (en) 2024-03-26

Family

ID=87825243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310671808.1A Active CN116704097B (en) 2023-06-07 2023-06-07 Digitized human figure design method based on human body posture consistency and texture mapping

Country Status (1)

Country Link
CN (1) CN116704097B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229246A (en) * 2016-12-14 2018-06-29 上海交通大学 Real-time three-dimensional human face posture method for tracing based on vehicle computing machine platform
CN112950769A (en) * 2021-03-31 2021-06-11 深圳市慧鲤科技有限公司 Three-dimensional human body reconstruction method, device, equipment and storage medium
CN113012282A (en) * 2021-03-31 2021-06-22 深圳市慧鲤科技有限公司 Three-dimensional human body reconstruction method, device, equipment and storage medium
CN113297988A (en) * 2021-05-28 2021-08-24 东南大学 Object attitude estimation method based on domain migration and depth completion
CN115147899A (en) * 2022-06-30 2022-10-04 广西师范大学 Head posture estimation method based on label distribution and supervised space transformation network
CN115379269A (en) * 2022-08-17 2022-11-22 咪咕文化科技有限公司 Live broadcast interaction method of virtual image, computing equipment and storage medium
KR102506352B1 (en) * 2022-06-07 2023-03-06 주식회사 엑스바디 Digital twin avatar provision system based on 3D anthropometric data for e-commerce

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229246A (en) * 2016-12-14 2018-06-29 上海交通大学 Real-time three-dimensional human face posture method for tracing based on vehicle computing machine platform
CN112950769A (en) * 2021-03-31 2021-06-11 深圳市慧鲤科技有限公司 Three-dimensional human body reconstruction method, device, equipment and storage medium
CN113012282A (en) * 2021-03-31 2021-06-22 深圳市慧鲤科技有限公司 Three-dimensional human body reconstruction method, device, equipment and storage medium
CN113297988A (en) * 2021-05-28 2021-08-24 东南大学 Object attitude estimation method based on domain migration and depth completion
KR102506352B1 (en) * 2022-06-07 2023-03-06 주식회사 엑스바디 Digital twin avatar provision system based on 3D anthropometric data for e-commerce
CN115147899A (en) * 2022-06-30 2022-10-04 广西师范大学 Head posture estimation method based on label distribution and supervised space transformation network
CN115379269A (en) * 2022-08-17 2022-11-22 咪咕文化科技有限公司 Live broadcast interaction method of virtual image, computing equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Schrum, ML et al..MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning.PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22).2022,全文. *
二维到三维人脸美丽预测研究;刘姝;中国优秀博士学位论文全文数据库 信息科技辑;全文 *
基于纹理映射的三维服装款式着装效果研究;杨天虹;中国优秀硕士学位论文全文数据库(05);全文 *

Also Published As

Publication number Publication date
CN116704097A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN110458939B (en) Indoor scene modeling method based on visual angle generation
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
US8624901B2 (en) Apparatus and method for generating facial animation
CN113421328B (en) Three-dimensional human body virtual reconstruction method and device
CN110310285A (en) A kind of burn surface area calculation method accurately rebuild based on 3 D human body
CN112085835B (en) Three-dimensional cartoon face generation method and device, electronic equipment and storage medium
CN109766866B (en) Face characteristic point real-time detection method and detection system based on three-dimensional reconstruction
US20210158593A1 (en) Pose selection and animation of characters using video data and training techniques
Li et al. Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
KR20230085931A (en) Method and system for extracting color from face images
Li et al. Spa: Sparse photorealistic animation using a single rgb-d camera
CN113593001A (en) Target object three-dimensional reconstruction method and device, computer equipment and storage medium
CN111914595B (en) Human hand three-dimensional attitude estimation method and device based on color image
Jung et al. Learning free-form deformation for 3D face reconstruction from in-the-wild images
CN106408654B (en) A kind of creation method and system of three-dimensional map
CN116704097B (en) Digitized human figure design method based on human body posture consistency and texture mapping
CN107871338A (en) Real-time, interactive rendering intent based on scene decoration
Cheng et al. An augmented reality image registration method based on improved ORB
Patterson et al. Landmark-based re-topology of stereo-pair acquired face meshes
Divya Udayan et al. Animage-based approach to the reconstruction of ancient architectures by extracting and arranging 3D spatial components
CN112132743B (en) Video face changing method capable of self-adapting illumination
CN114943799A (en) Face image processing method and device and computer readable storage medium
CN115936796A (en) Virtual makeup changing method, system, equipment and storage medium
Li et al. A 2D face image texture synthesis and 3D model reconstruction based on the Unity platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant