CN114863005A - Rendering method and device for limb special effect, storage medium and equipment - Google Patents

Rendering method and device for limb special effect, storage medium and equipment Download PDF

Info

Publication number
CN114863005A
CN114863005A CN202210412285.4A CN202210412285A CN114863005A CN 114863005 A CN114863005 A CN 114863005A CN 202210412285 A CN202210412285 A CN 202210412285A CN 114863005 A CN114863005 A CN 114863005A
Authority
CN
China
Prior art keywords
limb
special effect
target object
model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210412285.4A
Other languages
Chinese (zh)
Inventor
钱立辉
韩欣彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Huya Huxin Technology Co ltd
Original Assignee
Foshan Huya Huxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Huya Huxin Technology Co ltd filed Critical Foshan Huya Huxin Technology Co ltd
Priority to CN202210412285.4A priority Critical patent/CN114863005A/en
Publication of CN114863005A publication Critical patent/CN114863005A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The application provides a rendering method, a rendering device, a storage medium and equipment of a limb special effect, wherein the rendering method comprises the following steps: acquiring an image and a reference limb model of a target object, wherein the image comprises a limb of the target object, the reference limb model is used for presenting a rendering result of a specified limb special effect on a reference three-dimensional model, and the reference limb model comprises a mapping relation between the limb special effect and the reference three-dimensional model; generating a three-dimensional model corresponding to the target object based on the image, and acquiring camera parameters corresponding to the image; determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of which the limb special effect corresponds to the target object; rendering the limb special effect in the image according to the limb special effect model and the camera parameters. The rendered limb special effect is attached to the target object and has good stereoscopic impression.

Description

Rendering method and device for limb special effect, storage medium and equipment
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a rendering method, an apparatus, a storage medium, and a device for a limb special effect.
Background
At present, a plurality of mature video or picture special effects such as expression special effects, headgear, stickers and the like exist on application platforms such as short videos, live broadcasts, beauty cameras and the like. However, most of the special effects in the market are head special effects based on human faces or human head postures, and the research on the special effects based on human limbs after human body structural understanding is less, and almost no products fall on the ground.
Disclosure of Invention
In order to overcome the problems in the related art, the application provides a rendering method, a rendering device, a storage medium and a rendering device for a limb special effect, so as to solve the defects in the related art.
According to a first aspect of the present application, there is provided a rendering method of a limb effect, the method including:
acquiring an image of a target object and a reference limb model, wherein the image comprises a limb of the target object, the reference limb model is used for presenting a rendering result of a specified limb special effect on a reference three-dimensional model, and the reference limb model comprises a mapping relation between the limb special effect and the reference three-dimensional model;
generating a three-dimensional model corresponding to the target object based on the image, and acquiring camera parameters corresponding to the image;
determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of which the limb special effect corresponds to the target object;
rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
According to a second aspect of the present application, there is provided a rendering apparatus for limb effects, the apparatus comprising:
the system comprises an input module, a display module and a display module, wherein the input module is used for acquiring an image of a target object and a reference limb model, the image comprises a limb of the target object, the reference limb model is used for presenting a rendering result of a specified limb special effect on a reference three-dimensional model, and the reference limb model comprises a mapping relation between the limb special effect and the reference three-dimensional model;
the human body modeling module is used for generating a three-dimensional model corresponding to the target object based on the image and acquiring camera parameters corresponding to the image;
the special effect modeling module is used for determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of which the limb special effect corresponds to the target object;
and the rendering module is used for rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
According to a third aspect of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the embodiments described above.
According to a fourth aspect of the present application, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the above embodiments when executing the program.
The three-dimensional model of the target object generated in the method has the same gesture with the target object in the input image, and the limb special effect model obtained after the limb special effect is combined to the three-dimensional model is also the limb special effect corresponding to the gesture, so that the limb special effect rendered based on the limb special effect model can be more attached to the target object on the image, and the three-dimensional effect is better in stereoscopic impression.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
Fig. 1 is a flowchart illustrating a rendering method of a limb effect according to an embodiment of the present application.
FIG. 2 is a network architecture diagram of a model prediction and camera prediction network according to one embodiment of the present application.
Fig. 3 is a block diagram of a rendering apparatus for limb effects according to an embodiment of the present application.
FIG. 4 is a block diagram of computing device hardware, shown in accordance with one embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In the application, the limb special effect refers to a special effect made around a limb of a person, for example, a vest special effect, a tattoo special effect and the like, can perform corresponding actions along with the movement of the limb of the person, can be applied to various scenes, for example, in the fields of live broadcast, short video and the like, the limb special effect can be used for increasing interestingness, and in the clothing industry, the limb special effect of clothing can be used for simulating the effect of wearing designated clothing by a user, so that the user can conveniently select the fitted clothing. The limb characteristics comprise a texture type limb special effect and a joint binding type-based limb characteristic. The texture type of the limb characteristic refers to a special effect expressed on a certain designated area of the limb in a texture form, for example, a pattern with multiple colors or shapes is attached to a user. Different from the limb special effect based on the joint binding type, the limb special effect of the texture type needs more accurate human body model prediction due to the fact that the limb special effect needs to be attached to a certain specified position on a human limb instead of the specified position on an image. However, it is still relatively difficult to obtain a relatively accurate human body model based on the human body structural posture prediction at present, and real-time detection is not easy to achieve, in addition, it is difficult to naturally and attractively display a special effect object for the predicted human body model, and the problem of how to put stickers needs to be solved for the special effect of stickers, so as to ensure that the stickers are different from simple two-dimensional stickers and have certain stereoscopic impression, so that in various application software in the current market, a good limb special effect can hardly be obtained.
In view of the above, the present application provides a rendering method for limb special effects to solve the above problems.
As shown in fig. 1, fig. 1 is a flowchart of a rendering method for a limb special effect according to an embodiment of the present application, including the following steps:
step S101, acquiring an image of a target object and a reference limb model of a limb special effect;
step S102, generating a three-dimensional model corresponding to a target object based on an image of the target object, and acquiring camera parameters corresponding to the image;
step S103, determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of the target object corresponding to the limb special effect;
and step S104, rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
In step S101, the acquired image, i.e., the input image, includes the limb of the target object, so that the specified limb special effect can be rendered according to the current position and motion of the limb of the target object in the input image. The reference limb model is a three-dimensional model used for presenting a rendering result of the specified limb special effect on the reference three-dimensional model. The specified limb special effect refers to a limb special effect designed by a special effect designer and related to a specified limb of a person, and the specified limb special effect may be a texture type limb special effect or a joint binding type limb special effect, which is not limited in the present application. In some embodiments, the specified limb special effect may be a static limb special effect, such as a necklace special effect bound to a neck, a tattoo special effect drawn on an arm, or a dynamic limb special effect, such as a flame special effect bound to an arm, which is not limited in this application. The reference three-dimensional model refers to a standard three-dimensional human body model which is referred to by a designer when designing a special effect, is a standard human body model designed according to a human limb structure, is not a human body model generated according to any target object, and does not have the limb characteristics of any user. In some embodiments, in order to facilitate the designer to intuitively observe the effect of the special effect of his design, the reference three-dimensional model may be a human body model that is presented in a specified posture, for example, a T-shaped human body posture, i.e., a posture in which both feet are standing in combination and both arms are horizontally unfolded. In some embodiments, the human body shape of the reference three-dimensional model can be designed according to a specific application scenario, for example, when designing a clothes special effect for a person with a larger body size, the reference three-dimensional model can be a human body model with a larger body size. The method comprises the steps of obtaining a reference limb model, binding a position displayed by the limb special effect with a specified area on a human body model, and determining the position of the limb special effect corresponding to the current posture of a target object according to the mapping relation between the limb special effect and the reference three-dimensional model even if the current posture of the target object is different from the posture of the reference three-dimensional model.
In some embodiments, the acquired input image may be a single image or may be a video frame, i.e. a frame of image in a video stream. For example, the rendering method for the limb special effect can be applied to application scenes such as live broadcast and short video, and in the live broadcast scene, after a certain limb special effect is enabled by a main broadcast, rendering of the limb special effect is performed once before each frame of image in the video in the live broadcast process is output to a client.
In step S102, from the acquired input image, a three-dimensional model corresponding to the target object may be generated, and camera parameters corresponding to the input image are acquired. The three-dimensional model has a posture the same as the current posture of the target object in the input image, and may be implemented by a three-dimensional reconstruction algorithm, which is not limited in the present application. The camera parameters refer to the position parameters of a camera corresponding to the input image in a coordinate system where the three-dimensional model is located, the three-dimensional model is observed through the camera parameters, and the position and the posture of a target object in the generated image can be consistent with those of the target object in the input image, so that the final rendering result of the limb special effect can be accurately attached to a pixel corresponding to a human body.
In some embodiments, the three-dimensional model of the target object may also be generated by inputting the limb key points of the target object in the image, that is, the limb key points of the target object are obtained first, then the three-dimensional model corresponding to the target object is generated based on the limb key points, and the camera parameters corresponding to the input image are obtained. In some embodiments, the limb key points may be key points of respective joints of the human body, for example, the limb key points of the target object may include 17 key points, which are in turn: the nose, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left palm, right palm, left crotch, right crotch, left knee, right knee, left ankle, right ankle, left toe, right toe, and the 17 keypoints may together form the current pose of the target object. If the human body model is predicted based on the image, a large amount of calculation is often needed, the requirement on the calculation environment is high, and most of the used devices of many users, such as the anchor of a live broadcast platform, are consumer-level computer devices, so that the visual algorithm with a large amount of calculation is difficult to process, and the large amount of calculation also means that the processing time of the whole special effect rendering is long, namely the real-time rendering is difficult to realize, and the effect is poor in scenes with high real-time requirements, such as live video and the like; the human body model is predicted based on the key points, and only a small number of key points need to be processed, so that the calculation amount of an algorithm for generating the three-dimensional model can be remarkably reduced, most of consumer-grade equipment can bear the rendering of the limb special effect, the processing speed is high, and the real-time special effect rendering can be realized.
In some embodiments, the method for obtaining the limb key points of the target object may be to obtain the limb key points of the target object after inputting the input image into the designated neural network training. The designated neural network refers to a neural network model for predicting the keypoints, which may be an existing open-source keypoint detection model, such as mobilene v3, or another keypoint detection model, and the application is not limited thereto.
In some embodiments, the method for generating a three-dimensional model corresponding to a target object based on a limb key point and acquiring a camera parameter corresponding to an input image may include acquiring a posture characteristic of the target object based on the limb key point; then respectively generating a posture parameter and a shape parameter corresponding to the target object and a camera parameter corresponding to the input image based on the posture characteristics; and finally, generating a three-dimensional model corresponding to the target object through human body parametric modeling according to the posture parameters and the shape parameters. The gesture feature of the target object is acquired based on the limb key points, and the gesture parameter, the shape parameter and the camera parameter corresponding to the input image corresponding to the target object are respectively generated based on the gesture feature, so that the gesture feature and the camera parameter can be realized through one or more neural network layers in a neural network or a neural network model. The pose feature is a special effect vector for describing the pose of the target object in the input image, and may be implemented by one or more neural network layers, for example, by inputting the coordinates of each limb key point through a fully connected layer, and then outputting a feature vector for representing the current limb pose. In some embodiments, the number of the limb key points is 17, the input of the connected layer is 34-dimensional data including the abscissa and ordinate of the 17 limb key points, and the output of the connected layer may be a 256-dimensional pose feature. Specifically, in some embodiments, the pose feature may also be another dimension, for example, 128 dimensions or 512 dimensions, and when the dimension of the pose feature is higher, generally, the more precise the pose that can be expressed is, the higher the accuracy of the final generated human body model is, but relatively, the higher the calculation amount in the process of generating the human body three-dimensional model is, the higher the calculation amount is, the influence is exerted on the real-time performance of the whole rendering, and therefore, the dimension of the pose feature may be set according to actual requirements, which is not limited in this application. The posture parameters and the shape parameters respectively refer to a group of parameter sets which can represent the current posture of the target object, such as the rotation angle of each joint, and a group of parameter sets which can represent the state of the target object, such as the features of height, thinness, head-to-body ratio and the like. Based on the posture characteristics, the posture parameters and the shape parameters of the target object and the camera parameters corresponding to the input image can be respectively predicted through three different sub-neural networks. In some embodiments, the three sub-neural networks that predict the pose parameters, shape parameters, and corresponding camera parameters of the input image of the target object may each be a fully connected layer.
The human body parametric modeling refers to a process of finally generating a human body model object through analysis processing of internal logic of a program according to input parameters such as attitude parameters, shape parameters and the like. In parametric modeling of the human body, the human body is understood to be the sum of a basic model, which may be called parametric model of the human body, and deformation performed on the basis of the basic model, and the shape parameters may control the deformation amount of the basic model, thereby controlling the shape of the finally generated human body model. And the posture of the human body model is controlled through the posture parameters. The posture of the human body can be represented by a motion tree formed by all joint points of the human body, wherein each joint point can be divided into a father node and a child node according to the connection relation among the joint points, the father node can drive the child nodes to move together when moving, for example, when an elbow joint moves, a wrist joint on the same arm can move along with the movement of the elbow joint, and the elbow joint is the father node of the wrist joint. In the motion tree, the rotation relationship between each joint and the parent joint can be represented by one rotation vector, and the local rotation vectors of each joint are combined together to represent the current posture of the human body, namely the local rotation vector of each joint can form the posture parameter of the human body model. And the posture of the human body model can be changed by changing the rotation vector of any joint in the posture parameters. In some embodiments, the parameterized human body model used in the present application may be an SMPL parameterized model, and accordingly, the deformation amount of the human body model is controlled by ten increments, i.e., the shape parameter is a 10-dimensional parameter, and the motion tree is composed of 24 joint points, and if the rotation vector of each joint point is represented by a 6D corner driving signal, the posture parameter is a 144-dimensional parameter.
Because different joints in the human body model are connected together, the movement of each other can generate some linkage influences, especially the influence between the joints at a close distance is larger, for example, the movement of a shoulder joint can influence an elbow joint and a wrist joint, the movement of the elbow joint can also influence the wrist joint, but the movement between the joints at a far distance has little influence, for example, the elbow joint can hardly influence a knee joint, and the posture of the human body can be more accurately described through the correlation between the joints, namely the influence degree of one joint on the other joint. Therefore, in some embodiments, after the posture features of the target object are acquired based on the limb key points, local features of the target object may be extracted based on the posture features of the target object, and the posture features may be optimized according to the correlation between the local features of the target object, so as to generate posture parameters, shape parameters and camera parameters corresponding to the input image corresponding to the target object based on the optimized posture features, respectively, so that the three-dimensional human body model generated based on the posture parameters and the shape parameters has higher accuracy.
In some embodiments, the above-mentioned step of extracting the local features of the target object based on the pose features of the target object, and optimizing the pose features according to the correlation between the local features of the target object may be implemented by an attention mechanism. After the posture characteristics of the target object are obtained based on the body key points, firstly, query characteristics (Q), key characteristics (K) and value characteristics (V) of the posture characteristics are respectively extracted through three sub-network characteristic extractors, then, the K characteristics are transposed, matrix multiplication is carried out on the Q characteristics and the transposed K characteristics to obtain an attention characteristic matrix, then, a softmax operator is used for carrying out normalization operation on the attention characteristic matrix, matrix multiplication is carried out on the normalized attention characteristic matrix and the V characteristics, and finally, an attention characteristic vector F, namely the optimized posture characteristics, is obtained. Specifically, the above process can be represented by the following equation:
F=softmax(QK T )V
compared with the posture characteristic before optimization, the optimized posture characteristic F fully utilizes the local characteristic of the target object through an attention mechanism, so that the three-dimensional human body model generated by the posture parameters and the shape parameters acquired based on the optimized posture characteristic has higher accuracy. In the above process, three sub-network feature extractors for extracting Q feature, K feature and V feature may be respectively denoted as Q layer, K layer and V layer, all of which may be fully connected layers, and when the input pose feature is 256-dimensional, all of Q feature, K feature and V feature output by all of the three layers are 256-dimensional, and in some embodiments, the three sub-network feature extractors may be explicitly trained through end-to-end training.
Since there is no high requirement on the fineness of the three-dimensional model of the human body when rendering the limb special effect model, in some embodiments, after the three-dimensional model corresponding to the target object is generated, the three-dimensional model may be downsampled at least once. The down-sampling is to uniformly select about half or a specified number of three-dimensional points in the three-dimensional model to be reserved, and abandon other three-dimensional points to reduce the number of the three-dimensional points in the three-dimensional model, thereby reducing the subsequent calculation amount. For example, if the three-dimensional model obtained by SMPL parameterized model inference has a total of 6890 three-dimensional points, two downsamplings may be performed at the end so that the three-dimensional model obtained at the end retains only 1200 three-dimensional points.
In some embodiments, step S102 may be implemented by an overall neural network, i.e., a model prediction and camera prediction network. In some embodiments, a network structure diagram of the model prediction and camera prediction network may be as shown in fig. 2, where a first layer of the network is a Fully Connected layer (FC), horizontal and vertical coordinates of 17 limb key points are input, 34 dimensions are total, and 256-dimensional pose features are output. For the attitude feature, three sub-network feature extractors of a Q layer, a K layer and a V layer are respectively used for extracting the Q feature, the K feature and the V feature, the three layers are all full connection layers, the input attitude feature is 256 dimensions, and the output Q feature, the output K feature and the output V feature are also 256 dimensions. Then, for the Q feature and the K feature, the K feature is transposed first, and then the Q feature (256 × 1) and the transposed K feature (1 × 256) are subjected to matrix multiplication to obtain an attention feature matrix of (256 × 256). And then, carrying out normalization processing on the attention feature matrix by using a softmax operator, carrying out matrix multiplication on the attention feature matrix after the normalization processing and the V feature (256 x 1), and finally obtaining the attitude feature (256 x 1) with attention. The entire attention process described above can be viewed as the second layer of the entire model prediction and camera prediction network for optimizing the pose features. Then, at the third layer of the neural network, the neural network is composed of three sub-networks, and the sub-networks are all fully connected layers and are respectively used for predicting the attitude parameters and the shape parameters of the target object and the camera parameters corresponding to the input image, wherein the input of the sub-network FC1 for predicting the attitude parameters is 256 dimensions, the output is 144 dimensions, and the 6D corner driving signals of 24 joints are included; the input of the sub-network FC2 for predicting the morphological parameters is 256-dimensional, and the output is 10-dimensional, and the output comprises 10 incremental parameters for controlling the deformation degree of the basic model; while the input to the sub-network FC3 for predicting camera parameters is 256-dimensional and the output is 3-dimensional. Finally, the attitude parameters and the morphological parameters are inferred through an SMPL parametric model, and a three-dimensional model of the target object can be obtained, wherein the three-dimensional model comprises 6890 three-dimensional points. It is noted that in some embodiments, 17 limb keypoints are selected at each joint of the human body, and thus, 17 limb keypoints are included in 24 joint points in the pose parameters corresponding to the SMPL model. In addition, in some embodiments, each layer in fig. 2 may include only one neural network layer in order to reduce the amount of neural networks to reduce the amount of computation and rendering time, and in other embodiments, each layer in fig. 2 may also include a plurality of neural network layers in order to obtain higher accuracy, which is not limited in this application. In addition, as can be seen from fig. 2, the model prediction and camera prediction network used in the present application includes a small number of layers, and the parameter dimension of each network layer is relatively small, so that the calculation scale of the whole neural network is relatively small, and the processing speed is relatively high, and therefore, the model prediction and camera prediction network used in the present application can predict the three-dimensional human body model in real time.
Since the three-dimensional model obtained in step S102 is obtained by neural network prediction, there are some inevitable errors, and in some embodiments, the three-dimensional model may be further optimized based on the limb key points of the target object to obtain a more accurate three-dimensional network result. The specific process may be that, first, the two-dimensional coordinates of each limb key point of the target object acquired through the input image are used as the actual values (P) of the limb key points pred ) (ii) a Then three-dimensional coordinates of three-dimensional points corresponding to the limb key points are taken out from the generated three-dimensional model, and projection algorithms, such as weak perspective projection algorithms, are utilized to obtain two-dimensional coordinates of the three-dimensional points corresponding to the limb key points in the three-dimensional model after projection based on the camera parameters obtained in the process, and the two-dimensional coordinates are taken as the limb key pointsPredicted value (P) of proj ) When the three-dimensional model generated in step S102 is projected onto the original input image, the predicted positions of the key points of each limb; an energy equation is defined again:
E(pose)=||P pred -P proj ||
wherein, the pose is the predicted pose of the three-dimensional model, such as pose parameters obtained in the model prediction and camera prediction network, and the energy equation is used to represent the error between the key points obtained based on the input image, i.e. the limb key points predicted by the prediction model and the key points obtained by the projection of the three-dimensional model, and then, by continuously optimizing and reducing this error value through an optimization algorithm, such as a steepest descent optimization algorithm, a more accurate three-dimensional model can be obtained.
In step S103, the reference three-dimensional model is generated in a similar manner to the generated three-dimensional model of the target object, and therefore, the reference three-dimensional model and the generated three-dimensional model of the target object have the same structure, and therefore, according to the mapping relationship between the limb special effect and the reference three-dimensional model, the rendering result of the limb special effect on the three-dimensional model of the target object can be determined, and the limb special effect model of the target object corresponding to the limb special effect is generated, that is, the limb special effect model can be changed correspondingly along with the posture of the target object, so that the limb special effect looks more stereoscopic, and has a better presentation effect. For example, a limb special effect of a waistcoat effect may be designed on a reference three-dimensional model (a standard model, for example, a three-dimensional model of a T-pose), and a three-dimensional point mapping relationship is bound to the reference three-dimensional model, where the mapping relationship may be denoted as an f-function, the input is the three-dimensional model, and the output is a corresponding position of each three-dimensional point on the waistcoat special effect and the input three-dimensional model. Therefore, the parameters of all three-dimensional point coordinates in the three-dimensional model predicted in step S102 can be used as input through the f-function, and the output of the f-function is the corresponding position of each three-dimensional point on the vest special effect. Therefore, through step S103, three-dimensional coordinate information of the limb special effect can be obtained for the final rendering step.
In step S104, after the three-dimensional coordinate information of the limb special effect is obtained, the corresponding position of the limb special effect in the input image may be rendered according to the camera parameters obtained in step S102, and finally, an output image including the limb special effect corresponding to the target object is obtained.
If the rendering is performed only according to the limb special effect model, the whole projection of the limb special effect is completely rendered in the output image during the rendering, however, in some images, the limb of the target object may block the special effect, and if the blocked part of the special effect is still rendered, the rendering may not be true enough, so in some embodiments, the three-dimensional model of the target object may be rendered by using a transparent color, the limb special effect model may be rendered by using a preset color, it is determined whether the three-dimensional model of the target object blocks a specified local area of the limb special effect model according to a spatial relationship between the three-dimensional model of the target object and the limb special effect model, and then the area of the non-blocked limb special effect model is rendered without rendering the blocked area. The purpose of using the transparent color to render the three-dimensional model of the target object is that the target to be rendered should only have a limb special effect, that is, the three-dimensional model of the target object is not explicitly rendered, but the transparent three-dimensional model still can shield the non-transparent limb special effect model, so that the rendered limb special effect can keep the shielded three-dimensional effect.
Corresponding to the embodiment of the rendering method of the limb special effect, the application also provides a rendering device of the limb special effect.
As shown in fig. 3, fig. 3 is a block diagram of a rendering apparatus for limb special effects according to an embodiment of the present application, including the following modules:
the input module 310: the reference limb model is used for acquiring an image of a target object and a limb special effect;
the human modeling module 320: the system comprises a three-dimensional model generating unit, a camera parameter acquiring unit and a display unit, wherein the three-dimensional model is used for generating a three-dimensional model corresponding to a target object based on an image of the target object and acquiring a camera parameter corresponding to the image;
the special effects modeling module 330: the system comprises a three-dimensional model, a limb special effect model and a reference three-dimensional model, wherein the three-dimensional model is used for generating a three-dimensional model of a target object;
rendering model 340: and the system is used for rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
The present application also provides a computer device comprising at least a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to any of the preceding embodiments when executing the program.
Fig. 4 is a more specific hardware structure diagram of a computing device provided in the present application, where the device may include: a processor 401, a memory 402, an input/output interface 403, a communication interface 404, and a bus 405. Wherein the processor 401, the memory 402, the input/output interface 403 and the communication interface 404 are communicatively connected to each other within the device by a bus 405.
The processor 401 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application. The processor 401 may further include a graphics card, which may be an Nvidia titan X graphics card or a 1080Ti graphics card, etc.
The Memory 402 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 402 may store an operating system and other application programs, and when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 402 and called to be executed by the processor 401.
The input/output interface 403 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 404 is used to connect a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
The bus 405 includes a path that transfers information between the various components of the device, such as the processor 401, memory 402, input/output interface 403, and communication interface 404.
It should be noted that although the above-mentioned device only shows the processor 401, the memory 402, the input/output interface 403, the communication interface 404 and the bus 405, in a specific implementation, the device may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the preceding embodiments.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the solution of the present application. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement without inventive effort.
The foregoing is directed to embodiments of the present application and it is noted that numerous modifications and adaptations may be made by those skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.

Claims (13)

1. A rendering method of limb special effects is characterized by comprising the following steps:
acquiring an image of a target object and a reference limb model, wherein the image comprises a limb of the target object, the reference limb model is used for presenting a rendering result of a specified limb special effect on a reference three-dimensional model, and the reference limb model comprises a mapping relation between the limb special effect and the reference three-dimensional model;
generating a three-dimensional model corresponding to the target object based on the image, and acquiring camera parameters corresponding to the image;
determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of which the limb special effect corresponds to the target object;
rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
2. The method of claim 1, wherein the step of generating a three-dimensional model corresponding to the target object based on the image and acquiring camera parameters corresponding to the image comprises:
acquiring limb key points of the target object, generating a three-dimensional model corresponding to the target object based on the limb key points, and acquiring camera parameters corresponding to the image.
3. The method of claim 2, wherein the limb keypoints are obtained after training by inputting the images into a neural network.
4. The method of claim 2, wherein the step of generating a three-dimensional model corresponding to the target object based on the extremity keypoints and acquiring camera parameters corresponding to the image comprises:
acquiring the posture characteristic of the target object based on the limb key point;
respectively generating a posture parameter and a shape parameter corresponding to the target object and a camera parameter corresponding to the image based on the posture characteristic;
and generating a three-dimensional model corresponding to the target object by human body parametric modeling according to the posture parameters and the shape parameters.
5. The method of claim 4, wherein after the step of obtaining pose features of the target object based on the limb keypoints, the method further comprises:
extracting local features of the target object based on the attitude features of the target object, and optimizing the attitude features according to the correlation among the local features of the target object so as to respectively generate attitude parameters and shape parameters corresponding to the target object and camera parameters corresponding to the image based on the optimized attitude features.
6. The method according to claim 5, wherein the step of extracting local features of the target object based on the pose features of the target object and optimizing the pose features according to the correlation between the respective local features of the target object is realized by an attention mechanism.
7. The method of claim 2, wherein the step of generating a three-dimensional model corresponding to the target object and acquiring camera parameters corresponding to the image is followed by the step of:
taking the two-dimensional coordinates of the limb key points as actual values of the limb key points;
acquiring a three-dimensional coordinate corresponding to the limb key point in the three-dimensional model;
according to the three-dimensional coordinates and the camera parameters, obtaining two-dimensional coordinates of the three-dimensional points after projection based on the camera parameters through a projection algorithm, and taking the two-dimensional coordinates as predicted values of the limb key points;
and reducing the error between the actual value and the predicted value of the limb key point through an optimization algorithm so as to optimize the three-dimensional model.
8. The method of claim 1, wherein after the step of generating a three-dimensional model corresponding to the target object based on the image and acquiring camera parameters corresponding to the image, further comprising:
and performing at least one down-sampling on the three-dimensional model.
9. The method of claim 1, wherein the rendering the limb special effect in the image according to the limb special effect model and the camera parameters comprises:
in the image, rendering the three-dimensional model by using transparent color, rendering the limb special effect model by using preset color, and determining whether the three-dimensional model covers a specified local area of the limb special effect model according to the spatial relationship between the three-dimensional model and the limb special effect model.
10. The method of claim 1, wherein the image is a video frame.
11. An apparatus for rendering a body effect, the apparatus comprising:
the system comprises an input module, a display module and a display module, wherein the input module is used for acquiring an image of a target object and a reference limb model, the image comprises a limb of the target object, the reference limb model is used for presenting a rendering result of a specified limb special effect on a reference three-dimensional model, and the reference limb model comprises a mapping relation between the limb special effect and the reference three-dimensional model;
the human body modeling module is used for generating a three-dimensional model corresponding to the target object based on the image and acquiring camera parameters corresponding to the image;
the special effect modeling module is used for determining a rendering result of the limb special effect on the three-dimensional model of the target object according to the mapping relation between the limb special effect and the reference three-dimensional model, and generating a limb special effect model of which the limb special effect corresponds to the target object;
and the rendering module is used for rendering the limb special effect in the image according to the limb special effect model and the camera parameters.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 10.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1-10 when executing the program.
CN202210412285.4A 2022-04-19 2022-04-19 Rendering method and device for limb special effect, storage medium and equipment Pending CN114863005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210412285.4A CN114863005A (en) 2022-04-19 2022-04-19 Rendering method and device for limb special effect, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210412285.4A CN114863005A (en) 2022-04-19 2022-04-19 Rendering method and device for limb special effect, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN114863005A true CN114863005A (en) 2022-08-05

Family

ID=82632012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210412285.4A Pending CN114863005A (en) 2022-04-19 2022-04-19 Rendering method and device for limb special effect, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114863005A (en)

Similar Documents

Publication Publication Date Title
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
US11468612B2 (en) Controlling display of a model based on captured images and determined information
US11748934B2 (en) Three-dimensional expression base generation method and apparatus, speech interaction method and apparatus, and medium
WO2018188499A1 (en) Image processing method and device, video processing method and device, virtual reality device and storage medium
CN108525298B (en) Image processing method, image processing device, storage medium and electronic equipment
US11610331B2 (en) Method and apparatus for generating data for estimating three-dimensional (3D) pose of object included in input image, and prediction model for estimating 3D pose of object
CN111787242B (en) Method and apparatus for virtual fitting
CN113496507A (en) Human body three-dimensional model reconstruction method
EP4307233A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
EP3987443A1 (en) Recurrent multi-task convolutional neural network architecture
US20140160122A1 (en) Creating a virtual representation based on camera data
CN114782661B (en) Training method and device for lower body posture prediction model
CN116977522A (en) Rendering method and device of three-dimensional model, computer equipment and storage medium
CN114170290A (en) Image processing method and related equipment
CN115661336A (en) Three-dimensional reconstruction method and related device
WO2021151380A1 (en) Method for rendering virtual object based on illumination estimation, method for training neural network, and related products
CN114219001A (en) Model fusion method and related device
CN111383313B (en) Virtual model rendering method, device, equipment and readable storage medium
US20230244354A1 (en) 3d models for displayed 2d elements
Liu et al. Skeleton tracking based on Kinect camera and the application in virtual reality system
CN110349269A (en) A kind of target wear try-in method and system
CN116485953A (en) Data processing method, device, equipment and readable storage medium
CN114863005A (en) Rendering method and device for limb special effect, storage medium and equipment
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN116266408A (en) Body type estimating method, body type estimating device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination