CN114399536A

CN114399536A - Virtual human video generation method and device

Info

Publication number: CN114399536A
Application number: CN202210059239.0A
Authority: CN
Inventors: 李甫; 林天威
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-04-26

Abstract

The disclosure provides a method and a device for generating a virtual human video, relates to the field of artificial intelligence, and particularly relates to computer vision, image recognition and deep learning technologies. The specific implementation scheme is as follows: the method comprises the steps of obtaining a virtual human video and a material to be added, wherein the virtual human video comprises a plurality of frames of virtual human images, obtaining first position information of a first frame of virtual human image of the material to be added in the virtual human video, determining a plurality of tracking points in the first frame of virtual human image according to the first position information, tracking the plurality of tracking points to obtain position information of the plurality of tracking points in other frames of virtual human images in the plurality of frames of virtual human images, generating the virtual human video with the material added according to the material to be added and the position information of the plurality of tracking points in other frames of virtual human images, saving resources (including human resources and material resources), improving the efficiency of generating the virtual human video with the material added, and being beneficial to improving interactive experience between a user and a virtual human.

Description

Virtual human video generation method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to computer vision, image recognition and deep learning technologies, and in particular relates to a method and a device for generating a virtual human video.

Background

In the day, the virtual human plays more and more roles in daily life of people, such as mall navigation, bank interaction and the like, and particularly, with the proposal of the concept of 'meta universe', the popularity of the virtual human is higher.

The avatar video comprises a plurality of frames of avatar images, and after the avatar video is made, materials may need to be added to the avatar images, such as adding a worker brand to the avatar images.

However, the method needs to consume a large amount of manpower and material resources, and the efficiency of generating the virtual human video is low.

Disclosure of Invention

The present disclosure provides a method and an apparatus for generating a virtual human video for improving efficiency of generating a virtual human video.

According to a first aspect of the present disclosure, a method for generating a virtual human video is provided, including:

acquiring a virtual human video and a material to be added, wherein the virtual human video comprises a plurality of frames of virtual human images;

acquiring first position information of a first frame virtual human image of the material to be added in the virtual human video, and determining a plurality of tracking points in the first frame virtual human image according to the first position information;

and tracking the plurality of tracking points to obtain the position information of the plurality of tracking points in other frames of virtual human images in the multi-frame virtual human image, and generating the virtual human video with the added material according to the material to be added and the position information of the plurality of tracking points in other frames of virtual human images.

According to a second aspect of the present disclosure, there is provided a virtual human video generation apparatus, including:

the first acquisition unit is used for acquiring a virtual human video and a material to be added, wherein the virtual human video comprises a plurality of frames of virtual human images;

the second acquisition unit is used for acquiring first position information of a first frame virtual human image of the material to be added in the virtual human video;

a determining unit, configured to determine a plurality of tracking points in the first frame of virtual human image according to the first position information;

the tracking unit is used for tracking the plurality of tracking points to obtain the position information of the plurality of tracking points in other frames of virtual human images in the multi-frame virtual human image;

and the generating unit is used for generating the virtual human video added with the material according to the material to be added and the position information of the tracking points in the other frame virtual human images.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration of types of avatars in accordance with an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of comparison of virtual human images before and after adding materials according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 5 is a schematic diagram of determining tracking points according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of determining position information of a material to be added in an image of a virtual human according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram of a avatar image, according to one embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a avatar image according to another embodiment of the present disclosure;

FIG. 9 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 10 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 11 is a schematic diagram according to a fifth embodiment of the present disclosure;

fig. 12 is a block diagram of an electronic device for implementing a method for generating a avatar video according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The metauniverse (Metaverse) is a virtual world which is linked and created by using scientific and technological means and is mapped and interacted with the real world, and is a digital living space with a novel social system. With the proposal of the "metass", the popularity of the widely used virtual human is more increased. For example, avatars are widely used in video products and interactive products, among others.

Illustratively, as shown in fig. 1, the types of avatars may include: cartoon virtual people, anthropomorphic virtual people, and real-human virtual people.

The cartoon avatar is an avatar generated based on the cartoon character. The anthropomorphic dummy refers to a dummy that is virtually generated with reference to a real person. The real human avatar is a virtual human generated by modeling based on a real human.

In contrast, avatars are common in malls and banks, for example, avatars may be used for mall navigation to quickly locate a user's destination and provide a travel route for the user to travel from a current location to the destination. As another example, the avatar may be used for banking interactions to provide autonomous banking services, such as depositing and withdrawing services, for the user.

It should be understood that the avatar is also widely applied to other scenarios, such as introduction of features of virtual products or physical products, and is not listed here.

After the virtual human image is made, a virtual human video can be generated based on the virtual human image, and the virtual human video comprises a plurality of frames of virtual human images.

After the virtual human video is generated, materials may need to be added to each virtual human image in the virtual human video, where the materials refer to objects for characterizing the virtual human, such as a chest card for characterizing the identity of the virtual human.

Illustratively, as shown in fig. 2, after the preparation of the avatar image is completed, a chest card may be added on the avatar image to specially characterize the identity of the avatar.

In the related art, if a material needs to be added to an avatar image in the avatar video, the avatar video may be generated in a manner of shooting again, or the avatar video may be generated again in a manner of modeling again, where each frame of the avatar image in the regenerated avatar video includes the material that needs to be added.

However, if the avatar video including the material is generated by means of re-shooting or re-modeling, a large amount of manpower and material resources are required, and the efficiency of generating the avatar video is also low.

In other embodiments, the virtual human video may be generated by a "fixed mapping" method, that is, mapping a material on each frame of virtual human image based on a preset fixed position (determined based on a material to be added) for each frame of virtual human image in the constructed virtual human video, so as to obtain each frame of virtual human image to which the material is added, and further obtain the virtual human video to which the material is added.

For convenience of distinguishing, the virtual human video before the material is not added can be called a first virtual human video, and the virtual human video after the material is added can be called a second virtual human video.

However, with this method, because of the "fixed mapping", the positions of the materials in the virtual human images of the frames in the second virtual human video are completely the same, and the actions of the virtual human in the virtual human images of some frames in the virtual human video are usually slightly different, whereas with the "fixed mapping" method, the materials cannot move along with the movement of the virtual human, that is, in the virtual human images of the frames in the second virtual human video, the materials are in the same position, so that the flexibility of the generated second virtual human video is low, the reality of the second virtual human video is low, and the virtual human interaction experience of the user is poor.

In order to avoid at least one of the above problems, the inventors of the present disclosure have made creative efforts to obtain the inventive concept of the present disclosure: the virtual human video comprises a plurality of frames of virtual human images, first position information of a material to be added in a first frame of virtual human image is obtained, a plurality of tracking points are determined in the first frame of virtual human image according to the first position information to track the plurality of tracking points, so that position information of the plurality of tracking points in other frames of virtual human images is obtained, and the virtual human video with the material added is generated based on the first position information and the position information of the plurality of tracking points in other frames of virtual human images.

Based on the inventive concept, the invention provides a method and a device for generating a virtual human video, which are applied to the field of artificial intelligence, in particular to computer vision, image recognition and deep learning technologies, so as to improve the reliability of the virtual human video.

Fig. 3 is a schematic diagram according to a first embodiment of the disclosure, and as shown in fig. 3, a method for generating a virtual human video includes:

s301: and acquiring the virtual human video and the material to be added.

The virtual human video comprises a plurality of frames of virtual human images.

For example, an execution subject of the embodiment is a virtual human video generation device (hereinafter, simply referred to as a generation device), and the generation device may be a server (such as a cloud-end server or a local server), a computer, a terminal device, a processor, a chip, or the like, which is not limited in the embodiment.

The following example implementation can be employed with respect to capturing avatar video:

in one example, the generating device may be connected with the video capturing device and receive the avatar video transmitted by the video capturing device.

In another example, the generating device may provide a video-loaded tool by which the user can transmit the avatar video to the generating device.

The tool for loading the video can be an interface for connecting with the external device, such as an interface for connecting with other storage devices, and the virtual human video transmitted by the external device is acquired through the interface; the tool for loading the video may also be a display device, for example, the generating device may input an interface for loading the video function on the display device, and the user may import the virtual human video to the generating device through the interface, and the generating device acquires the imported virtual human video.

Similarly, the obtaining of the material to be added can also be realized by the method, which is not limited in this embodiment and is not described in detail again. Based on the understanding of the materials in the above embodiments, the materials to be added may be different based on different application scenarios.

In some embodiments, different materials can be stored in the generating device in advance, and when the avatar video is acquired, the materials to be added are determined from the materials based on the avatar image in the avatar video. Or, the generating device may provide an operable interface, and the user may select the material to be added based on the operable interface, and accordingly, the generating device may determine the material to be added based on the selection operation of the user on the operable interface.

S302: the method comprises the steps of obtaining first position information of a first frame virtual human image of a material to be added in a virtual human video, and determining a plurality of tracking points in the first frame virtual human image according to the first position information.

It should be understood that "first" in the first location information is used for distinguishing from other location information in the following, such as for distinguishing from the second location information, and the like, and is not to be construed as a limitation on the location information. Similarly, the "first" in the first frame virtual human image is used for distinguishing from other frame virtual human images in the following text, such as for distinguishing from the second frame virtual human image and the like, and cannot be understood as a definition of the virtual human image.

The tracking points are pixel points in the first frame of virtual human image, the number of the tracking points is multiple, and the number of the tracking points can be determined based on the needs, the history, the tests and other manners, which is not limited in this embodiment.

Illustratively, taking the demand-based example as an example, the determination of the number of tracking points is exemplarily set forth as follows:

the method has the advantages that a relatively large number of tracking points can be selected relatively for the demands of relatively high quality, namely, scenes with relatively high demands on the reliability and the reality of the virtual human video added with the materials to be added, and a relatively small number of tracking points can be selected relatively for the demands of relatively low quality, namely, scenes with relatively low demands on the reliability and the reality of the virtual human video added with the materials to be added.

S303: and tracking the plurality of tracking points to obtain the position information of the plurality of tracking points in other frames of virtual human images in the multi-frame virtual human image, and generating the virtual human video with the added material according to the material to be added and the position information of the plurality of tracking points in other frames of virtual human images.

Illustratively, if the virtual human video comprises N frames of virtual human images, and the N frames of virtual human images are respectively a first frame of virtual human image, a second frame of virtual human image and up to an nth frame of virtual human image, after a plurality of tracking points are determined in the first frame of virtual human image, the plurality of tracking points are tracked in the second frame of virtual human image and up to the nth frame of virtual human image.

Accordingly, the plurality of tracking points are subjected to tracking processing in the second frame virtual human image to obtain position information of the plurality of tracking points in the second frame virtual human image (for convenience of distinction, the position information is referred to as second position information), and so on, and the plurality of tracking points are subjected to tracking processing in the nth frame virtual human image to obtain position information of the plurality of tracking points in the nth frame virtual human image (for convenience of distinction, the position information is referred to as nth position information).

After the position information corresponding to each frame of the virtual human image of the plurality of tracking points in the virtual human video is obtained through tracking, mapping processing can be performed on the material to be added on the basis of the position information corresponding to each frame of the virtual human image of the plurality of tracking points, so that the virtual human video with the material added is obtained.

On one hand, because the position information of the plurality of tracking points in the other frames of the virtual human images is obtained based on tracking processing, the dynamic characteristics of the virtual human in the virtual human video can be flexibly represented, so that the defects that the flexibility is low and the authenticity is low because the material to be added cannot act along with the action of the virtual human when the mapping is fixed in the related technology are avoided, the material to be added and the virtual human are organically combined into a whole, the reliability and the authenticity of the virtual human video after the material is added are improved, and the interactive experience between a user and the virtual human is further improved.

On the other hand, the scheme of the embodiment is implemented based on the inventive concept, and as can be seen from the inventive concept and the detailed contents of the scheme, in the embodiment, re-shooting or re-modeling is not required, so that consumption of manpower and material resources for re-shooting or re-modeling can be avoided, manpower cost and resources can be saved, the defect that the accuracy and reliability of the virtual human video to which the material is added are low due to the influence of human subjective factors introduced by re-shooting is avoided, and the accuracy and reliability of the generated virtual human video to which the material is added are improved. Of course, since it takes time to shoot again or model again, the efficiency of generating the material-added avatar video can be improved by using the scheme of the embodiment.

Based on the above analysis, the embodiment provides a method for generating a virtual human video, which includes: the method comprises the steps of obtaining a virtual human video and a material to be added, wherein the virtual human video comprises a plurality of frames of virtual human images, obtaining first position information of a first frame of virtual human image of the material to be added in the virtual human video, determining a plurality of tracking points in the first frame of virtual human image according to the first position information, tracking the plurality of tracking points to obtain position information of the plurality of tracking points in other frames of virtual human images in the plurality of frames of virtual human images, and generating the virtual human video with the added material according to the material to be added and the position information of the plurality of tracking points in other frames of virtual human images, and in the embodiment, the method comprises the following steps: and determining a plurality of tracking points according to the first position information to track the plurality of tracking points to obtain the position information of the plurality of tracking points in the virtual human images of other frames, so that the technical characteristics of the virtual human video added with the material are generated according to each position information, the resources (including human resources and material resources) are saved, the efficiency of generating the virtual human video added with the material is improved, and the interactive experience between the user and the virtual human is facilitated to be improved.

Fig. 4 is a schematic diagram according to a second embodiment of the present disclosure, and as shown in fig. 4, a method for generating a virtual human video includes:

s401: and acquiring the virtual human video and the material to be added.

In order to avoid repetitive descriptions, the same technical features of the present embodiment as those of the above embodiments are not described again.

S402: the method comprises the steps of obtaining first position information of a first frame virtual human image of a material to be added in a virtual human video.

The first position information may be acquired by:

the method comprises the following steps: and receiving coordinate information input by a user and used for adding a material to be added on the first frame of virtual human image, and determining the coordinate information as first position information.

For example, the generating device may be connected to an input device, the input device may be a mouse and/or a keyboard, and the like, and the user may input coordinate information based on the input device, so that the generating device receives the coordinate information transmitted by the input device.

For another example, the generating device may be connected to a display device, the display device supports a touch operation, and a user may input coordinate information on the display device based on the touch operation on the display device, so that the generating device receives the coordinate information transmitted by the input device.

The method 2 comprises the following steps: the method comprises the steps of obtaining selection operation of a user on a first frame of virtual human image, determining coordinate information corresponding to the selection operation, and determining the coordinate information as first position information.

The selecting operation may be used to select a pixel point of the material to be added in the first frame of the virtual human image, and may also be used to select an enclosure frame of the material to be added in the first frame of the virtual human image, and the like.

Similarly, the selection operation may be implemented by the input device being a mouse and/or a keyboard.

Illustratively, the user moves a virtual cursor of a mouse in the first virtual human image to a position of a material to be added, and performs a selection operation by clicking the mouse, such as a point shown in fig. 5, where coordinate information of the point is the first position information.

The method 3 comprises the following steps: and determining position information corresponding to the material to be added from the pre-constructed mapping relation according to the material to be added, and determining the position information as first position information.

The mapping relation is used for representing the corresponding relation between the material and the position information.

It should be understood that the above examples are only used for exemplary illustration, the possible method for acquiring the first position information of the present embodiment is not to be construed as a limitation on the method for acquiring the first position information, and the technical effects of flexibility and diversity of acquiring the first position information can be achieved by combining the different methods for acquiring the first position information set forth above.

S403: and determining a plurality of tracking points in the first frame of virtual human image according to the first position information.

The tracking point has relatively more obvious features relative to other pixel points in the first frame of virtual human image, and is relatively easier to track, and the first position information is relatively close to the distance (in the same way, the distance may also be determined based on the modes of demand, history, experiment, and the like, this embodiment is not limited), such as an angular point, an inflection point, and the like, which are relatively close to the first position information.

The effectiveness and the reliability of tracking processing of the tracking points can be improved by selecting the pixel points with relatively obvious characteristics as the tracking points, the range of the tracking points deviating from the first position information can be relatively small by selecting the pixel points with relatively short distance, the defect that the position information obtained by tracking processing has large deviation is avoided, and the reliability and the accuracy of the virtual human video after the added material are improved.

For example, the type attribute of each pixel point in the first frame of virtual human image may be determined, and at least part of the pixel points with attribute types of inflection points and corner points may be determined as the tracking points.

On the basis, the selection range of the tracking points may be predetermined, and after the tracking points are determined based on the type attributes, the determined tracking points may be screened based on the selection range to obtain a plurality of final tracking points. And if at least part of the pixel points with the attribute types of the inflection points and the corner points in the selected range are determined as the tracking points.

In some embodiments, S403 may include the steps of:

the first step is as follows: and selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the first frame of virtual human image according to the first position information.

The second step is as follows: and determining a plurality of tracking points according to the selected pixel points with the curvatures larger than the preset curvature threshold.

Similarly, the curvature threshold may be set based on a demand, a history, a test, and the like, and the present embodiment is not limited.

The curvature can represent the degree of deviation of the straight line where the pixel point is located from the straight line where the tracking point is located. In contrast, the curvatures of the inflection points and corner points are large.

This step can be understood as: the curvature of each pixel point in the first frame of virtual human image can be determined firstly, each curvature is traversed, so that the curvature larger than the curvature threshold value is selected from each curvature, and a plurality of tracking points are determined on the basis of the pixel points corresponding to the selected curvatures, so that each tracking point is a pixel point with obvious relative characteristics, and the technical effects of effectiveness and reliability of follow-up tracking processing on the plurality of tracking points are improved.

In some embodiments, the first step may comprise: and determining a selection area according to the first position information, and selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the selection area.

And the distance between each pixel point in the selected area and the first position information is smaller than a preset distance threshold value.

Similarly, the distance threshold may be set based on a demand, a history, a test, and the like, which is not limited in this embodiment.

Illustratively, the method includes determining in a first frame of virtual human image that first position information is a dot, a circular area with a distance threshold as a radius, the circular area being a selection area, and calculating a curvature of each pixel point in the selection area with respect to a pixel point of the first position information to obtain a pixel point with a curvature greater than a curvature threshold, thereby obtaining a plurality of tracking points.

For example, after determining a point corresponding to the position of the material to be added based on the selection operation, 5 tracking points as shown in fig. 5 may be determined based on the position information (i.e., first position information) of the point.

In this embodiment, the selection area is determined first to select a plurality of tracking points in the selection area, so that the selection of the tracking points can be performed in a smaller range, the efficiency of determining the tracking points is improved, and the tracking points are relatively close to the first position information and have relatively obvious characteristics, thereby facilitating the technical effect of effective and reliable tracking of the tracking points.

In other embodiments, a plurality of tracking points can be selected from the first frame of virtual human image by the user. The implementation principle of the method can be described in the above point for determining the material to be added based on the selection operation, and details are not described here.

S404: tracking a plurality of tracking points of the ith frame of virtual human image in the multi-frame virtual human image to obtain the position information of the plurality of tracking points in the (i + 1) th frame of virtual human image of the multi-frame virtual human image.

Wherein i is more than 1 and less than or equal to N-1, i is a positive integer, and N is the total frame number of the multi-frame virtual human image.

Illustratively, tracking a plurality of tracking points in a first frame of virtual human image to obtain respective corresponding position information (i.e. second position information) of the plurality of tracking points in a second virtual human image, tracking a plurality of tracking points in the second frame of virtual human image to obtain respective corresponding position information of the plurality of tracking points in a third virtual human image, and so on until tracking a plurality of tracking points in an N-1 th frame of virtual human image to obtain respective corresponding position information of the plurality of tracking points in an N-th frame of virtual human image.

For example, in combination with the above analysis, if the number of the tracking points is five, and the five tracking points have corresponding position information in the first frame virtual human image, then for each of the five tracking points, the tracking processing is performed on the tracking point to determine the second position information of the tracking point based on the position information of the tracking point in the first frame virtual human image, and so on until the corresponding position information of the five tracking points in the other frames virtual human images is obtained.

In this embodiment, by determining the position information of each tracking point in the virtual human image of the next frame based on the position information of each tracking point in the virtual human image of the previous frame, the determined position information of the virtual human image of the next frame has stronger consistency and rationality compared with the position information of the virtual human image of the previous frame, so as to realize high fit between the virtual human images of the adjacent frames, and conform to the motion trajectory of the limb movement of the virtual human, thereby improving the technical effects of accuracy and reliability of the determined position information of the virtual human image of the next frame.

In some embodiments, the tracking process may be performed on a plurality of tracking points based on an optical flow tracking (Lucas-Kanade) algorithm.

For example, for any tracking point in five tracking points, the position information of the tracking point in the first frame virtual human image is calculated based on the optical flow tracking algorithm, so that the second position information of the tracking point is obtained. For a specific implementation principle of the optical flow tracking algorithm, reference may be made to related technologies, which are not described herein again.

S405: and determining the position information of the material to be added in the virtual human images of other frames according to the position information of the tracking points in the virtual human images of other frames.

For example, in combination with the above embodiment, the position information of the material to be added in the second frame virtual human image may be determined according to the second position information corresponding to each of the five tracking points, and so on, and the position information of the material to be added in the nth frame virtual human image may be determined up to the nth position information corresponding to each of the five tracking points.

In some embodiments, the position confidence of the material to be added in other frames of the virtual human image can be determined based on a Mean-Value Coordinates (MVC) algorithm.

Taking the example of determining the position information of the material to be added in the second frame virtual human image according to the position information of the three tracking points in the second frame virtual human image as an example, the following is exemplarily described:

under the condition that the second position information corresponding to each of the three tracking points is known, the three second position information can be substituted into the algorithm of the mean value coordinate, so as to obtain the position information of the material to be added in the second frame virtual human image.

As shown in FIG. 6, the three tracking points are v_i-1、v_iAnd v_i+1Determining a set of weight values λ_iSuch that the set of weight values satisfies the following equations 1 and 2, equation 1:

formula 2:

wherein, as shown in FIG. 6, v₀I.e. the point, v, of the material to be added in the second frame of the virtual human image₀The position information of the virtual human image is the position information of the material to be added in the second frame of the virtual human image.

That is to say, the point v of the material to be added in the second frame virtual human image can be obtained by linearly weighting the three tracking points by using the group of weight values₀. Accordingly, v can be obtained by the following formulae 3 and 4₀And, formula 3:

formula 4:

it should be understood that the above example is only used for exemplary illustration, and the implementation manner of determining the position information of the material to be added in the other virtual human images by using the position information of the plurality of tracking points in the other frame virtual human images, which may be adopted in the embodiment, is not to be construed as a limitation on the implementation manner.

S406: and mapping the material to be added according to the position information of the material to be added in each frame of virtual human image to obtain each frame of virtual human image after the material is added.

In some embodiments, after the position information of the material to be added in a certain frame of the virtual human image is determined, the material to be added can be subjected to mapping processing directly based on the position information, so that the material to be added is added in the certain frame of the virtual human image.

In other embodiments, after the position information of the material to be added in a certain frame of virtual human image is determined, a smoothing method is adopted to perform mapping processing on the material to be added at the position information, so as to realize the addition of the material to be added in the certain frame of virtual human image.

Illustratively, the smoothing process may include: and smoothing the position information of the material to be added in each frame of virtual human image to obtain smoothed position information, and performing mapping processing on the material to be added according to the smoothed position information.

In the embodiment, through smoothing and then mapping, the jitter can be reduced, so that the material to be added and the virtual human image are more naturally and really fused, the user experience is met, and the technical effects of reality and reliability of the virtual human video after the material is added are improved.

The embodiment does not limit the specific implementation of the smoothing process, for example, the smoothing process may be implemented by using a mean filtering (single smoothing), a median filtering (median smoothing), a bilateral filtering (bilateral smoothing), a gaussian filtering (gaussian smoothing), and so on, which are not listed here.

Illustratively, taking the smoothing process implemented by using gaussian filtering as an example, the following exemplary steps are performed:

after the position information of the material to be added in a certain frame of virtual human image is determined, the pixel value corresponding to the position information and the pixel values of other pixel points in the neighborhood are weighted and averaged to obtain a target pixel value, and the pixel value corresponding to the position information is replaced based on the target pixel value.

Similarly, the neighborhood may be determined based on the demand, history, and experiment, which is not limited in this embodiment.

Of course, the smoothing process may also be implemented by determining a gaussian smoothing parameter to implement gaussian filtering based on the gaussian smoothing parameter. Illustratively, such smoothing may be implemented based on the following steps:

the first step is as follows: and determining a radial center line of the position information of the material to be added in each frame of the virtual human image, and determining a Gaussian smoothing parameter according to the radial center line.

The radial center line can be understood as a center line of a gaussian distribution curve, and the gaussian distribution curve, i.e., a curve corresponding to a gaussian kernel function, is a standard curve in normal distribution. That is, the radial center line is the center line of a standard curve in normal distribution, and the standard curve in normal distribution is determined by taking the position information of the material to be added in each frame of the virtual human image as the center point.

The second step is as follows: and performing Gaussian smoothing on the position information of the material to be added in each frame of virtual human image according to the Gaussian smoothing parameters to obtain the position information after smoothing.

That is to say, in this embodiment, a gaussian kernel function (that is, a gaussian distribution curve, which is a standard curve in normal distribution) may be first constructed, and the determined position information of the material to be added in a certain frame of the virtual human image is weighted and calculated based on the gaussian kernel function, so as to obtain new position information, thereby reducing jitter and improving the technical effects of smoothness and reliability.

In still other embodiments, after the position information of the material to be added in a certain frame of the virtual human image is determined, mapping processing may be performed on the material to be added in a manner of combining resolution processing and smoothing processing, so as to add the material to be added in the certain frame of the virtual human image.

For example, the size of the frame of virtual human image may be expanded, for example, by four times, correspondingly, the pixel points on the frame of virtual human image are also expanded by four times, and then smoothing is performed, for example, by using the smoothing method, the pixel points of the position information of the material to be added in the frame of virtual human image, which are expanded by four times, are smoothed to obtain the position information after smoothing, the material to be added is added to the position information after smoothing, and the size of the frame of virtual human image is reduced by four times, that is, reduced to the original size.

Combining the above analysis, it can be known that the smoothing process can be implemented by using gaussian smoothing process, and can be implemented by determining the radial center line, and accordingly, if the resolution process is combined with the radial center line, the implementation manner can be as follows:

and amplifying the size of each frame of virtual human image and the position information in each frame of virtual human image by preset times, and determining the radial center line of the position information of the material to be added in each frame of virtual human image after the preset times are amplified.

Similarly, the preset multiple may be determined based on a demand, a history, a test, and the like, and this embodiment is not limited.

In the embodiment, the material to be added is subjected to mapping processing in a mode of combining resolution processing and smoothing processing, so that jitter can be reduced from multiple dimensions, namely the jitter is reduced from the resolution dimension and the smoothing dimension, the high fit between the material to be added and the virtual human image can be further improved, and the technical effects of authenticity and reliability are improved.

It should be noted that, during the mapping process, the method can be implemented in a manner of replacing based on pixel values, for example, according to the position information of the material to be added in each frame of the virtual human image, the pixel value of the corresponding position information is replaced with the pixel value of the material to be added.

However, although the method is relatively simple and convenient to implement, the effect is relatively poor, and the attaching degree between the material to be added and the virtual human image is poor.

For example, as shown in fig. 7, the material-added virtual human image generated based on the manner of directly replacing the pixel value has only two values, namely 0 and 1, of a mask (mask) of the material to be added, and at an edge of the material to be added, the value 0 of the mask is directly jumped to the value 1, which causes the edge of the material to be added to be particularly sharp, and causes the degree of fit between the material to be added and the virtual human image to be low.

In order to avoid the above problem, the method may be implemented by using an edge transition process, for example, the edge transition process includes the following steps:

the first step is as follows: and determining edge pixel information when the material to be added is subjected to mapping processing according to the position information of the material to be added in each frame of virtual human image.

For example, after determining the position information of the material to be added in a frame of the virtual human image, the position information of the edge where the frame of the virtual human image is attached to the material to be added may be determined, and the pixel information, such as the pixel value, of the pixel point corresponding to the position information of the edge may be determined.

The second step is as follows: and constructing an edge gradient pixel matrix according to the edge pixel information.

The edge gradient pixel matrix is used for representing a pixel matrix for performing gradient processing on edge pixel information so as to perform smoothing processing on pixel values at an edge.

In some embodiments, an edge gradient pixel matrix may be constructed by using a gaussian smoothing method, for example, a gaussian kernel function (for the description of the gaussian kernel function, see the above embodiments, which is not described herein) may be determined, so as to perform smoothing processing on pixel values at an edge based on the gaussian kernel function, where the edge gradient pixel matrix is a set of smooth gradient values from 0 to 1.

The Gaussian kernel function can be determined based on the edge smoothing requirement and the content of the material to be added, so that the content of the material to be added is not obscured while the edge smoothing requirement is met, namely the content of the material to be added is not influenced.

The third step: and mapping the material to be added according to the edge gradient pixel matrix to obtain each frame of virtual human image added with the material.

In combination with the above analysis, after a set of smooth gradual change values from 0 to 1 is determined, the material to be added may be mapped to the virtual human image based on the set of smooth gradual change values from 0 to 1, and the effect after mapping processing may be referred to in fig. 8.

As shown in fig. 8, the mask of the material to be added after the charting process is a gradient value, and compared with fig. 7, the virtual human image after the material is added shown in fig. 8 is more real, and the coincidence degree between the material to be added and the virtual human image is higher.

Therefore, by constructing the edge gradient pixel matrix and performing mapping processing based on the edge gradient pixel matrix, the defects of abrupt edge and low laminating degree can be avoided, and the technical effects of fusion reliability and reality between the material to be added and the virtual human image are improved.

S407: and generating the virtual human video added with the material according to the virtual human image of each frame added with the material.

Illustratively, after a material to be added is added to each frame of virtual human image in N frames of virtual human images, N frames of virtual human images to which the material to be added is added can be obtained, and a virtual human video to which the material is added is generated based on the N frames of virtual human images to which the material to be added is added.

For example, splicing the virtual human images of the N frames to which the materials to be added are added to obtain the virtual human video to which the materials are added.

In some embodiments, S407 may include the steps of:

the first step is as follows: the method comprises the steps of obtaining time information corresponding to each frame of virtual human image of a material to be added, and determining the time information corresponding to each frame of virtual human image after the material is added according to the time information corresponding to each frame of virtual human image of the material to be added.

The second step is as follows: and sequencing and splicing the frames of virtual human images added with the materials according to the time information corresponding to the frames of virtual human images added with the materials to obtain the virtual human video added with the materials.

Aiming at any frame of virtual human image, the time information representation of the frame of virtual human image, and the precedence order relation of the frame of virtual human image in the virtual human video. For example, the time information of the first frame of virtual human image is represented, the first frame of virtual human image is the virtual human image which is arranged at the top in sequence in the virtual human video, the time information of the second frame of virtual human image is represented, the second frame of virtual human image is the virtual human image which is arranged at the second in sequence in the virtual human video, and so on.

The sequence of each frame of virtual human image in the virtual human video after the material is added is the same as the sequence of each frame of virtual human image in the virtual human video (namely, the video before the material to be added is not added), so that after the material to be added is added to each frame of virtual human image, the frames of virtual human images after the material to be added is added can still be sequenced and spliced on the basis of the sequence, and the virtual human video after the material is added is obtained, so that the virtual human video after the material is added has the technical effects of higher accuracy and reliability.

Fig. 9 is a schematic diagram according to a third embodiment of the present disclosure, and as shown in fig. 9, a virtual human video generation apparatus 900 includes:

the first obtaining unit 901 is configured to obtain a virtual human video and a material to be added, where the virtual human video includes multiple frames of virtual human images.

The second obtaining unit 902 is configured to obtain first position information of a first frame virtual human image of a material to be added in the virtual human video.

A determining unit 903, configured to determine a plurality of tracking points in the first frame virtual human image according to the first position information.

And the tracking unit 904 is configured to perform tracking processing on the multiple tracking points to obtain position information of the multiple tracking points in the virtual human images of other frames in the multi-frame virtual human image.

And the generating unit 905 is used for generating the virtual human video with the material added according to the material to be added and the position information of the plurality of tracking points in the virtual human images of other frames.

Fig. 10 is a schematic diagram according to a fourth embodiment of the present disclosure, and as shown in fig. 10, a virtual human video generation apparatus 1000 includes:

the first obtaining unit 1001 is configured to obtain a virtual human video and a material to be added, where the virtual human video includes a plurality of frames of virtual human images.

The second obtaining unit 1002 is configured to obtain first position information of a first frame virtual human image of a material to be added in the virtual human video.

A determining unit 1003, configured to determine a plurality of tracking points in the first frame virtual human image according to the first position information.

As can be seen in fig. 10, in some embodiments, the determining unit 1003 includes:

the selecting subunit 10031 is configured to select, according to the first position information, a plurality of pixel points whose curvatures are greater than a preset curvature threshold from the first frame of virtual human image.

In some embodiments, selecting a subunit 10031 includes:

and the second determining module is used for determining the selected area according to the first position information.

The selecting module is used for selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from a selecting area, wherein the distance between each pixel point in the selecting area and the first position information is smaller than a preset distance threshold value.

The second determining subunit 10032 is configured to determine a plurality of tracking points according to the selected pixel point with the curvature greater than the preset curvature threshold.

And the tracking unit 1004 is used for tracking the plurality of tracking points to obtain the position information of the plurality of tracking points in the virtual human images of other frames in the multi-frame virtual human image.

In some embodiments, the tracking unit 1004 is configured to perform tracking processing on multiple tracking points of an ith frame of virtual human image in multiple frames of virtual human images to obtain position information of the multiple tracking points in an (i + 1) th frame of virtual human image of the multiple frames of virtual human images, where i is greater than 1 and less than or equal to N-1, i is a positive integer, and N is a total number of frames of the multiple frames of virtual human images.

As can be appreciated in conjunction with fig. 10, in some embodiments, the tracking unit 1004 includes:

the first determining subunit 10041 is configured to determine, according to the position information of the multiple tracking points in the other frame virtual human image, the position information of the material to be added in the other frame virtual human image.

The mapping subunit 10042 is configured to perform mapping processing on the material to be added according to the position information of the material to be added in each frame of the virtual human image, so as to obtain each frame of the virtual human image to which the material is added.

In some embodiments, the mapping subunit 10042, comprises:

and the smoothing module is used for smoothing the position information of the material to be added in each frame of the virtual human image to obtain the position information after smoothing.

In some embodiments, the smoothing module comprises:

and the first determining submodule is used for determining the radial central line of the position information of the material to be added in each frame of the virtual human image.

In some embodiments, the first determining submodule is configured to amplify the size of each frame of the virtual human image and the position information in each frame of the virtual human image by a preset multiple, and determine a radial center line of the position information of the material to be added in each frame of the virtual human image after being amplified by the preset multiple.

And the second determining submodule is used for determining the Gaussian smoothing parameter according to the radial central line.

And the smoothing submodule is used for performing Gaussian smoothing on the position information of the material to be added in each frame of the virtual human image according to the Gaussian smoothing parameters to obtain the position information after smoothing.

In some embodiments, the mapping subunit 10042, comprises:

the first determining module is used for determining edge pixel information when the material to be added is subjected to mapping processing according to the position information of the material to be added in each frame of virtual human image.

And the construction module is used for constructing an edge gradient pixel matrix according to the edge pixel information.

And the mapping module is used for mapping the material to be added according to the edge gradient pixel matrix to obtain each frame of virtual human image added with the material.

And the mapping module is used for mapping the material to be added according to the position information after the smoothing processing.

And the generating subunit 10043 is configured to generate the material-added virtual human video according to each frame of virtual human image to which the material is added.

In some embodiments, generating the subunit 10043 comprises:

and the acquisition module is used for acquiring time information corresponding to each frame of virtual human image of the material to be added.

And the determining module is used for determining the time information corresponding to each frame of virtual human image after the material is added according to the time information corresponding to each frame of virtual human image of the material to be added.

And the processing module is used for sequencing and splicing the frames of virtual human images added with the materials according to the time information corresponding to the frames of virtual human images added with the materials to obtain the virtual human video added with the materials.

And the generating unit 1005 is used for generating the virtual human video with the material added according to the material to be added and the position information of the plurality of tracking points in the virtual human images of other frames.

Fig. 11 is a schematic diagram according to a fifth embodiment of the present disclosure, and as shown in fig. 11, an electronic device 1100 in the present disclosure may include: a processor 1101 and a memory 1102.

A memory 1102 for storing programs; the Memory 1102 may include a volatile Memory (RAM), such as a Static Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memory 1102 is used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in one or more of the memories 1102 in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by the processor 1101.

The computer programs, computer instructions, etc. described above can be stored in one or more memories 1102 in a partitioned manner. And the above-mentioned computer program, computer instruction, or the like may be called by the processor 1101.

A processor 1101 for executing the computer program stored in the memory 1102 to implement the steps of the method according to the above embodiments.

Reference may be made in particular to the description relating to the preceding method embodiment.

The processor 1101 and the memory 1102 may be separate structures or may be an integrated structure integrated together. When the processor 1101 and the memory 1102 are separate structures, the memory 1102, the processor 1101 may be coupled by a bus 1103.

The electronic device of this embodiment may execute the technical solution in the method, and the specific implementation process and the technical principle are the same, which are not described herein again.

It should be noted that the head model of the virtual human in this embodiment is not a head model for a specific user, and cannot reflect personal information of a specific user. It should be noted that the virtual human image in this embodiment is from a public data set.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 executes the respective methods and processes described above, such as the generation method of the virtual human video. For example, in some embodiments, the method of generating a avatar video may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the above-described generation method of the virtual human video may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of generating the avatar video.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for generating a virtual human video comprises the following steps:

2. The method according to claim 1, wherein tracking the plurality of tracking points to obtain the position information of the plurality of tracking points in the other frame virtual human image comprises:

tracking a plurality of tracking points of the ith frame of virtual human image in the plurality of frames of virtual human images to obtain the position information of the plurality of tracking points in the (i + 1) th frame of virtual human image in the plurality of frames of virtual human images, wherein i is more than 1 and less than or equal to N-1, i is a positive integer, and N is the total frame number of the plurality of frames of virtual human images.

3. The method of claim 2, wherein generating the avatar video with the material added according to the material to be added and the position information of the tracking points in the other frames of avatar images comprises:

determining the position information of the material to be added in the other frame virtual human images according to the position information of the tracking points in the other frame virtual human images;

mapping the material to be added according to the position information of the material to be added in each frame of virtual human image to obtain each frame of virtual human image after the material is added;

and generating the virtual human video added with the material according to the virtual human image of each frame added with the material.

4. The method of claim 3, wherein the mapping processing of the material to be added according to the position information of the material to be added in each frame of the virtual human image comprises the following steps:

and smoothing the position information of the material to be added in each frame of virtual human image to obtain smoothed position information, and performing mapping processing on the material to be added according to the smoothed position information.

5. The method of claim 4, wherein smoothing the position information of the material to be added in each frame of the virtual human image to obtain smoothed position information comprises:

determining a radial center line of the position information of the material to be added in each frame of the virtual human image, determining a Gaussian smoothing parameter according to the radial center line, and performing Gaussian smoothing on the position information of the material to be added in each frame of the virtual human image according to the Gaussian smoothing parameter to obtain the position information after smoothing.

6. The method of claim 5, wherein determining the radial center line of the position information of the material to be added in each frame of the virtual human image comprises the following steps:

7. The method according to any one of claims 3 to 6, wherein the step of mapping the material to be added according to the position information of the material to be added in each frame of the virtual human image to obtain each frame of the virtual human image after the material is added comprises the following steps:

determining edge pixel information when the material to be added is subjected to mapping processing according to the position information of the material to be added in each frame of virtual human image;

and constructing an edge gradient pixel matrix according to the edge pixel information, and performing mapping processing on the material to be added according to the edge gradient pixel matrix to obtain each frame of virtual human image added with the material.

8. The method according to any one of claims 1-7, wherein determining a plurality of tracking points from the first frame virtual human image according to the first position information comprises:

selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the first frame of virtual human image according to the first position information;

and determining the plurality of tracking points according to the selected pixel points with the curvatures larger than a preset curvature threshold.

9. The method of claim 8, wherein selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the first frame of virtual human image according to the first position information comprises:

determining a selection area according to the first position information, and selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the selection area, wherein the distance between each pixel point in the selection area and the first position information is smaller than a preset distance threshold value.

10. The method according to any one of claims 3 to 9, wherein generating the material-added avatar video from each frame of the material-added avatar image comprises:

acquiring time information corresponding to each frame of virtual human image of a material to be added, and determining the time information corresponding to each frame of virtual human image after the material is added according to the time information corresponding to each frame of virtual human image of the material to be added;

and sequencing and splicing the frames of virtual human images added with the materials according to the time information corresponding to the frames of virtual human images added with the materials to obtain the virtual human video added with the materials.

11. A virtual human video generation device comprises:

12. The device of claim 11, wherein the tracking unit is configured to perform tracking processing on multiple tracking points of an ith frame of virtual human image in the multiple frames of virtual human images to obtain position information of the multiple tracking points in an (i + 1) th frame of virtual human image in the multiple frames of virtual human images, where 1 < i ≦ N-1, i is a positive integer, and N is a total frame number of the multiple frames of virtual human images.

13. The apparatus of claim 12, wherein the tracking unit comprises:

the first determining subunit is used for determining the position information of the material to be added in the other frame virtual human images according to the position information of the tracking points in the other frame virtual human images;

the mapping subunit is used for mapping the material to be added according to the position information of the material to be added in each frame of virtual human image to obtain each frame of virtual human image after the material is added;

and the generating subunit is used for generating the virtual human video with the added materials according to the virtual human images with the added materials.

14. The apparatus of claim 13, wherein the mapping subunit comprises:

the smoothing module is used for smoothing the position information of the material to be added in each frame of the virtual human image to obtain the position information after smoothing;

15. The apparatus of claim 14, wherein the smoothing module comprises:

the first determining submodule is used for determining a radial central line of the position information of the material to be added in each frame of the virtual human image;

a second determining submodule for determining a gaussian smoothing parameter based on the radial centerline;

16. The apparatus according to claim 15, wherein the first determining submodule is configured to enlarge the size of each frame of the virtual human image and the position information in each frame of the virtual human image by a preset factor, and determine a radial center line of the position information of the material to be added in each frame of the virtual human image, which is enlarged by the preset factor.

17. The apparatus of any of claims 13-16, wherein the mapping subunit comprises:

the first determining module is used for determining edge pixel information when the material to be added is subjected to mapping processing according to the position information of the material to be added in each frame of virtual human image;

the construction module is used for constructing an edge gradient pixel matrix according to the edge pixel information;

18. The apparatus according to any one of claims 11-17, wherein the determining unit comprises:

the selecting subunit is used for selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the first frame of virtual human image according to the first position information;

and the second determining subunit is used for determining the plurality of tracking points according to the selected pixel points with the curvatures larger than the preset curvature threshold.

19. The apparatus of claim 18, wherein the selecting a subunit comprises:

the second determining module is used for determining a selected area according to the first position information;

the selecting module is used for selecting a plurality of pixel points with curvatures larger than a preset curvature threshold value from the selecting area, wherein the distance between each pixel point in the selecting area and the first position information is smaller than a preset distance threshold value.

20. The apparatus of any one of claims 13-19, wherein the generating subunit comprises:

the acquisition module is used for acquiring time information corresponding to each frame of virtual human image of the material to be added;

the determining module is used for determining the time information corresponding to each frame of virtual human image after the material is added according to the time information corresponding to each frame of virtual human image of the material to be added;

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 10.