CN109064548A

CN109064548A - Video generation method, device, equipment and storage medium

Info

Publication number: CN109064548A
Application number: CN201810719201.5A
Authority: CN
Inventors: 乔慧; 李伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2018-12-21
Anticipated expiration: 2038-07-03
Also published as: CN109064548B

Abstract

The present invention provides a kind of video generation method, device, equipment and storage medium, this method comprises: obtaining target face 3D model；According to the corresponding default face 3D model of the target face 3D model and expression to be transformed, intermediate face 3D model is obtained, the expression to be transformed is any of multiple expressions；Extract the key point coordinate in the intermediate face 3D model；The key point coordinate includes: the partial coordinates of human face five-sense-organ profile；By the key point coordinate of the target face 3D model according to default gradual change step-length, it is gradually adjusted to the key point coordinate of intermediate face 3D model, and acquires the 2D image obtained after adjustment every time, obtains N 2D images altogether, the N is the integer greater than 1；According to the N 2D images, one section of video is generated.The video smooth based on less static 2D figure production may be implemented in the present invention, improves the efficiency of production video, and can make the expression shape change video of any Static Human Face.

Description

Video generation method, device, equipment and storage medium

Technical field

The present invention relates to technical field of image processing more particularly to a kind of video generation method, device, equipment and storage to be situated between Matter.

Background technique

With the development of terminal technology, more and more terminals are provided with video capture function, and people can pass through end End is to make cardon or small video.

Currently, video creating method is mainly shot by video camera, or by camera continuous acquisition multiple image, so Video is generated using the mode that artificial is handled afterwards, this method has higher requirements to the quality and quantity of video frame.

But when video camera obtain video frame number is less or video frame between have missing when, be difficult to produce stream The video of smooth high quality.

Summary of the invention

The present invention provides a kind of video generation method, device and storage medium, realizes and schemes production based on less static 2D Smooth video, improves the efficiency of production video, and can make the expression shape change video of any Static Human Face.

In a first aspect, the embodiment of the present invention provides a kind of video generation method, comprising:

Obtain target face 3D model；

According to the corresponding default face 3D model of the target face 3D model and expression to be transformed, intermediate face 3D is obtained Model, the expression to be transformed are any of multiple expressions；

Extract the key point coordinate in the intermediate face 3D model；The key point coordinate includes: human face five-sense-organ profile Partial coordinates；

By the key point coordinate of the target face 3D model according to default gradual change step-length, it is gradually adjusted to intermediate face 3D The key point coordinate of model, and the 2D image obtained after adjustment every time is acquired, N 2D images are obtained altogether, and the N is greater than 1 Integer；

According to the N 2D images, one section of video is generated.

It is described according to the N 2D images in a kind of possible design, generate one section of video, comprising:

According to the acquisition order of the N 2D images, the N 2D images are synthesized into one section of video.

It is described according to the target face 3D model and the corresponding default people of expression to be transformed in a kind of possible design Face 3D model obtains intermediate face 3D model, comprising:

According to the target face 3D model and expression to be transformed, by the corresponding default face 3D mould of the expression to be transformed Type is adjusted to closest to the target face 3D model, will the corresponding default face 3D model determination of expression to be transformed adjusted For intermediate face 3D model.

In a kind of possible design, the acquisition target face 3D model, comprising:

An at least 2D static images are obtained, include target facial image in every 2D static images；

According to an at least 2D static images, the target face 3D model is established.

In a kind of possible design, according to the target face 3D model and the corresponding default face of expression to be transformed 3D model, before obtaining intermediate face 3D model, further includes:

According to M samples pictures, the corresponding face 3D model of every samples pictures is obtained, the M is the integer greater than 0, Human face expression in the samples pictures is the expression to be transformed；

M face 3D model is handled by weighted calculation, obtains the corresponding default face 3D mould of the expression to be transformed Type.

It is described to handle M face 3D model by weighted calculation in a kind of possible design, it obtains described to be transformed The corresponding face 3D model of expression, comprising:

Pass through formulaObtain the corresponding face 3D model of the expression to be transformed；

Wherein, F indicates the corresponding face 3D model of the expression to be transformed, F₀Indicate preset Initial Face 3D model, F_i Indicate the corresponding face 3D model of i-th of samples pictures, a_iIndicate the weight system of the corresponding face 3D model of i-th of samples pictures Number.

Second aspect, the embodiment of the present invention provide a kind of video-generating device, comprising:

Module is obtained, for obtaining target face 3D model；

Processing module is used for according to the corresponding default face 3D model of the target face 3D model and expression to be transformed, Intermediate face 3D model is obtained, the expression to be transformed is any of multiple expressions；

Extraction module, for extracting the key point coordinate in the intermediate face 3D model；The key point coordinate includes: The partial coordinates of human face five-sense-organ profile；

Module is adjusted, for according to default gradual change step-length, gradually adjusting the key point coordinate of the target face 3D model The whole key point coordinate for intermediate face 3D model, and the 2D image obtained after adjustment every time is acquired, N 2D images are obtained altogether, The N is the integer greater than 1；

Video generation module, for generating one section of video according to the N 2D images.

In a kind of possible design, the video generation module is specifically used for:

In a kind of possible design, the processing module is specifically used for:

In a kind of possible design, the acquisition module is specifically used for:

In a kind of possible design, the processing module is also used to according to the target face 3D model and wait become The corresponding default face 3D model of expression is changed, before obtaining intermediate face 3D model, according to M samples pictures, obtains every sample The corresponding face 3D model of this picture, the M are the integer greater than 0, and the human face expression in the samples pictures is described wait become Change expression；

The third aspect, the embodiment of the present invention provide a kind of video generating device, comprising: processor and memory, memory In be stored with the executable instruction of the processor；Wherein, the processor is configured to next via the executable instruction is executed Execute video generation method described in any one of first aspect.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, Video generation method described in any one of first aspect is realized when the program is executed by processor.

5th aspect, the embodiment of the present invention provide a kind of program product, and described program product includes computer program, described Computer program is stored in readable storage medium storing program for executing, at least one processor of server can be read from the readable storage medium storing program for executing The computer program is taken, at least one described processor executes the computer program and makes server implementation first aspect sheet Any video generation method of inventive embodiments.

A kind of video generation method, device, equipment and storage medium provided by the invention, by obtaining target face 3D mould Type；According to the corresponding default face 3D model of the target face 3D model and expression to be transformed, intermediate face 3D model is obtained, The expression to be transformed is any of multiple expressions；Extract the key point coordinate in the intermediate face 3D model；It is described Key point coordinate includes: the partial coordinates of human face five-sense-organ profile；By the key point coordinate of the target face 3D model according to pre- If gradual change step-length, it is gradually adjusted to the key point coordinate of intermediate face 3D model, and acquires the 2D image obtained after adjustment every time, N 2D images are obtained altogether, and the N is the integer greater than 1；According to the N 2D images, one section of video is generated.The present invention can be with It realizes the video smooth based on less static 2D figure production, improves the efficiency of production video, and can make any quiet The expression shape change video of state face.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is the schematic illustration of an application scenarios of the invention；

Fig. 2 is the flow chart for the video generation method that the embodiment of the present invention one provides；

Fig. 3 is the structural schematic diagram of the key point coordinate of face lip；

Fig. 4 is the structural schematic diagram of video-generating device provided by Embodiment 2 of the present invention；

Fig. 5 is the structural schematic diagram for the video generating device that the embodiment of the present invention three improves.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

Technical solution of the present invention is described in detail with specifically embodiment below.These specific implementations below Example can be combined with each other, and the same or similar concept or process may be repeated no more in some embodiments.

Fig. 1 is the schematic illustration of an application scenarios of the invention, as shown in Figure 1, obtaining an at least 2D static map first Piece 10, includes facial image in the 2D static images 10, turns 3D technology according to existing 2D, is in 2D static images 10 Facial image establishes target face 3D model 20.Based on target face 3D model 20, default face 3D model 30 is adjusted It is whole, obtain the intermediate face 3D model 40 closest with the target face 3D model 20.Due to intermediate face 3D model 40 In human face five-sense-organ corresponding to key point coordinate information corresponded to human face expression to be transformed, therefore, can be by the target The key point coordinate of face 3D model 20 according to default gradual change step-length, sit by the key point for being gradually adjusted to intermediate face 3D model 40 Mark, and acquire the 2D image obtained after adjustment every time.These collected 2D images are constituted into one end video according to acquisition order Stream 50, when playing the video flowing 50, can be obtained the video that 2D static images 10 are transformed to different expressions.The present embodiment can be with It realizes the video smooth based on less static 2D figure production, improves the efficiency of production video, and can make any quiet The expression shape change video of state face.

How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.

Fig. 2 is the flow chart for the video generation method that the embodiment of the present invention one provides, as shown in Fig. 2, in the present embodiment Method may include:

S101, target face 3D model is obtained.

It, can be by obtaining an at least 2D static images in the present embodiment, and include target in every 2D static images Facial image；According to an at least 2D static images, the target face 3D model is established.It specifically, can be based on existing Some 2D turn 3D technology to establish the 3D model of target face, for example, using PhotoAnim animation software, tikuwa software etc. Deng.The technology that the present embodiment is converted to 3D rendering or animation to 2D image not limits.

S102, according to the corresponding default face 3D model of the target face 3D model and expression to be transformed, obtain intermediate Face 3D model.

In a kind of optional embodiment, the corresponding face of every samples pictures can be obtained according to M samples pictures 3D model, the M are the integer greater than 0, and the human face expression in the samples pictures is the expression to be transformed；By M face 3D model is handled by weighted calculation, obtains the corresponding default face 3D model of the expression to be transformed.

Optionally, pass through formulaObtain the corresponding face 3D model of the expression to be transformed；Its In, F indicates the corresponding face 3D model of the expression to be transformed, F₀Indicate preset Initial Face 3D model, F_iIt indicates i-th The corresponding face 3D model of samples pictures, a_iIndicate the weight coefficient of the corresponding face 3D model of i-th of samples pictures.

In the present embodiment, expression to be transformed is any of multiple expressions, such as: happiness, sadness, indignation, sorrow etc. Deng., can be according to the target face 3D model and expression to be transformed in a kind of optional embodiment, it will be described to be transformed The corresponding default face 3D model of expression is adjusted to closest to the target face 3D model, by expression pair to be transformed adjusted The default face 3D model answered is determined as intermediate face 3D model.Specifically, it is carried out so that expression to be transformed is happiness as an example detailed Explanation.Corresponding default face 3D model is happy expression, wherein the key point coordinate in default face 3D model is known , the key point coordinate include human face five-sense-organ profile partial coordinates (such as: lip, eyes, nose, eyebrow, face contour Etc.).Fig. 3 is the structural schematic diagram of the key point coordinate of face lip, as shown in figure 3, one sharing 9 key points on lip 60, and the coordinate of each key point 60 is known.The present embodiment only by taking lip as an example, the key point of other face divide with The division principle of lip is similar, it should be noted that the present embodiment not limits the position of key point and quantity, this field skill Art personnel can be adjusted the key point coordinate of different face according to expression feature.On the basis of default face 3D model, According to target face 3D model, default face 3D model is adjusted to the intermediate face 3D closest with target face 3D model Model.Wherein, the key point coordinate in intermediate face 3D model adjusted is known, this is because to default face 3D During model is adjusted, the key point coordinate of default face 3D model can act (corresponding adjusting parameter) according to adjustment and carry out Variation, when adjusting completion, the key point coordinate of intermediate face 3D model also determines therewith.

Key point coordinate in S103, the extraction intermediate face 3D model.

In the present embodiment, the key point coordinate in the intermediate face 3D model can be extracted as adjustment target face 3D The foundation of model, the key point coordinate include: the partial coordinates of human face five-sense-organ profile.Specifically, it is assumed that expression to be transformed For happiness, then the key point coordinate for obtaining intermediate face 3D model respectively (can be the key point coordinate of lip, the key of eyes Point coordinate etc.).

S104, by the key point coordinate of the target face 3D model according to default gradual change step-length, be gradually adjusted to intermediate The key point coordinate of face 3D model, and the 2D image obtained after adjustment every time is acquired, N 2D images are obtained altogether, and the N is big In 1 integer.

In the present embodiment, it can gradually be adjusted by the key point coordinate of target face 3D model according to default gradual change step-length For the key point coordinate of intermediate face 3D model, wherein the setting of step-length can be arbitrarily arranged.Specifically, it is assumed that target face 3D The key point coordinate of model is (2,5,6), and the key point coordinate of corresponding intermediate face 3D model is (3,7,8), then can Coordinate (2,5,6) is first transformed to coordinate (2.5,6,7), coordinate (2.5,6,7) is then transformed to (3,7,8) again.It needs Illustrate, the specific value of the unlimited fixed step size of the present embodiment, theoretically, step-length is smaller, and the number of obtained 2D image is also It is more, then the video finally presented also can be more coherent.

S105,2D images are opened according to the N, generates one section of video.

In the present embodiment, the N 2D images can be synthesized into one section of view according to the acquisition order of the N 2D images Frequently.Its detailed process may refer to the related discussion in embodiment illustrated in fig. 1, and details are not described herein again.

In a kind of embodiment optionally, cardon can also be generated according to the N 2D images, detailed process can It is discussed with the correlation in embodiment shown in Figure 1, details are not described herein again.

The present embodiment, by obtaining target face 3D model；According to the target face 3D model and expression pair to be transformed The default face 3D model answered, obtains intermediate face 3D model, and the expression to be transformed is any of multiple expressions；It extracts Key point coordinate in the intermediate face 3D model；The key point coordinate includes: the partial coordinates of human face five-sense-organ profile；It will The key point coordinate of the target face 3D model is gradually adjusted to the key of intermediate face 3D model according to default gradual change step-length Point coordinate, and the 2D image obtained after adjustment every time is acquired, N 2D images are obtained altogether, and the N is the integer greater than 1；According to institute N 2D images are stated, one section of video is generated.The video smooth based on less static 2D figure production may be implemented in the present invention, improves The efficiency of production video, and can make the expression shape change video of any Static Human Face.

Fig. 4 is the structural schematic diagram of video-generating device provided by Embodiment 2 of the present invention, as shown in figure 4, the present embodiment Video-generating device may include:

Module 71 is obtained, for obtaining target face 3D model；

Processing module 72, for according to the corresponding default face 3D mould of the target face 3D model and expression to be transformed Type, obtains intermediate face 3D model, and the expression to be transformed is any of multiple expressions；

Extraction module 73, for extracting the key point coordinate in the intermediate face 3D model；The key point coordinate packet It includes: the partial coordinates of human face five-sense-organ profile；

Adjust module 74, for by the key point coordinate of the target face 3D model according to default gradual change step-length, gradually It is adjusted to the key point coordinate of intermediate face 3D model, and acquires the 2D image obtained after adjustment every time, obtains N 2D figures altogether Picture, the N are the integer greater than 1；

Video generation module 75, for generating one section of video according to the N 2D images.

In a kind of possible design, the video generation module 75 is specifically used for:

In a kind of possible design, the processing module 72 is specifically used for:

In a kind of possible design, the acquisition module 71 is specifically used for:

The video-generating device of the present embodiment can execute the technical solution in the method for any of the above-described embodiment of the method, That the realization principle and technical effect are similar is similar for it, and details are not described herein again.

Fig. 5 is the structural schematic diagram for the video generating device that the embodiment of the present invention three improves, as shown in figure 5, the present embodiment Video generating device 80 may include: processor 81 and memory 82.

Memory 82 (such as realizes application program, the functional module of above-mentioned video generation method for storing computer program Deng), computer instruction etc., above-mentioned computer program, computer instruction etc. can be with partitioned storages in one or more memories In 82.And above-mentioned computer program, computer instruction, data etc. can be called with device 81 processed.

Processor 81, the computer program stored for executing the memory 82, to realize that above-described embodiment relates to And method in each step.It specifically may refer to the associated description in previous methods embodiment.Wherein, memory 82, place Reason device 81 can be of coupled connections by bus 83.

The server of the present embodiment can execute the technical solution in the method for any of the above-described embodiment of the method, realize Principle is similar with technical effect, and details are not described herein again.

In addition, the embodiment of the present application also provides a kind of computer readable storage medium, deposited in computer readable storage medium Computer executed instructions are contained, when at least one processor of user equipment executes the computer executed instructions, user equipment Execute above-mentioned various possible methods.

Wherein, computer-readable medium includes computer storage media and communication media, and wherein communication media includes being convenient for From a place to any medium of another place transmission computer program.Storage medium can be general or specialized computer Any usable medium that can be accessed.A kind of illustrative storage medium is coupled to processor, to enable a processor to from this Read information, and information can be written to the storage medium.Certainly, storage medium is also possible to the composition portion of processor Point.Pocessor and storage media can be located in ASIC.In addition, the ASIC can be located in user equipment.Certainly, processor and Storage medium can also be used as discrete assembly and be present in communication equipment.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or part of or all technical features are carried out etc. With replacement；And these modifications or substitutions, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution Range.

Claims

1. a kind of video generation method characterized by comprising

Obtain target face 3D model；

According to the corresponding default face 3D model of the target face 3D model and expression to be transformed, intermediate face 3D mould is obtained Type, the expression to be transformed are any of multiple expressions；

Extract the key point coordinate in the intermediate face 3D model；The key point coordinate includes: the portion of human face five-sense-organ profile Divide coordinate；

By the key point coordinate of the target face 3D model according to default gradual change step-length, it is gradually adjusted to intermediate face 3D model Key point coordinate, and acquire the 2D image that obtains after adjustment every time, obtain N 2D images altogether, the N is the integer greater than 1；

According to the N 2D images, one section of video is generated.

2. one section of video is generated the method according to claim 1, wherein described open 2D images according to the N, Include:

3. method according to claim 1 or 2, which is characterized in that described according to the target face 3D model and wait become The corresponding default face 3D model of expression is changed, intermediate face 3D model is obtained, comprising:

According to the target face 3D model and expression to be transformed, by the corresponding default face 3D model tune of the expression to be transformed It is whole for closest to the target face 3D model, the corresponding default face 3D model of expression to be transformed adjusted is determined as Between face 3D model.

4. the method according to claim 1, wherein the acquisition target face 3D model, comprising:

5. the method according to claim 1, wherein according to the target face 3D model and expression to be transformed Corresponding default face 3D model, before obtaining intermediate face 3D model, further includes:

According to M samples pictures, the corresponding face 3D model of every samples pictures is obtained, the M is the integer greater than 0, described Human face expression in samples pictures is the expression to be transformed；

M face 3D model is handled by weighted calculation, obtains the corresponding default face 3D model of the expression to be transformed.

6. according to the method described in claim 5, it is characterized in that, it is described by M face 3D model by weighted calculation handle, Obtain the corresponding face 3D model of the expression to be transformed, comprising:

Wherein, F indicates the corresponding face 3D model of the expression to be transformed, F₀Indicate preset Initial Face 3D model, F_iIt indicates The corresponding face 3D model of i-th of samples pictures, a_iIndicate the weight coefficient of the corresponding face 3D model of i-th of samples pictures.

7. a kind of video-generating device characterized by comprising

Module is obtained, for obtaining target face 3D model；

Processing module, for obtaining according to the corresponding default face 3D model of the target face 3D model and expression to be transformed Intermediate face 3D model, the expression to be transformed are any of multiple expressions；

Extraction module, for extracting the key point coordinate in the intermediate face 3D model；The key point coordinate includes: face The partial coordinates of face profile；

Module is adjusted, for the key point coordinate of the target face 3D model according to default gradual change step-length, to be gradually adjusted to The key point coordinate of intermediate face 3D model, and the 2D image obtained after adjustment every time is acquired, N 2D images, the N are obtained altogether For the integer greater than 1；

8. device according to claim 7, which is characterized in that the video generation module is specifically used for:

9. device according to claim 7 or 8, which is characterized in that the processing module is specifically used for:

10. device according to claim 7, which is characterized in that the acquisition module is specifically used for:

11. device according to claim 7, which is characterized in that the processing module is also used to according to the target person Face 3D model and the corresponding default face 3D model of expression to be transformed, before obtaining intermediate face 3D model, according to M sample graphs Piece obtains the corresponding face 3D model of every samples pictures, and the M is the integer greater than 0, the face table in the samples pictures Feelings are the expression to be transformed；

12. device according to claim 11, which is characterized in that described to pass through M face 3D model at weighted calculation Reason obtains the corresponding face 3D model of the expression to be transformed, comprising:

13. a kind of video generating device characterized by comprising memory and processor are stored with the processing in memory The executable instruction of device；Wherein, the processor is configured to carry out perform claim requirement 1-6 via the execution executable instruction Video generation method described in one.

14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Video generation method described in any one of claims 1-6 is realized when execution.